SameDiff ops documentation (#8043)
* SameDiff variables documentation Signed-off-by: Nikolay Ivankov <kolya.ivankov@gmail.com> * Minor changes in variables.md Signed-off-by: Nikolay Ivankov <kolya.ivankov@gmail.com> * Adding ops.md to samediff Signed-off-by: KolyaIvankov <kolya.ivankov@gmail.com> * Minor changes to ops.md and variables.md Signed-off-by: Nikolay Ivankov <kolya.ivankov@gmail.com>master
parent
f49c4ea9d0
commit
110d14e542
|
@ -1,3 +1,275 @@
|
|||
---
|
||||
title: Operations in SameDiff
|
||||
short_title: Ops
|
||||
description: What kind of operations is there in `SameDiff` and how to use them
|
||||
category: SameDiff
|
||||
weight: 4
|
||||
---
|
||||
|
||||
# SameDiff operations
|
||||
|
||||
{{autogenerated}}
|
||||
Operations in `SameDiff` work mostly the way you'd expect them to. You take variables - in our framework, those are
|
||||
objects of type `SDVariable` - apply operations to them, and thus produce new variables. Before we proceed to the
|
||||
overview of the available operations, let us list some of their common properties.
|
||||
|
||||
## Common properties of operations
|
||||
|
||||
- Variables of any *variable type* may be used in any operation, as long as their *data types* match those that are
|
||||
required by the operation (again, see our [variables](./samediff/variables) section for what variable types are). Most
|
||||
often an operation will require its `SDVariable` to have a floating point data type.
|
||||
- Variables created by operations have `ARRAY` variable type.
|
||||
- For all operations, you may define a `String` name of your resulting variable, although for most operations this
|
||||
is not obligatory. The name goes as the first argument in each operation, like so:
|
||||
```java
|
||||
SDVariable linear = weights.mmul("matrix_product", input).add(bias);
|
||||
SDVariable output = sameDiff.nn.sigmoid("output", linear);
|
||||
```
|
||||
Named variables may be accessed from outside using a `SameDiff` method `getVariable(String name)`. For the code above,
|
||||
this method will allow you to infer the value of both `output` as well as the result of `mmul` operation. Note that we
|
||||
haven't even explicitly defined this result as a separate `SDVariable`, and yet a corresponding `SDVariable` will be
|
||||
created internally and added to our instance of `SameDiff` under the `String` name `"matrix_product"`. In fact, a unique
|
||||
`String` name is given to every `SDVariable` you produce by operations: if you don't give a name explicitly, it is
|
||||
assigned to the resulting `SDVariable` automatically based on the operation's name.
|
||||
|
||||
|
||||
## Overview of operations
|
||||
The number of currently available operations, including overloads totals several hundreds, they range in complexity from s
|
||||
imple additions and multiplications via producing outputs of convolutional layers to creation of dedicated recurrent
|
||||
neural network modules, and much more. The sheer number of operations would've made it cumbersome to list them all on a
|
||||
single page. So, if you are already looking for something specific, you'll be better off checking our
|
||||
[javadoc](https://deeplearning4j.org/api/latest/), which already contains a detailed information on each operation, or
|
||||
by simply browsing through autocompletion suggestions (if your IDE supports that). Here we rather try to give you an
|
||||
idea of what operations you may expect to find and where to seek for them.
|
||||
|
||||
All operations may be split into two major branches: those which are methods of `SDVariable` and those of `SameDiff`
|
||||
classes. Let us have a closer look at each:
|
||||
|
||||
### `SDVariable` operations
|
||||
We have already seen `SDVariable` operations in previous examples, in expressions like
|
||||
```java
|
||||
SDVariable z = x.add(y);
|
||||
```
|
||||
where `x` and `y` are `SDVariable`'s.
|
||||
|
||||
Among `SDVariable` methods, you will find:
|
||||
- `BLAS`-type operations to perform linear algebra: things like `add`, `neg`, `mul` (used for both scaling and elementwise
|
||||
multiplication) and `mmul` (matrix multiplication), `dot`, `rdiv`, etc.;
|
||||
- comparison operations like `gt` or `lte`, used both to compare each element to a fixed `double` value as well as for
|
||||
elementwise comparison with another `SDVariable` of the same shape, and alike;
|
||||
- basic reduction operations: things like `min`, `sum`, `prod` (product of elements in array), `mean`, `norm2`,
|
||||
`argmax` (index of the maximal element), `squaredDifference` and so on, which may be taken along specified dimensions;
|
||||
- basic statistics operations for computing mean and standard deviation along given dimensions: `mean` and `std`.
|
||||
- operations for restructuring of the underlying array: `reshape` and `permute`, along with `shape` - an operation that
|
||||
delivers the shape of a variable as an array of integers - the dimension sizes;
|
||||
|
||||
`SDVariable` operations may be easily chained, producing lines like:
|
||||
```java
|
||||
SDVariable regressionCost = weights.mmul(input).add("regression_prediction", bias).squaredDifference(labels);
|
||||
```
|
||||
|
||||
### `SameDiff` operations
|
||||
The operations that are methods of `SameDiff` are called via one of 6 auxiliary objects present in each `SameDiff`,
|
||||
which split all operations into 6 uneven branches:
|
||||
- `math` - for general mathematical operations;
|
||||
- `random` - creating different random number generators;
|
||||
- `nn` - general neural network tools;
|
||||
- `cnn` - convolutional neural network tools;
|
||||
- `rnn` - recurrent neural network tools;
|
||||
- `loss` - loss functions;
|
||||
In order to use a particular operation, you need to call one of these 6 objects form your `SameDiff` instance, and then
|
||||
an operation itself, like that:
|
||||
```java
|
||||
SDVariable y = sameDiff.math.sin(x);
|
||||
```
|
||||
or
|
||||
```java
|
||||
SDVariable y = samediff.math().sin(x);
|
||||
```
|
||||
The distribution of operations among the auxiliary objects has no structural bearing beyond organizing things in a more
|
||||
intuitive way. So, for instance, if you're not sure whether to seek for, say, `tanh` operation in `math` or in `nn`,
|
||||
don't worry: we have it in both.
|
||||
|
||||
Let us briefly describe what kinds of operations you may expect to find in each of the branches:
|
||||
|
||||
### `math` - basic mathematical operations
|
||||
Math module mostly consists of general mathematical functions and statistics methods. Those include:
|
||||
|
||||
- power functions, e.g. `square`, `cube`, `sqrt`, `pow`, `reciprocal` etc.;
|
||||
- trigonometric functions, e.g. `sin`, `atan` etc.;
|
||||
- exponential/hyperbolic functions, like `exp`, `sinh`, `log`, `atanh` etc.;
|
||||
- miscellaneous elementwise operations, like taking absolute value, rounding and clipping, such as `abs`, `sign`,
|
||||
`ceil`, `round`, `clipByValue`, `clipByNorm` etc.;
|
||||
- reductions along specified dimensions: `min`, `amax`, `mean`, `asum`, `logEntropy`, and similar;
|
||||
- distance (reduction) operations, such as `euclideanDistance`, `manhattanDistance`, `jaccardDistance`, `cosineDistance`,
|
||||
`hammingDistance`, `cosineSimilarity`, along specified dimensions, for two identically shaped `SDVariables`;
|
||||
- specific matrix operations: `matrixInverse`, `matrixDeterminant`, `diag` (creating a diagonal matrix), `trace`, `eye`
|
||||
(creating identity matrix with variable dimensions), and several others;
|
||||
- more statistics operations: `standardize`, `moment`, `normalizeMoments`, `erf` and `erfc` (Gaussian error function and
|
||||
its complementary);
|
||||
- counting and indexing reductions: methods like `conuntZero` (number of zero elements), `iamin` (index of the element
|
||||
with the smallest absolute value), `firstIndex` (an index of the first element satisfying a specified `Condition` function);
|
||||
- reductions indicating properties of the underlying arrays. These include e.g. `isNaN` (elementwise checking), `isMax`
|
||||
(shape-preserving along specified dimensions), `isNonDecreasing` (reduction along specified dimensions);
|
||||
- elementwise logical operations: `and`, `or`, `xor`, `not`.
|
||||
|
||||
Most operations in `math` have very simple structure, and are inferred like that:
|
||||
```java
|
||||
SDVariable activation = sameDiff.math.cube(input);
|
||||
```
|
||||
Operations may be chained, although in a more cumbersome way in comparison to the `SDVariable` operations, e.g.:
|
||||
```java
|
||||
SDVariable matrixNorm1 = sameDiff.math.max(sameDiff.math.sum(sameDiff.math.abs(matrix), 1));
|
||||
```
|
||||
Observe that the (integer) argument `1` in the `sum` operation tells us that we have to take maximum absolute value
|
||||
along the `1`'s dimension, i.e. the column of the matrix.
|
||||
|
||||
### `random` - creating random values Random
|
||||
These operations create variables whose underlying arrays will be filled with random numbers following some distribution
|
||||
- say, Bernoulli, normal, binomial etc.. These values will be reset at each iteration. If you wish, for instance,
|
||||
to create a variable that will add a Gaussian noise to entries of the MNIST database, you may do something like:
|
||||
```java
|
||||
double mean = 0.;
|
||||
double deviation = 0.05;
|
||||
long[] shape = new long[28, 28];
|
||||
SDVariable noise_mnist = sameDiff.random.normal("noise_mnist", mean, deviation, shape);
|
||||
```
|
||||
The shape of you random variable may vary. Suppose, for instance, that you have audio signals of varying length, and you
|
||||
want to add noise to them. Then, you need to specify an `SDVariable`, say, `windowShape` with an integer
|
||||
[data type](./samediff/variabeles/datatype!!!), and proceed like that
|
||||
```java
|
||||
SDVariabel noise_audio = sameDiff.random.normal("noise_audio", mean, deviation, windowShape);
|
||||
```
|
||||
|
||||
### `nn` - general neural network tools
|
||||
Here we store methods for neural networks that are not necessarily associated with convolutional ones. Among them are
|
||||
- creation of dense linear and ReLU layers (with or without bias), and separate bias addition: `linear`, `reluLayer`,
|
||||
`biasAdd`;
|
||||
- popular activation functions, e.g. `relu`, `sigmoid`, `tanh`, `softmax` as well as their less used versions like
|
||||
`leakyRelu`, `elu`, `hardTanh`, and many more;
|
||||
- padding for 2d arrays with method `pad`, supporting several padding types, with both constant and variable padding width;
|
||||
- explosion/overfitting prevention, such as `dropout`, `layerNorm` and `batchNorm` for layer resp. batch normalization;
|
||||
|
||||
Some methods were created for internal use, but are openly available. Those include:
|
||||
- derivatives for several popular activation functions - these are mostly designed for speeding up
|
||||
backpropagation;
|
||||
- attention modules - basically, building blocks for recurrent neural networks we shall discuss below.
|
||||
|
||||
While activations in `nn` are fairly simple, other operations become more involved. Say, to create a linear
|
||||
or a ReLU layer, up to three predefined `SDVariable` objects may be required, as in the following code:
|
||||
```java
|
||||
SDVariable denseReluLayer = sameDiff.nn.reluLayer(input, weights, bias);
|
||||
```
|
||||
where `input`, `weights` and `bias` need to have dimensions suiting each other.
|
||||
|
||||
To create, say, a dense layer with softmax activation, you may proceed as follows:
|
||||
```java
|
||||
SDVariable linear = sameDiff.nn.linear(input, weight, bias);
|
||||
SDVariable output = sameDiff.nn.softmax(linear);
|
||||
```
|
||||
|
||||
### `cnn` - convolutional neural networks tools
|
||||
The `cnn` module contains layers and operations typically used in convolutional neural networks -
|
||||
different activations may be picked up from the `nn` module. Among `cnn` operations we currently have creation of:
|
||||
- linear convolution layers, currently for tensors of dimension up to 3 (minibatch not included): `conv1d`, `conv2d`,
|
||||
`conv3d`, `depthWiseConv2d`, `separableConv2D`/`sconv2d`;
|
||||
- linear deconvolution layers, currently `deconv1d`, `deconv2d`, `deconv3d`;
|
||||
- pooling, e.g. `maxPoooling2D`, `avgPooling1D`;
|
||||
- specialized reshaping methods: `batchToSpace`, `spaceToDepth`, `col2Im` and alike;
|
||||
- upsampling, currently presented by `upsampling2d` operation;
|
||||
- local response normalization: `localResponseNormalization`, currently for 2d convolutional layers only;
|
||||
|
||||
Convolution and deconvolution operations are specified by a number of static parameters like kernel size,
|
||||
dilation, having or not having bias etc.. To facilitate the creation process, we pack the required parameters into
|
||||
easily constructable and alterable configuration objects. Desired activations may be borrowed from the `nn` module. So,
|
||||
for example, if we want to create a 3x3 convolutional layer with `relu` activation, we may proceed as follows:
|
||||
```java
|
||||
Conv2DConfig config2d = new Conv2DConfig().builder().kW(3).kH(3).pW(2).pH(2).build();
|
||||
SDVariable convolution2dLinear = sameDiff.cnn.conv2d(input, weights, config2d);
|
||||
SDVariable convolution2dOutput = sameDiff.nn.relu(convolution2dLinear);
|
||||
```
|
||||
In the first line, we construct a convolution configuration using its default constructor. Then we specify the
|
||||
kernel size (this is mandatory) and optional padding size, keeping other settings default (unit stride, no
|
||||
dilation, no bias, `NCHW` data format). We then employ this configuration to create a linear convolution with predefined
|
||||
`SDVariables` for input and weights; the shape of `weights` is to be tuned to that of `input` and to `config`
|
||||
beforehand. Thus, if in the above example `input` has shape, say, `[-1, nIn, height, width]`, then `weights` are to have
|
||||
a form `[nIn, nOut, 3, 3]` (because we have 3x3 convolution kernel). The shape of the resulting variable `convoluton2d`
|
||||
will be predetermined by these parameters (in our case, it will be `[-1, nOut, height, width]`). Finally, in the last
|
||||
line we apply a `relu` activation.
|
||||
|
||||
### `rnn` - Recurrent neural networks
|
||||
|
||||
This module contains arguably the most sophisticated methods in the framework. Currently it allows you to create
|
||||
- simple recurrent units, using `sru` and `sruCell` methods;
|
||||
- LSTM units, using `lstmCell`, `lstmBlockCell` and `lstmLayer`;
|
||||
- Graves LSTM units, using `gru` methods.
|
||||
|
||||
As of now, recurrent operations require special configuration objects as input, in which you need to pack all the
|
||||
variables that will be used in a unit. This is subject to change in the later versions. For instance, to
|
||||
create a simple recurrent unit, you need to proceed like that:
|
||||
```java
|
||||
SRUConfiguration sruConfig = new SRUConfiguration(input, weights, bias, init);
|
||||
SDVariable sruOutput = samediff.rnn().sru(sruConfig);
|
||||
```
|
||||
Here, the arguments in the `SRUConfiguration` constructor are variables that are to be defined beforehand. Obviously
|
||||
their shapes should be matching, and these shapes predetermine the shape of `output`.
|
||||
|
||||
### `loss` - Loss functions
|
||||
In this branch we keep common loss functions. Most loss functions may be created quite simply, like that:
|
||||
```java
|
||||
SDVariable logLoss = sameDiff.loss.logLoss("logLoss", label, predictions);
|
||||
```
|
||||
where `labels` and `predictions` are `SDVariable`'s. A `String` name is a mandatory parameter in most `loss` methods,
|
||||
yet it may be set to `null` - in this case, the name will be generated automatically. You may also create weighted loss
|
||||
functions by adding another `SDVariable` parameters containing weights, as well as specify a reduction method (see below)
|
||||
for the loss over the minibatch. Thus, a full-fledged `logLoss` operation may
|
||||
look like:
|
||||
```java
|
||||
SDVariable wLogLossMean = sameDiff.loss.logLoss("wLogLossMean", label, predictions, weights, LossReduce.MEAN_BY_WEIGHT);
|
||||
```
|
||||
Some loss operations may allow/require further arguments, depending on their type: e.g. a dimension along which the
|
||||
loss is to be computed (as in `cosineLoss`), or some real-valued parameters.
|
||||
|
||||
As for reduction methods, over the minibatch, there are currently 4 of them available. Thus, initially loss values for
|
||||
each sample of the minibatch are computed, then they are multiplied by weights (if specified), and finally one of the
|
||||
following routines takes place:
|
||||
- `NONE` - leaving the resulting (weighted)loss values as-is; the result is an `INDArray` with the length of the
|
||||
minibatch: `sum_loss = sum(weights * loss_per_sample)`.
|
||||
- `SUM` - summing the values, producing a scalar result.
|
||||
- `MEAN_BY_WEIGHT` - first computes the sum as above, and then divides it by the sum of all weights, producing a scalar
|
||||
value: `mean_loss = sum(weights * loss_per_sample) / sum(weights)`. If weights are not
|
||||
specified, they all are set to `1.0` and this reduction is equivalent to getting mean loss value over the minibatch.
|
||||
- `MEAN_BY_NONZERO_WEIGHT_COUNT` - divides the weighted sum by the number of nonzero weight, producing a scalar:
|
||||
`mean_count_loss = sum(weights * loss_per_sample) / count(weights != 0)`. Useful e.g. when you want to compute the mean
|
||||
only over a subset of *valid* samples, setting weights by either `0.` or `1.`. When weights are not given, it just
|
||||
produces mean, and thus equivalent to `MEAN_BY_WEIGHT`.
|
||||
|
||||
|
||||
## The *don'ts* of operations
|
||||
|
||||
In order for `SameDiff` operations to work properly, several main rules are to be upheld. Failing to do so may result in
|
||||
an exception or, worse even, to a working code producing undesired results. All the things we mention in the current
|
||||
section describe what **you better not** do.
|
||||
|
||||
- All variables in an operation have to belong to the same instance of `SamdeDiff` (see the [variables](./samediff/variables)
|
||||
section on how variables are added to a `SameDiff` instance). In other words, **you better not**
|
||||
```java
|
||||
SDVariable x = sameDiff0.var(DataType.FLOAT, 1);
|
||||
SDVariable y = sameDiff1.placeHolder(DataType.FLOAT, 1);
|
||||
SDVariable z = x.add(y);
|
||||
```
|
||||
- At best, a new variable is to be created for a result of an operation or a chain of operations. In other words, **you
|
||||
better not** redefine existing variables **and better not** leave operations returning no result. In other words, try to
|
||||
**avoid** the code like this:
|
||||
```java
|
||||
SDVariable z = x.add(y);
|
||||
//DON'T!!!
|
||||
z.mul(2);
|
||||
x = z.mul(y);
|
||||
```
|
||||
A properly working version of the above code (if we've desired to obtain 2xy+2y<sup>2</sup> in an unusual way) will be
|
||||
```java
|
||||
SDVariable z = x.add(y);
|
||||
SDVariable _2z = z.mul(2);
|
||||
w = _2z.mul(y);
|
||||
```
|
||||
To learn more why it functions like that, see our [graph section](./samediff/graph).
|
||||
|
|
|
@ -60,6 +60,7 @@ throw an exception otherwise. Also, as we shall discuss just below, all the `SDV
|
|||
supposed to be of floating point type.
|
||||
|
||||
## Common features of variables
|
||||
|
||||
Before we go to the differences between variables, let us first look at the properties they all share
|
||||
- All variables are ultimately derived from an instance of `SameDiff`, serving as parts of its
|
||||
[graph](./samediff/graphs). In fact, each variable has a `SameDiff` as one of its fields.
|
||||
|
@ -87,13 +88,22 @@ SDVariable weights = samediff.var("weights", DataType.FLOAT, 784, 10);
|
|||
```
|
||||
adds a variable constituting of a 784x10 array of `float` numbers - weights for a single layer MNIST perceptron
|
||||
in this case - to a pre-existing `SameDiff` instance `samediff`.
|
||||
However, this way the values within a variable will be set as zeros. If we wish them to be filled, say, with random
|
||||
numbers, we need to add a weight initialization scheme, for instance like that:
|
||||
|
||||
However, this way the values within a variable will be set as zeros. You may also create a variable with values from
|
||||
a preset `INDArray`. Say
|
||||
```java
|
||||
SDVariable weights = samediff.var("weigths", Nd4j.nrand(784, 10).div(28));
|
||||
```
|
||||
will create a variable filled with normally distributed randomly generated numbers with variance `1/28`. You may put
|
||||
any other array creation methods instead of `nrand`, or any preset array, of course. Also, you may use some popular
|
||||
initialization scheme, like so:
|
||||
|
||||
```java
|
||||
SDVariable weights = samediff.var("weights", new XavierInitScheme('c', 784, 10), DataType.FLOAT, 784, 10);
|
||||
```
|
||||
Now, the weights will be randomly initialized using the Xavier scheme. There are other ways to create and
|
||||
fill variables, you may see a full reference in our [javadoc](https://deeplearning4j.org/api/latest/).
|
||||
|
||||
fill variables: you may look them up in the 'known subclasses' section [of our javadoc](https://deeplearning4j.org/api/latest/org/nd4j/weightinit/WeightInitScheme.html").
|
||||
|
||||
### Constants
|
||||
|
||||
|
|
Loading…
Reference in New Issue