cavis/docs/samediff/templates/ops.md

276 lines
17 KiB
Markdown
Raw Normal View History

---
title: Operations in SameDiff
short_title: Ops
description: What kind of operations is there in `SameDiff` and how to use them
category: SameDiff
weight: 4
---
# SameDiff operations
Operations in `SameDiff` work mostly the way you'd expect them to. You take variables - in our framework, those are
objects of type `SDVariable` - apply operations to them, and thus produce new variables. Before we proceed to the
overview of the available operations, let us list some of their common properties.
## Common properties of operations
- Variables of any *variable type* may be used in any operation, as long as their *data types* match those that are
required by the operation (again, see our [variables](./samediff/variables) section for what variable types are). Most
often an operation will require its `SDVariable` to have a floating point data type.
- Variables created by operations have `ARRAY` variable type.
- For all operations, you may define a `String` name of your resulting variable, although for most operations this
is not obligatory. The name goes as the first argument in each operation, like so:
```java
SDVariable linear = weights.mmul("matrix_product", input).add(bias);
SDVariable output = sameDiff.nn.sigmoid("output", linear);
```
Named variables may be accessed from outside using a `SameDiff` method `getVariable(String name)`. For the code above,
this method will allow you to infer the value of both `output` as well as the result of `mmul` operation. Note that we
haven't even explicitly defined this result as a separate `SDVariable`, and yet a corresponding `SDVariable` will be
created internally and added to our instance of `SameDiff` under the `String` name `"matrix_product"`. In fact, a unique
`String` name is given to every `SDVariable` you produce by operations: if you don't give a name explicitly, it is
assigned to the resulting `SDVariable` automatically based on the operation's name.
## Overview of operations
The number of currently available operations, including overloads totals several hundreds, they range in complexity from s
imple additions and multiplications via producing outputs of convolutional layers to creation of dedicated recurrent
neural network modules, and much more. The sheer number of operations would've made it cumbersome to list them all on a
single page. So, if you are already looking for something specific, you'll be better off checking our
[javadoc](https://deeplearning4j.org/api/latest/), which already contains a detailed information on each operation, or
by simply browsing through autocompletion suggestions (if your IDE supports that). Here we rather try to give you an
idea of what operations you may expect to find and where to seek for them.
All operations may be split into two major branches: those which are methods of `SDVariable` and those of `SameDiff`
classes. Let us have a closer look at each:
### `SDVariable` operations
We have already seen `SDVariable` operations in previous examples, in expressions like
```java
SDVariable z = x.add(y);
```
where `x` and `y` are `SDVariable`'s.
Among `SDVariable` methods, you will find:
- `BLAS`-type operations to perform linear algebra: things like `add`, `neg`, `mul` (used for both scaling and elementwise
multiplication) and `mmul` (matrix multiplication), `dot`, `rdiv`, etc.;
- comparison operations like `gt` or `lte`, used both to compare each element to a fixed `double` value as well as for
elementwise comparison with another `SDVariable` of the same shape, and alike;
- basic reduction operations: things like `min`, `sum`, `prod` (product of elements in array), `mean`, `norm2`,
`argmax` (index of the maximal element), `squaredDifference` and so on, which may be taken along specified dimensions;
- basic statistics operations for computing mean and standard deviation along given dimensions: `mean` and `std`.
- operations for restructuring of the underlying array: `reshape` and `permute`, along with `shape` - an operation that
delivers the shape of a variable as an array of integers - the dimension sizes;
`SDVariable` operations may be easily chained, producing lines like:
```java
SDVariable regressionCost = weights.mmul(input).add("regression_prediction", bias).squaredDifference(labels);
```
### `SameDiff` operations
The operations that are methods of `SameDiff` are called via one of 6 auxiliary objects present in each `SameDiff`,
which split all operations into 6 uneven branches:
- `math` - for general mathematical operations;
- `random` - creating different random number generators;
- `nn` - general neural network tools;
- `cnn` - convolutional neural network tools;
- `rnn` - recurrent neural network tools;
- `loss` - loss functions;
In order to use a particular operation, you need to call one of these 6 objects form your `SameDiff` instance, and then
an operation itself, like that:
```java
SDVariable y = sameDiff.math.sin(x);
```
or
```java
SDVariable y = samediff.math().sin(x);
```
The distribution of operations among the auxiliary objects has no structural bearing beyond organizing things in a more
intuitive way. So, for instance, if you're not sure whether to seek for, say, `tanh` operation in `math` or in `nn`,
don't worry: we have it in both.
Let us briefly describe what kinds of operations you may expect to find in each of the branches:
### `math` - basic mathematical operations
Math module mostly consists of general mathematical functions and statistics methods. Those include:
- power functions, e.g. `square`, `cube`, `sqrt`, `pow`, `reciprocal` etc.;
- trigonometric functions, e.g. `sin`, `atan` etc.;
- exponential/hyperbolic functions, like `exp`, `sinh`, `log`, `atanh` etc.;
- miscellaneous elementwise operations, like taking absolute value, rounding and clipping, such as `abs`, `sign`,
`ceil`, `round`, `clipByValue`, `clipByNorm` etc.;
- reductions along specified dimensions: `min`, `amax`, `mean`, `asum`, `logEntropy`, and similar;
- distance (reduction) operations, such as `euclideanDistance`, `manhattanDistance`, `jaccardDistance`, `cosineDistance`,
`hammingDistance`, `cosineSimilarity`, along specified dimensions, for two identically shaped `SDVariables`;
- specific matrix operations: `matrixInverse`, `matrixDeterminant`, `diag` (creating a diagonal matrix), `trace`, `eye`
(creating identity matrix with variable dimensions), and several others;
- more statistics operations: `standardize`, `moment`, `normalizeMoments`, `erf` and `erfc` (Gaussian error function and
its complementary);
- counting and indexing reductions: methods like `conuntZero` (number of zero elements), `iamin` (index of the element
with the smallest absolute value), `firstIndex` (an index of the first element satisfying a specified `Condition` function);
- reductions indicating properties of the underlying arrays. These include e.g. `isNaN` (elementwise checking), `isMax`
(shape-preserving along specified dimensions), `isNonDecreasing` (reduction along specified dimensions);
- elementwise logical operations: `and`, `or`, `xor`, `not`.
Most operations in `math` have very simple structure, and are inferred like that:
```java
SDVariable activation = sameDiff.math.cube(input);
```
Operations may be chained, although in a more cumbersome way in comparison to the `SDVariable` operations, e.g.:
```java
SDVariable matrixNorm1 = sameDiff.math.max(sameDiff.math.sum(sameDiff.math.abs(matrix), 1));
```
Observe that the (integer) argument `1` in the `sum` operation tells us that we have to take maximum absolute value
along the `1`'s dimension, i.e. the column of the matrix.
### `random` - creating random values Random
These operations create variables whose underlying arrays will be filled with random numbers following some distribution
- say, Bernoulli, normal, binomial etc.. These values will be reset at each iteration. If you wish, for instance,
to create a variable that will add a Gaussian noise to entries of the MNIST database, you may do something like:
```java
double mean = 0.;
double deviation = 0.05;
long[] shape = new long[28, 28];
SDVariable noise_mnist = sameDiff.random.normal("noise_mnist", mean, deviation, shape);
```
The shape of you random variable may vary. Suppose, for instance, that you have audio signals of varying length, and you
want to add noise to them. Then, you need to specify an `SDVariable`, say, `windowShape` with an integer
[data type](./samediff/variabeles/datatype!!!), and proceed like that
```java
SDVariabel noise_audio = sameDiff.random.normal("noise_audio", mean, deviation, windowShape);
```
### `nn` - general neural network tools
Here we store methods for neural networks that are not necessarily associated with convolutional ones. Among them are
- creation of dense linear and ReLU layers (with or without bias), and separate bias addition: `linear`, `reluLayer`,
`biasAdd`;
- popular activation functions, e.g. `relu`, `sigmoid`, `tanh`, `softmax` as well as their less used versions like
`leakyRelu`, `elu`, `hardTanh`, and many more;
- padding for 2d arrays with method `pad`, supporting several padding types, with both constant and variable padding width;
- explosion/overfitting prevention, such as `dropout`, `layerNorm` and `batchNorm` for layer resp. batch normalization;
Some methods were created for internal use, but are openly available. Those include:
- derivatives for several popular activation functions - these are mostly designed for speeding up
backpropagation;
- attention modules - basically, building blocks for recurrent neural networks we shall discuss below.
While activations in `nn` are fairly simple, other operations become more involved. Say, to create a linear
or a ReLU layer, up to three predefined `SDVariable` objects may be required, as in the following code:
```java
SDVariable denseReluLayer = sameDiff.nn.reluLayer(input, weights, bias);
```
where `input`, `weights` and `bias` need to have dimensions suiting each other.
To create, say, a dense layer with softmax activation, you may proceed as follows:
```java
SDVariable linear = sameDiff.nn.linear(input, weight, bias);
SDVariable output = sameDiff.nn.softmax(linear);
```
### `cnn` - convolutional neural networks tools
The `cnn` module contains layers and operations typically used in convolutional neural networks -
different activations may be picked up from the `nn` module. Among `cnn` operations we currently have creation of:
- linear convolution layers, currently for tensors of dimension up to 3 (minibatch not included): `conv1d`, `conv2d`,
`conv3d`, `depthWiseConv2d`, `separableConv2D`/`sconv2d`;
- linear deconvolution layers, currently `deconv1d`, `deconv2d`, `deconv3d`;
- pooling, e.g. `maxPoooling2D`, `avgPooling1D`;
- specialized reshaping methods: `batchToSpace`, `spaceToDepth`, `col2Im` and alike;
- upsampling, currently presented by `upsampling2d` operation;
- local response normalization: `localResponseNormalization`, currently for 2d convolutional layers only;
Convolution and deconvolution operations are specified by a number of static parameters like kernel size,
dilation, having or not having bias etc.. To facilitate the creation process, we pack the required parameters into
easily constructable and alterable configuration objects. Desired activations may be borrowed from the `nn` module. So,
for example, if we want to create a 3x3 convolutional layer with `relu` activation, we may proceed as follows:
```java
Conv2DConfig config2d = new Conv2DConfig().builder().kW(3).kH(3).pW(2).pH(2).build();
SDVariable convolution2dLinear = sameDiff.cnn.conv2d(input, weights, config2d);
SDVariable convolution2dOutput = sameDiff.nn.relu(convolution2dLinear);
```
In the first line, we construct a convolution configuration using its default constructor. Then we specify the
kernel size (this is mandatory) and optional padding size, keeping other settings default (unit stride, no
dilation, no bias, `NCHW` data format). We then employ this configuration to create a linear convolution with predefined
`SDVariables` for input and weights; the shape of `weights` is to be tuned to that of `input` and to `config`
beforehand. Thus, if in the above example `input` has shape, say, `[-1, nIn, height, width]`, then `weights` are to have
a form `[nIn, nOut, 3, 3]` (because we have 3x3 convolution kernel). The shape of the resulting variable `convoluton2d`
will be predetermined by these parameters (in our case, it will be `[-1, nOut, height, width]`). Finally, in the last
line we apply a `relu` activation.
### `rnn` - Recurrent neural networks
This module contains arguably the most sophisticated methods in the framework. Currently it allows you to create
- simple recurrent units, using `sru` and `sruCell` methods;
- LSTM units, using `lstmCell`, `lstmBlockCell` and `lstmLayer`;
- Graves LSTM units, using `gru` methods.
As of now, recurrent operations require special configuration objects as input, in which you need to pack all the
variables that will be used in a unit. This is subject to change in the later versions. For instance, to
create a simple recurrent unit, you need to proceed like that:
```java
SRUConfiguration sruConfig = new SRUConfiguration(input, weights, bias, init);
SDVariable sruOutput = samediff.rnn().sru(sruConfig);
```
Here, the arguments in the `SRUConfiguration` constructor are variables that are to be defined beforehand. Obviously
their shapes should be matching, and these shapes predetermine the shape of `output`.
### `loss` - Loss functions
In this branch we keep common loss functions. Most loss functions may be created quite simply, like that:
```java
SDVariable logLoss = sameDiff.loss.logLoss("logLoss", label, predictions);
```
where `labels` and `predictions` are `SDVariable`'s. A `String` name is a mandatory parameter in most `loss` methods,
yet it may be set to `null` - in this case, the name will be generated automatically. You may also create weighted loss
functions by adding another `SDVariable` parameters containing weights, as well as specify a reduction method (see below)
for the loss over the minibatch. Thus, a full-fledged `logLoss` operation may
look like:
```java
SDVariable wLogLossMean = sameDiff.loss.logLoss("wLogLossMean", label, predictions, weights, LossReduce.MEAN_BY_WEIGHT);
```
Some loss operations may allow/require further arguments, depending on their type: e.g. a dimension along which the
loss is to be computed (as in `cosineLoss`), or some real-valued parameters.
As for reduction methods, over the minibatch, there are currently 4 of them available. Thus, initially loss values for
each sample of the minibatch are computed, then they are multiplied by weights (if specified), and finally one of the
following routines takes place:
- `NONE` - leaving the resulting (weighted)loss values as-is; the result is an `INDArray` with the length of the
minibatch: `sum_loss = sum(weights * loss_per_sample)`.
- `SUM` - summing the values, producing a scalar result.
- `MEAN_BY_WEIGHT` - first computes the sum as above, and then divides it by the sum of all weights, producing a scalar
value: `mean_loss = sum(weights * loss_per_sample) / sum(weights)`. If weights are not
specified, they all are set to `1.0` and this reduction is equivalent to getting mean loss value over the minibatch.
- `MEAN_BY_NONZERO_WEIGHT_COUNT` - divides the weighted sum by the number of nonzero weight, producing a scalar:
`mean_count_loss = sum(weights * loss_per_sample) / count(weights != 0)`. Useful e.g. when you want to compute the mean
only over a subset of *valid* samples, setting weights by either `0.` or `1.`. When weights are not given, it just
produces mean, and thus equivalent to `MEAN_BY_WEIGHT`.
## The *don'ts* of operations
In order for `SameDiff` operations to work properly, several main rules are to be upheld. Failing to do so may result in
an exception or, worse even, to a working code producing undesired results. All the things we mention in the current
section describe what **you better not** do.
- All variables in an operation have to belong to the same instance of `SamdeDiff` (see the [variables](./samediff/variables)
section on how variables are added to a `SameDiff` instance). In other words, **you better not**
```java
SDVariable x = sameDiff0.var(DataType.FLOAT, 1);
SDVariable y = sameDiff1.placeHolder(DataType.FLOAT, 1);
SDVariable z = x.add(y);
```
- At best, a new variable is to be created for a result of an operation or a chain of operations. In other words, **you
better not** redefine existing variables **and better not** leave operations returning no result. In other words, try to
**avoid** the code like this:
```java
SDVariable z = x.add(y);
//DON'T!!!
z.mul(2);
x = z.mul(y);
```
A properly working version of the above code (if we've desired to obtain 2xy+2y<sup>2</sup> in an unusual way) will be
```java
SDVariable z = x.add(y);
SDVariable _2z = z.mul(2);
w = _2z.mul(y);
```
To learn more why it functions like that, see our [graph section](./samediff/graph).