diff --git a/docs/samediff/templates/ops.md b/docs/samediff/templates/ops.md index 12a04edb5..81f2b3900 100644 --- a/docs/samediff/templates/ops.md +++ b/docs/samediff/templates/ops.md @@ -1,3 +1,275 @@ -# SameDiff operations - -{{autogenerated}} \ No newline at end of file +--- +title: Operations in SameDiff +short_title: Ops +description: What kind of operations is there in `SameDiff` and how to use them +category: SameDiff +weight: 4 +--- + +# SameDiff operations + +Operations in `SameDiff` work mostly the way you'd expect them to. You take variables - in our framework, those are +objects of type `SDVariable` - apply operations to them, and thus produce new variables. Before we proceed to the +overview of the available operations, let us list some of their common properties. + +## Common properties of operations + +- Variables of any *variable type* may be used in any operation, as long as their *data types* match those that are +required by the operation (again, see our [variables](./samediff/variables) section for what variable types are). Most +often an operation will require its `SDVariable` to have a floating point data type. +- Variables created by operations have `ARRAY` variable type. +- For all operations, you may define a `String` name of your resulting variable, although for most operations this +is not obligatory. The name goes as the first argument in each operation, like so: +```java +SDVariable linear = weights.mmul("matrix_product", input).add(bias); +SDVariable output = sameDiff.nn.sigmoid("output", linear); +``` +Named variables may be accessed from outside using a `SameDiff` method `getVariable(String name)`. For the code above, +this method will allow you to infer the value of both `output` as well as the result of `mmul` operation. Note that we +haven't even explicitly defined this result as a separate `SDVariable`, and yet a corresponding `SDVariable` will be +created internally and added to our instance of `SameDiff` under the `String` name `"matrix_product"`. In fact, a unique +`String` name is given to every `SDVariable` you produce by operations: if you don't give a name explicitly, it is +assigned to the resulting `SDVariable` automatically based on the operation's name. + + +## Overview of operations +The number of currently available operations, including overloads totals several hundreds, they range in complexity from s +imple additions and multiplications via producing outputs of convolutional layers to creation of dedicated recurrent +neural network modules, and much more. The sheer number of operations would've made it cumbersome to list them all on a +single page. So, if you are already looking for something specific, you'll be better off checking our +[javadoc](https://deeplearning4j.org/api/latest/), which already contains a detailed information on each operation, or +by simply browsing through autocompletion suggestions (if your IDE supports that). Here we rather try to give you an +idea of what operations you may expect to find and where to seek for them. + +All operations may be split into two major branches: those which are methods of `SDVariable` and those of `SameDiff` +classes. Let us have a closer look at each: + +### `SDVariable` operations +We have already seen `SDVariable` operations in previous examples, in expressions like +```java +SDVariable z = x.add(y); +``` +where `x` and `y` are `SDVariable`'s. + +Among `SDVariable` methods, you will find: +- `BLAS`-type operations to perform linear algebra: things like `add`, `neg`, `mul` (used for both scaling and elementwise +multiplication) and `mmul` (matrix multiplication), `dot`, `rdiv`, etc.; +- comparison operations like `gt` or `lte`, used both to compare each element to a fixed `double` value as well as for +elementwise comparison with another `SDVariable` of the same shape, and alike; +- basic reduction operations: things like `min`, `sum`, `prod` (product of elements in array), `mean`, `norm2`, +`argmax` (index of the maximal element), `squaredDifference` and so on, which may be taken along specified dimensions; +- basic statistics operations for computing mean and standard deviation along given dimensions: `mean` and `std`. +- operations for restructuring of the underlying array: `reshape` and `permute`, along with `shape` - an operation that +delivers the shape of a variable as an array of integers - the dimension sizes; + +`SDVariable` operations may be easily chained, producing lines like: +```java +SDVariable regressionCost = weights.mmul(input).add("regression_prediction", bias).squaredDifference(labels); +``` + +### `SameDiff` operations +The operations that are methods of `SameDiff` are called via one of 6 auxiliary objects present in each `SameDiff`, +which split all operations into 6 uneven branches: +- `math` - for general mathematical operations; +- `random` - creating different random number generators; +- `nn` - general neural network tools; +- `cnn` - convolutional neural network tools; +- `rnn` - recurrent neural network tools; +- `loss` - loss functions; +In order to use a particular operation, you need to call one of these 6 objects form your `SameDiff` instance, and then +an operation itself, like that: +```java +SDVariable y = sameDiff.math.sin(x); +``` +or +```java +SDVariable y = samediff.math().sin(x); +``` +The distribution of operations among the auxiliary objects has no structural bearing beyond organizing things in a more +intuitive way. So, for instance, if you're not sure whether to seek for, say, `tanh` operation in `math` or in `nn`, +don't worry: we have it in both. + +Let us briefly describe what kinds of operations you may expect to find in each of the branches: + +### `math` - basic mathematical operations +Math module mostly consists of general mathematical functions and statistics methods. Those include: + +- power functions, e.g. `square`, `cube`, `sqrt`, `pow`, `reciprocal` etc.; +- trigonometric functions, e.g. `sin`, `atan` etc.; +- exponential/hyperbolic functions, like `exp`, `sinh`, `log`, `atanh` etc.; +- miscellaneous elementwise operations, like taking absolute value, rounding and clipping, such as `abs`, `sign`, +`ceil`, `round`, `clipByValue`, `clipByNorm` etc.; +- reductions along specified dimensions: `min`, `amax`, `mean`, `asum`, `logEntropy`, and similar; +- distance (reduction) operations, such as `euclideanDistance`, `manhattanDistance`, `jaccardDistance`, `cosineDistance`, +`hammingDistance`, `cosineSimilarity`, along specified dimensions, for two identically shaped `SDVariables`; +- specific matrix operations: `matrixInverse`, `matrixDeterminant`, `diag` (creating a diagonal matrix), `trace`, `eye` +(creating identity matrix with variable dimensions), and several others; +- more statistics operations: `standardize`, `moment`, `normalizeMoments`, `erf` and `erfc` (Gaussian error function and +its complementary); +- counting and indexing reductions: methods like `conuntZero` (number of zero elements), `iamin` (index of the element +with the smallest absolute value), `firstIndex` (an index of the first element satisfying a specified `Condition` function); +- reductions indicating properties of the underlying arrays. These include e.g. `isNaN` (elementwise checking), `isMax` +(shape-preserving along specified dimensions), `isNonDecreasing` (reduction along specified dimensions); +- elementwise logical operations: `and`, `or`, `xor`, `not`. + +Most operations in `math` have very simple structure, and are inferred like that: +```java +SDVariable activation = sameDiff.math.cube(input); +``` +Operations may be chained, although in a more cumbersome way in comparison to the `SDVariable` operations, e.g.: +```java +SDVariable matrixNorm1 = sameDiff.math.max(sameDiff.math.sum(sameDiff.math.abs(matrix), 1)); +``` +Observe that the (integer) argument `1` in the `sum` operation tells us that we have to take maximum absolute value +along the `1`'s dimension, i.e. the column of the matrix. + +### `random` - creating random values Random +These operations create variables whose underlying arrays will be filled with random numbers following some distribution +- say, Bernoulli, normal, binomial etc.. These values will be reset at each iteration. If you wish, for instance, +to create a variable that will add a Gaussian noise to entries of the MNIST database, you may do something like: +```java +double mean = 0.; +double deviation = 0.05; +long[] shape = new long[28, 28]; +SDVariable noise_mnist = sameDiff.random.normal("noise_mnist", mean, deviation, shape); +``` +The shape of you random variable may vary. Suppose, for instance, that you have audio signals of varying length, and you +want to add noise to them. Then, you need to specify an `SDVariable`, say, `windowShape` with an integer +[data type](./samediff/variabeles/datatype!!!), and proceed like that +```java +SDVariabel noise_audio = sameDiff.random.normal("noise_audio", mean, deviation, windowShape); +``` + +### `nn` - general neural network tools +Here we store methods for neural networks that are not necessarily associated with convolutional ones. Among them are +- creation of dense linear and ReLU layers (with or without bias), and separate bias addition: `linear`, `reluLayer`, +`biasAdd`; +- popular activation functions, e.g. `relu`, `sigmoid`, `tanh`, `softmax` as well as their less used versions like +`leakyRelu`, `elu`, `hardTanh`, and many more; +- padding for 2d arrays with method `pad`, supporting several padding types, with both constant and variable padding width; +- explosion/overfitting prevention, such as `dropout`, `layerNorm` and `batchNorm` for layer resp. batch normalization; + +Some methods were created for internal use, but are openly available. Those include: +- derivatives for several popular activation functions - these are mostly designed for speeding up +backpropagation; +- attention modules - basically, building blocks for recurrent neural networks we shall discuss below. + +While activations in `nn` are fairly simple, other operations become more involved. Say, to create a linear +or a ReLU layer, up to three predefined `SDVariable` objects may be required, as in the following code: +```java +SDVariable denseReluLayer = sameDiff.nn.reluLayer(input, weights, bias); +``` +where `input`, `weights` and `bias` need to have dimensions suiting each other. + +To create, say, a dense layer with softmax activation, you may proceed as follows: +```java +SDVariable linear = sameDiff.nn.linear(input, weight, bias); +SDVariable output = sameDiff.nn.softmax(linear); +``` + +### `cnn` - convolutional neural networks tools +The `cnn` module contains layers and operations typically used in convolutional neural networks - +different activations may be picked up from the `nn` module. Among `cnn` operations we currently have creation of: +- linear convolution layers, currently for tensors of dimension up to 3 (minibatch not included): `conv1d`, `conv2d`, +`conv3d`, `depthWiseConv2d`, `separableConv2D`/`sconv2d`; +- linear deconvolution layers, currently `deconv1d`, `deconv2d`, `deconv3d`; +- pooling, e.g. `maxPoooling2D`, `avgPooling1D`; +- specialized reshaping methods: `batchToSpace`, `spaceToDepth`, `col2Im` and alike; +- upsampling, currently presented by `upsampling2d` operation; +- local response normalization: `localResponseNormalization`, currently for 2d convolutional layers only; + +Convolution and deconvolution operations are specified by a number of static parameters like kernel size, +dilation, having or not having bias etc.. To facilitate the creation process, we pack the required parameters into +easily constructable and alterable configuration objects. Desired activations may be borrowed from the `nn` module. So, +for example, if we want to create a 3x3 convolutional layer with `relu` activation, we may proceed as follows: +```java +Conv2DConfig config2d = new Conv2DConfig().builder().kW(3).kH(3).pW(2).pH(2).build(); +SDVariable convolution2dLinear = sameDiff.cnn.conv2d(input, weights, config2d); +SDVariable convolution2dOutput = sameDiff.nn.relu(convolution2dLinear); +``` +In the first line, we construct a convolution configuration using its default constructor. Then we specify the +kernel size (this is mandatory) and optional padding size, keeping other settings default (unit stride, no +dilation, no bias, `NCHW` data format). We then employ this configuration to create a linear convolution with predefined +`SDVariables` for input and weights; the shape of `weights` is to be tuned to that of `input` and to `config` +beforehand. Thus, if in the above example `input` has shape, say, `[-1, nIn, height, width]`, then `weights` are to have +a form `[nIn, nOut, 3, 3]` (because we have 3x3 convolution kernel). The shape of the resulting variable `convoluton2d` +will be predetermined by these parameters (in our case, it will be `[-1, nOut, height, width]`). Finally, in the last +line we apply a `relu` activation. + +### `rnn` - Recurrent neural networks + +This module contains arguably the most sophisticated methods in the framework. Currently it allows you to create +- simple recurrent units, using `sru` and `sruCell` methods; +- LSTM units, using `lstmCell`, `lstmBlockCell` and `lstmLayer`; +- Graves LSTM units, using `gru` methods. + +As of now, recurrent operations require special configuration objects as input, in which you need to pack all the +variables that will be used in a unit. This is subject to change in the later versions. For instance, to +create a simple recurrent unit, you need to proceed like that: +```java +SRUConfiguration sruConfig = new SRUConfiguration(input, weights, bias, init); +SDVariable sruOutput = samediff.rnn().sru(sruConfig); +``` +Here, the arguments in the `SRUConfiguration` constructor are variables that are to be defined beforehand. Obviously +their shapes should be matching, and these shapes predetermine the shape of `output`. + +### `loss` - Loss functions +In this branch we keep common loss functions. Most loss functions may be created quite simply, like that: +```java +SDVariable logLoss = sameDiff.loss.logLoss("logLoss", label, predictions); +``` +where `labels` and `predictions` are `SDVariable`'s. A `String` name is a mandatory parameter in most `loss` methods, +yet it may be set to `null` - in this case, the name will be generated automatically. You may also create weighted loss +functions by adding another `SDVariable` parameters containing weights, as well as specify a reduction method (see below) +for the loss over the minibatch. Thus, a full-fledged `logLoss` operation may +look like: +```java +SDVariable wLogLossMean = sameDiff.loss.logLoss("wLogLossMean", label, predictions, weights, LossReduce.MEAN_BY_WEIGHT); +``` +Some loss operations may allow/require further arguments, depending on their type: e.g. a dimension along which the +loss is to be computed (as in `cosineLoss`), or some real-valued parameters. + +As for reduction methods, over the minibatch, there are currently 4 of them available. Thus, initially loss values for +each sample of the minibatch are computed, then they are multiplied by weights (if specified), and finally one of the +following routines takes place: +- `NONE` - leaving the resulting (weighted)loss values as-is; the result is an `INDArray` with the length of the +minibatch: `sum_loss = sum(weights * loss_per_sample)`. +- `SUM` - summing the values, producing a scalar result. +- `MEAN_BY_WEIGHT` - first computes the sum as above, and then divides it by the sum of all weights, producing a scalar +value: `mean_loss = sum(weights * loss_per_sample) / sum(weights)`. If weights are not +specified, they all are set to `1.0` and this reduction is equivalent to getting mean loss value over the minibatch. +- `MEAN_BY_NONZERO_WEIGHT_COUNT` - divides the weighted sum by the number of nonzero weight, producing a scalar: +`mean_count_loss = sum(weights * loss_per_sample) / count(weights != 0)`. Useful e.g. when you want to compute the mean +only over a subset of *valid* samples, setting weights by either `0.` or `1.`. When weights are not given, it just +produces mean, and thus equivalent to `MEAN_BY_WEIGHT`. + + +## The *don'ts* of operations + +In order for `SameDiff` operations to work properly, several main rules are to be upheld. Failing to do so may result in +an exception or, worse even, to a working code producing undesired results. All the things we mention in the current +section describe what **you better not** do. + +- All variables in an operation have to belong to the same instance of `SamdeDiff` (see the [variables](./samediff/variables) +section on how variables are added to a `SameDiff` instance). In other words, **you better not** +```java +SDVariable x = sameDiff0.var(DataType.FLOAT, 1); +SDVariable y = sameDiff1.placeHolder(DataType.FLOAT, 1); +SDVariable z = x.add(y); +``` +- At best, a new variable is to be created for a result of an operation or a chain of operations. In other words, **you +better not** redefine existing variables **and better not** leave operations returning no result. In other words, try to +**avoid** the code like this: +```java +SDVariable z = x.add(y); +//DON'T!!! +z.mul(2); +x = z.mul(y); +``` +A properly working version of the above code (if we've desired to obtain 2xy+2y2 in an unusual way) will be +```java +SDVariable z = x.add(y); +SDVariable _2z = z.mul(2); +w = _2z.mul(y); +``` + To learn more why it functions like that, see our [graph section](./samediff/graph). diff --git a/docs/samediff/templates/variables.md b/docs/samediff/templates/variables.md index 3a309c698..5b52900ca 100644 --- a/docs/samediff/templates/variables.md +++ b/docs/samediff/templates/variables.md @@ -60,6 +60,7 @@ throw an exception otherwise. Also, as we shall discuss just below, all the `SDV supposed to be of floating point type. ## Common features of variables + Before we go to the differences between variables, let us first look at the properties they all share - All variables are ultimately derived from an instance of `SameDiff`, serving as parts of its [graph](./samediff/graphs). In fact, each variable has a `SameDiff` as one of its fields. @@ -87,13 +88,22 @@ SDVariable weights = samediff.var("weights", DataType.FLOAT, 784, 10); ``` adds a variable constituting of a 784x10 array of `float` numbers - weights for a single layer MNIST perceptron in this case - to a pre-existing `SameDiff` instance `samediff`. -However, this way the values within a variable will be set as zeros. If we wish them to be filled, say, with random -numbers, we need to add a weight initialization scheme, for instance like that: + +However, this way the values within a variable will be set as zeros. You may also create a variable with values from +a preset `INDArray`. Say +```java +SDVariable weights = samediff.var("weigths", Nd4j.nrand(784, 10).div(28)); +``` +will create a variable filled with normally distributed randomly generated numbers with variance `1/28`. You may put +any other array creation methods instead of `nrand`, or any preset array, of course. Also, you may use some popular +initialization scheme, like so: + ```java SDVariable weights = samediff.var("weights", new XavierInitScheme('c', 784, 10), DataType.FLOAT, 784, 10); ``` Now, the weights will be randomly initialized using the Xavier scheme. There are other ways to create and -fill variables, you may see a full reference in our [javadoc](https://deeplearning4j.org/api/latest/). + +fill variables: you may look them up in the 'known subclasses' section [of our javadoc](https://deeplearning4j.org/api/latest/org/nd4j/weightinit/WeightInitScheme.html"). ### Constants