diff --git a/docs/samediff/templates/ops.md b/docs/samediff/templates/ops.md
index 12a04edb5..81f2b3900 100644
--- a/docs/samediff/templates/ops.md
+++ b/docs/samediff/templates/ops.md
@@ -1,3 +1,275 @@
-# SameDiff operations
-
-{{autogenerated}}
\ No newline at end of file
+---
+title: Operations in SameDiff
+short_title: Ops
+description: What kind of operations is there in `SameDiff` and how to use them
+category: SameDiff
+weight: 4
+---
+
+# SameDiff operations
+
+Operations in `SameDiff` work mostly the way you'd expect them to. You take variables - in our framework, those are 
+objects of type `SDVariable` - apply operations to them, and thus produce new variables. Before we proceed to the 
+overview of the available operations, let us list some of their common properties.
+
+## Common properties of operations
+
+- Variables of any *variable type* may be used in any operation, as long as their *data types* match those that are 
+required by the operation (again, see our [variables](./samediff/variables) section for what variable types are). Most
+often an operation will require its `SDVariable` to have a floating point data type.
+- Variables created by operations have `ARRAY` variable type.
+- For all operations, you may define a `String` name of your resulting variable, although for most operations this
+is not obligatory. The name goes as the first argument in each operation, like so:
+```java
+SDVariable linear = weights.mmul("matrix_product", input).add(bias); 
+SDVariable output = sameDiff.nn.sigmoid("output", linear);
+``` 
+Named variables may be accessed from outside using a `SameDiff` method `getVariable(String name)`. For the code above, 
+this method will allow you to infer the value of both `output` as well as the result of `mmul` operation. Note that we 
+haven't even explicitly defined this result as a separate `SDVariable`, and yet a corresponding `SDVariable` will be 
+created internally and added to our instance of `SameDiff` under the `String` name `"matrix_product"`. In fact, a unique
+`String` name is given to every `SDVariable` you produce by operations: if you don't give a name explicitly, it is 
+assigned to the resulting `SDVariable` automatically based on the operation's name. 
+
+
+## Overview of operations
+The number of currently available operations, including overloads totals several hundreds, they range in complexity from s
+imple additions and multiplications via producing outputs of convolutional layers to creation of dedicated recurrent 
+neural network modules, and much more. The sheer number of operations would've made it cumbersome to list them all on a 
+single page. So, if you are already looking for something specific, you'll be better off checking our 
+[javadoc](https://deeplearning4j.org/api/latest/), which already contains a detailed information on each operation, or 
+by simply browsing through autocompletion suggestions (if your IDE supports that). Here we rather try to give you an 
+idea of what operations you may expect to find and where to seek for them.
+
+All operations may be split into two major branches: those which are methods of `SDVariable` and those of `SameDiff` 
+classes. Let us have a closer look at each:
+
+### `SDVariable` operations
+We have already seen `SDVariable` operations in previous examples, in expressions like 
+```java
+SDVariable z = x.add(y);
+```
+where `x` and `y` are `SDVariable`'s. 
+
+Among `SDVariable` methods, you will find:
+- `BLAS`-type operations to perform linear algebra: things like `add`, `neg`, `mul` (used for both scaling and elementwise
+multiplication) and `mmul` (matrix multiplication), `dot`, `rdiv`,  etc.;
+- comparison operations like `gt` or `lte`, used both to compare each element to a fixed `double` value as well as for 
+elementwise comparison with another `SDVariable` of the same shape, and alike;
+- basic reduction operations: things like `min`, `sum`, `prod` (product of elements in array), `mean`, `norm2`,
+`argmax` (index of the maximal element), `squaredDifference` and so on, which may be taken along specified dimensions; 
+- basic statistics operations for computing mean and standard deviation along given dimensions: `mean` and `std`. 
+- operations for restructuring of the underlying array: `reshape` and `permute`, along with `shape` - an operation that 
+delivers the shape of a variable as an array of integers - the dimension sizes; 
+
+`SDVariable` operations may be easily chained, producing lines like:
+```java
+SDVariable regressionCost = weights.mmul(input).add("regression_prediction", bias).squaredDifference(labels);
+```
+
+### `SameDiff` operations
+The operations that are methods of `SameDiff` are called via one of 6 auxiliary objects present in each `SameDiff`, 
+which split all operations into 6 uneven branches:
+- `math` - for general mathematical operations;
+- `random` - creating different random number generators;
+- `nn` - general neural network tools;
+- `cnn` - convolutional neural network tools;
+- `rnn` - recurrent neural network tools;
+- `loss` - loss functions;
+In order to use a particular operation, you need to call one of these 6 objects form your `SameDiff` instance, and then 
+an operation itself, like that:
+```java
+SDVariable y = sameDiff.math.sin(x);
+```
+or 
+```java
+SDVariable y = samediff.math().sin(x);
+```
+The distribution of operations among the auxiliary objects has no structural bearing beyond organizing things in a more 
+intuitive way. So, for instance, if you're not sure whether to seek for, say, `tanh` operation in `math` or in `nn`, 
+don't worry: we have it in both. 
+
+Let us briefly describe what kinds of operations you may expect to find in each of the branches: 
+
+### `math` - basic mathematical operations
+Math module mostly consists of general mathematical functions and statistics methods. Those include:
+
+- power functions, e.g. `square`, `cube`, `sqrt`, `pow`, `reciprocal` etc.;
+- trigonometric functions, e.g. `sin`, `atan` etc.;
+- exponential/hyperbolic functions, like `exp`, `sinh`, `log`, `atanh` etc.;
+- miscellaneous elementwise operations, like taking absolute value, rounding and clipping, such as `abs`, `sign`, 
+`ceil`, `round`, `clipByValue`, `clipByNorm` etc.; 
+- reductions along specified dimensions: `min`, `amax`, `mean`, `asum`, `logEntropy`, and similar; 
+- distance (reduction) operations, such as `euclideanDistance`, `manhattanDistance`, `jaccardDistance`, `cosineDistance`, 
+`hammingDistance`, `cosineSimilarity`, along specified dimensions, for two identically shaped `SDVariables`;
+- specific matrix operations: `matrixInverse`, `matrixDeterminant`, `diag` (creating a diagonal matrix), `trace`, `eye` 
+(creating identity matrix with variable dimensions), and several others;
+- more statistics operations: `standardize`, `moment`, `normalizeMoments`, `erf` and `erfc` (Gaussian error function and
+its complementary);
+- counting and indexing reductions: methods like `conuntZero` (number of zero elements), `iamin` (index of the element 
+with the smallest absolute value), `firstIndex` (an index of the first element satisfying a specified `Condition` function);
+- reductions indicating properties of the underlying arrays. These include e.g. `isNaN` (elementwise checking), `isMax` 
+(shape-preserving along specified dimensions), `isNonDecreasing` (reduction along specified dimensions);
+- elementwise logical operations: `and`, `or`, `xor`, `not`.
+
+Most operations in `math` have very simple structure, and are inferred like that:
+```java
+SDVariable activation = sameDiff.math.cube(input);
+```
+Operations may be chained, although in a more cumbersome way in comparison to the `SDVariable` operations, e.g.:
+```java
+SDVariable matrixNorm1 = sameDiff.math.max(sameDiff.math.sum(sameDiff.math.abs(matrix), 1));
+```
+Observe that the (integer) argument `1` in the `sum` operation tells us that we have to take maximum absolute value 
+along the `1`'s dimension, i.e. the column of the matrix.
+
+### `random` - creating random values Random
+These operations create variables whose underlying arrays will be filled with random numbers following some distribution 
+- say, Bernoulli, normal, binomial etc.. These values will be reset at each iteration. If you wish, for instance,
+to create a variable that will add a Gaussian noise to entries of the MNIST database, you may do something like:
+```java
+double mean = 0.;
+double deviation = 0.05;
+long[] shape = new long[28, 28];
+SDVariable noise_mnist = sameDiff.random.normal("noise_mnist", mean, deviation, shape);
+```
+The shape of you random variable may vary. Suppose, for instance, that you have audio signals of varying length, and you
+want to add noise to them. Then, you need to specify an `SDVariable`, say, `windowShape` with an integer 
+[data type](./samediff/variabeles/datatype!!!), and proceed like that
+```java
+SDVariabel noise_audio = sameDiff.random.normal("noise_audio", mean, deviation, windowShape);
+```
+
+### `nn` - general neural network tools
+Here we store methods for neural networks that are not necessarily associated with convolutional ones. Among them are
+- creation of dense linear and ReLU layers (with or without bias), and separate bias addition: `linear`, `reluLayer`, 
+`biasAdd`;
+- popular activation functions, e.g. `relu`, `sigmoid`, `tanh`, `softmax` as well as their less used versions like 
+`leakyRelu`, `elu`, `hardTanh`, and many more;
+- padding for 2d arrays with method `pad`, supporting several padding types, with both constant and variable padding width;
+- explosion/overfitting prevention, such as `dropout`, `layerNorm`  and `batchNorm` for layer resp. batch normalization;
+
+Some methods were created for internal use, but are openly available. Those include: 
+- derivatives for several popular activation functions - these are mostly designed for speeding up 
+backpropagation;
+- attention modules - basically, building blocks for recurrent neural networks we shall discuss below.
+
+While activations in `nn` are fairly simple, other operations become more involved. Say, to create a linear 
+or a ReLU layer, up to three predefined `SDVariable` objects may be required, as in the following code:
+```java
+SDVariable denseReluLayer = sameDiff.nn.reluLayer(input, weights, bias);
+```
+where `input`, `weights` and `bias` need to have dimensions suiting each other. 
+
+To create, say, a dense layer with softmax activation, you may proceed as follows: 
+```java
+SDVariable linear = sameDiff.nn.linear(input, weight, bias);
+SDVariable output = sameDiff.nn.softmax(linear);
+``` 
+
+### `cnn` - convolutional neural networks tools
+The `cnn` module contains layers and operations typically used in convolutional neural networks - 
+different activations may be picked up from the `nn` module. Among `cnn` operations we currently have creation of:
+- linear convolution layers, currently for tensors of dimension up to 3 (minibatch not included): `conv1d`, `conv2d`, 
+`conv3d`, `depthWiseConv2d`, `separableConv2D`/`sconv2d`; 
+- linear deconvolution layers, currently `deconv1d`, `deconv2d`, `deconv3d`; 
+- pooling, e.g. `maxPoooling2D`, `avgPooling1D`;
+- specialized reshaping methods: `batchToSpace`, `spaceToDepth`, `col2Im` and alike;
+- upsampling, currently presented by `upsampling2d` operation;
+- local response normalization: `localResponseNormalization`, currently for 2d convolutional layers only;
+
+Convolution and deconvolution operations are specified by a number of static parameters like kernel size, 
+dilation, having or not having bias etc.. To facilitate the creation process, we pack the required parameters into 
+easily constructable and alterable configuration objects. Desired activations may be borrowed from the `nn` module. So, 
+for example, if we want to create a 3x3 convolutional layer with `relu` activation, we may proceed as follows:
+```java
+Conv2DConfig config2d = new Conv2DConfig().builder().kW(3).kH(3).pW(2).pH(2).build();
+SDVariable convolution2dLinear = sameDiff.cnn.conv2d(input, weights, config2d);
+SDVariable convolution2dOutput = sameDiff.nn.relu(convolution2dLinear);
+``` 
+In the first line, we construct a convolution configuration using its default constructor. Then we specify the 
+kernel size (this is mandatory) and optional padding size, keeping other settings default (unit stride, no 
+dilation, no bias, `NCHW` data format). We then employ this configuration to create a linear convolution with predefined
+`SDVariables` for input and weights; the shape of `weights` is to be tuned to that of `input` and to `config` 
+beforehand. Thus, if in the above example `input` has shape, say, `[-1, nIn, height, width]`, then `weights` are to have
+a form `[nIn, nOut, 3, 3]` (because we have 3x3 convolution kernel). The shape of the resulting variable `convoluton2d` 
+will be predetermined by these parameters (in our case, it will be `[-1, nOut, height, width]`). Finally, in the last 
+line we apply a `relu` activation.
+
+### `rnn` - Recurrent neural networks
+
+This module contains arguably the most sophisticated methods in the framework. Currently it allows you to create 
+- simple recurrent units, using `sru` and `sruCell` methods;
+- LSTM units, using `lstmCell`, `lstmBlockCell` and `lstmLayer`;
+- Graves LSTM units, using `gru` methods.
+
+As of now, recurrent operations require special configuration objects as input, in which you need to pack all the 
+variables that will be used in a unit. This is subject to change in the later versions. For instance, to 
+create a simple recurrent unit, you need to proceed like that:
+```java
+SRUConfiguration sruConfig = new SRUConfiguration(input, weights, bias, init);
+SDVariable sruOutput = samediff.rnn().sru(sruConfig);
+```
+Here, the arguments in the `SRUConfiguration` constructor are variables that are to be defined beforehand. Obviously 
+their shapes should be matching, and these shapes predetermine the shape of `output`.
+
+### `loss` - Loss functions
+In this branch we keep common loss functions. Most loss functions may be created quite simply, like that:
+```java
+SDVariable logLoss = sameDiff.loss.logLoss("logLoss", label, predictions);
+```
+where `labels` and `predictions` are `SDVariable`'s. A `String` name is a mandatory parameter in most `loss` methods, 
+yet it may be set to `null` - in this case, the name will be generated automatically. You may also create weighted loss
+functions by adding another `SDVariable` parameters containing weights, as well as specify a reduction method (see below) 
+for the loss over the minibatch. Thus, a full-fledged `logLoss` operation may 
+look like:
+```java
+SDVariable wLogLossMean = sameDiff.loss.logLoss("wLogLossMean", label, predictions, weights, LossReduce.MEAN_BY_WEIGHT);
+```
+Some loss operations may allow/require further arguments, depending on their type: e.g. a dimension along which the 
+loss is to be computed (as in `cosineLoss`), or some real-valued parameters.  
+
+As for reduction methods, over the minibatch, there are currently 4 of them available. Thus, initially loss values for 
+each sample of the minibatch are computed, then they are multiplied by weights (if specified), and finally one of the 
+following routines takes place: 
+- `NONE` - leaving the resulting (weighted)loss values as-is; the result is an `INDArray` with the length of the 
+minibatch: `sum_loss = sum(weights * loss_per_sample)`.
+- `SUM` - summing the values, producing a scalar result.
+- `MEAN_BY_WEIGHT` - first computes the sum as above, and then divides it by the sum of all weights, producing a scalar 
+value: `mean_loss = sum(weights * loss_per_sample) / sum(weights)`. If weights are not
+specified, they all are set to `1.0` and this reduction is equivalent to getting mean loss value over the minibatch.  
+- `MEAN_BY_NONZERO_WEIGHT_COUNT` - divides the weighted sum by the number of nonzero weight, producing a scalar: 
+`mean_count_loss = sum(weights * loss_per_sample) / count(weights != 0)`. Useful e.g. when you want to compute the mean
+only over a subset of *valid* samples, setting weights by either `0.` or `1.`. When weights are not given, it just 
+produces mean, and thus equivalent to `MEAN_BY_WEIGHT`.
+ 
+
+## The *don'ts* of operations 
+
+In order for `SameDiff` operations to work properly, several main rules are to be upheld. Failing to do so may result in
+an exception or, worse even, to a working code producing undesired results. All the things we mention in the current 
+section describe what **you better not** do.
+
+- All variables in an operation have to belong to the same instance of `SamdeDiff` (see the [variables](./samediff/variables)
+section on how variables are added to a `SameDiff` instance). In other words, **you better not** 
+```java
+SDVariable x = sameDiff0.var(DataType.FLOAT, 1);
+SDVariable y = sameDiff1.placeHolder(DataType.FLOAT, 1);
+SDVariable z = x.add(y);
+```
+- At best, a new variable is to be created for a result of an operation or a chain of operations. In other words, **you 
+better not** redefine existing variables **and better not** leave operations returning no result. In other words, try to 
+**avoid** the code like this:
+```java
+SDVariable z = x.add(y);
+//DON'T!!!
+z.mul(2);
+x = z.mul(y);
+``` 
+A properly working version of the above code (if we've desired to obtain 2xy+2y<sup>2</sup> in an unusual way) will be
+```java
+SDVariable z = x.add(y);
+SDVariable _2z = z.mul(2);
+w = _2z.mul(y);
+```
+ To learn more why it functions like that, see our [graph section](./samediff/graph).
diff --git a/docs/samediff/templates/variables.md b/docs/samediff/templates/variables.md
index 3a309c698..5b52900ca 100644
--- a/docs/samediff/templates/variables.md
+++ b/docs/samediff/templates/variables.md
@@ -60,6 +60,7 @@ throw an exception otherwise. Also, as we shall discuss just below, all the `SDV
 supposed to be of floating point type.
 
 ## Common features of variables
+
 Before we go to the differences between variables, let us first look at the properties they all share 
 - All variables are ultimately derived from an instance of `SameDiff`, serving as parts of its 
 [graph](./samediff/graphs). In fact, each variable has a `SameDiff` as one of its fields.
@@ -87,13 +88,22 @@ SDVariable weights = samediff.var("weights", DataType.FLOAT, 784, 10);
 ```
 adds a variable constituting of a 784x10 array of `float` numbers - weights for a single layer MNIST perceptron 
 in this case - to a pre-existing `SameDiff` instance `samediff`.
-However, this way the values within a variable will be set as zeros. If we wish them to be filled, say, with random 
-numbers, we need to add a weight initialization scheme, for instance like that:
+
+However, this way the values within a variable will be set as zeros. You may also create a variable with values from
+a preset `INDArray`. Say
+```java
+SDVariable weights = samediff.var("weigths", Nd4j.nrand(784, 10).div(28));
+```
+will create a variable filled with normally distributed randomly generated numbers with variance `1/28`. You may put
+any other array creation methods instead of `nrand`, or any preset array, of course. Also, you may use some popular 
+initialization scheme, like so:
+
 ```java
 SDVariable weights = samediff.var("weights", new XavierInitScheme('c', 784, 10), DataType.FLOAT, 784, 10);
 ```
 Now, the weights will be randomly initialized using the Xavier scheme. There are other ways to create and 
-fill variables, you may see a full reference in our [javadoc](https://deeplearning4j.org/api/latest/).
+
+fill variables: you may look them up in the 'known subclasses' section [of our javadoc](https://deeplearning4j.org/api/latest/org/nd4j/weightinit/WeightInitScheme.html").
 
 ### Constants