Drop stale documentation (#438)

The current documentation can be found on https://deeplearning4j.konduit.ai/ and soon in a new git repository too. The documentation in this repository has been stale for a while and keeping it around will only serve to confuse people. Signed-off-by: Paul Dubs <paul.dubs@gmail.com>
2020-05-08 07:39:41 +02:00 · 2020-05-08 07:39:41 +02:00 · f1232f8221
commit f1232f8221
parent 53920a0724
154 changed files with 0 additions and 14839 deletions
--- a/docs/README.md
+++ b/docs/README.md
@ -1,62 +0,0 @@
-# DL4J auto-generated documentation
-
-## Building
-
-Run `./gen_all_docs.sh` to generate documentation from source for all supported projects. For each documentation module, files will be put into a `doc_sources` folder where they are staged for copying to the primary docs repository. Note that the autogen docs require Python 2.
-
-To deploy a new version of documentation, first make sure to set `$DL4J_DOCS_DIR` to your local copy of 
-https://github.com/eclipse/deeplearning4j-docs and set `$DL4J_VERSION` to a URI-friendly version string such as `v100-RC` (note the lack of decimals). Then run `./copy-to-dl4j-docs.sh`. This puts documentation
-into the right folders and you can use `git` to create a PR and update the live docs.
-
-The structure of this project (template files, generating code, mkdocs YAML) is closely aligned
-with the [Keras documentation](keras.io) and heavily inspired by the [Keras docs repository](https://github.com/keras-team/keras/tree/master/docs).
-
-## File structure
-
-Each major module or library in Eclipse Deeplearning4j has its own folder. Inside that folder are three essential files:
-
- `templates/`
- `pages.json`
- `README.md`
-
-Note that the folder names don't exactly match up with the modules in the `pom.xml` definitions across DL4J. This is because some of the documentation is consolidated (such as DataVec) or omitted due to its experimental status or because it is low-level in the code.
-
-Templates must maintain a flat file structure. This is to accommodate Jekyll collections when the docs are published. Don't worry about having similarly named files in different doc modules - the module name is prepended when the docs are generated.
-
-## Creating templates
-
-Each template has a Jekyll header at the top:
-
-```markdown
---
-title: Deeplearning4j Autoencoders
-short_title: Autoencoders
-description: Supported autoencoder configurations.
-category: Models
-weight: 3
---
-```
-
-All of these definitions are necessary. 
-
- `title` is the HTML title that appears for a Google result or at the top of the browser window.
- `short_title` is a short name for simple navigation in the user guide.
- `description` is the text that appears below the title in a search engine result.
- `category` is the high-level category in the user guide.
- `weight` is the ordering that the doc will appear in navigation, the larger the lower the listing.
-
-## Creating links
-
-**All links to other docs need to be relative.** This prolongs the life of the documentation and reduces maintenance. The basic structure of a link to another doc looks like:
-
-```
-<module name>-<file name>
-```
-
-So if you created a DataVec doc with the name `iterators.md` in the `datavec` module, your relative link will look like:
-
-```
-./datavec-iterators
-```
-
-Note the omission of the file extension `.md`. Jekyll automatically generates a clean URL for us to use.
--- a/docs/init.py
+++ b/docs/init.py
@ -1,16 +0,0 @@
-################################################################################
-# Copyright (c) 2015-2019 Skymind, Inc.
-#
-# This program and the accompanying materials are made available under the
-# terms of the Apache License, Version 2.0 which is available at
-# https://www.apache.org/licenses/LICENSE-2.0.
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
-# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
-# License for the specific language governing permissions and limitations
-# under the License.
-#
-# SPDX-License-Identifier: Apache-2.0
-################################################################################
-
--- a/docs/arbiter/README.md
+++ b/docs/arbiter/README.md
@ -1,10 +0,0 @@
-# arbiter documentation
-
-To generate docs into the`datavec/doc_sources` folder, first `cd docs` then run:
-
-```shell
-python generate_docs.py \
-    --project arbiter \
-    --code ../arbiter
-	--out_language en
-```
--- a/docs/arbiter/pages.json
+++ b/docs/arbiter/pages.json
@ -1,61 +0,0 @@
-{
-  "excludes": [
-    "abstract"
-  ],
-  "indices": [
-  ],
-  "pages": [
-    {
-      "page": "overview.md",
-      "class": []
-    },
-    {
-      "page": "visualization.md",
-      "class": []
-    },
-    {
-      "page": "parameter-spaces.md",
-      "class": [
-        "arbiter-core/src/main/java/org/deeplearning4j/arbiter/optimize/parameter/continuous/ContinuousParameterSpace.java",
-        "arbiter-core/src/main/java/org/deeplearning4j/arbiter/optimize/parameter/discrete/DiscreteParameterSpace.java",
-        "arbiter-core/src/main/java/org/deeplearning4j/arbiter/optimize/parameter/integer/IntegerParameterSpace.java",
-        "arbiter-core/src/main/java/org/deeplearning4j/arbiter/optimize/parameter/BooleanSpace.java",
-        "arbiter-deeplearning4j/src/main/java/org/deeplearning4j/arbiter/dropout/AlphaDropoutSpace.java",
-        "arbiter-deeplearning4j/src/main/java/org/deeplearning4j/arbiter/dropout/GaussianDropoutSpace.java",
-        "arbiter-deeplearning4j/src/main/java/org/deeplearning4j/arbiter/dropout/GaussianNoiseSpace.java",
-        "arbiter-core/src/main/java/org/deeplearning4j/arbiter/optimize/parameter/FixedValue.java",
-        "arbiter-core/src/main/java/org/deeplearning4j/arbiter/optimize/parameter/math/MathOp.java",
-        "arbiter-core/src/main/java/org/deeplearning4j/arbiter/optimize/parameter/math/PairMathOp.java"
-      ]
-    },
-    {
-      "page": "layer-spaces.md",
-      "class": [
-        "arbiter-deeplearning4j/src/main/java/org/deeplearning4j/arbiter/layers/ActivationLayerSpace.java",
-        "arbiter-deeplearning4j/src/main/java/org/deeplearning4j/arbiter/layers/AutoEncoderLayerSpace.java",
-        "arbiter-deeplearning4j/src/main/java/org/deeplearning4j/arbiter/layers/BatchNormalizationSpace.java",
-        "arbiter-deeplearning4j/src/main/java/org/deeplearning4j/arbiter/layers/Bidirectional.java",
-        "arbiter-deeplearning4j/src/main/java/org/deeplearning4j/arbiter/layers/CenterLossOutputLayerSpace.java",
-        "arbiter-deeplearning4j/src/main/java/org/deeplearning4j/arbiter/layers/ConvolutionLayerSpace.java",
-        "arbiter-deeplearning4j/src/main/java/org/deeplearning4j/arbiter/layers/Deconvolution2DLayerSpace.java",
-        "arbiter-deeplearning4j/src/main/java/org/deeplearning4j/arbiter/layers/DenseLayerSpace.java",
-        "arbiter-deeplearning4j/src/main/java/org/deeplearning4j/arbiter/layers/DropoutLayerSpace.java",
-        "arbiter-deeplearning4j/src/main/java/org/deeplearning4j/arbiter/layers/EmbeddingLayerSpace.java",
-        "arbiter-deeplearning4j/src/main/java/org/deeplearning4j/arbiter/layers/FeedForwardLayerSpace.java",
-        "arbiter-deeplearning4j/src/main/java/org/deeplearning4j/arbiter/layers/GlobalPoolingLayerSpace.java",
-        "arbiter-deeplearning4j/src/main/java/org/deeplearning4j/arbiter/layers/GravesBidirectionalLSTMLayerSpace.java",
-        "arbiter-deeplearning4j/src/main/java/org/deeplearning4j/arbiter/layers/GravesLSTMLayerSpace.java",
-        "arbiter-deeplearning4j/src/main/java/org/deeplearning4j/arbiter/layers/LSTMLayerSpace.java",
-        "arbiter-deeplearning4j/src/main/java/org/deeplearning4j/arbiter/layers/LocalResponseNormalizationLayerSpace.java",
-        "arbiter-deeplearning4j/src/main/java/org/deeplearning4j/arbiter/layers/LossLayerSpace.java",
-        "arbiter-deeplearning4j/src/main/java/org/deeplearning4j/arbiter/layers/OCNNLayerSpace.java",
-        "arbiter-deeplearning4j/src/main/java/org/deeplearning4j/arbiter/layers/OutputLayerSpace.java",
-        "arbiter-deeplearning4j/src/main/java/org/deeplearning4j/arbiter/layers/RnnOutputLayerSpace.java",
-        "arbiter-deeplearning4j/src/main/java/org/deeplearning4j/arbiter/layers/SeparableConvolution2DLayerSpace.java",
-        "arbiter-deeplearning4j/src/main/java/org/deeplearning4j/arbiter/layers/SubsamplingLayerSpace.java",
-        "arbiter-deeplearning4j/src/main/java/org/deeplearning4j/arbiter/layers/VariationalAutoencoderLayerSpace.java"
-      ]
-    }
-  ]
-}
-
--- a/docs/arbiter/templates/layer-spaces.md
+++ b/docs/arbiter/templates/layer-spaces.md
@ -1,11 +0,0 @@
---
-title: Arbiter Layer Spaces
-short_title: Layer Spaces
-description: Set a search spaces for layers.
-category: Arbiter
-weight: 1
---
-
-## Layer Spaces
-
-{{autogenerated}}
--- a/docs/arbiter/templates/overview.md
+++ b/docs/arbiter/templates/overview.md
@ -1,257 +0,0 @@
---
-title: Arbiter Overview
-short_title: Overview
-description: Introduction to using Arbiter for hyperparameter optimization.
-category: Arbiter
-weight: 0
---
-
-## Hyperparameter Optimization
-
-Machine learning techniques have a set of parameters that have to be chosen before any training can begin. These parameters are referred to as hyperparameters. Some examples of hyperparameters are ‘k’ in k-nearest-neighbors and the regularization parameter in Support Vector Machines. Neural Networks, in particular, have a wide variety of hyperparameters. Some of these define the architecture of the neural network like the number of layers and their size. Other define the learning process like the learning rate and regularization. 
-
-Traditionally these choices are made based on existing rules of thumb or after extensive trial and error, both of which are less than ideal. Undoubtedly the choice of these parameters can have a significant impact on the results obtained after learning. Hyperparameter optimization attempts to automate this process using software that applies search strategies. 
-
-## Arbiter
-
-Arbiter is part of the DL4J Suite of Machine Learning/Deep Learning tools for the enterprise. It is dedicated to the hyperparameter optimization of neural networks created or imported into dl4j. It allows users to set up search spaces for the hyperparameters and run either grid search or random search to select the best configuration based on a given scoring metric. 
-
-When to use Arbiter?
-Arbiter can be used to find good performing models, potentially saving you time tuning your model's hyperparameters, at the expense of greater computational time. Note however that Arbiter doesn't completely automate the neural network tuning process, the user still needs to specify a search space. This search space defines the range of valid values for each hyperparameter (example: minimum and maximum allowable learning rate). If this search space is chosen poorly, Arbiter may not be able to find any good models.
-
-Add the following to your pom.xml to include Arbiter in your project where ${arbiter.version} is the latest release of the dl4j stack.
-
-```xml
-<!-- Arbiter - used for hyperparameter optimization (grid/random search) -->
-<dependency>
-    <groupId>org.deeplearning4j</groupId>
-    <artifactId>arbiter-deeplearning4j</artifactId>
-    <version>{{page.version}}</version>
-</dependency>
-<dependency>
-    <groupId>org.deeplearning4j</groupId>
-    <artifactId>arbiter-ui_2.11</artifactId>
-    <version>{{page.version}}</version>
-</dependency>
-```
-
-Arbiter also comes with a handy UI that helps visualize the results from the optimizations runs. 
-
-As a prerequisite to using Arbiter users should be familiar with the NeuralNetworkConfiguration, MultilayerNetworkConfiguration and ComputationGraphconfiguration classes in DL4J.
-
-## Usage
-This section will provide an overview of the important constructs necessary to use Arbiter. The sections that follow will dive into the details. 
-
-At the highest level, setting up hyperparameter optimization involves setting up an OptimizationConfiguration and running it via IOptimizationRunner. 
-
-Below is some code that demonstrates the fluent builder pattern in OptimizationConfiguration:
-
-```java
-OptimizationConfiguration configuration = new OptimizationConfiguration.Builder()
-    .candidateGenerator(candidateGenerator)
-    .dataSource(dataSourceClass,dataSourceProperties)
-    .modelSaver(modelSaver)
-    .scoreFunction(scoreFunction)
-    .terminationConditions(terminationConditions)
-    .build();
-```
-
-As indicated above setting up an optimization configuration requires:
-CandidateGenerator: Proposes candidates (i.e., hyperparameter configurations) for evaluation. Candidates are generated based on some strategy. Currently random search and grid search are supported. Valid configurations for the candidates are determined by the hyperparameter space associated with the candidate generator.
-DataSource: DataSource is used under the hood to provide data to the generated candidates for training and test
-ModelSaver: Specifies how the results of each hyperparameter optimization run should be saved. For example, whether saving should be done to local disk, to a database, to HDFS, or simply stored in memory.
-ScoreFunction: A metric that is a single number that we are seeking to minimize or maximize to determine the best candidate. Eg. Model loss or classification accuracy
-TerminationCondition:  Determines when hyperparameter optimization should be stopped. Eg. A given number of candidates have been evaluated, a certain amount of computation time has passed.
-
-The optimization configuration is then passed to an optimization runner along with a task creator. 
-
-If candidates generated are MultiLayerNetworks this is set up as follows:
-
-```java        
-IOptimizationRunner runner = new LocalOptimizationRunner(configuration, new MultiLayerNetworkTaskCreator());
-```
-
-Alternatively if candidates generated are ComputationGraphs this is set up as follows:
-
-```java        
-IOptimizationRunner runner = new LocalOptimizationRunner(configuration, new ComputationGraphTaskCreator());
-```
-
-Currently the only option available for the runner is the LocalOptimizationRunner which is used to execute learning on a single machine (i.e, in the current JVM). In principle, other execution methods (for example, on Spark or cloud computing machines) could be implemented.
-
-To summarize here are the steps to set up a hyperparameter optimization run:
-
-1. Specify hyperparameter search space 
-1. Specify a candidate generator for the hyperparameter search space 
-1. The next section of steps can be done in any order:
-1. Specify a data source
-1. Specify a model saver
-1. Specify a score function
-1. Specify a termination condition
-1. The next steps have to be done in order:
-1. Use 2 to 6 above to construct an Optimization Configuration
-1. Run with the Optimization Runner.
-
-
-## Hyperparameter search space 
-
-Arbiter’s `ParameterSpace<T>` class defines the acceptable ranges of values a given hyperparameter may take. ParameterSpace can be a simple, like a ParameterSpace that defines a continuous range of double values (say for learning rate) or complicated with multiple nested parameter spaces within like the case of a MultiLayerSpace (which defines a search space for a MultilayerConfiguration).
-
-
-## MultiLayerSpace and ComputationGraphSpace
-
-MultiLayerSpace and ComputationGraphSpace are Arbiter’s counterpart to dl4j’s MultiLayerConfiguration and ComputationGraphConfiguration. They are used to set up parameter spaces for valid hyperparameters in MultiLayerConfiguration and ComputationGraphConfiguration. 
-
-In addition to these users can also set up the number of epochs or an early stopping configuration to indicate when training on each candidate neural net should stop. If both an EarlyStoppingConfiguration and the number of epochs are specified, early stopping will be used in preference.
-
-Setting up MultiLayerSpace or ComputationGraphSpace are fairly straightforward once the user is familiar with Integer, Continuous and Discrete parameter spaces and LayerSpaces and UpdaterSpaces. 
-
-The only caveat to be noted here is that while it is possible to set up weightConstraints, l1Bias and l2Bias as part of the NeuralNetConfiguration these have to be setup on a per layer/layerSpace basis in MultiLayerSpace. In general all properties/hyperparameters available through the builder will take either a fixed value or a parameter space of that type. This means that pretty much every aspect of the MultiLayerConfiguration can be swept to test out a variety of architectures and initial values.
-
-Here is a simple example of a MultiLayerSpace:
-
-```java
-ParameterSpace<Boolean> biasSpace = new DiscreteParameterSpace<>(new Boolean[]{true, false});
-ParameterSpace<Integer> firstLayerSize = new IntegerParameterSpace(10,30);
-ParameterSpace<Integer> secondLayerSize = new MathOp<>(firstLayerSize, Op.MUL, 3);
-ParameterSpace<Double> firstLayerLR = new ContinuousParameterSpace(0.01, 0.1);
-ParameterSpace<Double> secondLayerLR = new MathOp<>(firstLayerLR, Op.ADD, 0.2);
-
-MultiLayerSpace mls =
-    new MultiLayerSpace.Builder().seed(12345)
-            .hasBias(biasSpace)
-            .layer(new DenseLayerSpace.Builder().nOut(firstLayerSize)
-                    .updater(new AdamSpace(firstLayerLR))
-                    .build())
-            .layer(new OutputLayerSpace.Builder().nOut(secondLayerSize)
-                    .updater(new AdamSpace(secondLayerLR))
-                    .build())
-            .setInputType(InputType.feedForward(10))
-  .numEpochs(20).build(); //Data will be fit for a fixed number of epochs
-```
-
-Of particular note is Arbiter’s ability to vary the number of layers in the MultiLayerSpace. Here is a simple example demonstrating the same that also demonstrates setting up a parameter search space for a weighted loss function:
-
-```java
-ILossFunction[] weightedLossFns = new ILossFunction[]{
-    new LossMCXENT(Nd4j.create(new double[]{1, 0.1})),
-        new LossMCXENT(Nd4j.create(new double[]{1, 0.05})),
-            new LossMCXENT(Nd4j.create(new double[]{1, 0.01}))};
-
-DiscreteParameterSpace<ILossFunction> weightLossFn = new DiscreteParameterSpace<>(weightedLossFns);
-MultiLayerSpace mls =
-    new MultiLayerSpace.Builder().seed(12345)
-        .addLayer(new DenseLayerSpace.Builder().nIn(10).nOut(10).build(),
-            new IntegerParameterSpace(2, 5)) //2 to 5 identical layers
-        .addLayer(new OutputLayerSpace.Builder()
-            .iLossFunction(weightLossFn)
-            .nIn(10).nOut(2).build())
-        .backprop(true).pretrain(false).build();
-```
-
-The two to five layers created above will be identical (stacked). Currently Arbiter does not support the ability to create independent layers. 
-
-Finally it is also possible to create a fixed number of identical layers as shown in the following example:
-
-```java
-DiscreteParameterSpace<Activation> activationSpace = new DiscreteParameterSpace(new Activation[]{Activation.IDENTITY, Activation.ELU, Activation.RELU});
-MultiLayerSpace mls = new MultiLayerSpace.Builder().updater(new Sgd(0.005))
-    .addLayer(new DenseLayerSpace.Builder().activation(activationSpace).nIn(10).nOut(10).build(),
-        new FixedValue<Integer>(3))
-    .addLayer(new OutputLayerSpace.Builder().iLossFunction(new LossMCXENT()).nIn(10).nOut(2).build())
-    .backprop(true).build();
-```
-
-In this example with a grid search three separate architectures will be created. They will be identical in every way but in the chosen activation function in the non-output layers. Again it is to be noted that the layers created in each architecture are identical(stacked).
-
-Creating ComputationGraphSpace is very similar to MultiLayerSpace. However there is currently only support for fixed graph structures. 
-
-Here is a simple example demonstrating setting up a ComputationGraphSpace:
-
-```java
-ComputationGraphSpace cgs = new ComputationGraphSpace.Builder()
-                .updater(new SgdSpace(new ContinuousParameterSpace(0.0001, 0.1)))
-                .l2(new ContinuousParameterSpace(0.2, 0.5))
-                .addInputs("in")
-                .addLayer("0",new  DenseLayerSpace.Builder().nIn(10).nOut(10).activation(
-            new DiscreteParameterSpace<>(Activation.RELU,Activation.TANH).build(),"in")           
-
-        .addLayer("1", new OutputLayerSpace.Builder().nIn(10).nOut(10)
-                             .activation(Activation.SOFTMAX).build(), "0")
-        .setOutputs("1").setInputTypes(InputType.feedForward(10)).build();
-```
-
-### JSON serialization.
-
-MultiLayerSpace, ComputationGraphSpace and OptimizationConfiguration have `toJso`n methods as well as `fromJson` methods. You can store the JSON representation for further use.
-
-Specifying a candidate generator
-As mentioned earlier Arbiter currently supports grid search and random search.
-
-Setting up a random search is straightforward and is shown below:
-MultiLayerSpace mls;
-...
-CandidateGenerator candidateGenerator = new RandomSearchGenerator(mls);
-
-Setting up a grid search is also simple. With a grid search the user also gets to specify a discretization count and a mode. The discretization count determines how many values a continuous parameter is binned into. For eg. a continuous parameter in range [0,1] is converted to [0.0, 0.5, 1.0] with a discretizationCount of 3. The mode determines the manner in which the candidates are generated. Candidates can be generated in Sequential (in order) or RandomOrder. With sequential order the first hyperparameter will be changed most rapidly and consequently the last hyperparameter will be changed the least rapidly. Note that both modes will result in the same set of candidates just in varying order.
-
-Here is a simple example of how a grid search is set up with a discretization count of 4 in sequential order:
-
-```java
-CandidateGenerator candidateGenerator = new GridSearchCandidateGenerator(mls, 4,
- GridSearchCandidateGenerator.Mode.Sequential);
-```
-
-
-## Specifying a data source
-
-The DataSource interface defines where data for training the different candidates come from. It is very straightforward to implement. Note that a no argument constructor is required to be defined. Depending on the needs of the user the DataSource implementation can be configured with properties, like the size of the minibatch. A simple implementation of the data source that uses the MNIST dataset is available in the example repo which is covered later in this guide.
-It is important to note here that the number of epochs (as well as early stopping configurations) can be set via the MultiLayerSpace and ComputationGraphSpace builders. 
-
-
-## Specifying a model/result saver 
-
-Arbiter currently supports saving models either saving to disk in local memory (FileModelSaver) or storing results in-memory (InMemoryResultSaver). InMemoryResultSaver is obviously not recommended for large models. 
-
-Setting them up are trivial. FileModelSaver constructor takes a path as String. It saves config, parameters and score to: baseDir/0/, baseDir/1/, etc where index is given by OptimizationResult.getIndex(). InMemoryResultSaver requires no arguments.
-
-Specifying a score function
-There are three main classes for score functions: EvaluationScoreFunction, ROCScoreFunction and RegressionScoreFunction. 
-
-EvaluationScoreFunction uses a DL4J evaluation metric. Available metrics are ACCURACY, F1, PRECISION, RECALL, GMEASURE, MCC. Here is a simple example that uses accuracy:
-        ScoreFunction scoreFunction = new EvaluationScoreFunction(Evaluation.Metric.ACCURACY);
-
-ROCScoreFunction calculates AUC (area under ROC curve) or AUPRC (area under precision/recall curve) on the test set. Different ROC types (ROC, ROCBinary and ROCMultiClass) are supported. Here is a simple example that uses AUC:
-ScoreFunction sf = new ROCScoreFunction(ROCScoreFunction.ROCType.BINARY, ROCScoreFunction.Metric.AUC));
-
-RegressionScoreFunction is used for regression and supports all DL4J RegressionEvaluation metrics (MSE, MAE, RMSE, RSE, PC, R2). Here is a simple example:
-ScoreFunction sf = new RegressionScoreFunction(RegressionEvaluation.Metric.MSE);
-
-## Specifying a termination condition
-
-Arbiter currently only supports two kinds of termination conditions - MaxTimeCondition and MaxCandidatesCondition. MaxTimeCondition specifies a time after which hyperparameter optimization will be terminated. MaxCandidatesCondition specifies a maximum number of candidates after which hyperparameter optimization is terminated. Termination conditions can be specified as a list. Hyperparameter optimization stops if any of the conditions are met. 
-
-Here is a simple example where the run is terminated at fifteen minutes or after training ten candidates which ever is met first:
-
-```java
-TerminationCondition[] terminationConditions = { 
-	new MaxTimeCondition(15, TimeUnit.MINUTES),
-    new MaxCandidatesCondition(10)
-};
-```
-
-
-## Example Arbiter Run on MNIST data
-
-The DL4J example repo contains a BasicHyperparameterOptimizationExample on MNIST data. Users can walk through this simple example here. This example also goes through setting up the Arbiter UI. Arbiter uses the same storage and persistence approach as DL4J's UI. More documentation on the UI can be found here. The UI can be accessed at  http://localhost:9000/arbiter.
-
-
-## Tips for hyperparameter tuning
-
-Please refer to the excellent section on hyperparameter optimization here from the CS231N class at Stanford. A summary of these techniques are below:
- Prefer random search over grid search. For a comparison of random and grid search methods, see Random Search for Hyper-parameter Optimization (Bergstra and Bengio, 2012).
- Run search from coarse to fine (Start with a coarse parameter search with one or two epochs, pick the best candidate to do a fine search on with more epochs, iterate)
- Use LogUniformDistribution for certain hyperparameter like the learning rate, l2 etc
- Be mindful of values that fall close to the borders of the parameter search space
-
-
--- a/docs/arbiter/templates/parameter-spaces.md
+++ b/docs/arbiter/templates/parameter-spaces.md
@ -1,11 +0,0 @@
---
-title: Arbiter Parameter Spaces
-short_title: Parameter Spaces
-description: Set a search spaces for parameters.
-category: Arbiter
-weight: 1
---
-
-## Parameter Spaces
-
-{{autogenerated}}
--- a/docs/copy-to-dl4j-docs.sh
+++ b/docs/copy-to-dl4j-docs.sh
@ -1,34 +0,0 @@
-#!/usr/bin/env bash
-
-################################################################################
-# Copyright (c) 2015-2018 Skymind, Inc.
-#
-# This program and the accompanying materials are made available under the
-# terms of the Apache License, Version 2.0 which is available at
-# https://www.apache.org/licenses/LICENSE-2.0.
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
-# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
-# License for the specific language governing permissions and limitations
-# under the License.
-#
-# SPDX-License-Identifier: Apache-2.0
-################################################################################
-
-# Make sure to set $DL4J_DOCS_DIR to your local copy of https://github.com/deeplearning4j/deeplearning4j-docs
-SOURCE_DIR=$(pwd)
-
-# print the current git status
-cd $DL4J_DOCS_DIR
-git status
-
-cd $SOURCE_DIR
-
-# each release is its own jekyll collection located in docs/<version>
-DOCS_DEST=$DL4J_DOCS_DIR/docs/_$DL4J_VERSION
-mkdir $DOCS_DEST
-echo Copying to $DOCS_DEST
-
-# recursively find all files in doc_sources and copy
-find $SOURCE_DIR/*/doc_sources -maxdepth 1 -type f -exec cp '{}' $DOCS_DEST \;
--- a/docs/datavec/README.md
+++ b/docs/datavec/README.md
@ -1,10 +0,0 @@
-# datavec documentation
-
-To generate docs into the`datavec/doc_sources` folder, first `cd docs` then run:
-
-```shell
-python generate_docs.py \
-    --project datavec \
-    --code ../datavec
-	--out_language en
-```
--- a/docs/datavec/pages.json
+++ b/docs/datavec/pages.json
@ -1,203 +0,0 @@
-{
-  "excludes": [
-    "abstract"
-  ],
-  "indices": [
-  ],
-  "pages": [
-    {
-      "page": "overview.md",
-      "class": []
-    },
-    {
-      "page": "normalization.md",
-      "module": [
-        "/../nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/dataset/api/preprocessor/"
-      ]
-    },
-    {
-      "page": "records.md",
-      "class": [
-        "datavec-api/src/main/java/org/datavec/api/records/impl/Record.java",
-        "datavec-api/src/main/java/org/datavec/api/records/impl/SequenceRecord.java"
-      ]
-    },
-    {
-      "page": "readers.md",
-      "class": [
-        "datavec-data/datavec-data-image/src/main/java/org/datavec/image/recordreader/ImageRecordReader.java",
-        "datavec-data/datavec-data-audio/src/main/java/org/datavec/audio/recordreader/NativeAudioRecordReader.java",
-        "datavec-data/datavec-data-audio/src/main/java/org/datavec/audio/recordreader/WavFileRecordReader.java",
-        "datavec-data/datavec-data-nlp/src/main/java/org/datavec/nlp/reader/TfidfRecordReader.java",
-        "datavec-api/src/main/java/org/datavec/api/records/reader/impl/FileRecordReader.java",
-        "datavec-api/src/main/java/org/datavec/api/records/reader/impl/LineRecordReader.java",
-        "datavec-api/src/main/java/org/datavec/api/records/reader/impl/ComposableRecordReader.java",
-        "datavec-api/src/main/java/org/datavec/api/records/reader/impl/csv/CSVRecordReader.java",
-        "datavec-api/src/main/java/org/datavec/api/records/reader/impl/csv/CSVRegexRecordReader.java",
-        "datavec-api/src/main/java/org/datavec/api/records/reader/impl/csv/CSVSequenceRecordReader.java",
-        "datavec-api/src/main/java/org/datavec/api/records/reader/impl/csv/CSVVariableSlidingWindowRecordReader.java",
-        "datavec-api/src/main/java/org/datavec/api/records/reader/impl/ConcatenatingRecordReader.java",
-        "datavec-api/src/main/java/org/datavec/api/records/reader/impl/transform/TransformProcessRecordReader.java",
-        "datavec-api/src/main/java/org/datavec/api/records/reader/impl/transform/TransformProcessSequenceRecordReader.java",
-        "datavec-api/src/main/java/org/datavec/api/records/reader/impl/collection/CollectionRecordReader.java",
-        "datavec-api/src/main/java/org/datavec/api/records/reader/impl/collection/CollectionSequenceRecordReader.java",
-        "datavec-api/src/main/java/org/datavec/api/records/reader/impl/collection/ListStringRecordReader.java",
-        "datavec-api/src/main/java/org/datavec/api/records/reader/impl/misc/LibSvmRecordReader.java",
-        "datavec-api/src/main/java/org/datavec/api/records/reader/impl/misc/MatlabRecordReader.java",
-        "datavec-api/src/main/java/org/datavec/api/records/reader/impl/misc/SVMLightRecordReader.java",
-        "datavec-api/src/main/java/org/datavec/api/records/reader/impl/regex/RegexLineRecordReader.java",
-        "datavec-api/src/main/java/org/datavec/api/records/reader/impl/regex/RegexSequenceRecordReader.java"
-      ]
-    },
-    {
-      "page": "executors.md",
-      "class": [
-        "datavec-local/src/main/java/org/datavec/local/transforms/LocalTransformExecutor.java",
-        "datavec-spark/src/main/java/org/datavec/spark/transform/SparkTransformExecutor.java"
-      ]
-    },
-    {
-      "page": "schema.md",
-      "class": [
-        "datavec-api/src/main/java/org/datavec/api/transform/schema/Schema.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/schema/SequenceSchema.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/schema/InferredSchema.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/join/Join.java"
-      ]
-    },
-    {
-      "page": "transforms.md",
-      "class": [
-        "datavec-api/src/main/java/org/datavec/api/transform/TransformProcess.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/categorical/CategoricalToIntegerTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/categorical/CategoricalToOneHotTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/categorical/IntegerToCategoricalTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/categorical/PivotTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/categorical/StringToCategoricalTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/column/AddConstantColumnTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/column/DuplicateColumnsTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/column/RemoveAllColumnsExceptForTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/column/RemoveColumnsTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/column/RenameColumnsTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/column/ReorderColumnsTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/doubletransform/DoubleColumnsMathOpTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/doubletransform/DoubleMathFunctionTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/doubletransform/DoubleMathOpTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/integer/IntegerColumnsMathOpTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/integer/IntegerMathOpTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/integer/IntegerToOneHotTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/integer/ReplaceEmptyIntegerWithValueTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/integer/ReplaceInvalidWithIntegerTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/longtransform/LongColumnsMathOpTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/longtransform/LongMathOpTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/nlp/TextToCharacterIndexTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/nlp/TextToTermIndexSequenceTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/sequence/SequenceDifferenceTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/sequence/SequenceMovingWindowReduceTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/sequence/SequenceOffsetTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/string/AppendStringColumnTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/string/ChangeCaseStringTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/string/ConcatenateStringColumns.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/string/MapAllStringsExceptListTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/string/RemoveWhiteSpaceTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/string/ReplaceEmptyStringTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/string/ReplaceStringTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/string/StringListToCategoricalSetTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/string/StringListToCountsNDArrayTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/string/StringListToIndicesNDArrayTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/string/StringMapTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/time/DeriveColumnsFromTimeTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/time/StringToTimeTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/time/TimeMathOpTransform.java",
-
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/condition/ConditionalCopyValueTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/condition/ConditionalReplaceValueTransform.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/condition/ConditionalReplaceValueTransformWithDefault.java",
-
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/doubletransform/ConvertToDouble.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/integer/ConvertToInteger.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/string/ConvertToString.java",
-
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/doubletransform/Log2Normalizer.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/doubletransform/MinMaxNormalizer.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/doubletransform/StandardizeNormalizer.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/transform/doubletransform/SubtractMeanNormalizer.java"
-      ]
-    },
-    {
-      "page": "operations.md",
-      "class": [
-        "datavec-api/src/main/java/org/datavec/api/transform/ops/AggregableCheckingOp.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/ops/AggregableMultiOp.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/ops/ByteWritableOp.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/ops/DispatchOp.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/ops/DispatchWithConditionOp.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/ops/DoubleWritableOp.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/ops/FloatWritableOp.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/ops/IntWritableOp.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/ops/LongWritableOp.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/ops/StringWritableOp.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/rank/CalculateSortedRank.java"
-      ]
-    },
-    {
-      "page": "conditions.md",
-      "class": [
-        "datavec-api/src/main/java/org/datavec/api/transform/condition/column/BooleanColumnCondition.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/condition/column/CategoricalColumnCondition.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/condition/column/DoubleColumnCondition.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/condition/column/InfiniteColumnCondition.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/condition/column/IntegerColumnCondition.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/condition/column/InvalidValueColumnCondition.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/condition/column/LongColumnCondition.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/condition/column/NaNColumnCondition.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/condition/column/NullWritableColumnCondition.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/condition/column/StringColumnCondition.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/condition/column/TimeColumnCondition.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/condition/column/TrivialColumnCondition.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/condition/sequence/SequenceLengthCondition.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/condition/string/StringRegexColumnCondition.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/condition/BooleanCondition.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/condition/SequenceConditionMode.java"
-      ]
-    },
-    {
-      "page": "filters.md",
-      "class": [
-        "datavec-api/src/main/java/org/datavec/api/transform/filter/Filter.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/filter/ConditionFilter.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/filter/FilterInvalidValues.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/filter/InvalidNumColumns.java"
-      ]
-    },
-    {
-      "page": "reductions.md",
-      "class": [
-        "datavec-api/src/main/java/org/datavec/api/transform/reduce/impl/GeographicMidpointReduction.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/stringreduce/StringReducer.java"
-      ]
-    },
-    {
-      "page": "serialization.md",
-      "class": [
-        "datavec-api/src/main/java/org/datavec/api/transform/serde/JsonSerializer.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/serde/YamlSerializer.java"
-      ]
-    },
-    {
-      "page": "visualization.md",
-      "class": [
-        "datavec-api/src/main/java/org/datavec/api/transform/ui/HtmlAnalysis.java",
-        "datavec-api/src/main/java/org/datavec/api/transform/ui/HtmlSequencePlotting.java"
-      ]
-    },
-    {
-      "page": "analysis.md",
-      "class": [
-        "datavec-spark/src/main/java/org/datavec/spark/transform/AnalyzeSpark.java",
-        "datavec-local/src/main/java/org/datavec/local/transforms/AnalyzeLocal.java"
-      ]
-    }
-  ]
-}
-
--- a/docs/datavec/templates/analysis.md
+++ b/docs/datavec/templates/analysis.md
@ -1,58 +0,0 @@
---
-title: DataVec Analysis
-short_title: Analysis
-description: Gather statistics on datasets.
-category: DataVec
-weight: 2
---
-
-## Analysis of data
-
-Sometimes datasets are too large or too abstract in their format to manually analyze and estimate statistics on certain columns or patterns. DataVec comes with some helper utilities for performing a data analysis, and maximums, means, minimums, and other useful metrics.
-
-## Using Spark for analysis
-
-If you have loaded your data into Apache Spark, DataVec has a special `AnalyzeSpark` class which can generate histograms, collect statistics, and return information about the quality of the data. Assuming you have already loaded your data into a Spark RDD, pass the `JavaRDD` and `Schema` to the class.
-
-If you are using DataVec in Scala and your data was loaded into a regular `RDD` class, you can convert it by calling `.toJavaRDD()` which returns a `JavaRDD`. If you need to convert it back, call `rdd()`.
-
-The code below demonstrates some of many analyses for a 2D dataset in Spark analysis using the RDD `javaRdd` and the schema `mySchema`:
-
-```java
-import org.datavec.spark.transform.AnalyzeSpark;
-import org.datavec.api.writable.Writable;
-import org.datavec.api.transform.analysis.*;
-
-int maxHistogramBuckets = 10
-DataAnalysis analysis = AnalyzeSpark.analyze(mySchema, javaRdd, maxHistogramBuckets)
-
-DataQualityAnalysis analysis = AnalyzeSpark.analyzeQuality(mySchema, javaRdd)
-
-Writable max = AnalyzeSpark.max(javaRdd, "myColumn", mySchema)
-
-int numSamples = 5
-List<Writable> sample = AnalyzeSpark.sampleFromColumn(numSamples, "myColumn", mySchema, javaRdd)
-```
-
-Note that if you have sequence data, there are special methods for that as well:
-
-```java
-SequenceDataAnalysis seqAnalysis = AnalyzeSpark.analyzeSequence(mySchema, sequenceRdd)
-
-List<Writable> uniqueSequence = AnalyzeSpark.getUniqueSequence("myColumn", seqSchema, sequenceRdd)
-```
-
-## Analyzing locally
-
-The `AnalyzeLocal` class works very similarly to its Spark counterpart and has a similar API. Instead of passing an RDD, it accepts a `RecordReader` which allows it to iterate over the dataset.
-
-```java
-import org.datavec.local.transforms.AnalyzeLocal;
-
-int maxHistogramBuckets = 10
-DataAnalysis analysis = AnalyzeLocal.analyze(mySchema, csvRecordReader, maxHistogramBuckets)
-```
-
-## Utilities
-
-{{autogenerated}}
--- a/docs/datavec/templates/conditions.md
+++ b/docs/datavec/templates/conditions.md
@ -1,11 +0,0 @@
---
-title: DataVec Conditions
-short_title: Conditions
-description: Rules for triggering operations and transformations.
-category: DataVec
-weight: 3
---
-
-## Available conditions
-
-{{autogenerated}}
--- a/docs/datavec/templates/executors.md
+++ b/docs/datavec/templates/executors.md
@ -1,43 +0,0 @@
---
-title: DataVec Executors
-short_title: Executors
-description: Execute ETL and vectorization in a local instance.
-category: DataVec
-weight: 3
---
-
-## Local or remote execution?
-
-Because datasets are commonly large by nature, you can decide on an execution mechanism that best suits your needs. For example, if you are vectorizing a large training dataset, you can process it in a distributed Spark cluster. However, if you need to do real-time inference, DataVec also provides a local executor that doesn't require any additional setup.
-
-## Executing a transform process
-
-Once you've created your `TransformProcess` using your `Schema`, and you've either loaded your dataset into a Apache Spark `JavaRDD` or have a `RecordReader` that load your dataset, you can execute a transform.
-
-Locally this looks like:
-
-```java
-import org.datavec.local.transforms.LocalTransformExecutor;
-
-List<List<Writable>> transformed = LocalTransformExecutor.execute(recordReader, transformProcess)
-
-List<List<List<Writable>>> transformedSeq = LocalTransformExecutor.executeToSequence(sequenceReader, transformProcess)
-
-List<List<Writable>> joined = LocalTransformExecutor.executeJoin(join, leftReader, rightReader)
-```
-
-When using Spark this looks like:
-
-```java
-import org.datavec.spark.transforms.SparkTransformExecutor;
-
-JavaRDD<List<Writable>> transformed = SparkTransformExecutor.execute(inputRdd, transformProcess)
-
-JavaRDD<List<List<Writable>>> transformedSeq = SparkTransformExecutor.executeToSequence(inputSequenceRdd, transformProcess)
-
-JavaRDD<List<Writable>> joined = SparkTransformExecutor.executeJoin(join, leftRdd, rightRdd)
-```
-
-## Available executors
-
-{{autogenerated}}
--- a/docs/datavec/templates/filters.md
+++ b/docs/datavec/templates/filters.md
@ -1,23 +0,0 @@
---
-title: DataVec Filters
-short_title: Filters
-description: Selection of data using conditions.
-category: DataVec
-weight: 3
---
-
-## Using filters
-
-Filters are a part of transforms and gives a DSL for you to keep parts of your dataset. Filters can be one-liners for single conditions or include complex boolean logic.
-
-```java
-TransformProcess tp = new TransformProcess.Builder(inputDataSchema)
-    .filter(new ConditionFilter(new CategoricalColumnCondition("MerchantCountryCode", ConditionOp.NotInSet, new HashSet<>(Arrays.asList("USA","CAN")))))
-    .build();
-```
-
-You can also write your own filters by implementing the `Filter` interface, though it is much more often that you may want to create a custom condition instead.
-
-## Available filters
-
-{{autogenerated}}
--- a/docs/datavec/templates/normalization.md
+++ b/docs/datavec/templates/normalization.md
@ -1,15 +0,0 @@
---
-title: DataVec Normalization
-short_title: Normalization
-description: Preparing data in the right shape and range for learning.
-category: DataVec
-weight: 5
---
-
-## Why normalize?
-
-Neural networks work best when the data they’re fed is normalized, constrained to a range between -1 and 1. There are several reasons for that. One is that nets are trained using gradient descent, and their activation functions usually having an active range somewhere between -1 and 1. Even when using an activation function that doesn’t saturate quickly, it is still good practice to constrain your values to this range to improve performance.
-
-## Available preprocessors
-
-{{autogenerated}}
--- a/docs/datavec/templates/operations.md
+++ b/docs/datavec/templates/operations.md
@ -1,33 +0,0 @@
---
-title: DataVec Operations
-short_title: Operations
-description: Implementations for advanced transformation.
-category: DataVec
-weight: 3
---
-
-## Usage
-
-Operations, such as a `Function`, help execute transforms and load data into DataVec. The concept of operations is low-level, meaning that most of the time you will not need to worry about them.
-
-## Loading data into Spark
-
-If you're using Apache Spark, functions will iterate over the dataset and load it into a Spark `RDD` and convert the raw data format into a `Writable`.
-
-```java
-import org.datavec.api.writable.Writable;
-import org.datavec.api.records.reader.impl.csv.CSVRecordReader;
-import org.datavec.spark.transform.misc.StringToWritablesFunction;
-
-SparkConf conf = new SparkConf();
-JavaSparkContext sc = new JavaSparkContext(conf)
-
-String customerInfoPath = new ClassPathResource("CustomerInfo.csv").getFile().getPath();
-JavaRDD<List<Writable>> customerInfo = sc.textFile(customerInfoPath).map(new StringToWritablesFunction(rr));
-```
-
-The above code loads a CSV file into a 2D java RDD. Once your RDD is loaded, you can transform it, perform joins and use reducers to wrangle the data any way you want.
-
-## Available ops
-
-{{autogenerated}}
--- a/docs/datavec/templates/overview.md
+++ b/docs/datavec/templates/overview.md
@ -1,121 +0,0 @@
---
-title: DataVec Overview
-short_title: Overview
-description: Overview of the vectorization and ETL library for DL4J.
-category: DataVec
-weight: 0
---
-
-## DataVec: A Vectorization and ETL Library
-
-DataVec solves one of the most important obstacles to effective machine or deep learning: getting data into a format that neural nets can understand. Nets understand vectors. Vectorization is the first problem many data scientists will have to solve to start training their algorithms on data. Datavec should be used for 99% of your data transformations, if you are not sure if this applies to you, please consult the [gitter](https://gitter.im/deeplearning4j/deeplearning4j). Datavec supports most data formats you could want out of the box, but you may also implement your own custom record reader as well.
-
-If your data is in CSV (Comma Seperated Values) format stored in flat files that must be converted to numeric and ingested, or your data is a directory structure of labelled images then DataVec is the tool to help you organize that data for use in DeepLearning4J. 
-
-
-Please **read this entire page**, particularly the section [Reading Records](#record) below, before working with DataVec.
-
-
-
-## Introductory Video
-
-This video describes the conversion of image data to a vector. 
-
-<iframe width="420" height="315" src="https://www.youtube.com/embed/EHHtyRKQIJ0" frameborder="0" allowfullscreen></iframe>
-
-## Key Aspects
- [DataVec](https://github.com/eclipse/deeplearning4j/tree/master/datavec) uses an input/output format system (similar in some ways to how Hadoop MapReduce uses InputFormat to determine InputSplits and RecordReaders, DataVec also provides RecordReaders to Serialize Data)
- Designed to support all major types of input data (text, CSV, audio, image and video) with these specific input formats
- Uses an output format system to specify an implementation-neutral type of vector format (SVMLight, etc.)
- Can be extended for specialized input formats (such as exotic image formats); i.e. You can write your own custom input format and let the rest of the codebase handle the transformation pipeline
- Makes vectorization a first-class citizen
- Built in Transformation tools to convert and normalize data
- Please see the [DataVec Javadoc](/api/{{page.version}}/) here
-
-There's a <a href="#tutorial">brief tutorial below</a>.
-
-## A Few Examples
-
- * Convert the CSV-based UCI Iris dataset into svmLight open vector text format
- * Convert the MNIST dataset from raw binary files to the svmLight text format.
- * Convert raw text into the Metronome vector format
- * Convert raw text into TF-IDF based vectors in a text vector format {svmLight, metronome}
- * Convert raw text into the word2vec in a text vector format {svmLight, metronome}
-
-## Targeted Vectorization Engines
-
- * Any CSV to vectors with a scriptable transform language
- * MNIST to vectors
- * Text to vectors
-    * TF-IDF
-    * Bag of Words
-    * word2vec
-
-## CSV Transformation Engine
-
-If data is numeric and appropriately formatted then CSVRecordReader may be satisfactory.  If however your data has non-numeric fields such as strings representing boolean (T/F) or strings for labels then a Schema Transformation will be required. DataVec uses apache [Spark](http://spark.apache.org/) to perform transform operations. *note you do not need to know the internals of Spark to be succesful with DataVec Transform
-
-## Schema Transformation Video
-
-A video tutorial of a simple DataVec transform along with code is available below.
-<iframe width="560" height="315" src="https://www.youtube.com/embed/MLEMw2NxjxE" frameborder="0" allowfullscreen></iframe>
-
-## Example Java Code
-
-Our [examples](https://github.com/eclipse/deeplearning4j-examples) include a collection of DataVec examples.   
-
-<!-- Note to Tom, write DataVec setup content
-
-## <a name="tutorial">Setting Up DataVec</a>
-
-Search for [DataVec](https://search.maven.org/#search%7Cga%7C1%7CDataVec) on Maven Central to get a list of JARs you can use.
-
-Add the dependency information into your pom.xml.
-
-->
-
-
-## <a name="record">Reading Records, Iterating Over Data</a>
-
-The following code shows how to work with one example, raw images, transforming them into a format that will work well with DL4J and ND4J:
-
-``` java
-// Instantiating RecordReader. Specify height, width and channels of images.
-// Note that for grayscale output, channels = 1, whereas for RGB images, channels = 3
-RecordReader recordReader = new ImageRecordReader(28, 28, 3);
-
-// Point to data path. 
-recordReader.initialize(new FileSplit(new File(labeledPath)));
-```
-
-The RecordReader is a class in DataVec that helps convert the byte-oriented input into data that's oriented toward a record; i.e. a collection of elements that are fixed in number and indexed with a unique ID. Converting data to records is the process of vectorization. The record itself is a vector, each element of which is a feature.
-
-The [ImageRecordReader](https://github.com/eclipse/deeplearning4j/tree/master/datavec/blob/a64389c08396bb39626201beeabb7c4d5f9288f9/datavec-data/datavec-data-image/src/main/java/org/datavec/image/recordreader/ImageRecordReader.java) is a subclass of the RecordReader and is built to automatically take in 28 x 28 pixel images. Thus, LFW images are scaled to 28 pixels x 28 pixels. You can change dimensions to match your custom images by changing the parameters fed to the ImageRecordReader, as long as you make sure to adjust the `nIn` hyperparameter, which will be equal to the product of image height x image width. 
-
-Other parameters shown above include `true`, which instructs the reader to append a label to the record, and `labels`, which is the array of supervised values (e.g. targets) used to validate neural net model results. Here are all the RecordReader extensions that come pre-built with DataVec (you can find them by right-clicking on `RecordReader` in IntelliJ, clicking `Go To` in the drop-down menu, and selection `Implementations`):
-
-![Alt text](/images/guide/recordreader_extensions.png)
-
-The DataSetIterator is a Deeplearning4J class that traverses the elements of a list. Iterators pass through the data list, accesses each item sequentially, keeps track of how far it has progressed by pointing to its current element, and modifies itself to point to the next element with each new step in the traversal.
-
-``` java
-// DataVec to DL4J
-DataSetIterator iter = new RecordReaderDataSetIterator(recordReader, 784, labels.size());
-```
-
-The DataSetIterator iterates through input datasets, fetching one or more new examples with each iteration, and loading those examples into a DataSet object that neural nets can work with. Note that ImageRecordReader produces image data with 4 dimensions that matches DL4J's expected activations layout. Thus, each 28x28 RGB image is represented as a 4d array, with dimensions [minibatch, channels, height, width] = [1, 3, 28, 28]. Note that the constructor line above also specifies the number of labels possible.
-Note also that ImageRecordReader does not normalize the image data, thus each pixel/channel value will be in the range 0 to 255 (and generally should be normalized separately - for example using ND4J's ImagePreProcessingScaler or another normalizer.
-
-`RecordReaderDataSetIterator` can take as parameters the specific recordReader you want (for images, sound, etc.) and the batch size. For supervised learning, it will also take a label index and the number of possible labels that can be applied to the input (for LFW, the number of labels is 5,749).
-
-## Execution
-
-Runs as both a local serial process and a MapReduce (MR engine on the roadmap) scale-out process with no code changes.
-
-## Targetted Vector Formats
-* svmLight
-* libsvm
-* Metronome
-
-## Built-In General Functionality
-* Understands how to take general text and convert it into vectors with stock techniques such as kernel hashing and TF-IDF
--- a/docs/datavec/templates/readers.md
+++ b/docs/datavec/templates/readers.md
@ -1,32 +0,0 @@
---
-title: DataVec Readers
-short_title: Readers
-description: Read individual records from different formats.
-category: DataVec
-weight: 2
---
-
-## Why readers?
-
-Readers iterate records from a dataset in storage and load the data into DataVec. The usefulness of readers beyond individual entries in a dataset includes: what if you wanted to train a text generator on a corpus? Or programmatically compose two entries together to form a new record? Reader implementations are useful for complex file types or distributed storage mechanisms.
-
-Readers return `Writable` classes that describe each column in a `Record`. These classes are used to convert each record to a tensor/ND-Array format.
-
-## Usage
-
-Each reader implementation extends `BaseRecordReader` and provides a simple API for selecting the next record in a dataset, acting similarly to iterators.
-
-Useful methods include:
-
- `next`: Return a batch of `Writable`.
- `nextRecord`: Return a single `Record`, optionally with `RecordMetaData`.
- `reset`: Reset the underlying iterator.
- `hasNext`: Iterator method to determine if another record is available.
-
-## Listeners
-
-You can hook a custom `RecordListener` to a record reader for debugging or visualization purposes. Pass your custom listener to the `addListener` base method immediately after initializing your class.
-
-## Types of readers
-
-{{autogenerated}}
--- a/docs/datavec/templates/records.md
+++ b/docs/datavec/templates/records.md
@ -1,19 +0,0 @@
---
-title: DataVec Records
-short_title: Records
-description: How to use data records in DataVec.
-category: DataVec
-weight: 1
---
-
-## What is a record?
-
-In the DataVec world a Record represents a single entry in a dataset. DataVec differentiates types of records to make data manipulation easier with built-in APIs. Sequences and 2D records are distinguishable.
-
-## Using records
-
-Most of the time you do not need to interact with the record classes directly, unless you are manually iterating records for the purpose of forwarding through a neural network.
-
-## Types of records
-
-{{autogenerated}}
--- a/docs/datavec/templates/reductions.md
+++ b/docs/datavec/templates/reductions.md
@ -1,11 +0,0 @@
---
-title: DataVec Reductions
-short_title: Reductions
-description: Operations for reducing complexity in data.
-category: DataVec
-weight: 1
---
-
-## Available reductions
-
-{{autogenerated}}
--- a/docs/datavec/templates/schema.md
+++ b/docs/datavec/templates/schema.md
@ -1,60 +0,0 @@
---
-title: DataVec Schema
-short_title: Schema
-description: Schemas for datasets and transformation.
-category: DataVec
-weight: 1
---
-
-## Why use schemas?
-
-The unfortunate reality is that data is *dirty*. When trying to vecotrize a dataset for deep learning, it is quite rare to find files that have zero errors. Schema is important for maintaining the meaning of the data before using it for something like training a neural network. 
-
-## Using schemas
-
-Schemas are primarily used for programming transformations. Before you can properly execute a `TransformProcess` you will need to pass the schema of the data being transformed. 
-
-An example of a schema for merchant records may look like:
-
-```java
-Schema inputDataSchema = new Schema.Builder()
-    .addColumnsString("DateTimeString", "CustomerID", "MerchantID")
-    .addColumnInteger("NumItemsInTransaction")
-    .addColumnCategorical("MerchantCountryCode", Arrays.asList("USA","CAN","FR","MX"))
-    .addColumnDouble("TransactionAmountUSD",0.0,null,false,false)   //$0.0 or more, no maximum limit, no NaN and no Infinite values
-    .addColumnCategorical("FraudLabel", Arrays.asList("Fraud","Legit"))
-    .build();
-```
-
-## Joining schemas
-
-If you have two different datasets that you want to merge together, DataVec provides a `Join` class with different join strategies such as `Inner` or `RightOuter`.
-
-```java
-Schema customerInfoSchema = new Schema.Builder()
-    .addColumnLong("customerID")
-    .addColumnString("customerName")
-    .addColumnCategorical("customerCountry", Arrays.asList("USA","France","Japan","UK"))
-    .build();
-
-Schema customerPurchasesSchema = new Schema.Builder()
-    .addColumnLong("customerID")
-    .addColumnTime("purchaseTimestamp", DateTimeZone.UTC)
-    .addColumnLong("productID")
-    .addColumnInteger("purchaseQty")
-    .addColumnDouble("unitPriceUSD")
-    .build();
-
-Join join = new Join.Builder(Join.JoinType.Inner)
-    .setJoinColumns("customerID")
-    .setSchemas(customerInfoSchema, customerPurchasesSchema)
-    .build();
-```
-
-Once you've defined your join and you've loaded the data into DataVec, you must use an `Executor` to complete the join.
-
-## Classes and utilities
-
-DataVec comes with a few `Schema` classes and helper utilities for 2D and sequence types of data.
-
-{{autogenerated}}
--- a/docs/datavec/templates/serialization.md
+++ b/docs/datavec/templates/serialization.md
@ -1,32 +0,0 @@
---
-title: DataVec Serialization
-short_title: Serialization
-description: Data wrangling and mapping from one schema to another.
-category: DataVec
-weight: 1
---
-
-## Serializing transforms
-
-DataVec comes with the ability to serialize transforms, which allows them to be more portable when they're needed for production environments. A `TransformProcess` is serialzied to a human-readable format such as JSON and can be saved as a file.
-
-## Serialization
-
-The code below shows how you can serialize the transform process `tp`.
-
-```java
-String serializedTransformString = tp.toJson()
-```
-
-## Deserialization
-
-When you want to reinstantiate the transform process, call the static `from<format>` method.
-
-```java
-TransformProcess tp = TransformProcess.fromJson(serializedTransformString)
-```
-
-
-## Available serializers
-
-{{autogenerated}}
--- a/docs/datavec/templates/transforms.md
+++ b/docs/datavec/templates/transforms.md
@ -1,64 +0,0 @@
---
-title: DataVec Transforms
-short_title: Transforms
-description: Data wrangling and mapping from one schema to another.
-category: DataVec
-weight: 1
---
-
-## Data wrangling
-
-One of the key tools in DataVec is transformations. DataVec helps the user map a dataset from one schema to another, and provides a list of operations to convert types, format data, and convert a 2D dataset to sequence data.
-
-## Building a transform process
-
-A transform process requires a `Schema` to successfully transform data. Both schema and transform process classes come with a helper `Builder` class which are useful for organizing code and avoiding complex constructors.
-
-When both are combined together they look like the sample code below. Note how `inputDataSchema` is passed into the `Builder` constructor. Your transform process will fail to compile without it.
-
-```java
-import org.datavec.api.transform.TransformProcess;
-
-TransformProcess tp = new TransformProcess.Builder(inputDataSchema)
-    .removeColumns("CustomerID","MerchantID")
-    .filter(new ConditionFilter(new CategoricalColumnCondition("MerchantCountryCode", ConditionOp.NotInSet, new HashSet<>(Arrays.asList("USA","CAN")))))
-    .conditionalReplaceValueTransform(
-        "TransactionAmountUSD",     //Column to operate on
-        new DoubleWritable(0.0),    //New value to use, when the condition is satisfied
-        new DoubleColumnCondition("TransactionAmountUSD",ConditionOp.LessThan, 0.0)) //Condition: amount < 0.0
-    .stringToTimeTransform("DateTimeString","YYYY-MM-DD HH:mm:ss.SSS", DateTimeZone.UTC)
-    .renameColumn("DateTimeString", "DateTime")
-    .transform(new DeriveColumnsFromTimeTransform.Builder("DateTime").addIntegerDerivedColumn("HourOfDay", DateTimeFieldType.hourOfDay()).build())
-    .removeColumns("DateTime")
-    .build();
-```
-
-## Executing a transformation
-
-Different "backends" for executors are available. Using the `tp` transform process above, here's how you can execute it locally using plain DataVec.
-
-```java
-import org.datavec.local.transforms.LocalTransformExecutor;
-
-List<List<Writable>> processedData = LocalTransformExecutor.execute(originalData, tp);
-```
-
-## Debugging
-
-Each operation in a transform process represents a "step" in schema changes. Sometimes, the resulting transformation is not the intended result. You can debug this by printing each step in the transform `tp` with the following:
-
-```java
-//Now, print the schema after each time step:
-int numActions = tp.getActionList().size();
-
-for(int i=0; i<numActions; i++ ){
-    System.out.println("\n\n==================================================");
-    System.out.println("-- Schema after step " + i + " (" + tp.getActionList().get(i) + ") --");
-
-    System.out.println(tp.getSchemaAfterStep(i));
-}
-```
-
-## Available transformations and conversions
-
-{{autogenerated}}
--- a/docs/datavec/templates/visualization.md
+++ b/docs/datavec/templates/visualization.md
@ -1,11 +0,0 @@
---
-title: DataVec Visualization
-short_title: Visualization
-description: UI for visualizing data in DataVec.
-category: DataVec
-weight: 10
---
-
-## Utilities
-
-{{autogenerated}}
--- a/docs/deeplearning4j-nlp/README.md
+++ b/docs/deeplearning4j-nlp/README.md
@ -1,10 +0,0 @@
-# deeplearning4j-nlp documentation
-
-To generate docs into the `deeplearning4j-nlp/doc_sources` folder, first `cd docs` then run:
-
-```shell
-python generate_docs.py \
-    --project deeplearning4j-nlp \
-    --code ../deeplearning4j
-	--out_language en
-```
--- a/docs/deeplearning4j-nlp/pages.json
+++ b/docs/deeplearning4j-nlp/pages.json
@ -1,34 +0,0 @@
-{
-  "excludes": [
-    "abstract"
-  ],
-  "indices": [
-  ],
-  "pages": [
-    {
-      "page": "overview.md",
-      "class": []
-    },
-    {
-      "page": "word2vec.md",
-      "class": []
-    },
-    {
-      "page": "doc2vec.md",
-      "class": []
-    },
-    {
-      "page": "sentence-iterator.md",
-      "class": []
-    },
-    {
-      "page": "tokenization.md",
-      "class": []
-    },
-    {
-      "page": "vocabulary-cache.md",
-      "class": []
-    }
-  ]
-}
-
--- a/docs/deeplearning4j-nlp/templates/doc2vec.md
+++ b/docs/deeplearning4j-nlp/templates/doc2vec.md
@ -1,47 +0,0 @@
---
-title: Doc2Vec, or Paragraph Vectors, in Deeplearning4j
-short_title: Doc2Vec
-description: Doc2Vec and arbitrary documents for language processing in DL4J.
-category: Language Processing
-weight: 10
---
-
-## Doc2Vec, or Paragraph Vectors, in Deeplearning4j
-
-The main purpose of Doc2Vec is associating arbitrary documents with labels, so labels are required. Doc2vec is an extension of word2vec that learns to correlate labels and words, rather than words with other words. Deeplearning4j's implentation is intended to serve the Java, [Scala](./scala.html) and Clojure communities. 
-
-The first step is coming up with a vector that represents the "meaning" of a document, which can then be used as input to a supervised machine learning algorithm to associate documents with labels.
-
-In the ParagraphVectors builder pattern, the `labels()` method points to the labels to train on. In the example below, you can see labels related to sentiment analysis:
-
-``` java
-    .labels(Arrays.asList("negative", "neutral","positive"))
-```
-
-Here's a full working example of [classification with paragraph vectors](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/nlp/paragraphvectors/ParagraphVectorsClassifierExample.java):
-
-``` java
-    public void testDifferentLabels() throws Exception {
-        ClassPathResource resource = new ClassPathResource("/labeled");
-        File file = resource.getFile();
-        LabelAwareSentenceIterator iter = LabelAwareUimaSentenceIterator.createWithPath(file.getAbsolutePath());
-
-        TokenizerFactory t = new UimaTokenizerFactory();
-
-        ParagraphVectors vec = new ParagraphVectors.Builder()
-                .minWordFrequency(1).labels(Arrays.asList("negative", "neutral","positive"))
-                .layerSize(100)
-                .stopWords(new ArrayList<String>())
-                .windowSize(5).iterate(iter).tokenizerFactory(t).build();
-
-        vec.fit();
-
-        assertNotEquals(vec.lookupTable().vector("UNK"), vec.lookupTable().vector("negative"));
-        assertNotEquals(vec.lookupTable().vector("UNK"),vec.lookupTable().vector("positive"));
-        assertNotEquals(vec.lookupTable().vector("UNK"),vec.lookupTable().vector("neutral"));}
-```
-
-### Further Reading
-
-* [Distributed Representations of Sentences and Documents](https://cs.stanford.edu/~quocle/paragraph_vector.pdf)
-* [Word2vec: A Tutorial](./word2vec)
--- a/docs/deeplearning4j-nlp/templates/overview.md
+++ b/docs/deeplearning4j-nlp/templates/overview.md
@ -1,43 +0,0 @@
---
-title: Deeplearning4j's NLP Functionality
-short_title: Overview
-description: Overview of language processing in DL4J
-category: Language Processing
-weight: 0
---
-
-## Deeplearning4j's NLP Functionality
-
-Although not designed to be comparable to tools such as Stanford CoreNLP or NLTK, deepLearning4J does include some core text processing tools that are described here.
-
-Deeplearning4j's NLP relies on [ClearTK](https://cleartk.github.io/cleartk/), an open-source machine learning and natural language processing framework for the Apache [Unstructured Information Management Architecture](https://uima.apache.org/), or UIMA. UIMA enables us to perform language identification, language-specific segmentation, sentence boundary detection and entity detection (proper nouns: persons, corporations, places and things).
-
-### SentenceIterator
-
-There are several steps involved in processing natural language. The first is to iterate over your corpus to create a list of documents, which can be as short as a tweet, or as long as a newspaper article. This is performed by a SentenceIterator, which will appear like this:
-
-<script src="https://gist-it.appspot.com/https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/nlp/word2vec/Word2VecRawTextExample.java?slice=33:41"></script>
-
-The SentenceIterator encapsulates a corpus or text, organizing it, say, as one Tweet per line. It is responsible for feeding text piece by piece into your natural language processor. The SentenceIterator is not analogous to a similarly named class, the DatasetIterator, which creates a dataset for training a neural net. Instead it creates a collection of strings by segmenting a corpus.
-
-### Tokenizer
-
-A Tokenizer further segments the text at the level of single words, also alternatively as n-grams. ClearTK contains the underlying tokenizers, such as parts of speech (PoS) and parse trees, which allow for both dependency and constituency parsing, like that employed by a recursive neural tensor network (RNTN).
-
-A Tokenizer is created and wrapped by a [TokenizerFactory](https://github.com/eclipse/deeplearning4j/blob/6f027fd5075e3e76a38123ae5e28c00c17db4361/deeplearning4j-scaleout/deeplearning4j-nlp/src/main/java/org/deeplearning4j/text/tokenization/tokenizerfactory/UimaTokenizerFactory.java). The default tokens are words separated by spaces. The tokenization process also involves some machine learning to differentiate between ambibuous symbols like . which end sentences and also abbreviate words such as Mr. and vs.
-
-Both Tokenizers and SentenceIterators work with Preprocessors to deal with anomalies in messy text like Unicode, and to render such text, say, as lowercase characters uniformly.
-
-<script src="https://gist-it.appspot.com/https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/nlp/word2vec/Word2VecRawTextExample.java?slice=43:57"></script>
-
-
-
-### Vocab
-
-Each document has to be tokenized to create a vocab, the set of words that matter for that document or corpus. Those words are stored in the vocab cache, which contains statistics about a subset of words counted in the document, the words that "matter". The line separating significant and insignifant words is mobile, but the basic idea of distinguishing between the two groups is that words occurring only once (or less than, say, five times) are hard to learn and their presence represents unhelpful noise.
-
-The vocab cache stores metadata for methods such as Word2vec and Bag of Words, which treat words in radically different ways. Word2vec creates representations of words, or neural word embeddings, in the form of vectors that are hundreds of coefficients long. Those coefficients help neural nets predict the likelihood of a word appearing in any given context; for example, after another word. Here's Word2vec, configured:
-
-<script src="https://gist-it.appspot.com/https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/nlp/word2vec/Word2VecRawTextExample.java"></script>
-
-Once you obtain word vectors, you can feed them into a deep net for classification, prediction, sentiment analysis and the like.
--- a/docs/deeplearning4j-nlp/templates/sentence-iterator.md
+++ b/docs/deeplearning4j-nlp/templates/sentence-iterator.md
@ -1,56 +0,0 @@
---
-title: Sentence Iteration
-short_title: Sentence Iteration
-description: Iteration of words, documents, and sentences for language processing in DL4J.
-category: Language Processing
-weight: 10
---
-
-## Sentence iterator
-
-A [sentence iterator](./doc/org/deeplearning4j/word2vec/sentenceiterator/SentenceIterator.html) is used in both [Word2vec](./word2vec.html) and [Bag of Words](./bagofwords-tf-idf.html).
-
-It feeds bits of text into a neural network in the form of vectors, and also covers the concept of documents in text processing.
-
-In natural-language processing, a document or sentence is typically used to encapsulate a context which an algorithm should learn.
-
-A few examples include analyzing Tweets and full-blown news articles. The purpose of the [sentence iterator](./doc/org/deeplearning4j/word2vec/sentenceiterator/SentenceIterator.html) is to divide text into processable bits. Note the sentence iterator is input agnostic. So bits of text (a document) can come from a file system, the Twitter API or Hadoop.
-
-Depending on how input is processed, the output of a sentence iterator will then be passed to a [tokenizer](./org/deeplearning4j/word2vec/tokenizer/Tokenizer.html) for the processing of individual tokens, which are usually words, but could also be ngrams, skipgrams or other units. The tokenizer is created on a per-sentence basis by a [tokenizer factory](./doc/org/deeplearning4j/word2vec/tokenizer/TokenizerFactory.html). The tokenizer factory is what is passed into a text-processing vectorizer. 
-
-Some typical examples are below:
-
-         SentenceIterator iter = new LineSentenceIterator(new File("your file"));
-
-This assumes that each line in a file is a sentence.
-
-You can also do list of strings as sentence as follows:
-
-	     Collection<String> sentences = ...;
-	     SentenceIterator iter = new CollectionSentenceIterator(sentences);
-
-This will assume that each string is a sentence (document). Remember this could be a list of Tweets or articles -- both are applicable.
-
-You can iterate over files as follows:
-          
-          SentenceIterator iter = new FileSentenceIterator(new File("your dir or file"));
-
-This will parse the files line by line and return individual sentences on each one.
-
-For anything complex, we recommend an actual machine-learning level pipeline, represented by the [UimaSentenceIterator](./doc/org/deeplearning4j/text/sentenceiterator/UimaSentenceIterator.html).
-
-The UimaSentenceIterator is capable of tokenization, part-of-speech tagging and lemmatization, among other things. The UimaSentenceIterator iterates over a set of files and can segment sentences. You can customize its behavior based on the AnalysisEngine passed into it.
-
-The AnalysisEngine is the [UIMA](http://uima.apache.org/) concept of a text-processing pipeline. DeepLearning4j comes with standard analysis engines for all of these common tasks, allowing you to customize which text is being passed in and how you define sentences. The AnalysisEngines are thread-safe versions of the [opennlp](http://opennlp.apache.org/) pipelines. We also include [cleartk](http://cleartk.googlecode.com/)-based pipelines for handling common tasks.
-
-For those using UIMA or curious about it, this employs the cleartk type system for tokens, sentences, and other annotations within the type system.
-
-Here's how to create a UimaSentenceItrator.
-
-            SentenceIterator iter = UimaSentenceIterator.create("path/to/your/text/documents");
-
-You can also instantiate directly:
-
-			SentenceIterator iter = new UimaSentenceIterator(path,AnalysisEngineFactory.createEngine(AnalysisEngineFactory.createEngineDescription(TokenizerAnnotator.getDescription(), SentenceAnnotator.getDescription())));
-
-For those familiar with Uima, this uses Uimafit extensively to create analysis engines. You can also create custom sentence iterators by extending SentenceIterator.
--- a/docs/deeplearning4j-nlp/templates/tokenization.md
+++ b/docs/deeplearning4j-nlp/templates/tokenization.md
@ -1,31 +0,0 @@
---
-title: Tokenization
-short_title: Tokenization
-description: Breaking text into individual words for language processing in DL4J.
-category: Language Processing
-weight: 10
---
-
-## What is Tokenization?
-
-Tokenization is the process of breaking text down into individual words. Word windows are also composed of tokens. [Word2Vec](./word2vec.html) can output text windows that comprise training examples for input into neural nets, as seen here.
-
-## Example
-
-Here's an example of tokenization done with DL4J tools:
-                 
-         //tokenization with lemmatization,part of speech taggin,sentence segmentation
-         TokenizerFactory tokenizerFactory = new UimaTokenizerFactory();
-         Tokenizer tokenizer = tokenizerFactory.tokenize("mystring");
-
-          //iterate over the tokens
-          while(tokenizer.hasMoreTokens()) {
-          	   String token = tokenizer.nextToken();
-          }
-          
-          //get the whole list of tokens
-          List<String> tokens = tokenizer.getTokens();
-
-The above snippet creates a tokenizer capable of stemming.
-
-In Word2Vec, that's the recommended a way of creating a vocabulary, because it averts various vocabulary quirks, such as the singular and plural of the same noun being counted as two different words.
--- a/docs/deeplearning4j-nlp/templates/vocabulary-cache.md
+++ b/docs/deeplearning4j-nlp/templates/vocabulary-cache.md
@ -1,26 +0,0 @@
---
-title: Vocabulary Cache
-short_title: Vocab Cache
-description: Mechanism for handling general NLP tasks in DL4J.
-category: Language Processing
-weight: 10
---
-
-# How the Vocab Cache Works
-
-The vocabulary cache, or vocab cache, is a mechanism for handling general-purpose natural-language tasks in Deeplearning4j, including normal TF-IDF, word vectors and certain information-retrieval techniques. The goal of the vocab cache is to be a one-stop shop for text vectorization, encapsulating techniques common to bag of words and word vectors, among others.
-
-Vocab cache handles storage of tokens, word-count frequencies, inverse-document frequencies and document occurrences via an inverted index. The InMemoryLookupCache is the reference implementation.
-
-In order to use a vocab cache as you iterate over text and index tokens, you need to figure out if the tokens should be included in the vocab. The criterion is usually if tokens occur with more than a certain pre-configured frequency in the corpus. Below that frequency, an individual token isn't a vocab word, and it remains just a token. 
-
-We track tokens as well. In order to track tokens, do the following:
-
-        addToken(new VocabWord(1.0,"myword"));
-
-When you want to add a vocab word, do the following:
-
-        addWordToIndex(0, Word2Vec.UNK);
-        putVocabWord(Word2Vec.UNK);
-
-Adding the word to the index sets the index. Then you declare it as a vocab word. (Declaring it as a vocab word will pull the word from the index.)
--- a/docs/deeplearning4j-nlp/templates/word2vec.md
+++ b/docs/deeplearning4j-nlp/templates/word2vec.md
@ -1,495 +0,0 @@
---
-title: Word2Vec in Deeplearning4j
-short_title: Word2Vec
-description: Neural word embeddings for NLP in DL4J.
-category: Language Processing
-weight: 2
---
-
-## Word2Vec, Doc2vec & GloVe: Neural Word Embeddings for Natural Language Processing
-
-Contents
-
-* <a href="#intro">Introduction</a>
-* <a href="#embed">Neural Word Embeddings</a>
-* <a href="#crazy">Amusing Word2vec Results</a>
-* <a href="#just">**Just Give Me the Code**</a>
-* <a href="#anatomy">Anatomy of Word2Vec</a>
-* <a href="#setup">Setup, Load and Train</a>
-* <a href="#code">A Code Example</a>
-* <a href="#trouble">Troubleshooting & Tuning Word2Vec</a>
-* <a href="#use">Word2vec Use Cases</a>
-* <a href="#foreign">Foreign Languages</a>
-* <a href="#glove">GloVe (Global Vectors) & Doc2Vec</a>
-
-## <a name="intro">Introduction to Word2Vec</a>
-
-Word2vec is a two-layer neural net that processes text. Its input is a text corpus and its output is a set of vectors: feature vectors for words in that corpus. While Word2vec is not a [deep neural network](https://skymind.ai/wiki/neural-network), it turns text into a numerical form that deep nets can understand. [Deeplearning4j](./deeplearning4j-quickstart) implements a distributed form of Word2vec for Java and Scala, which works on Spark with GPUs. 
-
-Word2vec's applications extend beyond parsing sentences in the wild. It can be applied just as well to <a href="#sequence">genes, code, likes, playlists, social media graphs and other verbal or symbolic series</a> in which patterns may be discerned. 
-
-Why? Because words are simply discrete states like the other data mentioned above, and we are simply looking for the transitional probabilities between those states: the likelihood that they will co-occur. So gene2vec, like2vec and follower2vec are all possible. With that in mind, the tutorial below will help you understand how to create neural embeddings for any group of discrete and co-occurring states. 
-
-The purpose and usefulness of Word2vec is to group the vectors of similar words together in vectorspace. That is, it detects similarities mathematically. Word2vec creates vectors that are distributed numerical representations of word features, features such as the context of individual words. It does so without human intervention.
-
-Given enough data, usage and contexts, Word2vec can make highly accurate guesses about a word’s meaning based on past appearances. Those guesses can be used to establish a word's association with other words (e.g. "man" is to "boy" what "woman" is to "girl"), or cluster documents and classify them by topic. Those clusters can form the basis of search, [sentiment analysis](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/recurrent/word2vecsentiment/Word2VecSentimentRNN.java) and recommendations in such diverse fields as scientific research, legal discovery, e-commerce and customer relationship management. 
-
-The output of the Word2vec neural net is a vocabulary in which each item has a vector attached to it, which can be fed into a deep-learning net or simply queried to detect relationships between words. 
-
-Measuring [cosine similarity](https://skymind.ai/wiki/glossary#cosine), no similarity is expressed as a 90 degree angle, while total similarity of 1 is a 0 degree angle, complete overlap; i.e. Sweden equals Sweden, while Norway has a cosine distance of 0.760124 from Sweden, the highest of any other country. 
-
-Here's a list of words associated with "Sweden" using Word2vec, in order of proximity:
-
-![Cosine Distance](/images/guide/sweden_cosine_distance.png) 
-
-The nations of Scandinavia and several wealthy, northern European, Germanic countries are among the top nine. 
-
-## <a name="embed">Neural Word Embeddings</a>
-
-The vectors we use to represent words are called *neural word embeddings*, and representations are strange. One thing describes another, even though those two things are radically different. As Elvis Costello said: "Writing about music is like dancing about architecture." Word2vec "vectorizes" about words, and by doing so it makes natural language computer-readable -- we can start to perform powerful mathematical operations on words to detect their similarities. 
-
-So a neural word embedding represents a word with numbers. It's a simple, yet unlikely, translation. 
-
-Word2vec is similar to an autoencoder, encoding each word in a vector, but rather than training against the input words through [reconstruction](.https://skymind.ai/wiki/variational-autoencoder) word2vec trains words against other words that neighbor them in the input corpus. 
-
-It does so in one of two ways, either using context to predict a target word (a method known as continuous bag of words, or CBOW), or using a word to predict a target context, which is called skip-gram. We use the latter method because it produces more accurate results on large datasets.
-
-![word2vec diagram](/images/guide/word2vec_diagrams.png) 
-
-When the feature vector assigned to a word cannot be used to accurately predict that word's context, the components of the vector are adjusted. Each word's context in the corpus is the *teacher* sending error signals back to adjust the feature vector. The vectors of words judged similar by their context are nudged closer together by adjusting the numbers in the vector.
-
-Just as Van Gogh's painting of sunflowers is a two-dimensional mixture of oil on canvas that *represents* vegetable matter in a three-dimensional space in Paris in the late 1880s, so 500 numbers arranged in a vector can represent a word or group of words.
-
-Those numbers locate each word as a point in 500-dimensional vectorspace. Spaces of more than three dimensions are difficult to visualize. (Geoff Hinton, teaching people to imagine 13-dimensional space, suggests that students first picture 3-dimensional space and then say to themselves: "Thirteen, thirteen, thirteen." :) 
-
-A well trained set of word vectors will place similar words close to each other in that space. The words *oak*, *elm* and *birch* might cluster in one corner, while *war*, *conflict* and *strife* huddle together in another. 
-
-Similar things and ideas are shown to be "close". Their relative meanings have been translated to measurable distances. Qualities become quantities, and algorithms can do their work. But similarity is just the basis of many associations that Word2vec can learn. For example, it can gauge relations between words of one language, and map them to another.
-
-![word2vec translation](/images/guide/word2vec_translation.png) 
-
-These vectors are the basis of a more comprehensive geometry of words. As shown in the graph, capital cities such as Rome, Paris, Berlin and Beijing cluster near each other, and they will each have similar distances in vectorspace to their countries; i.e. Rome - Italy = Beijing - China. If you only knew that Rome was the capital of Italy, and were wondering about the capital of China, then the equation Rome -Italy + China would return Beijing. No kidding. 
-
-![capitals output](/images/guide/countries_capitals.png) 
-
-## <a name="crazy">Amusing Word2Vec Results</a>
-
-Let's look at some other associations Word2vec can produce. 
-
-Instead of the pluses, minus and equals signs, we'll give you the results in the notation of logical analogies, where `:` means "is to" and `::` means "as"; e.g. "Rome is to Italy as Beijing is to China" =  `Rome:Italy::Beijing:China`. In the last spot, rather than supplying the "answer", we'll give you the list of words that a Word2vec model proposes, when given the first three elements:
-
-    king:queen::man:[woman, Attempted abduction, teenager, girl] 
-    //Weird, but you can kind of see it
-    
-    China:Taiwan::Russia:[Ukraine, Moscow, Moldova, Armenia]
-    //Two large countries and their small, estranged neighbors
-    
-    house:roof::castle:[dome, bell_tower, spire, crenellations, turrets]
-    
-    knee:leg::elbow:[forearm, arm, ulna_bone]
-    
-    New York Times:Sulzberger::Fox:[Murdoch, Chernin, Bancroft, Ailes]
-    //The Sulzberger-Ochs family owns and runs the NYT.
-    //The Murdoch family owns News Corp., which owns Fox News. 
-    //Peter Chernin was News Corp.'s COO for 13 yrs.
-    //Roger Ailes is president of Fox News. 
-    //The Bancroft family sold the Wall St. Journal to News Corp.
-    
-    love:indifference::fear:[apathy, callousness, timidity, helplessness, inaction]
-    //the poetry of this single array is simply amazing...
-    
-    Donald Trump:Republican::Barack Obama:[Democratic, GOP, Democrats, McCain]
-    //It's interesting to note that, just as Obama and McCain were rivals,
-    //so too, Word2vec thinks Trump has a rivalry with the idea Republican.
-    
-    monkey:human::dinosaur:[fossil, fossilized, Ice_Age_mammals, fossilization]
-    //Humans are fossilized monkeys? Humans are what's left 
-    //over from monkeys? Humans are the species that beat monkeys
-    //just as Ice Age mammals beat dinosaurs? Plausible.
-    
-    building:architect::software:[programmer, SecurityCenter, WinPcap]
-
-This model was trained on the Google News vocab, which you can [import](#import) and play with. Contemplate, for a moment, that the Word2vec algorithm has never been taught a single rule of English syntax. It knows nothing about the world, and is unassociated with any rules-based symbolic logic or knowledge graph. And yet it learns more, in a flexible and automated fashion, than most knowledge graphs will learn after a years of human labor. It comes to the Google News documents as a blank slate, and by the end of training, it can compute complex analogies that mean something to humans. 
-
-You can also query a Word2vec model for other assocations. Not everything has to be two analogies that mirror each other. ([We explain how below....](#eval))
-
-* Geopolitics: *Iraq - Violence = Jordan*
-* Distinction: *Human - Animal = Ethics*
-* *President - Power = Prime Minister*
-* *Library - Books = Hall*
-* Analogy: *Stock Market ≈ Thermometer*
-
-By building a sense of one word's proximity to other similar words, which do not necessarily contain the same letters, we have moved beyond hard tokens to a smoother and more general sense of meaning. 
-
-# <a name="just">Just Give Me the Code</a>
-
-## <a name="anatomy">Anatomy of Word2vec in DL4J</a>
-
-Here are Deeplearning4j's natural-language processing components:
-
-* **SentenceIterator/DocumentIterator**: Used to iterate over a dataset. A SentenceIterator returns strings and a DocumentIterator works with inputstreams. 
-* **Tokenizer/TokenizerFactory**: Used in tokenizing the text. In NLP terms, a sentence is represented as a series of tokens. A TokenizerFactory creates an instance of a tokenizer for a "sentence." 
-* **VocabCache**: Used for tracking metadata including word counts, document occurrences, the set of tokens (not vocab in this case, but rather tokens that have occurred), vocab (the features included in both [bag of words](./bagofwords-tf-idf.html) as well as the word vector lookup table)
-* **Inverted Index**: Stores metadata about where words occurred. Can be used for understanding the dataset. A Lucene index with the Lucene implementation[1] is automatically created. 
-
-While Word2vec refers to a family of related algorithms, this implementation uses [Negative Sampling](https://skymind.ai/wiki/glossary#skipgram).
-
-## <a name="setup">Word2Vec Setup</a> 
-
-Create a new project in IntelliJ using Maven. If you don't know how to do that, see our [Quickstart page](./deeplearning4j-quickstart). Then specify these properties and dependencies in the POM.xml file in your project's root directory (You can [check Maven](https://search.maven.org/#search%7Cga%7C1%7Cnd4j) for the most recent versions -- please use those...).
-
-
-
-### Loading Data
-
-Now create and name a new class in Java. After that, you'll take the raw sentences in your .txt file, traverse them with your iterator, and subject them to some sort of preprocessing, such as converting all words to lowercase. 
-
-``` java
-        String filePath = new ClassPathResource("raw_sentences.txt").getFile().getAbsolutePath();
-
-        log.info("Load & Vectorize Sentences....");
-        // Strip white space before and after for each line
-        SentenceIterator iter = new BasicLineIterator(filePath);
-```
-
-If you want to load a text file besides the sentences provided in our example, you'd do this:
-
-``` java
-        log.info("Load data....");
-        SentenceIterator iter = new LineSentenceIterator(new File("/Users/cvn/Desktop/file.txt"));
-        iter.setPreProcessor(new SentencePreProcessor() {
-            @Override
-            public String preProcess(String sentence) {
-                return sentence.toLowerCase();
-            }
-        });
-```
-
-That is, get rid of the `ClassPathResource` and feed the absolute path of your `.txt` file into the `LineSentenceIterator`. 
-
-``` java
-SentenceIterator iter = new LineSentenceIterator(new File("/your/absolute/file/path/here.txt"));
-```
-
-In bash, you can find the absolute file path of any directory by typing `pwd` in your command line from within that same directory. To that path, you'll add the file name and *voila*. 
-
-### Tokenizing the Data
-
-Word2vec needs to be fed words rather than whole sentences, so the next step is to tokenize the data. To tokenize a text is to break it up into its atomic units, creating a new token each time you hit a white space, for example. 
-
-``` java
-        // Split on white spaces in the line to get words
-        TokenizerFactory t = new DefaultTokenizerFactory();
-        t.setTokenPreProcessor(new CommonPreprocessor());
-```
-
-That should give you one word per line. 
-
-### Training the Model
-
-Now that the data is ready, you can configure the Word2vec neural net and feed in the tokens. 
-
-``` java
-        log.info("Building model....");
-        Word2Vec vec = new Word2Vec.Builder()
-                .minWordFrequency(5)
-                .layerSize(100)
-                .seed(42)
-                .windowSize(5)
-                .iterate(iter)
-                .tokenizerFactory(t)
-                .build();
-
-        log.info("Fitting Word2Vec model....");
-        vec.fit();
-```
-
-This configuration accepts a number of hyperparameters. A few require some explanation: 
-
-* *batchSize* is the amount of words you process at a time. 
-* *minWordFrequency* is the minimum number of times a word must appear in the corpus. Here, if it appears less than 5 times, it is not learned. Words must appear in multiple contexts to learn useful features about them. In very large corpora, it's reasonable to raise the minimum.
-* *useAdaGrad* - Adagrad creates a different gradient for each feature. Here we are not concerned with that. 
-* *layerSize* specifies the number of features in the word vector. This is equal to the number of dimensions in the featurespace. Words represented by 500 features become points in a 500-dimensional space.
-* *learningRate* is the step size for each update of the coefficients, as words are repositioned in the feature space. 
-* *minLearningRate* is the floor on the learning rate. Learning rate decays as the number of words you train on decreases. If learning rate shrinks too much, the net's learning is no longer efficient. This keeps the coefficients moving. 
-* *iterate* tells the net what batch of the dataset it's training on. 
-* *tokenizer* feeds it the words from the current batch. 
-* *vec.fit()* tells the configured net to begin training. 
-
-An example for [uptraining your previously trained word vectors is here](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/nlp/word2vec/Word2VecUptrainingExample.java).
-
-### <a name="eval">Evaluating the Model, Using Word2vec</a> 
-
-The next step is to evaluate the quality of your feature vectors. 
-
-``` java
-        // Write word vectors
-        WordVectorSerializer.writeWordVectors(vec, "pathToWriteto.txt");
-
-        log.info("Closest Words:");
-        Collection<String> lst = vec.wordsNearest("day", 10);
-        System.out.println(lst);
-        UiServer server = UiServer.getInstance();
-        System.out.println("Started on port " + server.getPort());
-        
-        //output: [night, week, year, game, season, during, office, until, -]
-```
-
-The line `vec.similarity("word1","word2")` will return the cosine similarity of the two words you enter. The closer it is to 1, the more similar the net perceives those words to be (see the Sweden-Norway example above). For example:
-
-``` java
-        double cosSim = vec.similarity("day", "night");
-        System.out.println(cosSim);
-        //output: 0.7704452276229858
-```
-
-With `vec.wordsNearest("word1", numWordsNearest)`, the words printed to the screen allow you to eyeball whether the net has clustered semantically similar words. You can set the number of nearest words you want with the second parameter of wordsNearest. For example:
-
-``` java
-        Collection<String> lst3 = vec.wordsNearest("man", 10);
-        System.out.println(lst3);
-        //output: [director, company, program, former, university, family, group, such, general]
-```
-
-### Visualizing the Model
-
-We rely on [TSNE](https://lvdmaaten.github.io/tsne/) to reduce the dimensionality of word feature vectors and project words into a two or three-dimensional space. The full [DL4J/ND4J example for TSNE is here](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/nlp/tsne/TSNEStandardExample.java).
-
-``` java
-        Nd4j.setDataType(DataBuffer.Type.DOUBLE);
-        List<String> cacheList = new ArrayList<>(); //cacheList is a dynamic array of strings used to hold all words
-
-        //STEP 2: Turn text input into a list of words
-        log.info("Load & Vectorize data....");
-        File wordFile = new ClassPathResource("words.txt").getFile();   //Open the file
-        //Get the data of all unique word vectors
-        Pair<InMemoryLookupTable,VocabCache> vectors = WordVectorSerializer.loadTxt(wordFile);
-        VocabCache cache = vectors.getSecond();
-        INDArray weights = vectors.getFirst().getSyn0();    //seperate weights of unique words into their own list
-
-        for(int i = 0; i < cache.numWords(); i++)   //seperate strings of words into their own list
-            cacheList.add(cache.wordAtIndex(i));
-
-        //STEP 3: build a dual-tree tsne to use later
-        log.info("Build model....");
-        BarnesHutTsne tsne = new BarnesHutTsne.Builder()
-                .setMaxIter(iterations).theta(0.5)
-                .normalize(false)
-                .learningRate(500)
-                .useAdaGrad(false)
-//                .usePca(false)
-                .build();
-
-        //STEP 4: establish the tsne values and save them to a file
-        log.info("Store TSNE Coordinates for Plotting....");
-        String outputFile = "target/archive-tmp/tsne-standard-coords.csv";
-        (new File(outputFile)).getParentFile().mkdirs();
-
-        tsne.fit(weights);
-        tsne.saveAsFile(cacheList, outputFile);
-```
-
-### Saving, Reloading & Using the Model
-
-You'll want to save the model. The normal way to save models in Deeplearning4j is via the serialization utils (Java serialization is akin to Python pickling, converting an object into a *series* of bytes).
-
-``` java
-        log.info("Save vectors....");
-        WordVectorSerializer.writeWord2VecModel(vec, "pathToSaveModel.txt");
-```
-
-This will save the vectors to a file called `pathToSaveModel.txt` that will appear in the root of the directory where Word2vec is trained. The output in the file should have one word per line, followed by a series of numbers that together are its vector representation.
-
-To keep working with the vectors, simply call methods on `vec` like this:
-
-``` java
-Collection<String> kingList = vec.wordsNearest(Arrays.asList("king", "woman"), Arrays.asList("queen"), 10);
-```
-
-The classic example of Word2vec's arithmetic of words is "king - queen = man - woman" and its logical extension "king - queen + woman = man". 
-
-The example above will output the 10 nearest words to the vector `king - queen + woman`, which should include `man`. The first parameter for wordsNearest has to include the "positive" words `king` and `woman`, which have a + sign associated with them; the second parameter includes the "negative" word `queen`, which is associated with the minus sign (positive and negative here have no emotional connotation); the third is the length of the list of nearest words you would like to see. Remember to add this to the top of the file: `import java.util.Arrays;`.
-
-Any number of combinations is possible, but they will only return sensible results if the words you query occurred with enough frequency in the corpus. Obviously, the ability to return similar words (or documents) is at the foundation of both search and recommendation engines. 
-
-You can reload the vectors into memory like this:
-
-``` java
-        Word2Vec word2Vec = WordVectorSerializer.readWord2VecModel("pathToSaveModel.txt");
-```
-
-You can then use Word2vec as a lookup table:
-
-``` java
-        WeightLookupTable weightLookupTable = word2Vec.lookupTable();
-        Iterator<INDArray> vectors = weightLookupTable.vectors();
-        INDArray wordVectorMatrix = word2Vec.getWordVectorMatrix("myword");
-        double[] wordVector = word2Vec.getWordVector("myword");
-```
-
-If the word isn't in the vocabulary, Word2vec returns zeros.
-
-### <a name="import">Importing Word2vec Models</a>
-
-The [Google News Corpus model](https://dl4jdata.blob.core.windows.net/resources/wordvectors/GoogleNews-vectors-negative300.bin.gz) we use to test the accuracy of our trained nets is hosted on S3. Users whose current hardware takes a long time to train on large corpora can simply download it to explore a Word2vec model without the prelude.
-
-If you trained with the [C vectors](https://docs.google.com/file/d/0B7XkCwpI5KDYaDBDQm1tZGNDRHc/edit) or Gensimm, this line will import the model.
-
-``` java
-    File gModel = new File("/Developer/Vector Models/GoogleNews-vectors-negative300.bin.gz");
-    Word2Vec vec = WordVectorSerializer.readWord2VecModel(gModel);
-```
-
-Remember to add `import java.io.File;` to your imported packages.
-
-With large models, you may run into trouble with your heap space. The Google model may take as much as 10G of RAM, and the JVM only launches with 256 MB of RAM, so you have to adjust your heap space. You can do that either with a `bash_profile` file (see our [Troubleshooting section](./deeplearning4j-troubleshooting-training)), or through IntelliJ itself: 
-
-``` java
-    //Click:
-    IntelliJ Preferences > Compiler > Command Line Options 
-    //Then paste:
-    -Xms1024m
-    -Xmx10g
-    -XX:MaxPermSize=2g
-```
-
-### <a name="grams">N-grams & Skip-grams</a>
-
-Words are read into the vector one at a time, *and scanned back and forth within a certain range*. Those ranges are n-grams, and an n-gram is a contiguous sequence of *n* items from a given linguistic sequence; it is the nth version of unigram, bigram, trigram, four-gram or five-gram. A skip-gram simply drops items from the n-gram. 
-
-The skip-gram representation popularized by Mikolov and used in the DL4J implementation has proven to be more accurate than other models, such as continuous bag of words, due to the more generalizable contexts generated. 
-
-This n-gram is then fed into a neural network to learn the significance of a given word vector; i.e. significance is defined as its usefulness as an indicator of certain larger meanings, or labels. 
-
-### <a name="code">A Working Example</a>
-
-**Please note** : The code below may be outdated. For updated examples, please see our [dl4j-examples repository on Github](https://github.com/eclipse/deeplearning4j-examples/tree/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/nlp).
-
-Now that you have a basic idea of how to set up Word2Vec, here's [one example](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/nlp/word2vec/Word2VecRawTextExample.java) of how it can be used with DL4J's API:
-
-<script src="https://gist-it.appspot.com/https://github.com/eclipse/deeplearning4j-examples/blob/master/src/main/java/org/deeplearning4j/examples/nlp/word2vec/Word2VecRawTextExample.java?slice=22:64"></script>
-
-After following the instructions in the [Quickstart](./deeplearning4j-quickstart), you can open this example in IntelliJ and hit run to see it work. If you query the Word2vec model with a word isn't contained in the training corpus, it will return null. 
-
-### <a name="trouble">Troubleshooting & Tuning Word2Vec</a>
-
-*Q: I get a lot of stack traces like this*
-
-``` java
-       java.lang.StackOverflowError: null
-       at java.lang.ref.Reference.<init>(Reference.java:254) ~[na:1.8.0_11]
-       at java.lang.ref.WeakReference.<init>(WeakReference.java:69) ~[na:1.8.0_11]
-       at java.io.ObjectStreamClass$WeakClassKey.<init>(ObjectStreamClass.java:2306) [na:1.8.0_11]
-       at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:322) ~[na:1.8.0_11]
-       at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1134) ~[na:1.8.0_11]
-       at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) ~[na:1.8.0_11]
-```
-
-*A:* Look inside the directory where you started your Word2vec application. This can, for example, be an IntelliJ project home directory or the directory where you typed Java at the command line. It should have some directories that look like:
-
-``` 
-       ehcache_auto_created2810726831714447871diskstore  
-       ehcache_auto_created4727787669919058795diskstore
-       ehcache_auto_created3883187579728988119diskstore  
-       ehcache_auto_created9101229611634051478diskstore
-```
-
-You can shut down your Word2vec application and try to delete them.
-
-*Q: Not all of the words from my raw text data are appearing in my Word2vec object…*
-
-*A:* Try to raise the layer size via **.layerSize()** on your Word2Vec object like so
-
-``` java
-        Word2Vec vec = new Word2Vec.Builder().layerSize(300).windowSize(5)
-                .layerSize(300).iterate(iter).tokenizerFactory(t).build();
-```
-
-*Q: How do I load my data? Why does training take forever?*
-
-*A:* If all of your sentences have been loaded as *one* sentence, Word2vec training could take a very long time. That's because Word2vec is a sentence-level algorithm, so sentence boundaries are very important, because co-occurrence statistics are gathered sentence by sentence. (For GloVe, sentence boundaries don't matter, because it's looking at corpus-wide co-occurrence. For many corpora, average sentence length is six words. That means that with a window size of 5 you have, say, 30 (random number here) rounds of skip-gram calculations. If you forget to specify your sentence boundaries, you may load a "sentence" that's 10,000 words long. In that case, Word2vec would attempt a full skip-gram cycle for the whole 10,000-word "sentence". In DL4J's implementation, a line is assumed to be a sentence. You need plug in your own SentenceIterator and Tokenizer. By asking you to specify how your sentences end, DL4J remains language-agnostic. UimaSentenceIterator is one way to do that. It uses OpenNLP for sentence boundary detection.
-
-
-*Q: Why is there such a difference in performance when feeding whole documents as one "sentence" vs splitting into Sentences?*
-
-*A:*If average sentence contains 6 words, and window size is 5, maximum theoretical number of 10 skipgram rounds will be achieved on 0 words. Sentence isn't long enough to have full window set with words. Rough maximum number of 5 sg rounds is available there for all words in such sentence.
-
-But if your "sentence" is 1000k words length, you'll have 10 skipgram rounds for every word in this sentence, excluding the first 5 and last five. So, you'll have to spend WAY more time building model + cooccurrence statistics will be shifted due to the absense of sentence boundaries.
-
-*Q: How does Word2Vec Use Memory?*
-
-*A:* The major memory consumer in w2v is weights matrix. Math is simple there: NumberOfWords x NumberOfDimensions x 2 x DataType memory footprint.
-
-So, if you build w2v model for 100k words using floats, and 100 dimensions, your memory footprint will be 100k x 100 x 2 x 4 (float size) = 80MB RAM just for matri + some space for strings, variables, threads etc.
-
-If you load pre-built model, it uses roughly 2 times less RAM then during build time, so it's 40MB RAM.
-
-And the most popular model used so far is Google News model. There's 3M words, and vector size 300. That gives us 3.6GB only to load model. And you have to add 3M of strings, that do not have constant size in java. So, usually that's something around 4-6GB for loaded model depending on jvm version/supplier, gc state and phase of the moon.
-
-
-*Q: I did everything you said and the results still don't look right.*
-
-*A:* Make sure you're not hitting into normalization issues. Some tasks, like wordsNearest(), use normalized weights by default, and others require non-normalized weights. Pay attention to this difference.
-
-
-
-### <a name="use">Use Cases</a>
-
-Google Scholar keeps a running tally of the papers citing [Deeplearning4j's implementation of Word2vec here](https://scholar.google.com/scholar?hl=en&q=deeplearning4j+word2vec&btnG=&as_sdt=1%2C5&as_sdtp=).
-
-Kenny Helsens, a data scientist based in Belgium, [applied Deeplearning4j's implementation of Word2vec](http://thinkdata.be/2015/06/10/word2vec-on-raw-omim-database/) to the NCBI's Online Mendelian Inheritance In Man (OMIM) database. He then looked for the words most similar to alk, a known oncogene of non-small cell lung carcinoma, and Word2vec returned: "nonsmall, carcinomas, carcinoma, mapdkd." From there, he established analogies between other cancer phenotypes and their genotypes. This is just one example of the associations Word2vec can learn on a large corpus. The potential for discovering new aspects of important diseases has only just begun, and outside of medicine, the opportunities are equally diverse.
-
-Andreas Klintberg trained Deeplearning4j's implementation of Word2vec on Swedish, and wrote a [thorough walkthrough on Medium](https://medium.com/@klintcho/training-a-word2vec-model-for-swedish-e14b15be6cb). 
-
-Word2Vec is especially useful in preparing text-based data for information retrieval and QA systems, which DL4J implements with [deep autoencoders](./deeplearning4j-nn-autoencoders). 
-
-Marketers might seek to establish relationships among products to build a recommendation engine. Investigators might analyze a social graph to surface members of a single group, or other relations they might have to location or financial sponsorship. 
-
-### <a name="patent">Google's Word2vec Patent</a>
-
-Word2vec is [a method of computing vector representations of words](https://arxiv.org/pdf/1301.3781.pdf) introduced by a team of researchers at Google led by Tomas Mikolov. Google [hosts an open-source version of Word2vec](https://code.google.com/p/word2vec/) released under an Apache 2.0 license. In 2014, Mikolov left Google for Facebook, and in May 2015, [Google was granted a patent for the method](http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=1&f=G&l=50&co1=AND&d=PTXT&s1=9037464&OS=9037464&RS=9037464), which does not abrogate the Apache license under which it has been released. 
-
-### <a name="foreign">Foreign Languages</a>
-
-While words in all languages may be converted into vectors with Word2vec, and those vectors learned with Deeplearning4j, NLP preprocessing can be very language specific, and requires tools beyond our libraries. The [Stanford Natural Language Processing Group](http://nlp.stanford.edu/software/) has a number of Java-based tools for tokenization, part-of-speech tagging and  named-entity recognition for languages such as [Mandarin Chinese](http://nlp.stanford.edu/projects/chinese-nlp.shtml), Arabic, French, German and Spanish. For Japanese, NLP tools like [Kuromoji](http://www.atilika.org/) are useful. Other foreign-language resources, including [text corpora, are available here](http://www-nlp.stanford.edu/links/statnlp.html).
-
-### <a name="glove">GloVe: Global Vectors</a>
-
-Loading and saving GloVe models to word2vec can be done like so:
-
-``` java
-        WordVectors wordVectors = WordVectorSerializer.loadTxtVectors(new File("glove.6B.50d.txt"));
-```
-
-### <a name="sequence">Sequence Vectors</a>
-
-Deeplearning4j has a class called [SequenceVectors](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nlp-parent/deeplearning4j-nlp/src/main/java/org/deeplearning4j/models/sequencevectors/SequenceVectors.java), which is one level of abstraction above word vectors, and which allows you to extract features from any sequence, including social media profiles, transactions, proteins, etc. If data can be described as sequence, it can be learned via skip-gram and hierarchic softmax with the AbstractVectors class. This is compatible with the [DeepWalk algorithm](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-graph/src/main/java/org/deeplearning4j/graph/models/deepwalk/DeepWalk.java), also implemented in Deeplearning4j. 
-
-### <a name="features">Word2Vec Features on Deeplearning4j</a>
-
-* Weights update after model serialization/deserialization was added. That is, you can update model state with, say, 200GB of new text by calling `loadFullModel`, adding `TokenizerFactory` and `SentenceIterator` to it, and calling `fit()` on the restored model.
-* Option for multiple datasources for vocab construction was added.
-* Epochs and Iterations can be specified separately, although they are both typically "1".
-* Word2Vec.Builder has this option: `hugeModelExpected`. If set to `true`, the vocab will be periodically truncated during the build.
-* While `minWordFrequency` is useful for ignoring rare words in the corpus, any number of words can be excluded to customize.
-* Two new WordVectorsSerialiaztion methods have been introduced: `writeFullModel` and `loadFullModel`. These save and load a full model state. 
-* A decent workstation should be able to handle a vocab with a few million words. Deeplearning4j's Word2vec imlementation can model a few terabytes of data on a single machine. Roughly, the math is: `vectorSize * 4 * 3 * vocab.size()`.
-
-### Doc2vec & Other NLP Resources
-
-* [DL4J Example of Text Classification With Word2vec & RNNs](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/recurrent/word2vecsentiment/Word2VecSentimentRNN.java)
-* [DL4J Example of Text Classification With Paragraph Vectors](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/nlp/paragraphvectors/ParagraphVectorsClassifierExample.java)
-* [Doc2vec, or Paragraph Vectors, With Deeplearning4j](./deeplearning4j-nlp-doc2vec)
-* [Thought Vectors, Natural Language Processing & the Future of AI](https://skymind.ai/wiki/thought-vectors)
-* [Quora: How Does Word2vec Work?](http://www.quora.com/How-does-word2vec-work)
-* [Quora: What Are Some Interesting Word2Vec Results?](http://www.quora.com/Word2vec/What-are-some-interesting-Word2Vec-results/answer/Omer-Levy)
-* [Word2Vec: an introduction](http://www.folgertkarsdorp.nl/word2vec-an-introduction/); Folgert Karsdorp
-* [Mikolov's Original Word2vec Code @Google](https://code.google.com/p/word2vec/)
-* [word2vec Explained: Deriving Mikolov et al.’s Negative-Sampling Word-Embedding Method](https://arxiv.org/pdf/1402.3722v1.pdf); Yoav Goldberg and Omer Levy
-* [Advances in Pre-Training Distributed Word Representations - by Mikolov et al](https://arxiv.org/abs/1712.09405)
-
-
-### <a name="doctorow">Word2Vec in Literature</a>
-
-    It's like numbers are language, like all the letters in the language are turned into numbers, and so it's something that everyone understands the same way. You lose the sounds of the letters and whether they click or pop or touch the palate, or go ooh or aah, and anything that can be misread or con you with its music or the pictures it puts in your mind, all of that is gone, along with the accent, and you have a new understanding entirely, a language of numbers, and everything becomes as clear to everyone as the writing on the wall. So as I say there comes a certain time for the reading of the numbers.
-        -- E.L. Doctorow, Billy Bathgate
--- a/docs/deeplearning4j-nn/README.md
+++ b/docs/deeplearning4j-nn/README.md
@ -1,10 +0,0 @@
-# deeplearning4j-nn documentation
-
-To generate docs into the `deeplearning4j-nn/doc_sources` folder, first `cd docs` then run:
-
-```shell
-python generate_docs.py \
-    --project deeplearning4j-nn \
-    --code ../deeplearning4j
-	--out_language en
-```
--- a/docs/deeplearning4j-nn/pages.json
+++ b/docs/deeplearning4j-nn/pages.json
@ -1,187 +0,0 @@
-{
-  "excludes": [
-    "abstract"
-  ],
-  "indices": [
-  ],
-  "pages": [
-    {
-      "page": "evaluation.md",
-      "module": [
-        "/deeplearning4j-nn/src/main/java/org/deeplearning4j/eval/"
-      ]
-    },
-    {
-      "page": "model-persistence.md",
-      "class": [
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/util/ModelSerializer.java"
-      ]
-    },
-    {
-      "page": "visualization.md",
-      "class": []
-    },
-    {
-      "page": "tsne-visualization.md",
-      "class": []
-    },
-    {
-      "page": "transfer-learning.md",
-      "module": [
-        "/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/transferlearning/"
-      ]
-    },
-    {
-      "page": "listeners.md",
-      "module": [
-        "/deeplearning4j-nn/src/main/java/org/deeplearning4j/optimize/listeners/"
-      ]
-    },
-    {
-      "page": "iterators.md",
-      "module": [
-        "/deeplearning4j-data/deeplearning4j-datasets/src/main/java/org/deeplearning4j/datasets/iterator/impl/",
-        "/deeplearning4j-data/deeplearning4j-datavec-iterators/src/main/java/org/deeplearning4j/datasets/datavec/",
-        "/deeplearning4j-data/deeplearning4j-utility-iterators/src/main/java/org/deeplearning4j/datasets/iterator/"
-      ],
-      "class": [
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/datasets/iterator/impl/MultiDataSetIteratorAdapter.java"
-      ]
-    },
-    {
-      "page": "layers.md",
-      "class": [
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/OutputLayer.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/DropoutLayer.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/ActivationLayer.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/LossLayer.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/DenseLayer.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/EmbeddingLayer.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/EmbeddingSequenceLayer.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/GlobalPoolingLayer.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/LocalResponseNormalization.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/LocallyConnected1D.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/LocallyConnected2D.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/NoParamLayer.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/Pooling1D.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/Pooling2D.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/Subsampling1DLayer.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/SubsamplingLayer.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/Subsampling3DLayer.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/Upsampling1D.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/Upsampling2D.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/Upsampling3D.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/ZeroPadding1DLayer.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/ZeroPaddingLayer.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/ZeroPadding3DLayer.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/misc/ElementWiseMultiplicationLayer.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/misc/RepeatVector.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/objdetect/Yolo2OutputLayer.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/util/MaskLayer.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/util/MaskZeroLayer.java"
-      ]
-    },
-    {
-      "page": "autoencoders.md",
-      "class": [
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/variational/BernoulliReconstructionDistribution.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/variational/CompositeReconstructionDistribution.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/variational/ExponentialReconstructionDistribution.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/variational/GaussianReconstructionDistribution.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/variational/LossFunctionWrapper.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/variational/ReconstructionDistribution.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/variational/VariationalAutoencoder.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/AutoEncoder.java"
-      ]
-    },
-    {
-      "page": "convolutional.md",
-      "class": [
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/Convolution1D.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/Convolution2D.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/Convolution3D.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/Deconvolution2D.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/convolutional/Cropping1D.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/convolutional/Cropping2D.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/convolutional/Cropping3D.java"
-      ]
-    },
-    {
-      "page": "recurrent.md",
-      "class": [
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/GravesBidirectionalLSTM.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/GravesLSTM.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/LSTM.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/RnnOutputLayer.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/RnnLossLayer.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/recurrent/Bidirectional.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/recurrent/LastTimeStep.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/recurrent/SimpleRnn.java"
-      ]
-    },
-    {
-      "page": "custom-layer.md",
-      "class": [
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/layers/BaseLayer.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/FeedForwardLayer.java"
-      ]
-    },
-    {
-      "page": "vertices.md",
-      "class": [
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/graph/vertex/impl/ElementWiseVertex.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/graph/vertex/impl/InputVertex.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/graph/vertex/impl/L2NormalizeVertex.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/graph/vertex/impl/L2Vertex.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/graph/vertex/impl/MergeVertex.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/graph/vertex/impl/PoolHelperVertex.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/graph/vertex/impl/ReshapeVertex.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/graph/vertex/impl/ScaleVertex.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/graph/vertex/impl/ShiftVertex.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/graph/vertex/impl/StackVertex.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/graph/vertex/impl/SubsetVertex.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/graph/vertex/impl/UnstackVertex.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/graph/vertex/impl/rnn/DuplicateToTimeSeriesVertex.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/graph/vertex/impl/rnn/LastTimeStepVertex.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/graph/vertex/impl/rnn/ReverseTimeSeriesVertex.java"
-      ]
-    },
-    {
-      "page": "early-stopping.md",
-      "class": [
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/earlystopping/EarlyStoppingConfiguration.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/earlystopping/EarlyStoppingModelSaver.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/earlystopping/EarlyStoppingResult.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/earlystopping/scorecalc/AutoencoderScoreCalculator.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/earlystopping/scorecalc/ClassificationScoreCalculator.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/earlystopping/scorecalc/DataSetLossCalculator.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/earlystopping/scorecalc/DataSetLossCalculatorCG.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/earlystopping/scorecalc/ROCScoreCalculator.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/earlystopping/scorecalc/RegressionScoreCalculator.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/earlystopping/scorecalc/VAEReconErrorScoreCalculator.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/earlystopping/scorecalc/VAEReconProbScoreCalculator.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/earlystopping/termination/ScoreImprovementEpochTerminationCondition.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/earlystopping/termination/BestScoreEpochTerminationCondition.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/earlystopping/termination/EpochTerminationCondition.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/earlystopping/termination/InvalidScoreIterationTerminationCondition.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/earlystopping/termination/IterationTerminationCondition.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/earlystopping/termination/MaxEpochsTerminationCondition.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/earlystopping/termination/MaxScoreIterationTerminationCondition.java",
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/earlystopping/termination/MaxTimeIterationTerminationCondition.java"
-      ]
-    },
-    {
-      "page": "computationgraph.md",
-      "class": [
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/graph/ComputationGraph.java"
-      ]
-    },
-    {
-      "page": "multilayernetwork.md",
-      "class": [
-        "deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/multilayer/MultiLayerNetwork.java"
-      ]
-    }
-  ]
-}
-
--- a/docs/deeplearning4j-nn/templates/autoencoders.md
+++ b/docs/deeplearning4j-nn/templates/autoencoders.md
@ -1,19 +0,0 @@
---
-title: Deeplearning4j Autoencoders
-short_title: Autoencoders
-description: Supported autoencoder configurations.
-category: Models
-weight: 3
---
-
-## What are autoencoders?
-
-Autoencoders are neural networks for unsupervised learning. Eclipse Deeplearning4j supports certain autoencoder layers such as variational autoencoders.
-
-## Where's Restricted Boltzmann Machine?
-
-RBMs are no longer supported as of version 0.9.x. They are no longer best-in-class for most machine learning problems.
-
-## Supported layers
-
-{{autogenerated}}
--- a/docs/deeplearning4j-nn/templates/computationgraph.md
+++ b/docs/deeplearning4j-nn/templates/computationgraph.md
@ -1,258 +0,0 @@
---
-title: Complex Architectures with Computation Graph
-short_title: Computation Graph
-description: How to build complex networks with DL4J computation graph.
-category: Models
-weight: 3
---
-
-## Building Complex Network Architectures with Computation Graph
-
-This page describes how to build more complicated networks, using DL4J's Computation Graph functionality.
-
-**Contents**
-
-* [Overview of the Computation Graph](#overview)
-* [Computation Graph: Some Example Use Cases](#usecases)
-* [Configuring a ComputationGraph network](#config)
-  * [Types of Graph Vertices](#vertextypes)
-  * [Example 1: Recurrent Network with Skip Connections](#rnnskip)
-  * [Example 2: Multiple Inputs and Merge Vertex](#multiin)
-  * [Example 3: Multi-Task Learning](#multitask)
-  * [Automatically Adding PreProcessors and Calculating nIns](#preprocessors)
-* [Training Data for ComputationGraph](#data)
-  * [RecordReaderMultiDataSetIterator Example 1: Regression Data](#rrmdsi1)
-  * [RecordReaderMultiDataSetIterator Example 2: Classification and Multi-Task Learning](#rrmdsi2)
-
-
-## <a name="overview">Overview of Computation Graph</a>
-
-DL4J has two types of networks comprised of multiple layers:
-
- The [MultiLayerNetwork](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/multilayer/MultiLayerNetwork.java), which is essentially a stack of neural network layers (with a single input layer and single output layer), and
- The [ComputationGraph](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/graph/ComputationGraph.java), which allows for greater freedom in network architectures
-
-
-Specifically, the ComputationGraph allows for networks to be built with the following features:
-
- Multiple network input arrays
- Multiple network outputs (including mixed classification/regression architectures)
- Layers connected to other layers using a directed acyclic graph connection structure (instead of just a stack of layers)
-
-As a general rule, when building networks with a single input layer, a single output layer, and an input->a->b->c->output type connection structure: MultiLayerNetwork is usually the preferred network. However, everything that MultiLayerNetwork can do, ComputationGraph can do as well - though the configuration may be a little more complicated.
-
-<p align="center">
-<a href="https://docs.skymind.ai/docs/welcome" type="button" class="btn btn-lg btn-success" onClick="ga('send', 'event', ‘quickstart', 'click');">GET STARTED WITH DEEP LEARNING</a>
-</p>
-
-## <a name="usecases">Computation Graph: Some Example Use Cases</a>
-
-Examples of some architectures that can be built using ComputationGraph include:
-
- Multi-task learning architectures
- Recurrent neural networks with skip connections
- [GoogLeNet](https://arxiv.org/abs/1409.4842), a complex type of convolutional netural network for image classification
- [Image caption generation](https://arxiv.org/abs/1411.4555)
- [Convolutional networks for sentence classification](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/convolution/sentenceclassification/CnnSentenceClassificationExample.java)
- [Residual learning convolutional neural networks](https://arxiv.org/abs/1512.03385)
-
-
-## <a name="config">Configuring a Computation Graph</a>
-
-### <a name="vertextypes">Types of Graph Vertices</a>
-
-The basic idea is that in the ComputationGraph, the core building block is the [GraphVertex](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/graph/vertex/GraphVertex.java), instead of layers. Layers (or, more accurately the [LayerVertex](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/graph/vertex/impl/LayerVertex.java) objects), are but one type of vertex in the graph. Other types of vertices include:
-
- Input Vertices
- Element-wise operation vertices
- Merge vertices
- Subset vertices
- Preprocessor vertices
-
-These types of graph vertices are described briefly below.
-
-**LayerVertex**: Layer vertices (graph vertices with neural network layers) are added using the ```.addLayer(String,Layer,String...)``` method. The first argument is the label for the layer, and the last arguments are the inputs to that layer.
-If you need to manually add an [InputPreProcessor](https://github.com/eclipse/deeplearning4j/tree/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/preprocessor) (usually this is unnecessary - see next section) you can use the ```.addLayer(String,Layer,InputPreProcessor,String...)``` method.
-
-**InputVertex**: Input vertices are specified by the ```addInputs(String...)``` method in your configuration. The strings used as inputs can be arbitrary - they are user-defined labels, and can be referenced later in the configuration. The number of strings provided define the number of inputs; the order of the input also defines the order of the corresponding INDArrays in the fit methods (or the DataSet/MultiDataSet objects).
-
-**ElementWiseVertex**: Element-wise operation vertices do for example an element-wise addition or subtraction of the activations out of one or more other vertices. Thus, the activations used as input for  the ElementWiseVertex must all be the same size, and the output size of the elementwise vertex is the same as the inputs.
-
-**MergeVertex**: The MergeVertex concatenates/merges the input activations. For example, if a MergeVertex has 2 inputs of size 5 and 10 respectively, then output size will be 5+10=15 activations. For convolutional network activations, examples are merged along the depth: so suppose the activations from one layer have 4 features and the other has 5 features (both with (4 or 5) x width x height activations), then the output will have (4+5) x width x height activations.
-
-**SubsetVertex**: The subset vertex allows you to get only part of the activations out of another vertex. For example, to get the first 5 activations out of another vertex with label "layer1", you can use ```.addVertex("subset1", new SubsetVertex(0,4), "layer1")```: this means that the 0th through 4th (inclusive) activations out of the "layer1" vertex will be used as output from the subset vertex.
-
-**PreProcessorVertex**: Occasionally, you might want to the functionality of an [InputPreProcessor](https://github.com/eclipse/deeplearning4j/tree/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/preprocessor) without that preprocessor being associated with a layer. The PreProcessorVertex allows you to do this.
-
-Finally, it is also possible to define custom graph vertices by implementing both a [configuration](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/graph/GraphVertex.java) and [implementation](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/graph/vertex/GraphVertex.java) class for your custom GraphVertex.
-
-
-### <a name="rnnskip">Example 1: Recurrent Network with Skip Connections</a>
-
-Suppose we wish to build the following recurrent neural network architecture:
-![RNN with Skip connections](/images/guide/lstm_skip_connection.png)
-
-For the sake of this example, lets assume our input data is of size 5. Our configuration would be as follows:
-
-```java
-ComputationGraphConfiguration conf = new NeuralNetConfiguration.Builder()
-    .updater(new Sgd(0.01))
-    .graphBuilder()
-    .addInputs("input") //can use any label for this
-    .addLayer("L1", new GravesLSTM.Builder().nIn(5).nOut(5).build(), "input")
-    .addLayer("L2",new RnnOutputLayer.Builder().nIn(5+5).nOut(5).build(), "input", "L1")
-    .setOutputs("L2")	//We need to specify the network outputs and their order
-    .build();
-
-ComputationGraph net = new ComputationGraph(conf);
-net.init();
-```
-
-Note that in the .addLayer(...) methods, the first string ("L1", "L2") is the name of that layer, and the strings at the end (["input"], ["input","L1"]) are the inputs to that layer.
-
-
-### <a name="multiin">Example 2: Multiple Inputs and Merge Vertex</a>
-
-Consider the following architecture:
-
-![Computation Graph with Merge Vertex](/images/guide/compgraph_merge.png)
-
-Here, the merge vertex takes the activations out of layers L1 and L2, and merges (concatenates) them: thus if layers L1 and L2 both have has 4 output activations (.nOut(4)) then the output size of the merge vertex is 4+4=8 activations.
-
-To build the above network, we use the following configuration:
-
-```java
-ComputationGraphConfiguration conf = new NeuralNetConfiguration.Builder()
-		.updater(new Sgd(0.01))
-    .graphBuilder()
-    .addInputs("input1", "input2")
-    .addLayer("L1", new DenseLayer.Builder().nIn(3).nOut(4).build(), "input1")
-    .addLayer("L2", new DenseLayer.Builder().nIn(3).nOut(4).build(), "input2")
-    .addVertex("merge", new MergeVertex(), "L1", "L2")
-    .addLayer("out", new OutputLayer.Builder().nIn(4+4).nOut(3).build(), "merge")
-    .setOutputs("out")
-    .build();
-```
-
-### <a name="multitask">Example 3: Multi-Task Learning</a>
-
-In multi-task learning, a neural network is used to make multiple independent predictions.
-Consider for example a simple network used for both classification and regression simultaneously. In this case, we have two output layers, "out1" for classification, and "out2" for regression.
-
-![Computation Graph for MultiTask Learning](/images/guide/compgraph_multitask.png)
-
-In this case, the network configuration is:
-
-```java
-ComputationGraphConfiguration conf = new NeuralNetConfiguration.Builder()
-		.updater(new Sgd(0.01))
-        .graphBuilder()
-        .addInputs("input")
-        .addLayer("L1", new DenseLayer.Builder().nIn(3).nOut(4).build(), "input")
-        .addLayer("out1", new OutputLayer.Builder()
-                .lossFunction(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
-                .nIn(4).nOut(3).build(), "L1")
-        .addLayer("out2", new OutputLayer.Builder()
-                .lossFunction(LossFunctions.LossFunction.MSE)
-                .nIn(4).nOut(2).build(), "L1")
-        .setOutputs("out1","out2")
-        .build();
-```
-
-### <a name="preprocessors">Automatically Adding PreProcessors and Calculating nIns</a>
-
-One feature of the ComputationGraphConfiguration is that you can specify the types of input to the network, using the ```.setInputTypes(InputType...)``` method in the configuration.
-
-The setInputType method has two effects:
-
-1. It will automatically add any [InputPreProcessor](https://github.com/eclipse/deeplearning4j/tree/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/preprocessor)s as required. InputPreProcessors are necessary to handle the interaction between for example fully connected (dense) and convolutional layers, or recurrent and fully connected layers.
-2. It will automatically calculate the number of inputs (.nIn(x) config) to a layer. Thus, if you are using the ```setInputTypes(InputType...)``` functionality, it is not necessary to manually specify the .nIn(x) options in your configuration. This can simplify building some architectures (such as convolutional networks with fully connected layers). If the .nIn(x) is specified for a layer, the network will not override this when using the InputType functionality.
-
-
-For example, if your network has 2 inputs, one being a convolutional input and the other being a feed-forward input, you would use ```.setInputTypes(InputType.convolutional(depth,width,height), InputType.feedForward(feedForwardInputSize))```
-
-
-## <a name="data">Training Data for ComputationGraph</a>
-
-There are two types of data that can be used with the ComputationGraph.
-
-### DataSet and the DataSetIterator
-
-The DataSet class was originally designed for use with the MultiLayerNetwork, however can also be used with ComputationGraph - but only if that computation graph has a single input and output array. For computation graph architectures with more than one input array, or more than one output array, DataSet and DataSetIterator cannot be used (instead, use MultiDataSet/MultiDataSetIterator).
-
-A DataSet object is basically a pair of INDArrays that hold your training data. In the case of RNNs, it may also include masking arrays (see [this](http://deeplearning4j.org/usingrnns) for more details). A DataSetIterator is essentially an iterator over DataSet objects.
-
-### MultiDataSet and the MultiDataSetIterator
-
-MultiDataSet is multiple input and/or multiple output version of DataSet. It may also include multiple mask arrays (for each input/output array) in the case of recurrent neural networks. As a general rule, you should use DataSet/DataSetIterator, unless you are dealing with multiple inputs and/or multiple outputs.
-
-There are currently two ways to use a MultiDataSetIterator:
-
- By implementing the [MultiDataSetIterator](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/dataset/api/iterator/MultiDataSetIterator.java) interface directly
- By using the [RecordReaderMultiDataSetIterator](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-data/deeplearning4j-datavec-iterators/src/main/java/org/deeplearning4j/datasets/datavec/RecordReaderMultiDataSetIterator.java) in conjuction with DataVec record readers
-
-
-The RecordReaderMultiDataSetIterator provides a number of options for loading data. In particular, the RecordReaderMultiDataSetIterator provides the following functionality:
-
- Multiple DataVec RecordReaders may be used simultaneously
- The record readers need not be the same modality: for example, you can use an image record reader with a CSV record reader
- It is possible to use a subset of the columns in a RecordReader for different purposes - for example, the first 10 columns in a CSV could be your input, and the last 5 could be your output
- It is possible to convert single columns from a class index to a one-hot representation
-
-
-Some basic examples on how to use the RecordReaderMultiDataSetIterator follow. You might also find [these unit tests](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-core/src/test/java/org/deeplearning4j/datasets/datavec/RecordReaderMultiDataSetIteratorTest.java) to be useful.
-
-### <a name="rrmdsi1">RecordReaderMultiDataSetIterator Example 1: Regression Data</a>
-
-Suppose we have a CSV file with 5 columns, and we want to use the first 3 as our input, and the last 2 columns as our output (for regression). We can build a MultiDataSetIterator to do this as follows:
-
-```java
-int numLinesToSkip = 0;
-String fileDelimiter = ",";
-RecordReader rr = new CSVRecordReader(numLinesToSkip,fileDelimiter);
-String csvPath = "/path/to/my/file.csv";
-rr.initialize(new FileSplit(new File(csvPath)));
-
-int batchSize = 4;
-MultiDataSetIterator iterator = new RecordReaderMultiDataSetIterator.Builder(batchSize)
-        .addReader("myReader",rr)
-        .addInput("myReader",0,2)  //Input: columns 0 to 2 inclusive
-        .addOutput("myReader",3,4) //Output: columns 3 to 4 inclusive
-        .build();
-```
-
-
-### <a name="rrmdsi2">RecordReaderMultiDataSetIterator Example 2: Classification and Multi-Task Learning</a>
-
-Suppose we have two separate CSV files, one for our inputs, and one for our outputs. Further suppose we are building a multi-task learning architecture, whereby have two outputs - one for classification.
-For this example, let's assume the data is as follows:
-
- Input file: myInput.csv, and we want to use all columns as input (without modification)
- Output file: myOutput.csv.
-  - Network output 1 - regression: columns 0 to 3
-  - Network output 2 - classification: column 4 is the class index for classification, with 3 classes. Thus column 4 contains integer values [0,1,2] only, and we want to convert these indexes to a one-hot representation for classification.
-
-In this case, we can build our iterator as follows:
-
-```java
-int numLinesToSkip = 0;
-String fileDelimiter = ",";
-
-RecordReader featuresReader = new CSVRecordReader(numLinesToSkip,fileDelimiter);
-String featuresCsvPath = "/path/to/my/myInput.csv";
-featuresReader.initialize(new FileSplit(new File(featuresCsvPath)));
-
-RecordReader labelsReader = new CSVRecordReader(numLinesToSkip,fileDelimiter);
-String labelsCsvPath = "/path/to/my/myOutput.csv";
-labelsReader.initialize(new FileSplit(new File(labelsCsvPath)));
-
-int batchSize = 4;
-int numClasses = 3;
-MultiDataSetIterator iterator = new RecordReaderMultiDataSetIterator.Builder(batchSize)
-        .addReader("csvInput", featuresReader)
-        .addReader("csvLabels", labelsReader)
-        .addInput("csvInput") //Input: all columns from input reader
-        .addOutput("csvLabels", 0, 3) //Output 1: columns 0 to 3 inclusive
-        .addOutputOneHot("csvLabels", 4, numClasses)   //Output 2: column 4 -> convert to one-hot for classification
-        .build();
-```
--- a/docs/deeplearning4j-nn/templates/convolutional.md
+++ b/docs/deeplearning4j-nn/templates/convolutional.md
@ -1,15 +0,0 @@
---
-title: Supported Convolutional Layers
-short_title: Convolutional
-description: Supported convolutional layers.
-category: Models
-weight: 3
---
-
-## What is a convolutional neural network?
-
-Each layer in a neural network configuration represents a unit of hidden units. When layers are stacked together, they represent a *deep neural network*.
-
-## Available layers
-
-{{autogenerated}}
--- a/docs/deeplearning4j-nn/templates/custom-layer.md
+++ b/docs/deeplearning4j-nn/templates/custom-layer.md
@ -1,52 +0,0 @@
---
-title: Custom Layers
-short_title: Custom Layers
-description: Extend DL4J functionality for custom layers.
-category: Models
-weight: 10
---
-
-## Writing Your Custom Layer
-
-There are two components to adding a custom layer:
-
-1. Adding the layer configuration class: extends org.deeplearning4j.nn.conf.layers.Layer
-2. Adding the layer implementation class: implements org.deeplearning4j.nn.api.Layer
-
-The configuration layer ((1) above) class handles the settings. It's the one you would
-use when constructing a MultiLayerNetwork or ComputationGraph. You can add custom
-settings here, and use them in your layer.
-
-The implementation layer ((2) above) class has parameters, and handles network forward
-pass, backpropagation, etc. It is created from the org.deeplearning4j.nn.conf.layers.Layer.instantiate(...)
-method. In other words: the instantiate method is how we go from the configuration
-to the implementation; MultiLayerNetwork or ComputationGraph will call this method
-when initializing the
-
-An example of these are CustomLayer (the configuration class) and CustomLayerImpl (the
-implementation class). Both of these classes have extensive comments regarding
-their methods.
-
-You'll note that in Deeplearning4j there are two DenseLayer clases, two GravesLSTM classes,
-etc: the reason is because one is for the configuration, one is for the implementation.
-We have not followed this "same name" pattern here to hopefully avoid confusion.
-
-## Testing Your Custom Layer
-
-Once you have added a custom layer, it is necessary to run some tests to ensure
-it is correct.
-
-These tests should at a minimum include the following:
-
-1. Tests to ensure that the JSON configuration (to/from JSON) works correctly
-   This is necessary for networks with your custom layer to function with both
-   model serialization (saving) and Spark training.
-2. Gradient checks to ensure that the implementation is correct.
-
-## Example
-
-A full custom layer example is available in our [examples repository](https://github.com/eclipse/deeplearning4j-examples/tree/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/misc/customlayers).
-
-## API
-
-{{autogenerated}}
--- a/docs/deeplearning4j-nn/templates/early-stopping.md
+++ b/docs/deeplearning4j-nn/templates/early-stopping.md
@ -1,83 +0,0 @@
---
-title: Early Stopping
-short_title: Early Stopping
-description: Terminate a training session given certain conditions.
-category: Tuning & Training
-weight: 10
---
-
-## What is early stopping?
-
-When training neural networks, numerous decisions need to be made regarding the settings (hyperparameters) used, in order to obtain good performance. Once such hyperparameter is the number of training epochs: that is, how many full passes of the data set (epochs) should be used? If we use too few epochs, we might underfit (i.e., not learn everything we can from the training data); if we use too many epochs, we might overfit (i.e., fit the 'noise' in the training data, and not the signal).
-
-Early stopping attempts to remove the need to manually set this value. It can also be considered a type of regularization method (like L1/L2 weight decay and dropout) in that it can stop the network from overfitting.
-
-The idea behind early stopping is relatively simple:
-
-* Split data into training and test sets
-* At the end of each epoch (or, every N epochs):
-  * evaluate the network performance on the test set
-  * if the network outperforms the previous best model: save a copy of the network at the current epoch
-* Take as our final model the model that has the best test set performance
-
-
-This is shown graphically below:
-
-![Early Stopping](/images/guide/earlystopping.png)
-
-The best model is the one saved at the time of the vertical dotted line - i.e., the model with the best accuracy on the test set.
-
-
-Using DL4J's early stopping functionality requires you to provide a number of configuration options:
-
-* A score calculator, such as the *DataSetLossCalculator*([JavaDoc](https://deeplearning4j.org/api/{{page.version}}/org/deeplearning4j/earlystopping/scorecalc/DataSetLossCalculator.html), [Source Code](https://github.com/eclipse/deeplearning4j/blob/c152293ef8d1094c281f5375ded61ff5f8eb6587/deeplearning4j-core/src/main/java/org/deeplearning4j/earlystopping/scorecalc/DataSetLossCalculator.java)) for a Multi Layer Network, or *DataSetLossCalculatorCG* ([JavaDoc](https://deeplearning4j.org/api/{{page.version}}/org/deeplearning4j/earlystopping/scorecalc/DataSetLossCalculatorCG.html), [Source Code](https://github.com/eclipse/deeplearning4j/blob/c152293ef8d1094c281f5375ded61ff5f8eb6587/deeplearning4j-core/src/main/java/org/deeplearning4j/earlystopping/scorecalc/DataSetLossCalculatorCG.java)) for a Computation Graph. Is used to calculate at every epoch (for example: the loss function value on a test set, or the accuracy on the test set)
-* How frequently we want to calculate the score function (default: every epoch)
-* One or more termination conditions, which tell the training process when to stop. There are two classes of termination conditions:
-  * Epoch termination conditions: evaluated every N epochs
-  * Iteration termination conditions: evaluated once per minibatch
-* A model saver, that defines how models are saved
-
-An example, with an epoch termination condition of maximum of 30 epochs, a maximum of 20 minutes training time, calculating the score every epoch, and saving the intermediate results to disk:
-
-```java
-
-MultiLayerConfiguration myNetworkConfiguration = ...;
-DataSetIterator myTrainData = ...;
-DataSetIterator myTestData = ...;
-
-EarlyStoppingConfiguration esConf = new EarlyStoppingConfiguration.Builder()
-		.epochTerminationConditions(new MaxEpochsTerminationCondition(30))
-		.iterationTerminationConditions(new MaxTimeIterationTerminationCondition(20, TimeUnit.MINUTES))
-		.scoreCalculator(new DataSetLossCalculator(myTestData, true))
-        .evaluateEveryNEpochs(1)
-		.modelSaver(new LocalFileModelSaver(directory))
-		.build();
-
-EarlyStoppingTrainer trainer = new EarlyStoppingTrainer(esConf,myNetworkConfiguration,myTrainData);
-
-//Conduct early stopping training:
-EarlyStoppingResult result = trainer.fit();
-
-//Print out the results:
-System.out.println("Termination reason: " + result.getTerminationReason());
-System.out.println("Termination details: " + result.getTerminationDetails());
-System.out.println("Total epochs: " + result.getTotalEpochs());
-System.out.println("Best epoch number: " + result.getBestModelEpoch());
-System.out.println("Score at best epoch: " + result.getBestModelScore());
-
-//Get the best model:
-MultiLayerNetwork bestModel = result.getBestModel();
-
-```
-
-You can also implement your own iteration and epoch termination conditions.
-
-## Early Stopping w/ Parallel Wrapper
-
-The early stopping implementation described above will only work with a single device. However, `EarlyStoppingParallelTrainer` provides similar functionality as early stopping and allows you to optimize for either multiple CPUs or GPUs. `EarlyStoppingParallelTrainer` wraps your model in a `ParallelWrapper` class and performs localized distributed training.
-
-Note that `EarlyStoppingParallelTrainer` doesn't support all of the functionality as its single device counterpart. It is not UI-compatible and may not work with complex iteration listeners. This is due to how the model is distributed and copied in the background.
-
-## API
-
-{{autogenerated}}
--- a/docs/deeplearning4j-nn/templates/evaluation.md
+++ b/docs/deeplearning4j-nn/templates/evaluation.md
@ -1,212 +0,0 @@
---
-title: Evaluation Classes for Neural Networks
-short_title: Evaluation
-description: Tools and classes for evaluating neural network performance
-category: Tuning & Training
-weight: 3
---
-
-
-## Why evaluate?
-
-When training or deploying a Neural Network it is useful to know the accuracy of your model. In DL4J the Evaluation Class and variants of the Evaluation Class are available to evaluate your model's performance. 
-
-
-### <a name="classification">Evaluation for Classification</a>
-
-The Evaluation class is used to evaluate the performance for binary and multi-class classifiers (including time series classifiers). This section covers basic usage of the Evaluation Class.
-
-Given a dataset in the form of a DataSetIterator, the easiest way to perform evaluation is to use the built-in evaluate methods on MultiLayerNetwork and ComputationGraph:
-```
-DataSetIterator myTestData = ...
-Evaluation eval = model.evaluate(myTestData);
-```
-
-However, evaluation can be performed on individual minibatches also. Here is an example taken from our dataexamples/CSVExample in the [Examples](https://github.com/eclipse/deeplearning4j-examples) project.
-
-The CSV example has CSV data for 3 classes of flowers and builds a simple feed forward neural network to classify the flowers based on 4 measurements. 
-
-```
-Evaluation eval = new Evaluation(3);
-INDArray output = model.output(testData.getFeatures());
-eval.eval(testData.getLabels(), output);
-log.info(eval.stats());
-```
-
-The first line creates an Evaluation object with 3 classes. 
-The second line gets the labels from the model for our test dataset. 
-The third line uses the eval method to compare the labels array from the testdata with the labels generated from the model. 
-The fourth line logs the evaluation data to the console. 
-
-The output.
-
-```
-Examples labeled as 0 classified by model as 0: 24 times
-Examples labeled as 1 classified by model as 1: 11 times
-Examples labeled as 1 classified by model as 2: 1 times
-Examples labeled as 2 classified by model as 2: 17 times
-
-
-==========================Scores========================================
- # of classes:    3
- Accuracy:        0.9811
- Precision:       0.9815
- Recall:          0.9722
- F1 Score:        0.9760
-Precision, recall & F1: macro-averaged (equally weighted avg. of 3 classes)
-========================================================================
-```
-
-By default the .stats() method displays the confusion matrix entries (one per line), Accuracy, Precision, Recall and F1 Score. Additionally the Evaluation Class can also calculate and return the following values:
-
-* Confusion Matrix
-* False Positive/Negative Rate
-* True Positive/Negative
-* Class Counts
-* F-beta, G-measure, Matthews Correlation Coefficient and more, see [Evaluation JavaDoc](https://deeplearning4j.org/api/latest/org/deeplearning4j/eval/Evaluation.html)
-
-Display the Confusion Matrix. 
-
-```
-System.out.println(eval.confusionToString());
-```
-
-Displays
-
-```
-Predicted:         0      1      2
-Actual:
-0  0          |      16      0      0
-1  1          |       0     19      0
-2  2          |       0      0     18
-```
-
-Additionaly the confusion matrix can be accessed directly, converted to csv or html using.
-
-```
-eval.getConfusionMatrix() ;
-eval.getConfusionMatrix().toHTML();
-eval.getConfusionMatrix().toCSV();
-```
-
-
-### <a name="regression">Evaluation for Regression</a>
-
-To Evaluate a network performing regression use the RegressionEvaluation Class. 
-
-As with the Evaluation class, RegressionEvaluation on a DataSetIterator can be performed as follows:
-```
-DataSetIterator myTestData = ...
-RegressionEvaluation eval = model.evaluateRegression(myTestData);
-```
-
-Here is a code snippet with single column, in this case the neural network was predicting the age of shelfish based on measurements. 
-
-```
-RegressionEvaluation eval =  new RegressionEvaluation(1);
-```
-
-Print the statistics for the Evaluation. 
-
-```
-System.out.println(eval.stats());
-```
-
-Returns
-
-```
-Column    MSE            MAE            RMSE           RSE            R^2            
-col_0     7.98925e+00    2.00648e+00    2.82653e+00    5.01481e-01    7.25783e-01    
-```
-
-Columns are Mean Squared Error, Mean Absolute Error, Root Mean Squared Error, Relative Squared Error, and R^2 Coefficient of Determination
-
-See [RegressionEvaluation JavaDoc](https://deeplearning4j.org/api/{{page.version}}/org/deeplearning4j/eval/RegressionEvaluation.html)
-
-### <a name="multiple">Performing Multiple Evaluations Simultaneously</a>
-
-When performing multiple types of evaluations (for example, Evaluation and ROC on the same network and dataset) it is more efficient to do this in one pass of the dataset, as follows:
-
-```
-DataSetIterator testData = ...
-Evaluation eval = new Evaluation();
-ROC roc = new ROC();
-model.doEvaluation(testdata, eval, roc);
-```
-
-### <a name="timeseries">Evaluation of Time Series</a>
-
-Time series evaluation is very similar to the above evaluation approaches. Evaluation in DL4J is performed on all (non-masked) time steps separately - for example, a time series of length 10 will contribute 10 predictions/labels to an Evaluation object.
-One difference with time seires is the (optional) presence of mask arrays, which are used to mark some time steps as missing or not present. See [Using RNNs - Masking](./deeplearning4j-nn-recurrent) for more details on masking.
-
-For most users, it is simply sufficient to use the ```MultiLayerNetwork.evaluate(DataSetIterator)``` or ```MultiLayerNetwork.evaluateRegression(DataSetIterator)``` and similar methods. These methods will properly handle masking, if mask arrays are present.
-
-
-### <a name="binary">Evaluation for Binary Classifiers</a>
-
-The EvaluationBinary is used for evaluating networks with binary classification outputs - these networks usually have Sigmoid activation functions and XENT loss functions. The typical classification metrics, such as accuracy, precision, recall, F1 score, etc. are calculated for each output.
-
-```
-EvaluationBinary eval = new EvaluationBinary(int size)
-```
-
-See [EvaluationBinary JavaDoc](https://deeplearning4j.org/api/{{page.version}}/org/deeplearning4j/eval/EvaluationBinary.html)
-
-
-### <a name="roc">ROC</a>
-
-ROC (Receiver Operating Characteristic) is another commonly used evaluation metric for the evaluation of classifiers. Three ROC variants exist in DL4J:
-
- ROC - for single binary label (as a single column probability, or 2 column 'softmax' probability distribution).
- ROCBinary - for multiple binary labels
- ROCMultiClass - for evaluation of non-binary classifiers, using a "one vs. all" approach 
-
-These classes have the ability to calculate the area under ROC curve (AUROC) and area under Precision-Recall curve (AUPRC), via the ```calculateAUC()``` and ```calculateAUPRC()``` methods. Furthermore, the ROC and Precision-Recall curves can be obtained using ```getRocCurve()``` and ```getPrecisionRecallCurve()```.
-
-The ROC and Precision-Recall curves can be exported to HTML for viewing using: ```EvaluationTools.exportRocChartsToHtmlFile(ROC, File)```, which will export a HTML file with both ROC and P-R curves, that can be viewed in a browser.
-
-
-Note that all three support two modes of operation/calculation
- Thresholded (approximate AUROC/AUPRC calculation, no memory issues)
- Exact (exact AUROC/AUPRC calculation, but can require large amount of memory with very large datasets - i.e., datasets with many millions of examples)
-
-The number of bins can be set using the constructors. Exact can be set using the default constructor ```new ROC()``` or explicitly using ```new ROC(0)```
-
-See [ROCBinary JavaDoc](https://deeplearning4j.org/api/{{page.version}}/org/deeplearning4j/eval/ROC.html) is used to evaluate Binary Classifiers.
-
-### <a name="calibration">Evaluating Classifier Calibration</a>
-
-Deeplearning4j also has the EvaluationCalibration class, which is designed to analyze the calibration of a classifier. It provides a number of tools for this purpose:
- 
- - Counts of the number of labels and predictions for each class
- - Reliability diagram (or reliability curve)
- - Residual plot (histogram)
- - Histograms of probabilities, including probabilities for each class separately
- 
- Evaluation of a classifier using EvaluationCalibration is performed in a similar manner to the other evaluation classes.
- The various plots/histograms can be exported to HTML for viewing using ```EvaluationTools.exportevaluationCalibrationToHtmlFile(EvaluationCalibration, File)```.
-
-### <a name="spark">Distributed Evaluation for Spark Networks</a>
-
-SparkDl4jMultiLayer and SparkComputationGraph both have similar methods for evaluation:
-```
-Evaluation eval = SparkDl4jMultiLayer.evaluate(JavaRDD<DataSet>);
-
-//Multiple evaluations in one pass:
-SparkDl4jMultiLayer.doEvaluation(JavaRDD<DataSet>, IEvaluation...);
-```
-
-
-### <a name="multitask">Evaluation for Multi-task Networks</a>
-
-A multi-task network is a network that is trained to produce multiple outputs. For example a network given audio samples can be trained to both predict the language spoken and the gender of the speaker. Multi-task configuration is briefly described [here](./deeplearning4j-nn-computationgraph). 
-
-Evaluation Classes useful for Multi-Task Network
-
-See [ROCMultiClass JavaDoc](https://deeplearning4j.org/api/{{page.version}}/org/deeplearning4j/eval/ROCMultiClass.html)
-
-See [ROCBinary JavaDoc](https://deeplearning4j.org/api/{{page.version}}/org/deeplearning4j/eval/ROCBinary.html)
-
-## Available evaluations
-
-{{autogenerated}}
--- a/docs/deeplearning4j-nn/templates/iterators.md
+++ b/docs/deeplearning4j-nn/templates/iterators.md
@ -1,44 +0,0 @@
---
-title: Deeplearning4j Iterators
-short_title: Iterators
-description: Data iteration tools for loading into neural networks.
-category: Models
-weight: 5
---
-
-## What is an iterator?
-
-A dataset iterator allows for easy loading of data into neural networks and help organize batching, conversion, and masking. The iterators included in Eclipse Deeplearning4j help with either user-provided data, or automatic loading of common benchmarking datasets such as MNIST and IRIS.
-
-## Usage
-
-For most use cases, initializing an iterator and passing a reference to a `MultiLayerNetwork` or `ComputationGraph` `fit()` method is all you need to begin a task for training:
-
-```java
-MultiLayerNetwork model = new MultiLayerNetwork(conf);
-model.init();
-
-// pass an MNIST data iterator that automatically fetches data
-DataSetIterator mnistTrain = new MnistDataSetIterator(batchSize, true, rngSeed);
-net.fit(mnistTrain);
-```
-
-Many other methods also accept iterators for tasks such as evaluation:
-
-```java
-// passing directly to the neural network
-DataSetIterator mnistTest = new MnistDataSetIterator(batchSize, false, rngSeed);
-net.eval(mnistTest);
-
-// using an evaluation class
-Evaluation eval = new Evaluation(10); //create an evaluation object with 10 possible classes
-while(mnistTest.hasNext()){
-    DataSet next = mnistTest.next();
-    INDArray output = model.output(next.getFeatureMatrix()); //get the networks prediction
-    eval.eval(next.getLabels(), output); //check the prediction against the true class
-}
-```
-
-## Available iterators
-
-{{autogenerated}}
--- a/docs/deeplearning4j-nn/templates/layers.md
+++ b/docs/deeplearning4j-nn/templates/layers.md
@ -1,23 +0,0 @@
---
-title: Supported Layers
-short_title: Layers
-description: Supported neural network layers.
-category: Models
-weight: 3
---
-
-## What are layers?
-
-Each layer in a neural network configuration represents a unit of hidden units. When layers are stacked together, they represent a *deep neural network*.
-
-## Using layers
-
-All layers available in Eclipse Deeplearning4j can be used either in a `MultiLayerNetwork` or `ComputationGraph`. When configuring a neural network, you pass the layer configuration and the network will instantiate the layer for you.
-
-## Layers vs. vertices
-
-If you are configuring complex networks such as InceptionV4, you will need to use the `ComputationGraph` API and join different branches together using vertices. Check the vertices for more information.
-
-## General layers
-
-{{autogenerated}}
--- a/docs/deeplearning4j-nn/templates/listeners.md
+++ b/docs/deeplearning4j-nn/templates/listeners.md
@ -1,26 +0,0 @@
---
-title: Deeplearning4j Listeners
-short_title: Listeners
-description: Adding hooks and listeners on DL4J models.
-category: Models
-weight: 5
---
-
-## What are listeners?
-
-Listeners allow users to "hook" into certain events in Eclipse Deeplearning4j. This allows you to collect or print information useful for tasks like training. For example, a `ScoreIterationListener` allows you to print training scores from the output layer of a neural network.
-
-## Usage
-
-To add one or more listeners to a `MultiLayerNetwork` or `ComputationGraph`, use the `addListener` method:
-
-```java
-MultiLayerNetwork model = new MultiLayerNetwork(conf);
-model.init();
-//print the score with every 1 iteration
-model.setListeners(new ScoreIterationListener(1));
-```
-
-## Available listeners
-
-{{autogenerated}}
--- a/docs/deeplearning4j-nn/templates/model-persistence.md
+++ b/docs/deeplearning4j-nn/templates/model-persistence.md
@ -1,28 +0,0 @@
---
-title: Deeplearning4j Model Persistence
-short_title: Model Persistence
-description: Saving and loading of neural networks.
-category: Models
-weight: 10
---
-
-## Saving and Loading a Neural Network
-
-The `ModelSerializer` is a class which handles loading and saving models. There are two methods for saving models shown in the examples through the link. The first example saves a normal multilayer network, the second one saves a [computation graph](https://deeplearning4j.org/docs/latest/deeplearning4j-nn-computationgraph).
-
-Here is a [basic example](https://github.com/eclipse/deeplearning4j-examples/tree/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/misc/modelsaving) with code to save a computation graph using the `ModelSerializer` class, as well as an example of using ModelSerializer to save a neural net built using MultiLayer configuration.
-
-### RNG Seed
-
-If your model uses probabilities (i.e. DropOut/DropConnect), it may make sense to save it separately, and apply it after model is restored; i.e:
-
-```bash
- Nd4j.getRandom().setSeed(12345);
- ModelSerializer.restoreMultiLayerNetwork(modelFile);
-```
-
-This will guarantee equal results between sessions/JVMs.
-
-## Model serializer
-
-{{autogenerated}}
--- a/docs/deeplearning4j-nn/templates/multilayernetwork.md
+++ b/docs/deeplearning4j-nn/templates/multilayernetwork.md
@ -1,81 +0,0 @@
---
-title: Multilayer Network
-short_title: Multilayer Network
-description: Simple and sequential network configuration.
-category: Models
-weight: 3
---
-
-## Why use MultiLayerNetwork?
-
-The `MultiLayerNetwork` class is the simplest network configuration API available in Eclipse Deeplearning4j. This class is useful for beginners or users who do not need a complex and branched network graph. 
-
-You will not want to use `MultiLayerNetwork` configuration if you are creating complex loss functions, using graph vertices, or doing advanced training such as a triplet network. This includes popular complex networks such as InceptionV4.
-
-## Usage
-
-The example below shows how to build a simple linear classifier using `DenseLayer` (a basic multiperceptron layer).
-
-```java
-MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
-    .seed(seed)
-    .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
-    .learningRate(learningRate)
-    .updater(Updater.NESTEROVS).momentum(0.9)
-    .list()
-    .layer(0, new DenseLayer.Builder().nIn(numInputs).nOut(numHiddenNodes)
-            .weightInit(WeightInit.XAVIER)
-            .activation("relu")
-            .build())
-    .layer(1, new OutputLayer.Builder(LossFunction.NEGATIVELOGLIKELIHOOD)
-            .weightInit(WeightInit.XAVIER)
-            .activation("softmax").weightInit(WeightInit.XAVIER)
-            .nIn(numHiddenNodes).nOut(numOutputs).build())
-    .pretrain(false).backprop(true).build();
-```
-
-You can also create convolutional configurations:
-
-```java
-MultiLayerConfiguration.Builder builder = new NeuralNetConfiguration.Builder()
-    .seed(seed)
-    .regularization(true).l2(0.0005)
-    .learningRate(0.01)//.biasLearningRate(0.02)
-    //.learningRateDecayPolicy(LearningRatePolicy.Inverse).lrPolicyDecayRate(0.001).lrPolicyPower(0.75)
-    .weightInit(WeightInit.XAVIER)
-    .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
-    .updater(Updater.NESTEROVS).momentum(0.9)
-    .list()
-    .layer(0, new ConvolutionLayer.Builder(5, 5)
-            //nIn and nOut specify depth. nIn here is the nChannels and nOut is the number of filters to be applied
-            .nIn(nChannels)
-            .stride(1, 1)
-            .nOut(20)
-            .activation("identity")
-            .build())
-    .layer(1, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
-            .kernelSize(2,2)
-            .stride(2,2)
-            .build())
-    .layer(2, new ConvolutionLayer.Builder(5, 5)
-            //Note that nIn need not be specified in later layers
-            .stride(1, 1)
-            .nOut(50)
-            .activation("identity")
-            .build())
-    .layer(3, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
-            .kernelSize(2,2)
-            .stride(2,2)
-            .build())
-    .layer(4, new DenseLayer.Builder().activation("relu")
-            .nOut(500).build())
-    .layer(5, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
-            .nOut(outputNum)
-            .activation("softmax")
-            .build())
-    .backprop(true).pretrain(false);
-```
-
-## API
-
-{{autogenerated}}
--- a/docs/deeplearning4j-nn/templates/recurrent.md
+++ b/docs/deeplearning4j-nn/templates/recurrent.md
@ -1,355 +0,0 @@
---
-title: Recurrent Neural Networks in DL4J
-short_title: RNN
-description: Recurrent Neural Network implementations in DL4J.
-category: Models
-weight: 10
---
-
-## Recurrent Neural Networks in DL4J
-
-This document outlines the specifics training features and the practicalities of how to use them in DeepLearning4J. This document assumes some familiarity with recurrent neural networks and their use - it is not an introduction to recurrent neural networks, and assumes some familiarity with their both their use and terminology.
-
-**Contents**
-
-* [The Basics: Data and Network Configuration](#basics)
-* [RNN Training Features](#trainingfeatures)
-  * [Truncated Back Propagation Through Time](#tbptt)
-  * [Masking: One-to-Many, Many-to-One, and Sequence Classification](#masking)
-    * [Masking and Sequence Classification After Training](#testtimemasking)
-  * [Combining RNN Layers with Other Layer Types](#otherlayertypes)
-* [Test Time: Prediction One Step at a Time](#rnntimestep)
-* [Importing Time Series Data](#data)
-* [Examples](#examples)
-
-## <a name="basics">The Basics: Data and Network Configuration</a>
-DL4J currently supports the following types of recurrent neural network
-* GravesLSTM (Long Short-Term Memory)
-* BidirectionalGravesLSTM 
-* BaseRecurrent
-
-Java documentation for each is available, [GravesLSTM](https://deeplearning4j.org/api/{{page.version}}/org/deeplearning4j/nn/conf/layers/GravesLSTM.html), 
- [BidirectionalGravesLSTM](https://deeplearning4j.org/api/{{page.version}}/org/deeplearning4j/nn/conf/layers/GravesBidirectionalLSTM.html),  [BaseRecurrent](https://deeplearning4j.org/api/latest/org/deeplearning4j/nn/conf/layers/BaseRecurrentLayer.html)
-
-#### Data for RNNs
-Consider for the moment a standard feed-forward network (a multi-layer perceptron or 'DenseLayer' in DL4J). These networks expect input and output data that is two-dimensional: that is, data with "shape" [numExamples,inputSize]. This means that the data into a feed-forward network has ‘numExamples’ rows/examples, where each row consists of ‘inputSize’ columns. A single example would have shape [1,inputSize], though in practice we generally use multiple examples for computational and optimization efficiency. Similarly, output data for a standard feed-forward network is also two dimensional, with shape [numExamples,outputSize].
-
-Conversely, data for RNNs are time series. Thus, they have 3 dimensions: one additional dimension for time. Input data thus has shape [numExamples,inputSize,timeSeriesLength], and output data has shape [numExamples,outputSize,timeSeriesLength]. This means that the data in our INDArray is laid out such that the value at position (i,j,k) is the jth value at the kth time step of the ith example in the minibatch. This data layout is shown below.
-
-When importing time series data using the class CSVSequenceRecordReader each line in the data files represents one time step with the earliest time series observation in the first row (or first row after header if present) and the most recent observation in the last row of the csv. Each feature time series is a separate column of the of the csv file. For example if you have five features in time series, each with 120 observations, and a training & test set of size 53 then there will be 106 input csv files(53 input, 53 labels). The 53 input csv files will each have five columns and 120 rows. The label csv files will have one column (the label) and one row.
-
-![Data: Feed Forward vs. RNN](/images/guide/rnn_data.png)
-
-#### RnnOutputLayer
-
-RnnOutputLayer is a type of layer used as the final layer with many recurrent neural network systems (for both regression and classification tasks). RnnOutputLayer handles things like score calculation, and error calculation (of prediction vs. actual) given a loss function etc. Functionally, it is very similar to the 'standard' OutputLayer class (which is used with feed-forward networks); however it both outputs (and expects as labels/targets) 3d time series data sets.
-
-Configuration for the RnnOutputLayer follows the same design other layers: for example, to set the third layer in a MultiLayerNetwork to a RnnOutputLayer for classification:
-
-    .layer(2, new RnnOutputLayer.Builder(LossFunction.MCXENT).activation(Activation.SOFTMAX)
-    .weightInit(WeightInit.XAVIER).nIn(prevLayerSize).nOut(nOut).build())
-
-Use of RnnOutputLayer in practice can be seen in the examples, linked at the end of this document.
-
-## <a name="trainingfeatures">RNN Training Features</a>
-
-### <a name="tbptt">Truncated Back Propagation Through Time</a>
-Training neural networks (including RNNs) can be quite computationally demanding. For recurrent neural networks, this is especially the case when we are dealing with long sequences - i.e., training data with many time steps.
-
-Truncated backpropagation through time (BPTT) was developed in order to reduce the computational complexity of each parameter update in a recurrent neural network. In summary, it allows us to train networks faster (by performing more frequent parameter updates), for a given amount of computational power. It is recommended to use truncated BPTT when your input sequences are long (typically, more than a few hundred time steps).
-
-Consider what happens when training a recurrent neural network with a time series of length 12 time steps. Here, we need to do a forward pass of 12 steps, calculate the error (based on predicted vs. actual), and do a backward pass of 12 time steps:
-
-![Standard Backprop Training](/images/guide/rnn_tbptt_1.png)
-
-For 12 time steps, in the image above, this is not a problem. Consider, however, that instead the input time series was 10,000 or more time steps. In this case, standard backpropagation through time would require 10,000 time steps for each of the forward and backward passes for each and every parameter update. This is of course very computationally demanding.
-
-In practice, truncated BPTT splits the forward and backward passes into a set of smaller forward/backward pass operations. The specific length of these forward/backward pass segments is a parameter set by the user. For example, if we use truncated BPTT of length 4 time steps, learning looks like the following:
-
-![Truncated BPTT](/images/guide/rnn_tbptt_2.png)
-
-Note that the overall complexity for truncated BPTT and standard BPTT are approximately the same - both do the same number of time step during forward/backward pass. Using this method however, we get 3 parameter updates instead of one for approximately the same amount of effort. However, the cost is not exactly the same there is a small amount of overhead per parameter update.
-
-The downside of truncated BPTT is that the length of the dependencies learned in truncated BPTT can be shorter than in full BPTT. This is easy to see: consider the images above, with a TBPTT length of 4. Suppose that at time step 10, the network needs to store some information from time step 0 in order to make an accurate prediction. In standard BPTT, this is ok: the gradients can flow backwards all the way along the unrolled network, from time 10 to time 0. In truncated BPTT, this is problematic: the gradients from time step 10 simply don't flow back far enough to cause the required parameter updates that would store the required information. This tradeoff is usually worth it, and (as long as the truncated BPTT lengths are set appropriately), truncated BPTT works well in practice.
-
-Using truncated BPTT in DL4J is quite simple: just add the following code to your network configuration (at the end, before the final .build() in your network configuration)
-
-    .backpropType(BackpropType.TruncatedBPTT)
-    .tBPTTLength(100)
-
-The above code snippet will cause any network training (i.e., calls to MultiLayerNetwork.fit() methods) to use truncated BPTT with segments of length 100 steps.
-
-Some things of note:
-
-* By default (if a backprop type is not manually specified), DL4J will use BackpropType.Standard (i.e., full BPTT).
-* The tBPTTLength configuration parameter set the length of the truncated BPTT passes. Typically, this is somewhere on the order of 50 to 200 time steps, though depends on the application and data.
-* The truncated BPTT lengths is typically a fraction of the total time series length (i.e., 200 vs. sequence length 1000), but variable length time series in the same minibatch is OK when using TBPTT (for example, a minibatch with two sequences - one of length 100 and another of length 1000 - with a TBPTT length of 200 - will work correctly)
-
-### <a name="masking">Masking: One-to-Many, Many-to-One, and Sequence Classification</a>
-
-DL4J supports a number of related training features for RNNs, based on the idea of padding and masking. Padding and masking allows us to support training situations including one-to-many, many-to-one, as also support variable length time series (in the same mini-batch).
-
-Suppose we want to train a recurrent neural network with inputs or outputs that don't occur at every time step. Examples of this (for a single example) are shown in the image below. DL4J supports training networks for all of these situations:
-
-![RNN Training Types](/images/guide/rnn_masking_1.png)
-
-Without masking and padding, we are restricted to the many-to-many case (above, left): that is, (a) All examples are of the same length, and (b) Examples have both inputs and outputs at all time steps.
-
-The idea behind padding is simple. Consider two time series of lengths 50 and 100 time steps, in the same mini-batch. The training data is a rectangular array; thus, we pad (i.e., add zeros to) the shorter time series (for both input and output), such that the input and output are both the same length (in this example: 100 time steps).
-
-Of course, if this was all we did, it would cause problems during training. Thus, in addition to padding, we use a masking mechanism. The idea behind masking is simple: we have two additional arrays that record whether an input or output is actually present for a given time step and example, or whether the input/output is just padding.
-
-Recall that with RNNs, our minibatch data has 3 dimensions, with shape [miniBatchSize,inputSize,timeSeriesLength] and [miniBatchSize,outputSize,timeSeriesLength] for the input and output respectively. The padding arrays are then 2 dimensional, with shape [miniBatchSize,timeSeriesLength] for both the input and output, with values of 0 ('absent') or 1 ('present') for each time series and example. The masking arrays for the input and output are stored in separate arrays.
-
-For a single example, the input and output masking arrays are shown below:
-
-![RNN Training Types](/images/guide/rnn_masking_2.png)
-
-For the “Masking not required” cases, we could equivalently use a masking array of all 1s, which will give the same result as not having a mask array at all. Also note that it is possible to use zero, one or two masking arrays when learning RNNs - for example, the many-to-one case could have a masking array for the output only.
-
-In practice: these padding arrays are generally created during the data import stage (for example, by the SequenceRecordReaderDatasetIterator – discussed later), and are contained within the DataSet object. If a DataSet contains masking arrays, the MultiLayerNetwork fit will automatically use them during training. If they are absent, no masking functionality is used.
-
-#### Evaluation and Scoring with Masking
-
-Mask arrays are also important when doing scoring and evaluation (i.e., when evaluating the accuracy of a RNN classifier). Consider for example the many-to-one case: there is only a single output for each example, and any evaluation should take this into account.
-
-Evaluation using the (output) mask arrays can be used during evaluation by passing it to the following method:
-
-    Evaluation.evalTimeSeries(INDArray labels, INDArray predicted, INDArray outputMask)
-
-where labels are the actual output (3d time series), predicted is the network predictions (3d time series, same shape as labels), and outputMask is the 2d mask array for the output. Note that the input mask array is not required for evaluation.
-
-Score calculation will also make use of the mask arrays, via the MultiLayerNetwork.score(DataSet) method. Again, if the DataSet contains an output masking array, it will automatically be used when calculating the score (loss function - mean squared error, negative log likelihood etc) for the network.
-
-#### <a name="testtimemasking">Masking and Sequence Classification After Training</a>
-
-Sequence classification is one common use of masking. The idea is that although we have a sequence (time series) as input, we only want to provide a single label for the entire sequence (rather than one label at each time step in the sequence).
-
-However, RNNs by design output sequences, of the same length of the input sequence. For sequence classification, masking allows us to train the network with this single label at the final time step - we essentially tell the network that there isn't *actually* label data anywhere except for the last time step.
-
-Now, suppose we've trained our network, and want to get the last time step for predictions, from the time series output array. How do we do that?
-
-
-To get the last time step, there are two cases to be aware of. First, when we have a single example, we don't actually need to use the mask arrays: we can just get the last time step in the output array:
-
-```
-    INDArray timeSeriesFeatures = ...;
-    INDArray timeSeriesOutput = myNetwork.output(timeSeriesFeatures);
-    int timeSeriesLength = timeSeriesOutput.size(2);		//Size of time dimension
-    INDArray lastTimeStepProbabilities = timeSeriesOutput.get(NDArrayIndex.point(0), NDArrayIndex.all(), NDArrayIndex.point(timeSeriesLength-1));
-```
-
-Assuming classification (same process for regression, however) the last line above gives us probabilities at the last time step - i.e., the class probabilities for our sequence classification.
-
-
-The slightly more complex case is when we have multiple examples in the one minibatch (features array), where the lengths of each example differ. (If all are the same length: we can use the same process as above).
-
-In this 'variable length' case, we need to get the last time step *for each example separately*. If we have the time series lengths for each example from our data pipeline, it becomes straightforward: we just iterate over examples, replacing the ```timeSeriesLength``` in the above code with the length of that example.
-
-If we don't have the lengths of the time series directly, we need to extract them from the mask array.
-
-If we have a labels mask array (which is a one-hot vector, like [0,0,0,1,0] for each time series):
-
-```
-    INDArray labelsMaskArray = ...;
-    INDArray lastTimeStepIndices = Nd4j.argMax(labelMaskArray,1);
-```
-
-Alternatively, if we have only the features mask: One quick and dirty approach is to use this:
-
-```
-    INDArray featuresMaskArray = ...;
-    int longestTimeSeries = featuresMaskArray.size(1);
-    INDArray linspace = Nd4j.linspace(1,longestTimeSeries,longestTimeSeries);
-    INDArray temp = featuresMaskArray.mulColumnVector(linspace);
-    INDArray lastTimeStepIndices = Nd4j.argMax(temp,1);
-```
-To understand what is happening here, note that originally we have a features mask like [1,1,1,1,0], from which we want to get the last non-zero element. So we map [1,1,1,1,0] -> [1,2,3,4,0], and then get the largest element (which is the last time step).
-
-
-In either case, we can then do the following:
-
-```
-    int numExamples = timeSeriesFeatures.size(0);
-    for( int i=0; i<numExamples; i++ ){
-        int thisTimeSeriesLastIndex = lastTimeStepIndices.getInt(i);
-        INDArray thisExampleProbabilities = timeSeriesOutput.get(NDArrayIndex.point(i), NDArrayIndex.all(), NDArrayIndex.point(thisTimeSeriesLastIndex));
-    }
-```
-
-
-### <a name="otherlayertypes">Combining RNN Layers with Other Layer Types</a>
-
-RNN layers in DL4J can be combined with other layer types. For example, it is possible to combine DenseLayer and LSTM layers in the same network; or combine Convolutional (CNN) layers and LSTM layers for video.
-
-Of course, the DenseLayer and Convolutional layers do not handle time series data - they expect a different  type of input. To deal with this, we need to use the layer preprocessor functionality: for example, the CnnToRnnPreProcessor and FeedForwardToRnnPreprocessor classes. See [here](https://github.com/eclipse/deeplearning4j/tree/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/preprocessor) for all preprocessors. Fortunately, in most situations, the DL4J configuration system will automatically add these preprocessors as required. However, the preprocessors can be added manually (overriding the automatic addition of preprocessors, for each layer).
-
-For example, to manually add a preprocessor between layers 1 and 2, add the following to your network configuration: `.inputPreProcessor(2, new RnnToFeedForwardPreProcessor())`.
-
-## <a name="rnntimestep">Test Time: Predictions One Step at a Time</a>
-As with other types of neural networks, predictions can be generated for RNNs using the `MultiLayerNetwork.output()` and `MultiLayerNetwork.feedForward()` methods. These methods can be useful in many circumstances; however, they have the limitation that we can only generate predictions for time series, starting from scratch each and every time.
-
-Consider for example the case where we want to generate predictions in a real-time system, where these predictions are based on a very large amount of history. It this case, it is impractical to use the output/feedForward methods, as they conduct the full forward pass over the entire data history, each time they are called. If we wish to make a prediction for a single time step, at every time step, these methods can be both (a) very costly, and (b) wasteful, as they do the same calculations over and over.
-
-For these situations, MultiLayerNetwork provides four methods of note:
-
-* `rnnTimeStep(INDArray)`
-* `rnnClearPreviousState()`
-* `rnnGetPreviousState(int layer)`
-* `rnnSetPreviousState(int layer, Map<String,INDArray> state)`
-
-The rnnTimeStep() method is designed to allow forward pass (predictions) to be conducted efficiently, one or more steps at a time. Unlike the output/feedForward methods, the rnnTimeStep method keeps track of the internal state of the RNN layers when it is called. It is important to note that output for the rnnTimeStep and the output/feedForward methods should be identical (for each time step), whether we make these predictions all at once (output/feedForward) or whether these predictions are generated one or more steps at a time (rnnTimeStep). Thus, the only difference should be the computational cost.
-
-In summary, the MultiLayerNetwork.rnnTimeStep() method does two things:
-
-1.	Generate output/predictions (forward pass), using the previous stored state (if any)
-2.	Update the stored state, storing the activations for the last time step (ready to be used next time rnnTimeStep is called)
-
-For example, suppose we want to use a RNN to predict the weather, one hour in advance (based on the weather at say the previous 100 hours as input).
-If we were to use the output method, at each hour we would need to feed in the full 100 hours of data to predict the weather for hour 101. Then to predict the weather for hour 102, we would need to feed in the full 100 (or 101) hours of data; and so on for hours 103+.
-
-Alternatively, we could use the rnnTimeStep method. Of course, if we want to use the full 100 hours of history before we make our first prediction, we still need to do the full forward pass:
-
-![RNN Time Step](/images/guide/rnn_timestep_1.png)
-
-For the first time we call rnnTimeStep, the only practical difference between the two approaches is that the activations/state of the last time step are stored - this is shown in orange. However, the next time we use the rnnTimeStep method, this stored state will be used to make the next predictions:
-
-![RNN Time Step](/images/guide/rnn_timestep_2.png)
-
-There are a number of important differences here:
-
-1. In the second image (second call of rnnTimeStep) the input data consists of a single time step, instead of the full history of data
-2. The forward pass is thus a single time step (as compared to the hundreds – or more)
-3. After the rnnTimeStep method returns, the internal state will automatically be updated. Thus, predictions for time 103 could be made in the same way as for time 102. And so on.
-
-However, if you want to start making predictions for a new (entirely separate) time series: it is necessary (and important) to manually clear the stored state, using the `MultiLayerNetwork.rnnClearPreviousState()` method. This will reset the internal state of all recurrent layers in the network.
-
-If you need to store or set the internal state of the RNN for use in predictions, you can use the rnnGetPreviousState and rnnSetPreviousState methods, for each layer individually. This can be useful for example during serialization (network saving/loading), as the internal network state from the rnnTimeStep method is *not* saved by default, and must be saved and loaded separately. Note that these get/set state methods return and accept a map, keyed by the type of activation. For example, in the LSTM model, it is necessary to store both the output activations, and the memory cell state.
-
-Some other points of note:
-
- We can use the rnnTimeStep method for multiple independent examples/predictions simultaneously. In the weather example above, we might for example want to make predicts for multiple locations using the same neural network. This works in the same way as training and  the forward pass / output methods: multiple rows (dimension 0 in the input data) are used for multiple examples.
- If no history/stored state is set (i.e., initially, or after a call to rnnClearPreviousState), a default initialization (zeros) is used. This is the same approach as during training.
- The rnnTimeStep can be used for an arbitrary number of time steps simultaneously – not just one time step. However, it is important to note:
-  - For a single time step prediction: the data is 2 dimensional, with shape [numExamples,nIn]; in this case, the output is also 2 dimensional, with shape [numExamples,nOut]
-  - For multiple time step predictions: the data is 3 dimensional, with shape [numExamples,nIn,numTimeSteps]; the output will have shape [numExamples,nOut,numTimeSteps]. Again, the final time step activations are stored as before.
- It is not possible to change the number of examples between calls of rnnTimeStep (in other words, if the first use of rnnTimeStep is for say 3 examples, all subsequent calls must be with 3 examples). After resetting the internal state (using rnnClearPreviousState()), any number of examples can be used for the next call of rnnTimeStep.
- The rnnTimeStep method makes no changes to the parameters; it is used after training the network has been completed only.
- The rnnTimeStep method works with networks containing single and stacked/multiple RNN layers, as well as with networks that combine other layer types (such as Convolutional or Dense layers).
- The RnnOutputLayer layer type does not have any internal state, as it does not have any recurrent connections.
-
-## <a name="data">Importing Time Series Data</a>
-
-Data import for RNNs is complicated by the fact that we have multiple different types of data we could want to use for RNNs: one-to-many, many-to-one, variable length time series, etc. This section will describe the currently implemented data import mechanisms for DL4J.
-
-The methods described here utilize the SequenceRecordReaderDataSetIterator class, in conjunction with the CSVSequenceRecordReader class from DataVec. This approach currently allows you to load delimited (tab, comma, etc) data from files, where each time series is in a separate file.
-This method also supports:
-
-* Variable length time series input
-* One-to-many and many-to-one data loading (where input and labels are in different files)
-* Label conversion from an index to a one-hot representation for classification (i.e., '2' to [0,0,1,0])
-* Skipping a fixed/specified number of rows at the start of the data files (i.e., comment or header rows)
-
-Note that in all cases, each line in the data files represents one time step.
-
-(In addition to the examples below, you might find [these unit tests](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-core/src/test/java/org/deeplearning4j/datasets/datavec/RecordReaderDataSetiteratorTest.java) to be of some use.)
-
-#### Example 1: Time Series of Same Length, Input and Labels in Separate Files
-
-Suppose we have 10 time series in our training data, represented by 20 files: 10 files for the input of each time series, and 10 files for the output/labels. For now, assume these 20 files all contain the same number of time steps (i.e., same number of rows).
-
-To use the [SequenceRecordReaderDataSetIterator](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-data/deeplearning4j-datavec-iterators/src/main/java/org/deeplearning4j/datasets/datavec/SequenceRecordReaderDataSetIterator.java) and [CSVSequenceRecordReader](https://github.com/eclipse/deeplearning4j/blob/master/datavec/datavec-api/src/main/java/org/datavec/api/records/reader/impl/csv/CSVSequenceRecordReader.java) approaches, we first create two CSVSequenceRecordReader objects, one for input and one for labels:
-
-    SequenceRecordReader featureReader = new CSVSequenceRecordReader(1, ",");
-    SequenceRecordReader labelReader = new CSVSequenceRecordReader(1, ",");
-
-This particular constructor takes the number of lines to skip (1 row skipped here), and the delimiter (comma character used here).
-
-Second, we need to initialize these two readers, by telling them where to get the data from. We do this with an InputSplit object.
-Suppose that our time series are numbered, with file names "myInput_0.csv", "myInput_1.csv", ..., "myLabels_0.csv", etc. One approach is to use the [NumberedFileInputSplit](https://github.com/eclipse/deeplearning4j/blob/master/datavec/datavec-api/src/main/java/org/datavec/api/split/NumberedFileInputSplit.java):
-
-    featureReader.initialize(new NumberedFileInputSplit("/path/to/data/myInput_%d.csv", 0, 9));
-    labelReader.initialize(new NumberedFileInputSplit(/path/to/data/myLabels_%d.csv", 0, 9));
-
-In this particular approach, the "%d" is replaced by the corresponding number, and the numbers 0 to 9 (both inclusive) are used.
-
-Finally, we can create our SequenceRecordReaderdataSetIterator:
-
-    DataSetIterator iter = new SequenceRecordReaderDataSetIterator(featureReader, labelReader, miniBatchSize, numPossibleLabels, regression);
-
-This DataSetIterator can then be passed to MultiLayerNetwork.fit() to train the network.
-
-The miniBatchSize argument specifies the number of examples (time series) in each minibatch. For example, with 10 files total, miniBatchSize of 5 would give us two data sets with 2 minibatches (DataSet objects) with 5 time series in each.
-
-Note that:
-
-* For classification problems: numPossibleLabels is the number of classes in your data set. Use regression = false.
-  * Labels data: one value per line, as a class index
-  * Label data will be converted to a one-hot representation automatically
-* For regression problems: numPossibleLabels is not used (set it to anything) and use regression = true.
-  * The number of values in the input and labels can be anything (unlike classification: can have an arbitrary number of outputs)
-  * No processing of the labels is done when regression = true
-
-#### Example 2: Time Series of Same Length, Input and Labels in Same File
-
-Following on from the last example, suppose that instead of a separate files for our input data and labels, we have both in the same file. However, each time series is still in a separate file.
-
-As of DL4J 0.4-rc3.8, this approach has the restriction of a single column for the output (either a class index, or a single real-valued regression output)
-
-In this case, we create and initialize a single reader. Again, we are skipping one header row, and specifying the format as comma delimited, and assuming our data files are named "myData_0.csv", ..., "myData_9.csv":
-
-    SequenceRecordReader reader = new CSVSequenceRecordReader(1, ",");
-    reader.initialize(new NumberedFileInputSplit("/path/to/data/myData_%d.csv", 0, 9));
-    DataSetIterator iterClassification = new SequenceRecordReaderDataSetIterator(reader, miniBatchSize, numPossibleLabels, labelIndex, false);
-
-`miniBatchSize` and `numPossibleLabels` are the same as the previous example. Here, `labelIndex` specifies which column the labels are in. For example, if the labels are in the fifth column, use labelIndex = 4 (i.e., columns are indexed 0 to numColumns-1).
-
-For regression on a single output value, we use:
-
-    DataSetIterator iterRegression = new SequenceRecordReaderDataSetIterator(reader, miniBatchSize, -1, labelIndex, true);
-
-Again, the numPossibleLabels argument is not used for regression.
-
-#### Example 3: Time Series of Different Lengths (Many-to-Many)
-
-Following on from the previous two examples, suppose that for each example individually, the input and labels are of the same length, but these lengths differ between time series.
-
-We can use the same approach (CSVSequenceRecordReader and SequenceRecordReaderDataSetIterator), though with a different constructor:
-
-    DataSetIterator variableLengthIter = new SequenceRecordReaderDataSetIterator(featureReader, labelReader, miniBatchSize, numPossibleLabels, regression, SequenceRecordReaderDataSetIterator.AlignmentMode.ALIGN_END);
-
-The argument here are the same as in the previous example, with the exception of the AlignmentMode.ALIGN_END addition. This alignment mode input tells the SequenceRecordReaderDataSetIterator to expect two things:
-
-1. That the time series may be of different lengths
-2. To align the input and labels - for each example individually - such that their last values occur at the same time step.
-
-Note that if the features and labels are always of the same length (as is the assumption in example 3), then the two alignment modes (AlignmentMode.ALIGN_END and AlignmentMode.ALIGN_START) will give identical outputs. The alignment mode option is explained in the next section.
-
-Also note: that variable length time series always start at time zero in the data arrays: padding, if required, will be added after the time series has ended.
-
-Unlike examples 1 and 2 above, the DataSet objects produced by the above variableLengthIter instance will also include input and masking arrays, as described earlier in this document.
-
-#### Example 4: Many-to-One and One-to-Many Data
-We can also use the AlignmentMode functionality in example 3 to implement a many-to-one RNN sequence classifier. Here, let us assume:
-
-* Input and labels are in separate delimited files
-* The labels files contain a single row (time step) (either a class index for classification, or one or more numbers for regression)
-* The input lengths may (optionally) differ between examples
-
-In fact, the same approach as in example 3 can do this:
-
-    DataSetIterator variableLengthIter = new SequenceRecordReaderDataSetIterator(featureReader, labelReader, miniBatchSize, numPossibleLabels, regression, SequenceRecordReaderDataSetIterator.AlignmentMode.ALIGN_END);
-
-Alignment modes are relatively straightforward. They specify whether to pad the start or the end of the shorter time series. The diagram below shows how this works, along with the masking arrays (as discussed earlier in this document):
-
-![Sequence Alignment](/images/guide/rnn_seq_alignment.png)
-
-The one-to-many case (similar to the last case above, but with only one input) is done by using AlignmentMode.ALIGN_START.
-
-Note that in the case of training data that contains time series of different lengths, the labels and inputs will be aligned for each example individually, and then the shorter time series will be padded as required:
-
-![Sequence Alignment](/images/guide/rnn_seq_alignment_2.png)
-
-## Available layers
-
-{{autogenerated}}
--- a/docs/deeplearning4j-nn/templates/transfer-learning.md
+++ b/docs/deeplearning4j-nn/templates/transfer-learning.md
@ -1,160 +0,0 @@
---
-title: Neural Network Transfer Learning
-short_title: Transfer Learning
-description:
-category: Tuning & Training
-weight: 5
---
-
-## DL4J’s Transfer Learning API
-
-The DL4J transfer learning API enables users to:
-
-* Modify the architecture of an existing model
-* Fine tune learning configurations of an existing model.
-* Hold parameters of a specified layer constant during training, also referred to as “frozen" 
- 
-Holding certain layers frozen on a network and training is effectively the same as training on a transformed version of the input, the transformed version being the intermediate outputs at the boundary of the frozen layers. This is the process of “feature extraction” from the input data and will be referred to as “featurizing” in this document. 
-
-
-## The transfer learning helper
-
-The forward pass to “featurize” the input data on large, pertained networks can be time consuming. DL4J also provides a TransferLearningHelper class with the following capabilities. 
-
-* Featurize an input dataset to save for future use
-* Fit the model with frozen layers with a featurized dataset 
-* Output from the model with frozen layers given a featurized input.
-
-When running multiple epochs users will save on computation time since the expensive forward pass on the frozen layers/vertices will only have to be conducted once.
-
-
-## Show me the code
-
-This example will use VGG16 to classify images belonging to five categories of flowers. The dataset will automatically download from http://download.tensorflow.org/example_images/flower_photos.tgz
-
-#### I.  Import a zoo model
-
-As of 0.9.0 (0.8.1-SNAPSHOT) Deeplearning4j has a new native model zoo. Read about the [deeplearning4j-zoo](/model-zoo) module for more information on using pretrained models. Here, we load a pretrained VGG-16 model initialized with weights trained on ImageNet:
-
-```
-ZooModel zooModel = new VGG16();
-ComputationGraph pretrainedNet = (ComputationGraph) zooModel.initPretrained(PretrainedType.IMAGENET);
-```
-
-
-#### II.  Set up a fine-tune configuration
-
-```
-FineTuneConfiguration fineTuneConf = new FineTuneConfiguration.Builder()
-            .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
-            .updater(new Nesterovs(5e-5))
-            .seed(seed)
-            .build();
-```
-
-#### III.  Build new models based on VGG16
-
-##### A.Modifying only the last layer, keeping other frozen
-
-The final layer of VGG16 does a softmax regression on the 1000 classes in ImageNet. We modify the very last layer to give predictions for five classes keeping the other layers frozen.
-
-```
-ComputationGraph vgg16Transfer = new TransferLearning.GraphBuilder(pretrainedNet)
-    .fineTuneConfiguration(fineTuneConf)
-              .setFeatureExtractor("fc2")
-              .removeVertexKeepConnections("predictions") 
-              .addLayer("predictions", 
-        new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
-                        .nIn(4096).nOut(numClasses)
-                        .weightInit(WeightInit.XAVIER)
-                        .activation(Activation.SOFTMAX).build(), "fc2")
-              .build();
-```
-After a mere thirty iterations, which in this case is exposure to 450 images, the model attains an accuracy > 75% on the test dataset. This is rather remarkable considering the complexity of training an image classifier from scratch.
-
-##### B. Attach new layers to the bottleneck (block5_pool)
-
-Here we hold all but the last three dense layers frozen and attach new dense layers onto it. Note that the primary intent here is to demonstrate the use of the API, secondary to what might give better results.
-
-```
-ComputationGraph vgg16Transfer = new TransferLearning.GraphBuilder(pretrainedNet)
-              .fineTuneConfiguration(fineTuneConf)
-              .setFeatureExtractor("block5_pool")
-              .nOutReplace("fc2",1024, WeightInit.XAVIER)
-              .removeVertexAndConnections("predictions") 
-              .addLayer("fc3",new DenseLayer.Builder()
-         .activation(Activation.RELU)
-         .nIn(1024).nOut(256).build(),"fc2") 
-              .addLayer("newpredictions",new OutputLayer
-        .Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
-                                .activation(Activation.SOFTMAX)
-                                .nIn(256).nOut(numClasses).build(),"fc3") 
-            .setOutputs("newpredictions") 
-            .build();
-```
-
-##### C. Fine tune layers from a previously saved model 
-
-Say we have saved off our model from (B) and now want to allow “block_5” layers to train. 
-
-```
-ComputationGraph vgg16FineTune = new TransferLearning.GraphBuilder(vgg16Transfer)
-              .fineTuneConfiguration(fineTuneConf)
-              .setFeatureExtractor(“block4_pool”)
-              .build();
-```
-
-#### IV.  Saving “featurized” datasets and training with them.
-
-We use the transfer learning helper API. Note this freezes the layers of the model passed in.
-
-Here is how you obtain the featured version of the dataset at the specified layer “fc2”.
-
-```
-TransferLearningHelper transferLearningHelper = 
-    new TransferLearningHelper(pretrainedNet, "fc2");
-while(trainIter.hasNext()) {
-        DataSet currentFeaturized = transferLearningHelper.featurize(trainIter.next());
-        saveToDisk(currentFeaturized,trainDataSaved,true);
-  trainDataSaved++;
-}
-```
-
-Here is how you can fit with a featured dataset. vgg16Transfer is a model setup in (A) of section III.
-
-```
-TransferLearningHelper transferLearningHelper = 
-    new TransferLearningHelper(vgg16Transfer);
-while (trainIter.hasNext()) {
-       transferLearningHelper.fitFeaturized(trainIter.next());
-}
-```
-
-## Notes
-
-* The TransferLearning builder returns a new instance of a dl4j model. 
-
-Keep in mind this is a second model that leaves the original one untouched. For large pertained network take into consideration memory requirements and adjust your JVM heap space accordingly.
-
-* The trained model helper imports models from Keras without enforcing a training configuration. 
-
-Therefore the last layer (as seen when printing the summary) is a dense layer and not an output layer with a loss function. Therefore to modify nOut of an output layer we delete the layer vertex, keeping it’s connections and add back in a new output layer with the same name, a different nOut, the suitable loss function etc etc. 
-
-* Changing nOuts at a layer/vertex will modify nIn of the layers/vertices it fans into. 
-
-When changing nOut users can specify a weight initialization scheme or a distribution for the layer as well as a separate weight initialization scheme or distribution for the layers it fans out to.
-
-* Frozen layer configurations are not saved when writing the model to disk. 
-
-In other words, a model with frozen layers when serialized and read back in will not have any frozen layers. To continue training holding specific layers constant the user is expected to go through the transfer learning helper or the transfer learning API. There are two ways to “freeze” layers in a dl4j model.
-
-    - On a copy: With the transfer learning API which will return a new model with the relevant frozen layers
-    - In place: With the transfer learning helper API which will apply the frozen layers to the given model.
-
-* FineTune configurations will selectively update learning parameters. 
-
-For eg, if a learning rate is specified this learning rate will apply to all unfrozen/trainable layers in the model. However, newly added layers can override this learning rate by specifying their own learning rates in the layer builder.
-
-## Utilities
-
-{{autogenerated}}
--- a/docs/deeplearning4j-nn/templates/tsne-visualization.md
+++ b/docs/deeplearning4j-nn/templates/tsne-visualization.md
@ -1,72 +0,0 @@
---
-title: t-SNE's Data Visualization
-short_title: t-SNE Visualization
-description: Data visualizaiton with t-SNE with higher dimensional data.
-category: Tuning & Training
-weight: 10
---
-
-## t-SNE's Data Visualization
-
-[t-Distributed Stochastic Neighbor Embedding](https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding) (t-SNE) is a data-visualization tool created by Laurens van der Maaten at Delft University of Technology.
-
-While it can be used for any data, t-SNE (pronounced Tee-Snee) is only really meaningful with labeled data, which clarify how the input is clustering. Below, you can see the kind of graphic you can generate in DL4J with t-SNE working on MNIST data.
-
-![Alt text](/images/guide/tsne.png)
-
-Look closely and you can see the numerals clustered near their likes, alongside the dots.
-
-Here's how t-SNE appears in Deeplearning4j code.
-
-```java
-public class TSNEStandardExample {
-
-    private static Logger log = LoggerFactory.getLogger(TSNEStandardExample.class);
-
-    public static void main(String[] args) throws Exception  {
-        //STEP 1: Initialization
-        int iterations = 100;
-        //create an n-dimensional array of doubles
-        DataTypeUtil.setDTypeForContext(DataBuffer.Type.DOUBLE);
-        List<String> cacheList = new ArrayList<>(); //cacheList is a dynamic array of strings used to hold all words
-
-        //STEP 2: Turn text input into a list of words
-        log.info("Load & Vectorize data....");
-        File wordFile = new ClassPathResource("words.txt").getFile();   //Open the file
-        //Get the data of all unique word vectors
-        Pair<InMemoryLookupTable,VocabCache> vectors = WordVectorSerializer.loadTxt(wordFile);
-        VocabCache cache = vectors.getSecond();
-        INDArray weights = vectors.getFirst().getSyn0();    //seperate weights of unique words into their own list
-
-        for(int i = 0; i < cache.numWords(); i++)   //seperate strings of words into their own list
-            cacheList.add(cache.wordAtIndex(i));
-
-        //STEP 3: build a dual-tree tsne to use later
-        log.info("Build model....");
-        BarnesHutTsne tsne = new BarnesHutTsne.Builder()
-                .setMaxIter(iterations).theta(0.5)
-                .normalize(false)
-                .learningRate(500)
-                .useAdaGrad(false)
-//                .usePca(false)
-                .build();
-
-        //STEP 4: establish the tsne values and save them to a file
-        log.info("Store TSNE Coordinates for Plotting....");
-        String outputFile = "target/archive-tmp/tsne-standard-coords.csv";
-        (new File(outputFile)).getParentFile().mkdirs();
-        tsne.plot(weights,2,cacheList,outputFile);
-        //This tsne will use the weights of the vectors as its matrix, have two dimensions, use the words strings as
-        //labels, and be written to the outputFile created on the previous line
-
-    }
-
-
-
-}
-```
-
-Here is an image of the tsne-standard-coords.csv file plotted using gnuplot.
-
-
-![Tsne data plot](/images/guide/tsne_output.png)
--- a/docs/deeplearning4j-nn/templates/vertices.md
+++ b/docs/deeplearning4j-nn/templates/vertices.md
@ -1,15 +0,0 @@
---
-title: Supported Vertices
-short_title: Vertices
-description: Computation graph nodes for advanced configuration.
-category: Models
-weight: 4
---
-
-## What is a vertex?
-
-In Eclipse Deeplearning4j a vertex is a type of layer that acts as a node in a `ComputationGraph`. It can accept multiple inputs, provide multiple outputs, and can help construct popular networks such as InceptionV4.
-
-## Available classes
-
-{{autogenerated}}
--- a/docs/deeplearning4j-nn/templates/visualization.md
+++ b/docs/deeplearning4j-nn/templates/visualization.md
@ -1,325 +0,0 @@
---
-title: Visualize, Monitor and Debug Neural Network Learning
-short_title: Visualization
-description: How to visualize, monitor and debug neural network learning.
-category: Tuning & Training
-weight: 2
---
-
-## Contents
-
-* [Visualizing Network Training with the Deeplearning4j Training UI](#ui)
-    * [Deeplearning4j UI: The Overview Page](#overviewpage)
-    * [Deeplearning4j UI: The Model Page](#modelpage)
-* [Deeplearning4J UI and Spark Training](#sparkui)
-* [Using the UI to Tune Your Network](#usingui)
-* [TSNE and Word2Vec](#tsne)
-* [Fixing UI Issue: "No configuration setting" exception](#issues)
-
-## <a name="ui">Visualizing Network Training with the Deeplearning4j Training UI</a>
-
-**Note**: This information here pertains to DL4J versions 0.7.0 and later.
-
-DL4J Provides a user interface to visualize in your browser (in real time) the current network status and progress of training. The UI is typically used to help with tuning neural networks - i.e., the selection of hyperparameters (such as learning rate) to obtain good performance for a network.
-
-**Step 1: Add the Deeplearning4j UI dependency to your project.**
-
-```
-    <dependency>
-        <groupId>org.deeplearning4j</groupId>
-        <artifactId>deeplearning4j-ui_2.10</artifactId>
-        <version>{{ page.version }}</version>
-    </dependency>
-```
-
-Note the ```_2.10``` suffix: this is the Scala version (due to using the Play framework, a Scala library, for the backend). If you are not using other Scala libraries, either ```_2.10``` or ```_2.11``` is OK.
-
-**Step 2: Enable the UI in your project**
-
-This is relatively straightforward:
-
-```
-    //Initialize the user interface backend
-    UIServer uiServer = UIServer.getInstance();
-
-    //Configure where the network information (gradients, score vs. time etc) is to be stored. Here: store in memory.
-    StatsStorage statsStorage = new InMemoryStatsStorage();         //Alternative: new FileStatsStorage(File), for saving and loading later
-
-    //Attach the StatsStorage instance to the UI: this allows the contents of the StatsStorage to be visualized
-    uiServer.attach(statsStorage);
-
-    //Then add the StatsListener to collect this information from the network, as it trains
-    net.setListeners(new StatsListener(statsStorage));
-```
-
-To access the UI, open your browser and go to ```http://localhost:9000/train```.
-You can set the port by using the ```org.deeplearning4j.ui.port``` system property: i.e., to use port 9001, pass the following to the JVM on launch: ```-Dorg.deeplearning4j.ui.port=9001```
-
-Information will then be collected and routed to the UI when you call the ```fit``` method on your network.
-
-
-**Example:** [See a UI example here](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/userInterface/UIExample.java)
-
-The full set of UI examples are available [here](https://github.com/eclipse/deeplearning4j-examples/tree/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/userInterface).
-
-
-### <a name="overviewpage">Deeplearning4j UI: The Overview Page</a>
-
-![Overview Page](/images/guide/DL4J_UI_01.png)
-
-The overview page (one of 3 available pages) contains the following information:
-
- Top left: score vs iteration chart - this is the value of the loss function on the current minibatch
- Top right: model and training information
- Bottom left: Ratio of parameters to updates (by layer) for all network weights vs. iteration
- Bottom right: Standard deviations (vs. time) of: activations, gradients and updates
-
-Note that for the bottom two charts, these are displayed as the logarithm (base 10) of the values. Thus a value of -3 on the update: parameter ratio chart corresponds to a ratio of 10<sup>-3</sup> = 0.001.
-
-The ratio of updates to parameters is specifically the ratio of mean magnitudes of these values (i.e., log10(mean(abs(updates))/mean(abs(parameters))).
-
-See the later section of this page on how to use these values in practice.
-
-### <a name="modelpage">Deeplearning4j UI: The Model Page</a>
-
-![Model Page](/images/guide/DL4J_UI_02.png)
-
-The model page contains a graph of the neural network layers, which operates as a selection mechanism. Click on a layer to display information for it.
-
-On the right, the following charts are available, after selecting a layer:
-
- Table of layer information
- Update to parameter ratio for this layer, as per the overview page. The components of this ratio (the parameter and update mean magnitudes) are also available via tabs.
- Layer activations (mean and mean +/- 2 standard deviations) over time
- Histograms of parameters and updates, for each parameter type
- Learning rate vs. time (note this will be flat, unless learning rate schedules are used)
-
-
-*Note: parameters are labeled as follows: weights (W) and biases (b). For recurrent neural networks, W refers to the weights connecting the layer to the layer below, and RW refers to the recurrent weights (i.e., those between time steps).*
-
-
-
-
-## <a name="sparkui">Deeplearning4J UI and Spark Training</a>
-
-The DL4J UI can be used with Spark. However, as of 0.7.0, conflicting dependencies mean that running the UI and Spark is the same JVM can be difficult.
-
-Two alternatives are available:
-
-1. Collect and save the relevant stats, to be visualized (offline) at a later point
-2. Run the UI in a separate server, and Use the remote UI functionality to upload the data from the Spark master to your UI instance
-
-**Collecting Stats for Later Offline Use**
-
-```
-    SparkDl4jMultiLayer sparkNet = new SparkDl4jMultiLayer(sc, conf, tm);
-
-    StatsStorage ss = new FileStatsStorage(new File("myNetworkTrainingStats.dl4j"));
-    sparkNet.setListeners(ss, Collections.singletonList(new StatsListener(null)));
-```
-
-Then, later you can load and display the saved information using:
-
-```
-    StatsStorage statsStorage = new FileStatsStorage(statsFile);    //If file already exists: load the data from it
-    UIServer uiServer = UIServer.getInstance();
-    uiServer.attach(statsStorage);
-```
-
-**Using the Remote UI Functionality**
-
-First, in the JVM running the UI (note this is the server):
-
-```
-    UIServer uiServer = UIServer.getInstance();
-    uiServer.enableRemoteListener();        //Necessary: remote support is not enabled by default
-```
-This will require the ```deeplearning4j-ui_2.10``` or ```deeplearning4j-ui_2.11``` dependency. (NOTE THIS IS NOT THE CLIENT THIS IS YOUR SERVER - SEE BELOW FOR THE CLIENT WHICH USES: deeplearning4j-ui-model)
-
-Client (both spark and standalone neural networks using simple deeplearning4j-nn)
-Second, for your neural net (Note this example is for spark, but computation graph and multi layer network both have the equivalemtn setListeners method with the same usage, [example found here](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/userInterface/RemoteUIExample.java)):
-
-```
-    SparkDl4jMultiLayer sparkNet = new SparkDl4jMultiLayer(sc, conf, tm);
-
-    StatsStorageRouter remoteUIRouter = new RemoteUIStatsStorageRouter("http://UI_MACHINE_IP:9000");
-    sparkNet.setListeners(remoteUIRouter, Collections.singletonList(new StatsListener(null)));
-```
-To avoid dependency conflicts with Spark, you should use the ```deeplearning4j-ui-model``` dependency to get the StatsListener, *not* the full ```deeplearning4j-ui_2.10``` UI dependency.
-
-**Note to scala users**:
-
-You need to use the above method if you are on a newer scala version. See the linked example above for the client.
-
-
-
-
-Note: you should replace ```UI_MACHINE_IP``` with the IP address of the machine running the user interface instance.
-
-
-
-
-## <a name="usingui">Using the UI to Tune Your Network</a>
-
-Here's an excellent [web page by Andrej Karpathy](http://cs231n.github.io/neural-networks-3/#baby) about visualizing neural net training. It is worth reading and understanding that page first.
-
-Tuning neural networks is often more an art than a science. However, here's some ideas that may be useful:
-
-**Overview Page - Model Score vs. Iteration Chart**
-
-The score vs. iteration should (overall) go down over time.
-
- If the score increases consistently, your learning rate is likely set too high. Try reducing it until scores become more stable.
- Increasing scores can also be indicative of other network issues, such as incorrect data normalization
- If the score is flat or decreases very slowly (over a few hundred iterations) (a) your learning rate may be too low, or (b) you might be having difficulties with optimization. In the latter case, if you are using the SGD updater, try a different updater such as Nesterovs (momentum), RMSProp or Adagrad.
- Note that data that isn't shuffled (i.e., each minibatch contains only one class, for classification) can result in very rough or abnormal-looking score vs. iteration graphs
- Some noise in this line chart is expected (i.e., the line will go up and down within a small range). However, if the scores vary quite significantly between runs variation is very large, this can be a problem
-    - The issues mentioned above (learning rate, normalization, data shuffling) may contribute to this.
-    - Setting the minibatch size to a very small number of examples can also contribute to noisy score vs. iteration graphs, and *might* lead to optimization difficulties
-
-**Overview Page and Model Page - Using the Update: Parameter Ratio Chart**
-
- The ratio of mean magnitude of updates to parameters is provided on both the overview and model pages
-    - "Mean magnitude" = the average of the absolute value of the parameters or updates at the current time step
- The most important use of this ratio is in selecting a learning rate. As a rule of thumb: this ratio should be around 1:1000 = 0.001. On the (log<sub>10</sub>) chart, this corresponds to a value of -3 (i.e., 10<sup>-3</sup> = 0.001)
-    - Note that is a rough guide only, and may not be appropriate for all networks. It's often a good starting point, however.
-    - If the ratio diverges significantly from this (for example, > -2 (i.e., 10<sup>-2</sup>=0.01) or < -4 (i.e., 10<sup>-4</sup>=0.0001), your parameters may be too unstable to learn useful features, or may change too slowly to learn useful features
-    - To change this ratio, adjust your learning rate (or sometimes, parameter initialization). In some networks, you may need to set the learning rate differently for different layers.
- Keep an eye out for unusually large spikes in the ratio: this may indicate exploding gradients
-
-
-**Model Page: Layer Activations (vs. Time) Chart**
-
-This chart can be used to detect vanishing or exploding activations (due to poor weight initialization, too much regularization, lack of data normalization, or too high a learning rate).
-
- This chart should ideally stabilize over time (usually a few hundred iterations)
- A good standard deviation for the activations is on the order of 0.5 to 2.0. Significantly outside of this range may indicate one of the problems mentioned above.
-
-**Model Page: Layer Parameters Histogram**
-
-The layer parameters histogram is displayed for the most recent iteration only.
-
- For weights, these histograms should  have an approximately Gaussian (normal) distribution, after some time
- For biases, these histograms will generally start at 0, and will usually end up being approximately Gaussian
-    - One exception to this is for LSTM recurrent neural network layers: by default, the biases for one gate (the forget gate) are set to 1.0 (by default, though this is configurable), to help in learning dependencies across long time periods. This results in the bias graphs initially having many biases around 0.0, with another set of biases around 1.0
- Keep an eye out for parameters that are diverging to +/- infinity: this may be due to too high a learning rate, or insufficient regularization (try adding some L2 regularization to your network).
- Keep an eye out for biases that become very large. This can sometimes occur in the output layer for classification, if the distribution of classes is very imbalanced
-
-**Model Page: Layer Updates Histogram**
-
-The layer update histogram is displayed for the most recent iteration only.
-
- Note that these are the updates - i.e., the gradients *after* applying learning rate, momentum, regularization etc
- As with the parameter graphs, these should have an approximately Gaussian (normal) distribution
- Keep an eye out for very large values: this can indicate exploding gradients in your network
-    - Exploding gradients are problematic as they can 'mess up' the parameters of your network
-    - In this case, it may indicate a weight initialization, learning rate or input/labels data normalization issue
-    - In the case of recurrent neural networks, adding some [gradient normalization or gradient clipping](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/GradientNormalization.java) may help
-
-**Model Page: Parameter Learning Rates Chart**
-
-This chart simply shows the learning rates of the parameters of selected layer, over time.
-
-If you are not using learning rate schedules, the chart will be flat. If you *are* using learning rate schedules, you can use this chart to track the current value of the learning rate (for each parameter), over time.
-
-
-## <a name="tsne">TSNE and Word2vec</a>
-
-We rely on [TSNE](https://lvdmaaten.github.io/tsne/) to reduce the dimensionality of [word feature vectors](./deeplearning4j-nlp-word2vec) and project words into a two or three-dimensional space. Here's some code for using TSNE with Word2Vec:
-
-```java
-log.info("Plot TSNE....");
-BarnesHutTsne tsne = new BarnesHutTsne.Builder()
-        .setMaxIter(1000)
-        .stopLyingIteration(250)
-        .learningRate(500)
-        .useAdaGrad(false)
-        .theta(0.5)
-        .setMomentum(0.5)
-        .normalize(true)
-        .usePca(false)
-        .build();
-vec.lookupTable().plotVocab(tsne);
-```
-
-## <a name="issues">Fixing UI Issue: "No configuration setting" exception</a>
-
-A possible exception that can occur with the DL4J UI is the following:
-```
-com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'play.crypto.provider'
-        at com.typesafe.config.impl.SimpleConfig.findKeyOrNull(SimpleConfig.java:152)
-        at com.typesafe.config.impl.SimpleConfig.findOrNull(SimpleConfig.java:170)
-        ...
-        at play.server.Server.forRouter(Server.java:96)
-        at org.deeplearning4j.ui.play.PlayUIServer.runMain(PlayUIServer.java:206)
-        at org.deeplearning4j.ui.api.UIServer.getInstance(UIServer.java:27)
-```
-
-This exception is not due to DL4J directly, but is due to a missing application.conf file, required by the Play framework (the library that DL4J's UI is based on). This is originally present in the deeplearning4j-play dependency: however, if an uber-jar (i.e., a JAR file with dependencies) is built (say, via ```mvn package```), it may not be copied over correctly. For example, using the ```maven-assembly-plugin``` has caused this exception for some users.
-
-The recommended solution (for Maven) is to use the Maven Shade plugin to produce an uber-jar, configured as follows:
-
-```xml
-    <build>
-        <plugins>
-            <plugin>
-                <groupId>org.codehaus.mojo</groupId>
-                <artifactId>exec-maven-plugin</artifactId>
-                <version>${exec-maven-plugin.version}</version>
-                <executions>
-                    <execution>
-                        <goals>
-                            <goal>exec</goal>
-                        </goals>
-                    </execution>
-                </executions>
-                <configuration>
-                    <executable>java</executable>
-                </configuration>
-            </plugin>
-            <plugin>
-                <groupId>org.apache.maven.plugins</groupId>
-                <artifactId>maven-shade-plugin</artifactId>
-                <version>${maven-shade-plugin.version}</version>
-                <configuration>
-                    <shadedArtifactAttached>true</shadedArtifactAttached>
-                    <shadedClassifierName>${shadedClassifier}</shadedClassifierName>
-                    <createDependencyReducedPom>true</createDependencyReducedPom>
-                    <filters>
-                        <filter>
-                            <artifact>*:*</artifact>
-                            <excludes>
-                                <!--<exclude>org/datanucleus/**</exclude>-->
-                                <exclude>META-INF/*.SF</exclude>
-                                <exclude>META-INF/*.DSA</exclude>
-                                <exclude>META-INF/*.RSA</exclude>
-                            </excludes>
-                        </filter>
-                    </filters>
-
-                </configuration>
-                <executions>
-                    <execution>
-                        <phase>package</phase>
-                        <goals>
-                            <goal>shade</goal>
-                        </goals>
-                        <configuration>
-                            <transformers>
-                                <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
-                                    <resource>reference.conf</resource>
-                                </transformer>
-                                <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
-                                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer" />
-                            </transformers>
-                        </configuration>
-                    </execution>
-                </executions>
-            </plugin>
-        <plugins>
-    <build>
-```
-
-Then, create your uber-jar with ```mvn package``` and run via ```cd target && java -cp dl4j-examples-0.9.1-bin.jar org.deeplearning4j.examples.userInterface.UIExample```. Note the "-bin" suffix for the generated JAR file: this includes all dependencies.
-
-Note also that this Maven Shade approach is configured for DL4J's examples repository.
--- a/docs/deeplearning4j-scaleout/README.md
+++ b/docs/deeplearning4j-scaleout/README.md
@ -1,16 +0,0 @@
-# deeplearning4j-scaleout documentation
-
-Build and serve documentation for deeplearning4j-scaleout with MkDocs (install with `pip install mkdocs`)
-The source for Keras documentation is in this directory under `doc_sources/`.
-
-The structure of this project (template files, generating code, mkdocs YAML) is closely aligned
-with the [Keras documentation](keras.io) and heavily inspired by the [Keras docs repository](https://github.com/keras-team/keras/tree/master/docs).
-
-To generate docs into the `deeplearning4j-scaleout/doc_sources` folder, first `cd docs` then run:
-
-```shell
-python generate_docs.py \
-    --project deeplearning4j-scaleout \
-    --code ../deeplearning4j
-	--out_language en
-```
--- a/docs/deeplearning4j-scaleout/pages.json
+++ b/docs/deeplearning4j-scaleout/pages.json
@ -1,34 +0,0 @@
-{
-  "excludes": [
-  ],
-  "indices": [
-  ],
-  "pages": [
-    {
-      "page": "intro.md",
-      "class": []
-    },
-    {
-      "page": "technicalref.md",
-      "class": []
-    },
-    {
-      "page": "howto.md",
-      "class": []
-    },
-    {
-      "page": "data-howto.md",
-      "class": []
-    },
-  	{
-  	  "page": "apiref.md",
-  	  "class": [
-  	    "deeplearning4j-scaleout/spark/dl4j-spark/src/main/java/org/deeplearning4j/spark/impl/multilayer/SparkDl4jMultiLayer.java",
-  		  "deeplearning4j-scaleout/spark/dl4j-spark/src/main/java/org/deeplearning4j/spark/impl/graph/SparkComputationGraph.java",
-  		  "deeplearning4j-scaleout/spark/dl4j-spark-parameterserver/src/main/java/org/deeplearning4j/spark/parameterserver/training/SharedTrainingMaster.java",
-        "deeplearning4j-scaleout/spark/dl4j-spark/src/main/java/org/deeplearning4j/spark/impl/paramavg/ParameterAveragingTrainingMaster.java"
-  	  ]
-  	}
-  ]
-}
-
--- a/docs/deeplearning4j-scaleout/templates/apiref.md
+++ b/docs/deeplearning4j-scaleout/templates/apiref.md
@ -1,13 +0,0 @@
---
-title: "Deeplearning4j on Spark: API Reference"
-short_title: API Reference
-description: "Deeplearning4j on Spark: API Reference"
-category: Distributed Deep Learning
-weight: 4
---
-
-# API Reference
-
-This page provides the API reference for key classes required to do distributed training with DL4J on Spark. Before going through these, make sure you have read the introduction guide for deeplearning4j Spark training [here](deeplearning4j-scaleout-intro).
-
-{{autogenerated}}
--- a/docs/deeplearning4j-scaleout/templates/data-howto.md
+++ b/docs/deeplearning4j-scaleout/templates/data-howto.md
@ -1,490 +0,0 @@
---
-title: "Deeplearning4j on Spark: How To Build Data Pipelines"
-short_title: Spark Data Pipelines Guide
-description: "Deeplearning4j on Spark: How To Build Data Pipelines"
-category: Distributed Deep Learning
-weight: 3
---
-
-# Deeplearning4j on Spark: How To Build Data Pipelines
-
-This page provides some guides on how to create data pipelines for both training and evaluation when using Deeplearning4j on Spark.
-
-This page assumes some familiarity with Spark (RDDs, master vs. workers, etc) and Deeplearning4j (networks, DataSet etc).
-
-As with training on a single machine, the final step of a data pipeline should be to produce a DataSet (single features arrays, single label array) or MultiDataSet (one or more feature arrays, one or more label arrays). In the case of DL4J on Spark, the final step of a data pipeline is data in one of the following formats:
-(a) an ```RDD<DataSet>```/```JavaRDD<DataSet>```
-(b) an ```RDD<MultiDataSet>```/```JavaRDD<MultiDataSet>```
-(c) a directory of serialized DataSet/MultiDataSet (minibatch) objects on network storage such as HDFS, S3 or Azure blob storage
-(d) a directory of minibatches in some other format
-
-Once data is in one of those four formats, it can be used for training or evaluation.
-
-**Note:** When training multiple models on a single dataset, it is best practice to preprocess your data once, and save it to network storage such as HDFS.
-Then, when training the network you can call ```SparkDl4jMultiLayer.fit(String path)``` or ```SparkComputationGraph.fit(String path)``` where ```path``` is the directory where you saved the files.
-
-
-Spark Data Prepration: How-To Guides
-* [How to prepare a RDD[DataSet] from CSV data for classification or regression](#csv)
-* [How to create a Spark data pipeline for training on images](#images)
-* [How to create a RDD[MultiDataSet] from one or more RDD[List[Writable]]](#multidataset)
-* [How to save a RDD[DataSet] or RDD[MultiDataSet] to network storage and use it for training](#saveloadrdd)
-* [How to prepare data on a single machine for use on a cluster: saving DataSets](#singletocluster)
-* [How to prepare data on a single machine for use on a cluster: map/sequence files](#singletocluster2)
-* [How to load multiple CSVs (one sequence per file) for RNN data pipelines](#csvseq)
-* [How to load prepared minibatches in custom format](#customformat)
-
-<br><br>
-
-## <a name="csv">How to prepare a RDD[DataSet] from CSV data for classification or regression</a>
-
-This guide shows how to load data contained in one or more CSV files and produce a ```JavaRDD<DataSet>``` for export, training or evaluation on Spark.
-
-The process is fairly straightforward. Note that the ```DataVecDataSetFunction``` is very similar to the ```RecordReaderDataSetIterator``` that is often used for single machine training.
-
-For example, suppose the CSV had the following format - 6 total columns: 5 features followed by an integer class index for classification, and 10 possible classes
-
-```
-1.0,3.2,4.5,1.1,6.3,0
-1.6,2.4,5.9,0.2,2.2,1
-...
-```
-
-we could load this data for classification using the following code:
-```
-String filePath = "hdfs:///your/path/some_csv_file.csv";
-JavaSparkContext sc = new JavaSparkContext();
-JavaRDD<String> rddString = sc.textFile(filePath);
-RecordReader recordReader = new CSVRecordReader(',');
-JavaRDD<List<Writable>> rddWritables = rddString.map(new StringToWritablesFunction(recordReader));
-
-int labelIndex = 5;         //Labels: a single integer representing the class index in column number 5
-int numLabelClasses = 10;   //10 classes for the label
-JavaRDD<DataSet> rddDataSetClassification = rddWritables.map(new DataVecDataSetFunction(labelIndex, numLabelClasses, false));
-```
-
-However, if this dataset was for regression instead, with again 6 total columns, 3 feature columns (positions 0, 1 and 2 in the file rows) and 3 label columns (positions 3, 4 and 5) we could load it using the same process as above, but changing the last 3 lines to:
-
-```
-int firstLabelColumn = 3;   //First column index for label
-int lastLabelColumn = 5;    //Last column index for label
-JavaRDD<DataSet> rddDataSetRegression = rddWritables.map(new DataVecDataSetFunction(firstColumnLabel, lastColumnLabel, true, null, null));
-```
-
-<br><br>
-
-## <a name="multidataset">How to create a RDD[MultiDataSet] from one or more RDD[List[Writable]]</a>
-
-RecordReaderMultiDataSetIterator (RRMDSI) is the most common way to create MultiDataSet instances for single-machine training data pipelines.
-It is possible to use RRMDSI for Spark data pipelines, where data is coming from one or more of ```RDD<List<Writable>>``` (for 'standard' data) or ```RDD<List<List<Writable>>``` (for sequence data).
-
-**Case 1: Single ```RDD<List<Writable>>``` to ```RDD<MultiDataSet>```**
-
-Consider the following *single node* (non-Spark) data pipeline for a CSV classification task.
-```
-RecordReader recordReader = new CSVRecordReader(numLinesToSkip,delimiter);
-recordReader.initialize(new FileSplit(new ClassPathResource("iris.txt").getFile()));
-
-int batchSize = 32;
-int labelColumn = 4;
-int numClasses = 3;
-MultiDataSetIterator iter = new RecordReaderMultiDataSetIterator.Builder(batchSize)
-    .addReader("data", recordReader)
-    .addInput("data", 0, labelColumn-1)
-    .addOutputOneHot("data", labelColumn, numClasses)
-    .build();
-```
-
-The equivalent to the following Spark data pipeline:
-
-```
-JavaRDD<List<Writable>> rdd = sc.textFile(f.getPath()).map(new StringToWritablesFunction(new CSVRecordReader()));
-
-MultiDataSetIterator iter = new RecordReaderMultiDataSetIterator.Builder(batchSize)
-    .addReader("data", new SparkSourceDummyReader(0))		//Note the use of the "SparkSourceDummyReader"
-    .addInput("data", 0, labelColumn-1)
-    .addOutputOneHot("data", labelColumn, numClasses)
-    .build();
-JavaRDD<MultiDataSet> mdsRdd = IteratorUtils.mapRRMDSI(rdd, rrmdsi2);
-```
-
-For Sequence data (```List<List<Writable>>```) you can use SparkSourceDummySeqReader instead.
-
-**Case 2: Multiple ```RDD<List<Writable>>``` or ```RDD<List<List<Writable>>``` to ```RDD<MultiDataSet>```**
-
-For this case, the process is much the same. However, internaly, a join is used.
-
-```
-JavaRDD<List<Writable>> rdd1 = ...
-JavaRDD<List<Writable>> rdd2 = ...
-
-RecordReaderMultiDataSetIterator rrmdsi = new RecordReaderMultiDataSetIterator.Builder(batchSize)
-    .addReader("rdd1", new SparkSourceDummyReader(0))		//0 = use first rdd in list
-    .addReader("rdd2", new SparkSourceDummyReader(1))		//1 = use second rdd in list
-    .addInput("rdd1", 1, 2)			//
-    .addOutput("rdd2", 1, 2)
-    .build();
-
-List<JavaRDD<List<Writable>>> list = Arrays.asList(rdd1, rdd2);
-int[] keyIdxs = new int[]{0,0};		//Column 0 in rdd1 and rdd2 is the 'key' used for joining
-boolean filterMissing = false;		//If true: filter out any records that don't have matching keys in all RDDs
-JavaRDD<MultiDataSet> mdsRdd = IteratorUtils.mapRRMDSI(list, null, keyIdxs, null, filterMissing, rrmdsi);
-```
-
-<br><br>
-
-## <a name="saveloadrdd">How to save a RDD[DataSet] or RDD[MultiDataSet] to network storage and use it for training</a>
-
-As noted at the start of this page, it is considered a best practice to preprocess and export your data once (i.e., save to network storage such as HDFS and reuse), rather than fitting from an ```RDD<DataSet>``` or ```RDD<MultiDataSet>``` directly in each training job.
-
-There are a number of reasons for this:
-* Better performance (avoid redundant loading/calculation): When fitting multiple models from the same dataset, it is faster to preprocess this data once and save to disk rather than preprocessing it again for every single training run.
-* Minimizing memory and other resources: By exporting and fitting from disk, we only need to keep the DataSets we are currently using (plus a small async prefetch buffer) in memory, rather than also keeping many unused DataSet objects in memory. Exporting results in lower total memory use and hence we can use larger networks, larger minibatch sizes, or allocate fewer resources to our job.
-* Avoiding recomputation: When an RDD is too large to fit into memory, some parts of it may need to be recomputed before it can be used (depending on the cache settings). When this occurs, Spark will recompute parts of the data pipeline multiple times, costing us both time and memory. A pre-export step avoids this recomputation entirely.
-
-**Step 1: Saving**
-
-Saving the DataSet objects once you have an ```RDD<DataSet>``` is quite straightforward:
-```
-JavaRDD<DataSet> rddDataSet = ...
-int minibatchSize = 32;     //Minibatch size of the saved DataSet objects
-String exportPath = "hdfs:///path/to/export/data";
-JavaRDD<String> paths = rddDataSet.mapPartitionsWithIndex(new BatchAndExportDataSetsFunction(minibatchSize, exportPath), true);
-```
-Keep in mind that this is a map function, so no data will be saved until the paths RDD is executed - i.e., you should follow this with an operation such as:
-```
-paths.saveAsTextFile("hdfs:///path/to/text/file.txt");  //Specified file will contain paths/URIs of all saved DataSet objects
-```
-or
-```
-List<String> paths = paths.collect();    //Collection of paths/URIs of all saved DataSet objects
-```
-or
-```
-paths.foreach(new VoidFunction<String>() {
-    @Override
-    public void call(String path) {
-        //Some operation on each path
-    }
-});
-```
-
-
-Saving an ```RDD<MultiDataSet>``` can be done in the same way using ```BatchAndExportMultiDataSetsFunction``` instead, which takes the same arguments.
-
-**Step 2: Loading and Fitting**
-
-The exported data can be used in a few ways.
-First, it can be used to fit a network directly:
-```
-String exportPath = "hdfs:///path/to/export/data";
-SparkDl4jMultiLayer net = ...
-net.fit(exportPath);      //Loads the serialized DataSet objects found in the 'exportPath' directory
-```
-Similarly, we can use ```SparkComputationGraph.fitMultiDataSet(String path)``` if we saved an ```RDD<MultiDataSet>``` instead.
-
-
-Alternatively, we can load up the paths in a few different ways, depending on if or how we saved them:
-
-```
-JavaSparkContext sc = new JavaSparkContext();
-
-//If we used saveAsTextFile:
-String saveTo = "hdfs:///path/to/text/file.txt";
-paths.saveAsTextFile(saveTo);                         //Save
-JavaRDD<String> loadedPaths = sc.textFile(saveTo);    //Load
-
-//If we used collecting:
-List<String> paths = paths.collect();                 //Collect
-JavaRDD<String> loadedPaths = sc.parallelize(paths);  //Parallelize
-
-//If we want to list the directory contents:
-String exportPath = "hdfs:///path/to/export/data";
-JavaRDD<String> loadedPaths = SparkUtils.listPaths(sc, exportPath);   //List paths using org.deeplearning4j.spark.util.SparkUtils
-```
-
-Then we can execute training on these paths by using methods such as ```SparkDl4jMultiLayer.fitPaths(JavaRDD<String>)```
-
-
-<br><br>
-
-## <a name="singletocluster">How to prepare data on a single machine for use on a cluster: saving DataSets</a>
-
-Another possible workflow is to start with the data pipeline on a single machine, and export the DataSet or MultiDataSet objects for use on the cluster.
-This workflow clearly isn't as scalable as preparing data on a cluster (you are using just one machine to prepare data) but it can be an easy option in some cases, especially when you have an existing data pipeline.
-
-This section assumes you have an existing ```DataSetIterator``` or ```MultiDataSetIterator``` used for single-machine training. There are many different ways to create one, which is outside of the scope of this guide.
-
-**Step 1: Save the DataSets or MultiDataSets**
-
-Saving the contents of a DataSet to a local directory can be done using the following code:
-```
-DataSetIterator iter = ...
-File rootDir = new File("/saving/directory/");
-int count = 0;
-while(iter.hasNext()){
-  DataSet ds = iter.next();
-  File outFile = new File(rootDir, "dataset_" + (count++) + ".bin");
-  ds.save(outFile);
-}
-```
-Note that for the purposes of Spark, the exact file names don't matter.
-The process for saving MultiDataSets is almost identical.
-
-As an aside: you can read these saved DataSet objects on a single machine (for non-Spark training) using [FileDataSetIterator](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-data/deeplearning4j-utility-iterators/src/main/java/org/deeplearning4j/datasets/iterator/file/FileDataSetIterator.java)).
-
-An alternative approach is to save directly to the cluster using output streams, to (for example) HDFS. This can only be done if the machine running the code is properly configured with the required libraries and access rights. For example, to save the DataSets directly to HDFS you could use:
-
-```
-JavaSparkContext sc = new JavaSparkContext();
-FileSystem fileSystem = FileSystem.get(sc.hadoopConfiguration());
-String outputDir = "hdfs:///my/output/location/";
-
-DataSetIterator iter = ...
-int count = 0;
-while(iter.hasNext()){
-  DataSet ds = iter.next();
-  String filePath = outputDir + "dataset_" + (count++) + ".bin";
-  try (OutputStream os = new BufferedOutputStream(fileSystem.create(new Path(outputPath)))) {
-    ds.save(os);
-  }
-}
-```
-
-
-**Step 2: Load and Train on a Cluster**
-The saved DataSet objects can then be copied to the cluster or network file storage (for example, using Hadoop FS utilities on a Hadoop cluster), and used as follows:
-```
-String dir = "hdfs:///data/copied/here";
-SparkDl4jMultiLayer net = ...
-net.fit(dir);      //Loads the serialized DataSet objects found in the 'dir' directory
-```
-or alternatively/equivalently, we can list the paths as an RDD using:
-```
-String dir = "hdfs:///data/copied/here";
-JavaRDD<String> paths = SparkUtils.listPaths(sc, dir);   //List paths using org.deeplearning4j.spark.util.SparkUtils
-```
-
-<br><br>
-
-## <a name="singletocluster2">How to prepare data on a single machine for use on a cluster: map/sequence files</a>
-
-An alternative approach is to use Hadoop MapFile and SequenceFiles, which are efficient binary storage formats.
-This can be used to convert the output of any DataVec ```RecordReader``` or ```SequenceRecordReader``` (including a custom record reader) to a format usable for use on Spark.
-MapFileRecordWriter and MapFileSequenceRecordWriter require the following dependencies:
-```
-<dependency>
-    <groupId>org.datavec</groupId>
-    <artifactId>datavec-hadoop</artifactId>
-    <version>${datavec.version}</version>
-</dependency>
-<dependency>
-    <groupId>org.apache.hadoop</groupId>
-    <artifactId>hadoop-common</artifactId>
-    <version>${hadoop.version}</version>
-    <!-- Optional exclusion for log4j in case you are using other logging frameworks -->
-    <!--
-    <exclusions>
-        <exclusion>
-            <groupId>log4j</groupId>
-            <artifactId>log4j</artifactId>
-        </exclusion>
-        <exclusion>
-            <groupId>org.slf4j</groupId>
-            <artifactId>slf4j-log4j12</artifactId>
-        </exclusion>
-    </exclusions>
-    -->
-</dependency>
-```
-
-**Step 1: Create a MapFile Locally**
-In the following example, a CSVRecordReader will be used, but any other RecordReader could be used in its place:
-```
-File csvFile = new File("/path/to/file.csv")
-RecordReader recordReader = new CSVRecordReader();
-recordReader.initialize(new FileSplit(csvFile));
-
-//Create map file writer
-String outPath = "/map/file/root/dir"
-MapFileRecordWriter writer = new MapFileRecordWriter(new File(outPath));
-
-//Convert to MapFile binary format:
-RecordReaderConverter.convert(recordReader, writer);
-```
-
-The process for using a ```SequenceRecordReader``` combined with a ```MapFileSequenceRecordWriter``` is virtually the same.
-
-Note also that ```MapFileRecordWriter``` and ```MapFileSequenceRecordWriter``` both support splitting - i.e., creating multiple smaller map files instead of creating one single (potentially multi-GB) map file. Using splitting is recommended when saving data in this manner for use with Spark.
-
-**Step 2: Copy to HDFS or other network file storage**
-
-The exact process is beyond the scope of this guide. However, it should be sufficient to simply copy the directory ("/map/file/root/dir" in the example above) to a location on HDFS.
-
-**Step 3: Read and Convert to ```RDD<DataSet>``` for Training**
-
-We can load the data for training using the following:
-```
-JavaSparkContext sc = new JavaSparkContext();
-String pathOnHDFS = "hdfs:///map/file/directory";
-JavaRDD<List<Writable>> rdd = SparkStorageUtils.restoreMapFile(pathOnHDFS, sc);     //import: org.datavec.spark.storage.SparkStorageUtils
-
-//Note at this point: it's the same as the latter part of the CSV how-to guide
-int labelIndex = 5;         //Labels: a single integer representing the class index in column number 5
-int numLabelClasses = 10;   //10 classes for the label
-JavaRDD<DataSet> rddDataSetClassification = rdd.map(new DataVecDataSetFunction(labelIndex, numLabelClasses, false));
-```
-
-<br><br>
-
-## <a name="csvseq">How to load multiple CSVs (one sequence per file) for RNN data pipelines</a>
-
-This guide shows how load CSV files for training an RNN.
-The assumption is that the dataset is comprised of multiple CSV files, where:
-
-* each CSV file represents one sequence
-* each row/line of the CSV contains the values for one time step (one or more columns/values, same number of values in all rows for all files) 
-* each CSV may contain a different number of lines to other CSVs (i.e., variable length sequences are OK here)
-* header lines either aren't present in any files, or are present in all files
-
-A data pipeline can be created using the following process:
-```
-String directoryWithCsvFiles = "hdfs:///path/to/directory";
-JavaPairRDD<String, PortableDataStream> origData = sc.binaryFiles(directoryWithCsvFiles);
-
-int numHeaderLinesEachFile = 0; //No header lines
-int delimiter = ",";            //Comma delimited files
-SequenceRecordReader seqRR = new CSVSequenceRecordReader(numHeaderLinesEachFile, delimiter);
-
-JavaRDD<List<List<Writable>>> sequencesRdd = origData.map(new SequenceRecordReaderFunction(seqRR));
-
-//Similar to the non-sequence CSV guide using DataVecDataSetFunction. Assuming classification here:
-int labelIndex = 5;             //Index of the label column. Occurs at position/column 5
-int numClasses = 10;            //Number of classes for classification
-JavaRDD<DataSet> dataSetRdd = sequencesRdd.map(new DataVecSequenceDataSetFunction(labelIndex, numClasses, false));
-```
-
-<br><br>
-
-## <a name="images">How to create a Spark data pipeline for training on images</a>
-
-This guide shows how to create an ```RDD<DataSet>``` for image classification, starting from images stored either locally, or on a network file system such as HDFS.
-
-The approach here used (added in 1.0.0-beta3) is to first preprocess the images into batches of files - [FileBatch](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-common/src/main/java/org/nd4j/api/loader/FileBatch.java) objects.
-The motivation for this approach is simple: the original image files typically use efficient compresion (JPEG for example) which is much more space (and network) efficient than a bitmap (int8 or 32-bit floating point) representation. However, on a cluster we want to minimize disk reads due to latency issues with remote storage - one file read/transfer is going to be faster than ```minibatchSize``` remote file reads.
-
-The [TinyImageNet example](https://github.com/eclipse/deeplearning4j-examples/tree/master/dl4j-spark-examples/dl4j-spark/src/main/java/org/deeplearning4j/tinyimagenet) also shows how this can be done.
-
-Note that one limitation of the implementation is that the set of classes (i.e., the class/category labels when doing classification) needs to be known, provided or collected manually. This differs from using ImageRecordReader for classification on a single machine, which can automatically infer the set of class labels.
-
-First, assume the images are in subdirectories based on their class labels. For example, suppose there are two classes, "cat" and "dog", the directory structure would look like:
-```
-rootDir/cat/img0.jpg
-rootDir/cat/img1.jpg
-...
-rootDir/dog/img0.jpg
-rootDir/dog/img1.jpg
-...
-```
-(Note the file names don't matter in this example - however, the parent directory names are the class labels)
-
-**Step 1 (option 1 of 2): Preprocess Locally**
-
-Local preprocessing can be done as follows:
-```
-String sourceDirectory = "/home/user/my_images";            //Where your data is located
-String destinationDirectory = "/home/user/preprocessed";    //Where the preprocessed data should be written
-int batchSize = 32;                                         //Number of examples (images) in each FileBatch object
-SparkDataUtils.createFileBatchesLocal(sourceDirectory, NativeImageLoader.ALLOWED_FORMATS, true, saveDirTrain, batchSize);
-```
-
-The full import for SparkDataUtils is ```org.deeplearning4j.spark.util.SparkDataUtils```.
-
-After preprocessing is has been completed, the directory can be copied to the cluster for use in training (Step 2).
-
-**Step 1 (option 2 of 2): Preprocess using Spark**
-
-Alternatively, if the original images are on remote file storage (such as HDFS), we can use the following:
-```
-```
-String sourceDirectory = "hdfs:///data/my_images";          //Where your data is located
-String destinationDirectory = "hdfs:///data/preprocessed";  //Where the preprocessed data should be written
-int batchSize = 32;                                         //Number of examples (images) in each FileBatch object
-SparkDataUtils.createFileBatchesSpark(sourceDirectory, destinationDirectory, batchSize, sparkContext);
-```
-```
-
-**Step 2: Training**
-The data pipeline for image classification can be constructed as follows. This code is taken from the [TinyImageNet example](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-spark-examples/dl4j-spark/src/main/java/org/deeplearning4j/tinyimagenet/TrainSpark.java):
-```
-//Create data loader
-int imageHeightWidth = 64;      //64x64 pixel input to network
-int imageChannels = 3;          //RGB
-PathLabelGenerator labelMaker = new ParentPathLabelGenerator();
-ImageRecordReader rr = new ImageRecordReader(imageHeightWidth, imageHeightWidth, imageChannels, labelMaker);
-rr.setLabels(Arrays.asList("cat", "dog"));
-int numClasses = 2;
-RecordReaderFileBatchLoader loader = new RecordReaderFileBatchLoader(rr, minibatch, 1, numClasses);
-loader.setPreProcessor(new ImagePreProcessingScaler());   //Scale 0-255 valued pixels to 0-1 range
-
-
-//Fit the network
-String trainDataPath = "hdfs:///data/preprocessed";         //Where the preprocessed data is located
-JavaRDD<String> pathsTrain = SparkUtils.listPaths(sc, trainDataPath);
-for (int i = 0; i < numEpochs; i++) {
-    sparkNet.fitPaths(pathsTrain, loader);
-}
-```
-
-And that's it.
-
-Note: for other label generation cases (such as labels provided from the filename instead of parent directory), or for tasks such as semantic segmentation, you can substitute a different PathLabelGenerator instead of the default. For example, if the label should come from the file name, you can use ```PatternPathLabelGenerator``` instead.
-Let's say images are in the format "cat_img1234.jpg", "dog_2309.png" etc. We can use the following process:
-```
-PathLabelGenerator labelGenerator = new PatternPathLabelGenerator("_", 0);  //Split on the "_" character, and take the first value
-ImageRecordReader imageRecordReader = new ImageRecordReader(imageHW, imageHW, imageChannels, labelGenerator);
-```
-
-Note that PathLabelGenerator returns a Writable object, so for tasks like image segmentation, you can return an INDArray using the NDArrayWritable class in a custom PathLabelGenerator.
-
-<br><br>
-
-## <a name="customformat">How to load prepared minibatches in custom format</a>
-
-DL4J Spark training supports the ability to load data serialized in a custom format. The assumption is that each file on the remote/network storage represents a single minibatch of data in some readable format.
-
-Note that this approach is typically not required or recommended for most users, but is provided as an additional option for advanced users or those with pre-prepared data in a custom format or a format that is not natively supported by DL4J.
-When files represent a single record/example (instead of a minibatch) in a custom format, a custom RecordReader could be used instead.
-
-The interfaces of note are:
-
-* [DataSetLoader](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-core/src/main/java/org/deeplearning4j/api/loader/DataSetLoader.java)
-* [MultiDataSetLoader](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-core/src/main/java/org/deeplearning4j/api/loader/MultiDataSetLoader.java)
-
-Both of which extend the single-method [Loader](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-common/src/main/java/org/nd4j/api/loader/Loader.java) interface.
-
-Suppose a HDFS directory contains a number of files, each being a minibatch in some custom format.
-These can be loaded using the following process:
-```
-JavaSparkContext sc = new JavaSparkContext();
-String dataDirectory = "hdfs:///path/with/data";
-JavaRDD<String> loadedPaths = SparkUtils.listPaths(sc, dataDirectory);   //List paths using org.deeplearning4j.spark.util.SparkUtils
-
-SparkDl4jMultiLayer net = ...
-Loader<DataSet> myCustomLoader = new MyCustomLoader();
-net.fitPaths(loadedPaths, myCustomLoader);
-```
-
-Where the custom loader class looks something like:
-```
-public class MyCustomLoader implements DataSetLoader {
-    @Override
-    public DataSet load(Source source) throws IOException {
-        InputStream inputStream = source.getInputStream();
-        <load custom data format here> 
-        INDArray features = ...;
-        INDArray labels = ...;
-        return new DataSet(features, labels);
-    }
-}
-```
--- a/docs/deeplearning4j-scaleout/templates/howto.md
+++ b/docs/deeplearning4j-scaleout/templates/howto.md
@ -1,721 +0,0 @@
---
-title: "Deeplearning4j on Spark: How To Guides"
-short_title: How To Guide
-description: "Deeplearning4j on Spark: How To Guides"
-category: Distributed Deep Learning
-weight: 2
---
-
-# Deeplearning4j on Spark: How To Guides
-
-This page contains a number of how-to guides for common distributed training tasks.
-Note that for guides on building data pipelines, see [here](deeplearning4j-scaleout-data-howto).
-
-Before going through these guides, make sure you have read the introduction guide for deeplearning4j Spark training [here](deeplearning4j-scaleout-intro).
-
-Before Training Guides
-* [How to build an uber-JAR for training via Spark submit using Maven](#uberjar)
-* [How to use GPUs for training on Spark](#gpus)
-* [How to use CPUs on master, GPUs on the workers](#cpusgpus)
-* [How to configure memory settings for Spark](#memory)
-* [How to Configure Garbage Collection for Workers](#gc)
-* [How to use Kryo Serialization with DL4J and ND4J](#kryo)
-* [How to use YARN and GPUs](#yarngpus)
-* [How to configure Spark Locality Configuration](#locality)
-
-During and After Training Guides
-* [How to configure encoding thresholds](#threshold)
-* [How to perform distributed test set evaluation](#evaluation)
-* [How to save (and load) neural networks trained on Spark](#saveload)
-* [How to perform distributed inference](#inference)
-
-Problems and Troubleshooting Guides
-* [How to debug common Spark dependency problems (NoClassDefFoundExcption and similar)](#dependencyproblems)
-* [How to fix "Error querying NTP server" errors](#ntperror)
-* [How to Cache RDD[INDArray] and RDD[DataSet] Safely](#caching)
-* [Fixing libgomp issues on Amazon Elastic MapReduce](#libgomp)
-* [Failed training on Ubuntu 16.04 (Ubuntu bug that may affect DL4J Spark users)](#ubuntu16)
-
-<br><br>
-
-# Before Training - How-To Guides
-
-## <a name="uberjar">How to build an uber-JAR for training via Spark submit using Maven</a>
-
-When submitting a training job to a cluster, a typical workflow is to build an "uber-jar" that is submitted to Spark submit. An uber-jar is single JAR file containing all of the dependencies (libraries, class files, etc) required to run a job.
-Note that Spark submit is a script that comes with a Spark distribution that users submit their job (in the form of a JAR file) to, in order to begin execution of their Spark job.
-
-This guide assumes you already have code set up to train a network on Spark.
-
-**Step 1: Decide on the required dependencies.**
-
-There is a lot of overlap with single machine training with DL4J and ND4J. For example, for both single machine and Spark training you should include the standard set of deeplearning4j dependencies, such as:
-* deeplearning4j-core
-* deeplearning4j-spark
-* nd4j-native-platform (for CPU-only training)
-
-In addition, you will need to include the Deeplearning4j's Spark module, ```dl4j-spark_2.10``` or ```dl4j-spark_2.11```. This module is required for both development and execution of Deeplearning4j Spark jobs.
-Be careful to use the spark version that matches your cluster - for both the Spark version (Spark 1 vs. Spark 2) and the Scala version (2.10 vs. 2.11). If these are mismatched, your job will likely fail at runtime.
-
-Dependency example: Spark 2, Scala 2.11:
-```
-<dependency>
-  <groupId>org.deeplearning4j</groupId>
-  <artifactId>dl4j-spark_2.11</artifactId>
-  <version>1.0.0-beta2_spark_2</version>
-</dependency>
-```
-
-Depedency example, Spark 1, Scala 2.10:
-```
-<dependency>
-  <groupId>org.deeplearning4j</groupId>
-  <artifactId>dl4j-spark_2.10</artifactId>
-  <version>1.0.0-beta2_spark_1</version>
-</dependency>
-```
-
-Note that if you add a Spark dependency such as spark-core_2.11, this can be set to ```provided``` scope in your pom.xml (see [Maven docs](https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#Dependency_Scope) for more details), as Spark submit will add Spark to the classpath. Adding this dependency is not required for execution on a cluster, but may be needed if you want to test or debug a Spark-based job on your local machine.
-
-
-When training on CUDA GPUs, there are a couple of possible cases when adding CUDA dependencies:
-
-**Case 1: Cluster nodes have CUDA toolkit installed on the master and worker nodes**
-
-When the CUDA toolkit and CuDNN are available on the cluster nodes, we can use a smaller dependency:
-* If the OS building the uber-jar is the same OS as the cluster: include nd4j-cuda-x.x
-* If the OS building the uber-jar is different to the cluster OS (i.e., build on Windows, execute Spark on Linux cluster): include nd4j-cuda-x.x-platform
-* In both cases, include 
-where x.x is the CUDA version - for example, x.x=9.2 for CUDA 9.2.
-
-**Case 2: Cluster nodes do NOT have the CUDA toolkit installed on the master and worker nodes**
-
-When CUDA/CuDNN are NOT installed on the cluster nodes, we can do the following:
-* First, include the dependencies as per 'Case 1' above
-* Then include the "redist" javacpp-presets for the cluster operating system, as described here: [DL4J CuDNN Docs](./deeplearning4j-config-cudnn)
-
-
-**Step 2: Configure your pom.xml file for building an uber-jar**
-
-When using Spark submit, you will need an uber-jar to submit to start and run your job. After configuring the relevant dependencies in step 1, we need to configure the pom.xml file to properly build the uber-jar.
-
-We recommend that you use the maven shade plugin for building an uber-jar. There are alternative tools/plugins for this purpose, but these do not always include all relevant files from the source jars, such as those required for Java's ServiceLoader mechanism to function correctly. (The ServiceLoader mechanism is used by ND4J and a lot of other software libraries).
-
-A Maven shade configuration suitable for this purpose is provided in the example standalone sample project [pom.xml file](https://github.com/eclipse/deeplearning4j-examples/blob/master/standalone-sample-project/pom.xml):
-```
-<build>
-    <plugins>
-        <!-- Other plugins here if required -->
-
-        <!-- Configure maven shade to produce an uber-jar when running "mvn package" -->
-        <plugin>
-            <groupId>org.apache.maven.plugins</groupId>
-            <artifactId>maven-shade-plugin</artifactId>
-            <version>${maven-shade-plugin.version}</version>
-            <configuration>
-                <shadedArtifactAttached>true</shadedArtifactAttached>
-                <shadedClassifierName>bin</shadedClassifierName>
-                <createDependencyReducedPom>true</createDependencyReducedPom>
-                <filters>
-                    <filter>
-                        <artifact>*:*</artifact>
-                        <excludes>
-                            <exclude>org/datanucleus/**</exclude>
-                            <exclude>META-INF/*.SF</exclude>
-                            <exclude>META-INF/*.DSA</exclude>
-                            <exclude>META-INF/*.RSA</exclude>
-                        </excludes>
-                    </filter>
-                </filters>
-            </configuration>
-
-            <executions>
-                <execution>
-                    <phase>package</phase>
-                    <goals>
-                        <goal>shade</goal>
-                    </goals>
-                    <configuration>
-                        <transformers>
-                            <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
-                                <resource>reference.conf</resource>
-                            </transformer>
-                            <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
-                            <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
-                            </transformer>
-                        </transformers>
-                    </configuration>
-                </execution>
-            </executions>
-        </plugin>
-    </plugins>
-</build>
-```
-
-
-**Step 3: Build the uber jar**
-
-Finally, open up a command line window (bash on Linux, cmd on Windows, etc) simply run ```mvn package -DskipTests``` to build the uber-jar for your project.
-Note that the uber-jar should be present under ```<project_root>/target/<project_name>-bin.jar```.
-Be sure to use the large ```...-bin.jar``` file as this is the shaded jar with all of the dependencies.
-
-That's is - you should now have an uber-jar that is suitable for submitting to spark-submit for training networks on Spark with CPUs or NVIDA (CUDA) GPUs.
-
-
-<br><br>
-
-## <a name="gpus">How to use GPUs for training on Spark</a>
-
-Deeplearning4j and ND4J support GPU acceleration using NVIDA GPUs. DL4J Spark training can also be performed using GPUs.
-
-DL4J and ND4J are designed in such a way that the code (neural network configuration, data pipeline code) is "backend independent". That is, you can write the code once, and execute it on either a CPU or GPU, simply by including the appropriate backend (nd4j-native backend for CPUs, or nd4j-cuda-x.x for GPUs). Executing on Spark is no different from executing on a single node in this respect: you need to simply include the appropriate ND4J backend, and make sure your machines (master/worker nodes in the case) are appropriately set with the CUDA libraries (see the [uber-jar guide](#uberjar) for running on CUDA without needing to install CUDA/cuDNN on each node).
-
-When running on GPUs, there are a few components:
-(a) The ND4J CUDA backend (nd4j-cuda-x.x dependency)
-(b) The CUDA toolkit
-(c) The Deeplearning4j CUDA dependency to gain cuDNN support (deeplearning4j-cuda-x.x)
-(d) The cuDNN library files
-
-Both (a) and (b) must be available for ND4J/DL4J to run using an available CUDA GPU run.
-(c) and (d) are optional, though are recommended to get optimal performance - NVIDIA's cuDNN library is able to significantly speed up training for many layers, such as convolutional layers (ConvolutionLayer, SubsamplingLayer, BatchNormalization, etc) and LSTM RNN layers.
-
-For configuring dependencies for Spark jobs, see the [uber-jar section](#uberjar) above.
-For configuring cuDNN on a single node, see [Using Deeplearning4j with CuDNN](./deeplearning4j-config-cudnn)
-
-<br><br>
-
-## <a name="cpusgpus">How to use CPUs on master, GPUs on the workers</a>
-
-In some cases, it may make sense to run the master using CPUs only, and the workers using GPUs.
-If resources (i.e., the number of available GPU machines) are not constrained, it may simply be easier to have a homogeneous cluster: i.e., set up the cluster so that the master is using a GPU for execution also.
-
-Assuming the master/driver is executing on a CPU machine, and the workers are executing on GPU machines, you can simply include both backends (i.e., both the ```nd4j-cuda-x.x``` and ```nd4j-native``` dependencies as described in the [uber-jar section](#uberjar)).
-
-When multiple backends are present on the classpath, by default the CUDA backend will be tried first. If this cannot be loaded, the CPU (nd4j-native) backend will be loaded second. Thus, if the driver does not have a GPU, it should fall back to using a CPU. However, this default behaviour can be changed by setting the ```BACKEND_PRIORITY_CPU``` or ```BACKEND_PRIORITY_GPU``` environment variables on the master/driver, as described [here](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-common/src/main/java/org/nd4j/config/ND4JEnvironmentVars.java).
-The exact process for setting environment variables may depend on the cluster manager - Spark standalone vs. YARN vs. Mesos. Please consult the documentation for each on how to set the environment variables for Spark jobs for the driver/master.
-
-<br><br>
-
-## <a name="memory">How to configure memory settings for Spark</a>
-
-For important background on how memory and memory configuration works for DL4J and ND4J, start by reading [Memory management for ND4J/DL4J](./deeplearning4j-config-memory).
-
-The memory management on Spark is similar to memory management for single node training:
-* On-heap memory is configured using the standard Java Xms and Xmx memory configuration settings
-* Off-heap memory is configured using the javacpp system properties
-
-However, memory configuration in the context of Spark adds some additional complications:
-1. Often, memory configuration has to be done separately (sometimes using different mechanisms) for the driver/master vs. the workers
-2. The approach for configuring memory can depend on the cluster resource manager - Spark standalone vs. YARN vs. Mesos, etc
-3. Cluster resource manager default memory settings are often not appropriate for libraries (such as DL4J/ND4J) that rely heavily on off-heap memory
-
-See the Spark documentation for your cluster manager:
-* [YARN](https://spark.apache.org/docs/latest/running-on-yarn.html)
-* [Mesos](https://spark.apache.org/docs/latest/running-on-mesos.html)
-* [Spark Standalone](https://spark.apache.org/docs/latest/spark-standalone.html)
-
-You should set 4 things:
-1. The worker on-heap memory (Xmx) - usually set as an argument for Spark submit (for example, ```--executor-memory 4g``` for YARN)
-2. The worker off-heap memory (javacpp system properties options)  (for example, ```--conf "spark.executor.extraJavaOptions=-Dorg.bytedeco.javacpp.maxbytes=8G"```)
-3. The driver on-heap memory - usually set as an 
-4. The driver off-heap memory
-
-
-Some notes:
-* On YARN, it is generally necessary to set the ```spark.yarn.driver.memoryOverhead``` and ```spark.yarn.executor.memoryOverhead``` properties. The default settings are much too small for DL4J training.
-* On Spark standalone, you can also configure memory by modifying the ```conf/spark-env.sh``` file on each node, as described in the [Spark configuration docs](https://spark.apache.org/docs/latest/configuration.html#environment-variables). For example, you could add the following lines to set 8GB heap for the driver, 12 GB off-heap for the driver, 12GB heap for the workers, and 18GB off-heap for the workers:
-    * ```SPARK_DRIVER_OPTS=-Dorg.bytedeco.javacpp.maxbytes=12G```
-    * ```SPARK_DRIVER_MEMORY=8G```
-    * ```SPARK_WORKER_OPTS=-Dorg.bytedeco.javacpp.maxbytes=18G```
-    * ```SPARK_WORKER_MEMORY=12G```
-
-All up, this might look like (for YARN, with 4GB on-heap, 5GB off-heap, 6GB YARN off-heap overhead):
-```
--class my.class.name.here --num-executors 4 --executor-cores 8 --executor-memory 4G --driver-memory 4G --conf "spark.executor.extraJavaOptions=-Dorg.bytedeco.javacpp.maxbytes=5G" --conf "spark.driver.extraJavaOptions=-Dorg.bytedeco.javacpp.maxbytes=5G" --conf spark.yarn.executor.memoryOverhead=6144
-```
-
-<br><br>
-
-## <a name="gc">How to Configure Garbage Collection for Workers</a>
-
-One determinant of the performance of training is the frequency of garbage colection.
-When using [Workspaces](https://deeplearning4j.org/docs/latest/deeplearning4j-config-memory) (see also [this](https://deeplearning4j.org/docs/latest/deeplearning4j-config-workspaces)), which are enabled by default, it can be helpful to reduce the frequency of garbage collection.
-For simple machine training (and on the driver) this is easy:
-```
-// this will limit frequency of gc calls to 5000 milliseconds
-Nd4j.getMemoryManager().setAutoGcWindow(5000)
-
-// OR you could totally disable it
-Nd4j.getMemoryManager().togglePeriodicGc(false);
-```
-
-However, setting this on the driver will not change the settings on the workers.
-Instead, it can be set for the workers as follows:
-```
-new SharedTrainingMaster.Builder(voidConfiguration, minibatch)
-    <other configuration>
-    .workerTogglePeriodicGC(true)       //Periodic garbage collection is enabled...
-    .workerPeriodicGCFrequency(5000)    //...and is configured to be performed every 5 seconds (every 5000ms)
-    .build();
-```
-
-
-The default (as of 1.0.0-beta3) is to perform periodic garbage collection every 5 seconds on the workers.
-
-<br><br>
-
-## <a name="kryo">How to use Kryo Serialization with DL4J and ND4J</a>
-
-Deeplearning4j and ND4J can utilize Kryo serialization, with appropriate configuration.
-Note that due to the off-heap memory of INDArrays, Kryo will offer less of a performance benefit compared to using Kryo in other contexts.
-
-To enable Kryo serialization, first add the [nd4j-kryo dependency](https://search.maven.org/search?q=nd4j-kryo):
-```
-<dependency>
-  <groupId>org.nd4j</groupId>
-  <artifactId>nd4j-kryo_2.11</artifactId>
-  <version>${dl4j-version}</version>
-</dependency>
-```
-where ```${dl4j-version}``` is the version used for DL4J and ND4J.
-
-Then, at the start of your training job, add the following code:
-```
-    SparkConf conf = new SparkConf();
-    conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
-    conf.set("spark.kryo.registrator", "org.nd4j.Nd4jRegistrator");
-```
-
-Note that when using Deeplearning4j's SparkDl4jMultiLayer or SparkComputationGraph classes, a warning will be logged if the Kryo configuration is incorrect.
-
-<br><br>
-
-## <a name="yarngpus">How to use YARN and GPUs</a>
-
-For DL4J, the only requirement for CUDA GPUs is to use the appropriate backend, with the appropriate NVIDIA libraries either installed on each node, or provided in the uber-JAR (see [Spark how-to guide](deeplearning4j-scaleout-howto) for more details).
-For recent versions of YARN, some additional configuration may be required in some cases - see the [YARN GPU documentation](https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/UsingGpus.html) for more details.
-
-Earlier version of YARN (for example, 2.7.x and similar) did not support GPUs natively.
-For these versions, it is possible to utilize node labels to ensure that jobs are scheduled onto GPU-only nodes. For more details, see the Hadoop Yarn [documentation](https://hadoop.apache.org/docs/r2.7.3/hadoop-yarn/hadoop-yarn-site/NodeLabel.html)
-
-Note that YARN-specific memory configuration (see [memory how-to](deeplearning4j-scaleout-howto#memory)) is also required.
-
-<br><br>
-
-## <a name="locality">How to Configure Spark Locality Configuration</a>
-
-Configuring Spark locality settings is an optional configuration option that can improve training performance.
-
-The summary: adding ```--conf spark.locality.wait=0``` to your Spark submit configuration may marginally reduce training times, by scheduling the network fit operations to be started sooner.
-
-For more details, see [link 1](https://spark.apache.org/docs/latest/tuning.html#data-locality) and [link 2](https://spark.apache.org/docs/latest/configuration.html#scheduling).
-
-<br><br>
-
-# During and After Training Guides
-
-## <a name="threshold">How to Configure Encoding Thresholds</a>
-
-Deeplearning4j's Spark implementation uses a threshold encoding scheme for sending parameter updates between nodes. This encoding scheme results in a small quantized message, which significantly reduces the network cost of communicating updates. See the [technical explanation page](./deeplearning4j-scaleout-technicalref) for more details on this encoding process.
-
-This threshold encoding process introduces a "distributed training specific" hyperparameter - the encoding threshold.
-Both too large thresholds and too small thresholds can result in sub-optimal performance:
-
-* Large thresholds mean infrequent communication - too infrequent and convergence can suffer
-* Small thresholds mean more frequent communication - but smaller changes are communicated at each step
-
-The encoding threshold to be used is controlled by the [ThresholdAlgorithm](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/optimize/solvers/accumulation/encoding/ThresholdAlgorithm.java). The specific implementation of the ThresholdAlgorithm determines what threshold should be used.
-
-The default behaviour for DL4J is to use [AdaptiveThresholdAlgorithm](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/optimize/solvers/accumulation/encoding/threshold/AdaptiveThresholdAlgorithm.java) which tries to keep the sparsity ratio in a certain range.
-* The sparsity ratio is defined as numValues(encodedUpdate)/numParameters - 1.0 means fully dense (all values communicated), 0.0 means fully sparse (no values communicated)
-* Larger thresholds mean more sparse values (less network communication), and a smaller threshold means less sparse values (more network communication)
-* The AdaptiveThresholdAlgorithm tries to keep the sparsity ratio between 0.01 and 0.0001 by default. If the sparsity of the updates falls outside of this range, the threshold is either increased or decreased until it is within this range.
-* An initial threshold value still needs to be set - we have found the
-
-In practice, we have seen that this adaptive threshold process to work well.
-The built-in implementations for threshold algorithms include:
-
-* AdaptiveThresholdAlgorithm
-* [FixedThresholdAlgorithm](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/optimize/solvers/accumulation/encoding/threshold/FixedThresholdAlgorithm.java): a fixed, non-adaptive threshold using the specified encoding threshold.
-* [TargetSparsityThresholdAlgorithm](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/optimize/solvers/accumulation/encoding/threshold/TargetSparsityThresholdAlgorithm.java): an adaptive threshold algorithm that targets a specific sparsity, and increases or decreases the threshold to try to match the target.
-
-In addition, DL4J has a [ResidualPostProcessor](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/optimize/solvers/accumulation/encoding/ResidualPostProcessor.java) interface, with the default implementation being [ResidualClippingPostProcessor](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/optimize/solvers/accumulation/encoding/residual/ResidualClippingPostProcessor.java) which clips the residual vector to a maximum of 5x the current threshold, every 5 steps.
-The motivation for this is that the "left over" parts of the updates (i.e., those parts not communicated) are store in the residual vector. If the updates are much larger than the threshold, we can have a phenomenon we have termed "residual explosion" - that is, the residual values can continue to grow to many times the threshold (hence would take many steps to communicate the gradient). The residual post processor is used to avoid this phenomenon.
-
-The threshold algorithm (and initial threshold) and the residual post processor can be set as follows:
-```
-TrainingMaster tm = new SharedTrainingMaster.Builder(voidConfiguration, minibatch)
-    .thresholdAlgorithm(new AdaptiveThresholdAlgorithm(this.gradientThreshold))
-    .residualPostProcessor(new ResidualClippingPostProcessor(5, 5))
-    <other config>
-    .build();
-```
-
-Finally, DL4J's SharedTrainingMaster also has an encoding debug mode, enabled by setting ```.encodingDebugMode(true)``` in the SharedTrainingmaster builder.
-When this is enabled, each of the workers will log the current threshold, sparsity, and various other statistics about the encoding.
-These statistics can be used to determine if the threshold is appropriately set: for example, many updates that are tens or hundreds of times the threshold may indicate the threshold is too low and should be increased; at the other end of the spectrum, very sparse updates (less than one in 10000 values being communicated) may indicate that the threshold should be decreased.
-
-<br><br>
-
-## <a name="evaluation">How to perform distributed test set evaluation</a>
-
-Deeplearning4j supports most standard evaluation metrics for neural networks. For basic information on evaluation, see the [Deeplearning4j Evaluation Page](./deeplearning4j-nn-evaluation)
-
-All of the [evaluation metrics](./deeplearning4j-nn-evaluation) that Deeplearning4j supports can be calculated in a distributed manner using Spark.
-
-**Step 1: Prepare Your Data**
-
-Evaluation data for Deeplearinng4j on Spark is very similar to training data. That is, you can use:
-* ```RDD<DataSet>``` or ```JavaRDD<DataSet>``` for evaluating single input/output networks
-* ```RDD<MultiDataSet>``` or ```JavaRDD<MultiDataSet>``` for evaluating multi input/output networks
-* ```RDD<String>``` or ```JavaRDD<String>``` where each String is a path that points to serialized DataSet/MultiDataSet (or other minibatch file-based formats) on network storage such as HDFS.
-
-See the data page (TODO: LINK) for details on how to prepare your data into one of these formats.
-
-**Step 2: Prepare Your Network**
-
-Creating your network is straightforward.
-First, load your network (MultiLayerNetwork or ComputationGraph) into memory on the driver using the information from the following guide: [How to save (and load) neural networks trained on Spark](#saveload)
-
-Then, simply create your network using:
-
-```
-JavaSparkContext sc = new JavaSparkContext();
-MultiLayerNetwork net = <code to load network>
-SparkDl4jMultiLayer sparkNet = new SparkDl4jMultiLayer(sc, cgForEval, null);
-```
-
-```
-JavaSparkContext sc = new JavaSparkContext();
-ComputationGraph net = <code to load network>
-SparkComputationGraph sparkNet = new SparkComputationGraph(sc, net, null);
-```
-
-Note that you don't need to configure a TrainingMaster (i.e., the 3rd argument is null above), as evaluation does not use it.
-
-
-**Step 3: Call the appropriate evaluation method**
-
-For common cases, you can call one of the standard evalutation methods on SparkDl4jMultiLayer or SparkComputationGraph:
-```
-evaluate(RDD<DataSet>)                //Accuracy/F1 etc for classifiers
-evaluate(JavaRDD<DataSet>)            //Accuracy/F1 etc for classifiers
-evaluateROC(JavaRDD<DataSet>)         //ROC for single output binary classifiers
-evaluateRegression(JavaRDD<DataSet>)  //For regression metrics
-```
-
-For performing multiple evaluations simultaneously (more efficient than performing them sequentially) you can use something like:
-```
-IEvaluation[] evaluations = new IEvaluation[]{new Evaluation(), new ROCMultiClass()};
-JavaRDD<DataSet> data = ...;
-sparkNet.doEvaluation(data, 32, evaluations);
-```
-
-Note that some of the evaluation methods have overloads with extra parameters, including:
-* ```int evalNumWorkers``` - the number of evaluation workers - i.e., the number of copies of a network used for evaluation on each node (up to the maximum number of Spark threads per worker). For large networks (or limited cluster memory), you might want to reduce this to avoid running into memory problems.
-* ```int evalBatchSize``` - the minibatch size to use when performing evaluation. This needs to be large enough to efficiently use the hardware resources, but small enough to not run out of memory. Values of 32-128 is unsually a good starting point; increase when more memory is available and for smaller networks; decrease if memory is a problem.
-* ```DataSetLoader loader``` and ```MultiDataSetLoader loader``` - these are available when evaluating on a ```RDD<String>``` or ```JavaRDD<String>```. They are interfaces to load a path into a DataSet or MultiDataSet using a custom user-defined function. Most users will not need to use these, however the functionality is provided for greater flexibility. They would be used for example if the saved minibatch file format is not a DataSet/MultiDataSet but some other (possibly custom) format.
-
-
-Finally, if you want to save the results of evaluation (of any type) you can save it to JSON format directly to remote storage such as HDFS as follows:
-```
-JavaSparkContext sc = new JavaSparkContext();
-Evaluation eval = ...
-String json = eval.toJson();
-String writeTo = "hdfs:///output/directory/evaluation.json";
-SparkUtils.writeStringToFile(writeTo, json, sc); //Also supports local file paths - file://
-```
-The import for ```SparkUtils``` is ```org.datavec.spark.transform.utils.SparkUtils```
-
-The evaluation can be loaded using:
-```
-String json = SparkUtils.readStringFromFile(writeTo, sc);
-Evaluation eval = Evaluation.fromJson(json);
-```
-
-<br><br>
-
-## <a name="saveload">How to save (and load) neural networks trained on Spark</a>
-
-Deeplearning4j's Spark functionality is built around the idea of wrapper classes - i.e., ```SparkDl4jMultiLayer``` and ```SparkComputationGraph``` internally use the standard ```MultiLayerNetwork``` and ```ComputationGraph``` classes.
-You can access the internal MultiLayerNetwork/ComputationGraph classes using ```SparkDl4jMultiLayer.getNetwork()``` and ```SparkComputationGraph.getNetwork()``` respectively.
-
-To save on the master/driver's local file system, get the network as described above and simply use the ```ModelSerializer``` class or ```MultiLayerNetwork.save(File)/.load(File)``` and ```ComputationGraph.save(File)/.load(File)``` methods.
-
-To save to (or load from) a remote location or distributed file system such as HDFS, you can use input and output streams.
-
-For example,
-```
-JavaSparkContext sc = new JavaSparkContext();
-FileSystem fileSystem = FileSystem.get(sc.hadoopConfiguration());
-String outputPath = "hdfs:///my/output/location/file.bin";
-MultiLayerNetwork net = sparkNet.getNetwork();
-try (BufferedOutputStream os = new BufferedOutputStream(fileSystem.create(new Path(outputPath)))) {
-    ModelSerializer.writeModel(net, os, true);
-}
-```
-
-Reading is a similar process:
-```
-JavaSparkContext sc = new JavaSparkContext();
-FileSystem fileSystem = FileSystem.get(sc.hadoopConfiguration());
-String outputPath = "hdfs:///my/output/location/file.bin";
-MultiLayerNetwork net;
-try(BufferedInputStream is = new BufferedInputStream(fileSystem.open(new Path(outputPath)))){
-    net = ModelSerializer.restoreMultiLayerNetwork(is);
-}
-```
-
-<br><br>
-
-
-## <a name="inference">How to perform distributed inference</a>
-
-Deeplearning4j's Spark implementation supports distributed inference. That is, we can easily generate predictions on an RDD of inputs using a cluster of machines.
-This distributed inference can also be used for networks trained on a single machine and loaded for Spark (see the [saving/loading section](#saveload) for details on how to load a saved network for use with Spark).
-
-Note: If you want to perform evaluation (i.e., calculate accuracy, F1, MSE, etc), refer to the [evaluation how-to](#evaluation) instead.
-
-The method signatures for performing distributed inference are as follows:
-```
-SparkDl4jMultiLayer.feedForwardWithKey(JavaPairRDD<K, INDArray> featuresData, int batchSize) : JavaPairRDD<K, INDArray>
-SparkComputationGraph.feedForwardWithKey(JavaPairRDD<K, INDArray[]> featuresData, int batchSize) : JavaPairRDD<K, INDArray[]>
-```
-There are also overloads that accept an input mask array, when required
-
-Note the parameter ```K``` - this is a generic type to signify the unique 'key' used to identify each example. The key values are not used as part of the inference process. This key is required as Spark's RDDs are unordered - without this, we would have no way to know which element in the predictions RDD corresponds to which element in the input RDD.
-The batch size parameter is used to specify the minibatch size when performing inference. It does not impact the values returned, but instead is used to balance memory use vs. computational efficiency: large batches might compute a little quicker overall, but require more memory. In many cases, a batch size of 64 is a good starting point to try if you are unsure of what to use.
-
-<br><br><br>
-
-# Problems and Troubleshooting Guides
-
-## <a name="dependencyproblems">How to debug common Spark dependency problems (NoClassDefFoundExcption and similar)</a>
-
-Unfortunately, dependency problems at runtime can occur on a cluster if your project is not configured correctly. These problems can occur with any Spark jobs, not just those using DL4J - and they may be caused by other dependencies or libraries on the classpath, not by Deeplearning4j dependencies.
-
-When dependency problems occur, they typically produce exceptions like:
-* NoSuchMethodException
-* ClassNotFoundException
-* AbstractMethodError
-
-For example, mismatched Spark versions (trying to use Spark 1 on a Spark 2 cluster) can look like:
-```
-java.lang.AbstractMethodError: org.deeplearning4j.spark.api.worker.ExecuteWorkerPathMDSFlatMap.call(Ljava/lang/Object;)Ljava/util/Iterator;
-```
-
-Another class of errors is the ```UnsupportedClassVersionError``` for example ```java.lang.UnsupportedClassVersionError: XYZ : Unsupported major.minor version 52.0``` - this can result from trying to run (for example) Java 8 code on a cluster that is set up with only a Java 7 JRE/JDK.
-
-
-How to debug dependency problems:
-
-**Step 1: Collect Dependency Information**
-
-The first step (when using Maven) is to produce a dependency tree that you can refer to.
-Open a command line window (for example, bash on Linux, cmd on Windows), navigate to the root directory of your Maven project and run ```mvn dependency:tree```
-This will give you a list of dependencies (direct and transient) that can be helpful to understand exactly what is on the classpath, and why.
-
-Note also that ```mvn dependency:tree -Dverbose``` will provide extra information, and can be useful when debugging problems related to mismatched library versions.
-
-**Step 2: Check your Spark Versions**
-
-When running into dependency issues, check the following.
-
-*First: check the Spark versions*
-If your cluster is running Spark 2, you should be using a version of deeplearning4j-spark_2.10/2.11 (and DataVec) that ends with ```_spark_2```
-
-Look through
-
-If you find a problem, you should change your project dependencies as follows:
-On a Spark 2 (Scala 2.11) cluster, use:
-```
-<dependency>
-    <groupId>org.deeplearning4j</groupId>
-    <artifactId>dl4j-spark_2.11</artifactId>
-    <version>1.0.0-beta2_spark_2</version>
-</dependency>
-```
-whereas on a Spark 1 (Scala 2.11) cluster, you should use:
-```
-<dependency>
-    <groupId>org.deeplearning4j</groupId>
-    <artifactId>dl4j-spark_2.11</artifactId>
-    <version>1.0.0-beta2_spark_1</version>
-</dependency>
-```
-
-**Step 3: Check the Scala Versions**
-
-Apache Spark is distributed with versions that support both Scala 2.10 and Scala 2.11.
-
-To avoid problems with Scala versions, you need to do two things:
-(a) Ensure you don't have a mix of Scala 2.10 and Scala 2.11 (or 2.12) dependencies on your project classpath. Check your dependency tree for entries ending in ```_2.10``` or ```_2.11```: for example, ```org.apache.spark:spark-core_2.11:jar:1.6.3:compile``` is a Spark 1 (1.6.3) dependency using Scala 2.11
-(b) Ensure that your project matches what the cluster is using. For example, if you cluster is running Spark 2 with Scala 2.11, all of your Scala dependencies should use 2.11 also. Note that Scala 2.11 is more common for Spark clusters.
-
-If you find mismatched Scala versions, you will need to align them by changing the dependency versions in your pom.xml (or similar configuration file for other dependency management systems). Many libraries (including Spark and DL4J) release dependencies with both Scala 2.10 and 2.11 versions.
-
-**Step 4: Check for Mismatched Library Versions**
-
-A number of common utility libraries that are widely used across the Java ecosystem are not compatible across versions. For example, Spark might rely on library X version Y and will fail to run when library X version Z is on the classpath. Furthermore, many of these libraries are split into multiple modules (i.e., multiple separate modular dependencies) that won't work correctly when mixing different versions.
-
-Some that can commonly cause problems include:
-* Jackson
-* Guava
-
-DL4J and ND4J use versions of these libraries that should avoid dependency conflicts with Spark.
-However, it is possible that other (3rd party libraries) can pull in versions of these dependencies.
-
-Often, the exception will give a hint of where to look - i.e., the stack trace might include a specific class, which can be used to identify the problematic library.
-
-**Step 5: Once Identified, Fix the Dependency Conflict**
-
-To debug these sorts of problems, check the dependency tree (the output of ```mvn dependency:tree -Dverbose```) carefully. Where necessary, you can use [exclusions](https://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html) or add the problematic dependency as a direct dependency to force it's version in your probelm. To do this, you would add the dependency of the version you want directly to your project. Often, this is enough to solve the problem.
-
-Keep in mind that when using Spark submit, Spark will add a copy of Spark and it's dependent libraries to the driver and worker classpaths.
-This means that for dependencies that are added by Spark, you can't simply exclude them in your project - Spark submit will add them at runtime whether you exclude them or not in your project.
-
-One additional setting that is worth knowing about is the (experimental) Spark configuration options, ```spark.driver.userClassPathFirst``` and ```spark.executor.userClassPathFirst``` (See the [Spark configuartion docs](https://spark.apache.org/docs/latest/configuration.html) for more details). In some cases, these options may be a fix for dependency issues.
-
-<br><br>
-
-## <a name="caching">How to Cache RDD[INDArray] and RDD[DataSet] Safely</a>
-
-Spark has some issues regarding how it handles Java objects with large off-heap components, such as the DataSet and INDArray objects used in Deeplearning4j. This section explains the issues related to caching/persisting these objects.
-
-The key points to know about are:
-
-* MEMORY_ONLY and MEMORY_AND_DISK persistence can be problematic with off-heap memory, due to Spark not properly estimating the size of objects in the RDD. This can lead to out of (off-heap) memory issues.
-* When persisting a ```RDD<DataSet>``` or ```RDD<INDArray>``` for re-use, use MEMORY_ONLY_SER or MEMORY_AND_DISK_SER
-
-**Why MEMORY_ONLY_SER or MEMORY_AND_DISK_SER Are Recommended**
-
-One of the way that Apache Spark improves performance is by allowing users to cache data in memory. This can be done using the ```RDD.cache()``` or ```RDD.persist(StorageLevel.MEMORY_ONLY())``` to store the contents in-memory, in deserialized (i.e., standard Java object) form.
-The basic idea is simple: if you persist a RDD, you can re-use it from memory (or disk, depending on configuration) without having to recalculate it. However, large RDDs may not entirely fit into memory. In this case, some parts of the RDD have to be recomputed or loaded from disk, depending on the storage level used. Furthermore, to avoid using too much memory, Spark will drop parts (blocks) of an RDD when required.
-
-The main storage levels available in Spark are listed below. For an explanation of  these, see the [Spark Programming Guide](https://spark.apache.org/docs/1.6.2/programming-guide.html#rdd-persistence).
-
-* MEMORY_ONLY
-* MEMORY_AND_DISK
-* MEMORY_ONLY_SER
-* MEMORY_AND_DISK_SER
-* DISK_ONLY
-
-The problem with Spark is how it handles memory. In particular, Spark will drop part of an RDD (a block) based on the estimated size of that block. The way Spark estimates the size of a block depends on the persistence level. For ```MEMORY_ONLY``` and ```MEMORY_AND_DISK``` persistence, this is done by walking the Java object graph - i.e., look at the fields in an object and recursively estimate the size of those objects. This process does not however take into account the off-heap memory used by Deeplearning4j or ND4J. For objects like DataSets and INDArrays (which are stored almost entirely off-heap), Spark significantly under-estimates the true size of the objects using this process. Furthermore, Spark considers only the amount of on-heap memory use when deciding whether to keep or drop blocks. Because DataSet and INDArray objects have a very small on-heap size, Spark will keep too many of them around with ```MEMORY_ONLY``` and ```MEMORY_AND_DISK``` persistence, resulting in off-heap memory being exhausted, causing out of memory issues.
-
-However, for ```MEMORY_ONLY_SER``` and ```MEMORY_AND_DISK_SER``` Spark stores blocks in *serialized* form, on the Java heap. The size of objects stored in serialized form can be estimated accurately by Spark (there is no off-heap memory component for the serialized objects) and consequently Spark will drop blocks when required - avoiding any out of memory issues.
-
-<br><br>
-
-## <a name="ntperror">How to fix "Error querying NTP server" errors</a>
-
-DL4J's parameter averaging implementation has the option to collect training stats, by using ```SparkDl4jMultiLayer.setCollectTrainingStats(true)```.
-When this is enabled, internet access is required to connect to the NTP (network time protocal) server.
-
-It is possible to get errors like ```NTPTimeSource: Error querying NTP server, attempt 1 of 10```. Sometimes these failures are transient (later retries will work) and can be ignored. However, if the Spark cluster is configured such that one or more of the workers cannot access the internet (or specifically, the NTP server), all retries can fail.
-
-Two solutions are available:
-
-1. Don't use ```sparkNet.setCollectTrainingStats(true)``` - this functionality is optional (not required for training), and is disabled by default
-2. Set the system to use the local machine clock instead of the NTP server, as the time source (note however that the timeline information may be very inaccurate as a result)
-To use the system clock time source, add the following to Spark submit:
-```
--conf spark.driver.extraJavaOptions=-Dorg.deeplearning4j.spark.time.TimeSource=org.deeplearning4j.spark.time.SystemClockTimeSource
--conf spark.executor.extraJavaOptions=-Dorg.deeplearning4j.spark.time.TimeSource=org.deeplearning4j.spark.time.SystemClockTimeSource
-```
-
-<br><br>
-
-## <a name="ubuntu16">Failed training on Ubuntu 16.04 (Ubuntu bug that may affect DL4J users)</a>
-
-When running a Spark on YARN cluster on Ubuntu 16.04 machines, chances are that after finishing a job, all processes owned by the user running Hadoop/YARN are killed. This is related to a bug in Ubuntu, which is documented at https://bugs.launchpad.net/ubuntu/+source/procps/+bug/1610499. There's also a Stackoverflow discussion about it at https://stackoverflow.com/questions/38419078/logouts-while-running-hadoop-under-ubuntu-16-04.
-
-Some workarounds are suggested. 
-
-**Option 1**
-
-Add
-```
-[login]
-KillUserProcesses=no
-```
-to /etc/systemd/logind.conf, and reboot.
-
-**Option 2**
-
-Copy the /bin/kill binary from Ubuntu 14.04 and use that one instead. 
-
-**Option 3**
-
-Downgrade to Ubuntu 14.04 
-
-**Option 4**
-
-## <a href="caching">How to Cache RDD[INDArray] and RDD[DataSet] Safely</a>
-
-Spark has some issues regarding how it handles Java objects with large off-heap components, such as the DataSet and INDArray objects used in Deeplearning4j. This section explains the issues related to caching/persisting these objects.
-
-The key points to know about are:
-
-* MEMORY_ONLY and MEMORY_AND_DISK persistence can be problematic with off-heap memory, due to Spark not properly estimating the size of objects in the RDD. This can lead to out of (off-heap) memory issues.
-* When persisting a ```RDD<DataSet>``` or ```RDD<INDArray>``` for re-use, use MEMORY_ONLY_SER or MEMORY_AND_DISK_SER
-
-**Why MEMORY_ONLY_SER or MEMORY_AND_DISK_SER Are Recommended**
-
-One of the way that Apache Spark improves performance is by allowing users to cache data in memory. This can be done using the ```RDD.cache()``` or ```RDD.persist(StorageLevel.MEMORY_ONLY())``` to store the contents in-memory, in deserialized (i.e., standard Java object) form.
-The basic idea is simple: if you persist a RDD, you can re-use it from memory (or disk, depending on configuration) without having to recalculate it. However, large RDDs may not entirely fit into memory. In this case, some parts of the RDD have to be recomputed or loaded from disk, depending on the storage level used. Furthermore, to avoid using too much memory, Spark will drop parts (blocks) of an RDD when required.
-
-The main storage levels available in Spark are listed below. For an explanation of  these, see the [Spark Programming Guide](https://spark.apache.org/docs/1.6.2/programming-guide.html#rdd-persistence).
-
-* MEMORY_ONLY
-* MEMORY_AND_DISK
-* MEMORY_ONLY_SER
-* MEMORY_AND_DISK_SER
-* DISK_ONLY
-
-The problem with Spark is how it handles memory. In particular, Spark will drop part of an RDD (a block) based on the estimated size of that block. The way Spark estimates the size of a block depends on the persistence level. For ```MEMORY_ONLY``` and ```MEMORY_AND_DISK``` persistence, this is done by walking the Java object graph - i.e., look at the fields in an object and recursively estimate the size of those objects. This process does not however take into account the off-heap memory used by Deeplearning4j or ND4J. For objects like DataSets and INDArrays (which are stored almost entirely off-heap), Spark significantly under-estimates the true size of the objects using this process. Furthermore, Spark considers only the amount of on-heap memory use when deciding whether to keep or drop blocks. Because DataSet and INDArray objects have a very small on-heap size, Spark will keep too many of them around with ```MEMORY_ONLY``` and ```MEMORY_AND_DISK``` persistence, resulting in off-heap memory being exhausted, causing out of memory issues.
-
-However, for ```MEMORY_ONLY_SER``` and ```MEMORY_AND_DISK_SER``` Spark stores blocks in *serialized* form, on the Java heap. The size of objects stored in serialized form can be estimated accurately by Spark (there is no off-heap memory component for the serialized objects) and consequently Spark will drop blocks when required - avoiding any out of memory issues.
-
-## <a href="ntperror">How to fix "Error querying NTP server" errors</a>
-
-DL4J's parameter averaging implementation has the option to collect training stats, by using ```SparkDl4jMultiLayer.setCollectTrainingStats(true)```.
-When this is enabled, internet access is required to connect to the NTP (network time protocal) server.
-
-It is possible to get errors like ```NTPTimeSource: Error querying NTP server, attempt 1 of 10```. Sometimes these failures are transient (later retries will work) and can be ignored. However, if the Spark cluster is configured such that one or more of the workers cannot access the internet (or specifically, the NTP server), all retries can fail.
-
-Two solutions are available:
-
-1. Don't use ```sparkNet.setCollectTrainingStats(true)``` - this functionality is optional (not required for training), and is disabled by default
-2. Set the system to use the local machine clock instead of the NTP server, as the time source (note however that the timeline information may be very inaccurate as a result)
-To use the system clock time source, add the following to Spark submit:
-```
--conf spark.driver.extraJavaOptions=-Dorg.deeplearning4j.spark.time.TimeSource=org.deeplearning4j.spark.time.SystemClockTimeSource
--conf spark.executor.extraJavaOptions=-Dorg.deeplearning4j.spark.time.TimeSource=org.deeplearning4j.spark.time.SystemClockTimeSource
-```
-
-## <a href="ubuntu16">Failed training on Ubuntu 16.04 (Ubuntu bug that may affect DL4J users)</a>
-
-When running a Spark on YARN cluster on Ubuntu 16.04 machines, chances are that after finishing a job, all processes owned by the user running Hadoop/YARN are killed. This is related to a bug in Ubuntu, which is documented at https://bugs.launchpad.net/ubuntu/+source/procps/+bug/1610499. There's also a Stackoverflow discussion about it at https://stackoverflow.com/questions/38419078/logouts-while-running-hadoop-under-ubuntu-16-04.
-
-Some workarounds are suggested. 
-
-**Option 1**
-
-Add
-```
-[login]
-KillUserProcesses=no
-```
-to /etc/systemd/logind.conf, and reboot.
-
-**Option 2**
-
-Copy the /bin/kill binary from Ubuntu 14.04 and use that one instead. 
-
-**Option 3**
-
-Downgrade to Ubuntu 14.04 
-
-**Option 4**
-
-run ```sudo loginctl enable-linger hadoop_user_name``` on cluster nodes
--- a/docs/deeplearning4j-scaleout/templates/intro.md
+++ b/docs/deeplearning4j-scaleout/templates/intro.md
@ -1,153 +0,0 @@
---
-title: "Deeplearning4j on Spark: Introduction/Getting Started"
-short_title: Introduction/Getting Started
-description: "Deeplearning4j on Spark: Introduction"
-category: Distributed Deep Learning
-weight: 0
---
-
-# Distributed Deep Learning with DL4J and Spark
-
-Deeplearning4j supports neural network training on a cluster of CPU or GPU machines using Apache Spark. Deeplearning4j also supports distributed evaluation as well as distributed inference using Spark.
-
-## DL4J’s Distributed Training Implementations
-
-DL4J has two implementations of distributed training. 
-  * Gradient sharing, available as of 1.0.0-beta: Based on [this](http://nikkostrom.com/publications/interspeech2015/strom_interspeech2015.pdf) paper by Nikko Strom, is an asynchronous SGD implementation with quantized and compressed updates implemented in Spark+Aeron
-  * Parameter averaging: A synchronous SGD implementation with a single parameter server implemented entirely in Spark.
-
-
-Users are directed towards the gradient sharing implementation which superseded the parameter averaging implementation. The gradient sharing implementation results in faster training times and is implemented to be scalable and fault-tolerant (as of 1.0.0-beta3). For the sake of completeness, this page will also cover the parameter averaging approach. The [technical reference section](deeplearning4j-scaleout-technicalref) covers details on the implementation.
-
-In addition to distributed training DL4J also enables users to do distributed evaluation (including multiple evaluations simultaneously) and distributed inference. Refer to the [Deeplearning4j on Spark: How To Guides](deeplearning4j-scaleout-howto) for more details.
-
-### When to use Spark for Training Neural Networks
-
-Spark is not always the most appropriate tool for training neural networks.
-
-You should use Spark when:
-1. You have a cluster of machines for training (not just a single machine - this includes multi-GPU machines)
-2. You need more than single machine to train the network
-3. Your network is large to justify a distributed implementation
-
-For a single machine with multiple GPUs or multiple physical processors, users should consider using DL4J's Parallel-Wrapper implementation as shown in [this example](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-cuda-specific-examples/src/main/java/org/deeplearning4j/examples/multigpu/MultiGpuLenetMnistExample.java). ParallelWrapper allows for easy data parallel training of networks on a single machine with multiple cores. Spark has higher overheads compared to ParallelWrapper for single machine training.
-
-Similarly, if you don't need Spark (smaller networks and/or datasets) - it is recommended to use single machine training, which is usually simpler to set up.
-
-For a network to be large enough: here's a rough guide. If the network takes 100ms or longer to perform one iteration (100ms per fit operation on each minibatch), distributed training should work well with good scalability. At 10ms per iteration, we might expect sub-linear scaling of performance vs. number of nodes. At around 1ms or below per iteration, the communication overhead may be too much: training on a cluster may be no faster (or perhaps even slower) than on a single machine.
-For the benefits of parallelism to outweigh the communication overhead, users should consider the ratio of network transfer time to computation time and ensure that the computation time is large enough to mask the additional overhead of distributed training.
-
-### Setup and Dependencies
-
-To run training on GPUs make sure that you are specifying the correct backend in your pom file (nd4j-cuda-x.x for GPUs vs nd4j-native backend for CPUs) and have set up the machines with the appropriate CUDA libraries. Refer to the [Deeplearning4j on Spark: How To Guides](deeplearning4j-scaleout-howto) for more details.
-
-To use the gradient sharing implementation include the following dependency:
-
-```
-<dependency>
-    <groupId>org.deeplearning4j</groupId>
-    <artifactId>dl4j-spark-parameterserver_${scala.binary.version}</artifactId>
-    <version>${dl4j.version}</version>
-</dependency>
-```
-
-If using the parameter averaging implementation (again, the gradient sharing implemention should be preferred) include:
-
-```
-<dependency>
-        <groupId>org.deeplearning4j</groupId>
-        <artifactId>dl4j-spark_${scala.binary.version}</artifactId>
-        <version>${dl4j.version}</version>
-</dependency>
-```
-Note that ${scala.binary.version} is a Maven property with the value 2.10 or 2.11 and should match the version of Spark you are using.
-
-## Key Concepts
-
-The following are key classes the user should be familiar with to get started with distributed training with DL4J.
-
-  * **TrainingMaster**: Specifies how distributed training will be conducted in practice. Implementations include Gradient Sharing (SharedTrainingMaster) or Parameter Averaging (ParameterAveragingTrainingMaster)
-  * **SparkDl4jMultiLayer and SparkComputationGraph**: These are wrappers around the MultiLayerNetwork and ComputationGraph classes in DL4J that enable the functionality related to distributed training. For training, they are configured with a TrainingMaster.
-  * **```RDD<DataSet>``` and ```RDD<MultiDataSet>```**: A Spark RDD with DL4J's DataSet or MultiDataSet classes define the source of the training data (or evaluation data). Note that the recommended best practice is to preprocess your data once, and save it to network storage such as HDFS. Refer to the [Deeplearning4j on Spark: How To Build Data Pipelines](deeplearning4j-scaleout-data-howto) section for more details.
-
-
-The training workflow usually proceeds as follows:
-1. Prepare training code with a few components:
-    a. Neural network configuration
-    b. Data pipeline
-    c. SparkDl4jMultiLayer/SparkComputationGraph plus Trainingmaster
-2. Create uber-JAR file (see [Spark how-to guide](deeplearning4j-scaleout-howto) for details)
-3. Determine the arguments (memory, number of nodes, etc) for Spark submit
-4. Submit the uber-JAR to Spark submit with the required arguments
-
-
-## Minimal Examples
-The following code snippets outlines the general setup required. The [API reference](deeplearning4j-scaleout-apiref) outlines detailed usage of the various classes. The user can submit a uber jar to Spark Submit for execution with the right options. See [Deeplearning4j on Spark: How To Guides](deeplearning4j-scaleout-howto) for further details.
-
-
-### Gradient Sharing (Preferred Implementation)
-
-```
-JavaSparkContext sc = ...;
-JavaRDD<DataSet> trainingData = ...;
-
-//Model setup as on a single node. Either a MultiLayerConfiguration or a ComputationGraphConfiguration
-MultiLayerConfiguration model = ...;
-
-// Configure distributed training required for gradient sharing implementation
-VoidConfiguration conf = VoidConfiguration.builder()
-				.unicastPort(40123)             //Port that workers will use to communicate. Use any free port
-				.networkMask(“10.0.0.0/16”)     //Network mask for communication. Examples 10.0.0.0/24, or 192.168.0.0/16 etc
-				.controllerAddress("10.0.2.4")  //IP of the master/driver
-				.build();
-
-//Create the TrainingMaster instance
-TrainingMaster trainingMaster = new SharedTrainingMaster.Builder(conf)
-				.batchSizePerWorker(batchSizePerWorker) //Batch size for training
-				.updatesThreshold(1e-3)                 //Update threshold for quantization/compression. See technical explanation page
-				.workersPerNode(numWorkersPerNode)      // equal to number of GPUs. For CPUs: use 1; use > 1 for large core count CPUs
-                .meshBuildMode(MeshBuildMode.MESH)      // or MeshBuildMode.PLAIN for < 32 nodes
-				.build();
-
-//Create the SparkDl4jMultiLayer instance
-SparkDl4jMultiLayer sparkNet = new SparkDl4jMultiLayer(sc, model, trainingMaster);
-
-//Execute training:
-for (int i = 0; i < numEpochs; i++) {
-    sparkNet.fit(trainingData);
-}
-```
-
-
-
-### Parameter Averaging Implementation
-
-```
-JavaSparkContext sc = ...;
-JavaRDD<DataSet> trainingData = ...;
-
-//Model setup as on a single node. Either a MultiLayerConfiguration or a ComputationGraphConfiguration
-MultiLayerConfiguration model = ...;
-
-//Create the TrainingMaster instance
-int examplesPerDataSetObject = 1;
-TrainingMaster trainingMaster = new ParameterAveragingTrainingMaster.Builder(examplesPerDataSetObject)
-				.(other configuration options)
-				.build();
-
-//Create the SparkDl4jMultiLayer instance and fit the network using the training data:
-SparkDl4jMultiLayer sparkNetwork = new SparkDl4jMultiLayer(sc, model, trainingMaster);
-
-//Execute training:
-for (int i = 0; i < numEpochs; i++) {
-    sparkNet.fit(trainingData);
-}
-```
-
-## Further Reading
-
-* [Deeplearning4j on Spark: Technical Explanation](deeplearning4j-scaleout-technicalref)
-* [Deeplearning4j on Spark: How To Guides](deeplearning4j-scaleout-howto)
-* [Deeplearning4j on Spark: How To Build Data Pipelines](deeplearning4j-scaleout-data-howto)
-* [Deeplearning4j on Spark: API Reference](deeplearning4j-scaleout-apiref)
-* The [Deeplearning4j examples repo](https://github.com/eclipse/deeplearning4j-examples) contains a number of Spark examples that can be used by the user as reference.
--- a/docs/deeplearning4j-scaleout/templates/parameter-server.md
+++ b/docs/deeplearning4j-scaleout/templates/parameter-server.md
@ -1,161 +0,0 @@
---
-title: Distributed Training with Parameter Server
-short_title: Parameter Server
-description: Deeplearning4j supports fast distributed training with Spark and a parameter server.
-category: Distributed Deep Learning
-weight: 12
---
-
-# Distributed training with gradients sharing
-
-DeepLearning4j supports distributed training in the Apache Spark environment and [Aeron](https://github.com/real-logic/Aeron) for high performance inter-node communication outside of Spark. The idea is relatively simple: individual workers calculate gradients on their DataSets. 
-
-Before gradients are applied to the network weights, they are accumulated in an intermediate storage mechanism (one for each machine). After aggregation, updated values above some configurable threshold are propagated across the network as a sparse binary array. Values below the threshold are stored and added to future updates, hence they are not lost, but merely delayed in their communication. 
-
-This thresholding approach reduces the network communication requirements by many orders of magnitude compared to a 
-naive approach of sending the entire dense update, or parameter vector, while maintaining high accuracy. 
-
-For more details on the thresholding approach, see [Strom, 2015 - Scalable Distributed DNN Training using Commodity GPU Cloud Computing](http://nikkostrom.com/publications/interspeech2015/strom_interspeech2015.pdf) and [Distributed Deep Learning, Part 1: An Introduction to Distributed Training of Neural Networks](http://engineering.skymind.io/distributed-deep-learning-part-1-an-introduction-to-distributed-training-of-neural-networks).
-
-Here are a few more perks were added to original algorithm proposed by Nikko Strom:
-
- Variable threshold: If the number of updates per iteration gets too low, the threshold is automatically decreased by a configurable step value. 
- Dense bitmap encoding: If the number of updates gets too high, another encoding scheme is used, which provides guarantees of "maximum number of bytes" being sent over the wire for any given update message.
- Periodically, we send "shake up" messages, encoded with a significantly smaller threshold, to share delayed weights that can't get above current threshold.
- 
-![Two phases within the cluster](/images/guide/distributed.png)
-
-Note that using Spark entails overhead. In order to determine whether Spark will help you or not, consider using the [Performance Listener](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/optimize/listeners/PerformanceListener.java) and look at the millisecond iteration time.
-If it's <= 150ms, Spark may not be worth it.
-
-## Setting up Your Cluster
-
-All you need to run training is a Spark 1.x/2.x cluster and at least one open UDP port (both inbound/outbound).
-
-### Cluster Setup
-
-As mentioned above, DeepLearning4j supports both Spark 1.x and Spark 2.x clusters. However, this particular implementation also requires Java 8+ to run. If your cluster is running Java 7, you'll either have to upgrade or use our [Parameters Averaging training mode](./deeplearning4j-spark-training).
-
-### Network Environment
-
-Gradient sharing relies heavily on the UDP protocol for communication between the Master and the slave nodes during training. If you're running your cluster in a cloud environment such as AWS or Azure, you need to allow one UDP port for Inbound/Outbound connections, and you have to specify that port in the `VoidConfiguration.unicastPort(int)` bean that is passed to `SharedTrainingMaster` constructor. 
-
-Another option to keep in mind: if you use YARN (or any other resource manager that handles Spark networking), you'll have to specify the network mask of the network that'll be used for UDP communications. That could be done with something like this: `VoidConfiguration.setNetworkMask("10.1.1.0/24")`.
-
-An option of last resort for IP address selection is the `DL4J_VOID_IP` environment variable. Set that variable on each node you're running, with a local IP address to be used for comms.
-
-### Netmask
-
-Network mask is CIDR notation, is just a way to tell software, which network interfaces should be used for communication. For example, if your cluster has 3 boxes with following IP addresses: `192.168.1.23, 192.168.1.78, 192.168.2.133` their common part of network address is 192.168.*, so netmask is `192.168.0.0/16`. You can also get detailed explanation what is netmask in wikipedia: [https://en.wikipedia.org/wiki/Subnetwork](https://en.wikipedia.org/wiki/Subnetwork)
-
-We're using netmasks for cases when Spark cluster is run on top of hadoop, or any other environment which doesn't assume Spark IP addresses announced. In such cases valid netmask should be provided in `VoidConfiguration` bean, and it will be used to pick interface for out-of-Spark communications. 
-
-### Dependencies
-
-Here's the template for the only required dependency:
-
-```
-<dependency>
-    <groupId>org.deeplearning4j</groupId>
-    <artifactId>dl4j-spark-parameterserver_${scala.binary.version}</artifactId>
-    <version>${dl4j.version}</version>
-</dependency>
-```
-
-For example:  
-
-```
-<dependency>
-    <groupId>org.deeplearning4j</groupId>
-    <artifactId>dl4j-spark-parameterserver_2.11</artifactId>
-    <version>${dl4j.version}</version>
-</dependency>
-```
-
-### Example Configuration:
-
-Below is a snippet from an example project taken from [our examples repo on Github](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-spark-examples/dl4j-spark/src/main/java/org/deeplearning4j/mlp/MnistMLPDistributedExample.java)  
-
-```
-SparkConf sparkConf = new SparkConf();
-sparkConf.setAppName("DL4J Spark Example");
-JavaSparkContext sc = new JavaSparkContext(sparkConf);
-
-MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
-            .seed(12345)
-            .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
-            ...
-            .build();
-
-/*
-    This is a ParameterServer configuration bean. The only option you'll really ever use is .unicastPort(int) 
-*/
-VoidConfiguration voidConfiguration = VoidConfiguration.builder()
-            .unicastPort(40123)
-            .build();
-
-/*
-    SharedTrainingMaster is the basement of distributed training. Tt holds all logic required for training 
-*/
-TrainingMaster tm = new SharedTrainingMaster.Builder(voidConfiguration,batchSizePerWorker)
-            .updatesThreshold(1e-3)
-            .rddTrainingApproach(RDDTrainingApproach.Export)
-            .batchSizePerWorker(batchSizePerWorker)
-            .workersPerNode(4)
-            .build();
-
-//Create the Spark network
-SparkDl4jMultiLayer sparkNet = new SparkDl4jMultiLayer(sc, conf, tm);
-
-//Execute training:
-for (int i = 0; i < numEpochs; i++) {
-    sparkNet.fit(trainData);
-    log.info("Completed Epoch {}", i);
-}
-```
-**_PLEASE NOTE_**: This configuration assumes that you have UDP port 40123 open on ALL nodes within your cluster.
-
-
-## Effective Scalability
-
-Network IO has its own price, and this algorithm does some IO as well. Additional overhead to training time can be calculated as `updates encoding time + message serialization time + updates application from other workers`.
-
-The longer the original iteration time, the less relative impact will come from sharing, and the better hypothetical scalability you will get.
-
-Here's a simple form that'll help you with scalability expectations:
-{% include formscalability.html %}
-
-## Performance Hints
-
-### Executors, Cores, Parallelism
-
-By design, Spark allows you to configure the number of executors and cores per executor for your task. Imagine you have a cluster of 18 nodes with 32 cores in each node.
-
-In this case, your `--num-executors` value will be 18 and the recommended `--executor-cores` value will be somewhere between 2 and 32. This option will basically define how many partitions your RDD will be split into.
-
-Plus, you can manually set the specific number of DL4J workers that'll be used on each node. This can be done via the  `SharedTrainingMaster.Builder().workersPerNode(int)` method. 
-  
-If your nodes are GPU-powered, it's usually a very good idea to set `workersPerNode(int)` to the number of GPUs per box or to keep its default value for auto-tuning.
-
-### Encoding Threshold
-
-A higher threshold value gives you more sparse updates which will boost network IO performance, but it might (and probably will) affect the learning performance of your neural network.
-
-A lower threshold value will give you more dense updates so each individual updates message will become larger. This will degrade network IO performance. Individual "best threshold value" is impossible to predict since it may vary for different architectures, but a default value of `1e-3` is a good value to start with.
-
-### Network Latency vs Bandwidth
-
-The rule of thumb is simple here: the faster your network, the better your performance. A 1GBe network should be considered the absolute minimum, but a 10GBe will perform better due to lower latency.
-
-Of course, performance depends on the network size and the amount of computation. Larger networks require greater bandwidth but also require more time per iteration (hence possibly leaving more time for asynchronous communication).
-
-### UDP Unicast vs UDP Broadcast
-
-To ensure maximum compatibility (for example, with cloud computing environments such as AWS and Azure, which do not support multicast), only UDP unicast is currently utilized in DL4J. 
-
-UDP Broadcast transfers should be faster, but for training performance, the difference should not be noticeable (except perhaps for very small workloads). 
-
-By design, each worker sends 1 updates message per iteration and this won’t change regardless of UDP transport type. Since message retransmission in UDP Unicast transport is handled by the Master node (which typically has low utilization) and since  message passing is asynchronous, we simply require that update communication time is less than network iteration time for performance - which is usually the case.
-
-### Multi-GPU Environments
-The best results are to be expected on boxes where PCIe/NVLink P2P connectivity between devices is available. However, everything will still work fine even without P2P. Just "a bit" slower. :)
--- a/docs/deeplearning4j-scaleout/templates/technicalref.md
+++ b/docs/deeplearning4j-scaleout/templates/technicalref.md
@ -1,140 +0,0 @@
---
-title: "Deeplearning4j on Spark: Technical Explanation"
-short_title: Technical Explanation
-description: "Deeplearning4j on Spark: Technical Explanation"
-category: Distributed Deep Learning
-weight: 1
---
-
-# DL4J Distributed Training: Technical Explanation
-
-This section will cover the technical details of Deeplearning4j's Apache Spark gradient sharing training implementation. Details on the parameter averaging implementation also follow.  Note that the parameter averaging implementation has been superseded by the gradient sharing implementation as of 1.0.0-beta. This guide assumes the reader is familiar with key concepts in distributed training like data parallelism and synchronous vs asynchronous SGD. This [blog post](https://blog.skymind.ai/distributed-deep-learning-part-1-an-introduction-to-distributed-training-of-neural-networks/) can provide an introduction.
-
-* [Asynchronous SGD Implementation](#asgd)
-* [Parameter Averaging Implementation](#parameteravg)
-* [Fault Tolerance](#faulttol)
-
-## <a name="asgd">Asynchronous SGD Implementation</a>
-DL4J's asynchronous SGD implementation is based on the [Strom 2015 neural network training paper](http://nikkostrom.com/publications/interspeech2015/strom_interspeech2015.pdf) by Nikko Strom, with some modifications.
-The next section will review the key features of the Strom paper followed by another section that describes the DL4J implementation and how it differs from the paper.
-
-### Strom's Approach
-When training a neural network on a cluster, the worker machines need to communicate changes to their parameters - either by communicating the new parameter values directly (such as in parameter averaging) or by communicating gradient/update information (as in gradient sharing).
-
-The key feature of this approach is that opposed to relaying all parameters/updates across the network only updates that are above a user specified threshold are communicated. Put another way: we start out with an update vector (1 entry per parameter) that needs to be communicated. Instead of communicating the vector as-is, we communicate only the large elements in a quantized way (which is a sparse binary vector) instead of all elements.
-The motivation here is to reduce the amount of network communication required - this "sparse, 1-bit binary encoding" approach can reduce the size required for communicating updates by a factor of 1000x or more - see the Strom paper for some compression statistics.
-
-Note that updates below the threshold are not discarded but accumulated in a “residual” vector to be applied later. Also of note is the absence of a centralized parameter server which is replaced by peer to peer communication as indicated in the image below.
-
-![Strom's ASGD implementation](/images/guide/Strom_ASGD.svg)
-
-The update vectors, δi,j in the image above, are:
-1. Sparse: only some of the gradients are communicated in each vector δi,j (the remainder are assumed to be 0) - sparse entries are encoded using an integer index
-2. Quantized to a single bit: each element of the sparse update vector takes value +τ or −τ. This value of τ is the same for all elements of the vector, hence only a single bit is required to differentiate between the two options
-3. Integer indexes (used to identify the entries in the sparse array) are optionally compressed using entropy coding to further reduce update sizes (the author quotes a further 3x reduction at the cost of additional computation, though the benefit may not be worth the additional cost)
-
-One of the main concerns of asynchronous SGD is the issue of stale gradients. Stale gradients need not be explicitly handled in Strom's approach - in most cases, the updates are applied very quickly on each node. The paper reports a reduction in network transfers by several orders of magnitude. Given a suitably computation intensive model (like an RNN or a CNN) this drastic reduction in network communication ensures that model equivalency is maintained across all nodes and stale gradients are not an issue.
-
-However the approach is not without its downsides as described below:
-1. Strom reports that convergence can suffer in the early stages of training (using fewer compute nodes for a fraction of an epoch seems to help)
-2. Compression and quantization is not free: these processes result in extra computation time per minibatch, and a small amount of memory overhead per executor
-3. The process introduces two additional hyperparameters to consider: the value for the threshold, τ and whether to use entropy coding for the updates or not (though notably both parameter averaging and async SGD also introduce additional hyperparameters)
-
-
-### DL4J's ASGD implementation
-
-The DL4J implementation differs from Strom's approach in the following ways:
-
-1. Not point-to-point: 
-The implementation allows the user to choose between two modes of network organization - plain mode and mesh mode. Plain mode is to be used when the number of nodes in the cluster are < 32 nodes and mesh mode is to be used for larger clusters. Refer to the section on [different modes](#modes) for more details.
-2. Two encoding schemes:
-	DL4J uses two encoding schemes, dynamically switching between the two depending on which will provide less network communication. Refer to the section on [encoding](#encoding) for more details.
-3. Quantization thresholds adjusted:
-	The quantization threshold is stepped up or down depending on the distribution of the updates after each iteration. This is done on each node independently to make sure that updates are indeed sparse. In practice, this is implemented via the [ThresholdAlgorithm](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/optimize/solvers/accumulation/encoding/ThresholdAlgorithm.java) interface and the [implementations](https://github.com/eclipse/deeplearning4j/tree/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/optimize/solvers/accumulation/encoding/threshold) there-of.
-4. Residual clipping
-	As noted earlier, the "left over" parts of the updates (i.e., those parts not communicated) are store in the residual vector. If the updates are much larger than the threshold, we can have a phenomenon we have termed "residual explosion" - that is, the residual values can continue to grow to many times the threshold (hence would take many steps to communicate the gradient). To avoid this, DL4J has a [ResidualPostProcessor](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/optimize/solvers/accumulation/encoding/ResidualPostProcessor.java) interface, with the default implementation being [ResidualClippingPostProcessor](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/optimize/solvers/accumulation/encoding/residual/ResidualClippingPostProcessor.java) which clips the residual vector to a maximum of 5x the current threshold, every 5 steps.
-5. Local parallelism via ParallelWrapper: 
-  This enables multi-CPU/GPU nodes to share information faster
-
-
-As is evident from the description, an implementation of ASGD requires updates to be transferred with every iteration of training. Further communication between workers within the cluster is a requirement in mesh mode.
-
-To enable fast out of spark communication DL4J uses [Aeron](https://github.com/real-logic/aeron/wiki). Aeron is a high performance messaging system that can run over UDP, Infiniband or Shared Memory. Aeron is designed to be the highest throughput with the lowest and most predictable latency possible of any messaging system. Building our own communications stack above Aeron allows us to have a custom implementation of the parameter server integrated with Spark and yet control and minimize allocations right of the wire.
-
-#### <a name="modes">Plain Mode vs Mesh Mode</a>
-
-DL4J's gradient sharing implementation can be configured in 2 ways, depending on the cluster size.
-
-Below is an image describing how plain mode is organized:
-![Plain Mode](/images/guide/plainmode.png)
-
-
-In plain mode, quantized encoded updates are relayed by each node to the master and the master then relays them to the remaining nodes. This ensures that the master always has an up to date version of the model, which is necessary for fault tolerance. The master node however is a potential bottleneck in this implementation. To scale to larger sized cluster (more than about 32 nodes - though this is network and hardware specific) use mesh mode as described below.
-
-Below is an image describing how mesh mode is organized:
-![Mesh Mode](/images/guide/meshmode.png)
-
-Mesh mode is a non-binary tree with Spark master at its root. By default each node can have a maximum of eight nodes and the tree can be a maximum of five levels deep. In mesh mode each node relays encoded updates to all nodes connected to it and each node aggregates updates received from all other nodes connected to it. In mesh mode, the master is no longer a bottleneck as the amount of communication it recieves directly is reduced. As the writing of this document, the implementation has been tested with unicast as well as multicast (available in 1.0.0-beta3). Future support is planned for RDMA.
-
-#### <a name="encoding">Encoding Schemes</a>
-Updates are send using one of two schemes as described below.
-  * Threshold encoding: Sends an array of integers each referring to the index of the parameter. A positive integer is send for a positive threshold and a negative integer is send for a negative threshold.
-  * Bitmap encoding: Each parameter update is encoded with two bits. The four states are used to indicate no change, a +ve threshold change, a -ve threshold change and a half threshold change that cycles between +ve and -ve.
-
-Using these two kinds of encoding schemes accommodates cases when the updates are dense. Since each node has its own threshold it's value is also communicated with each transfer. Encoding updates are pushed down to optimized native code (c++) for the sake of performance and GPU parallelization.
-The sparse threshold (integer index) encoding can result in very high compression rates, whereas the bitmap encoding results in a fixed size 16x compression ratio (i.e., 2 bits per parameter vs. 32 bits for the original update vector).
-
-
-## <a name="parameteravg">Parameter Averaging Implementation</a>
-The parameter averaging implementation was the first distributed training implementation in DL4J. It has since been superseded by the gradient sharing implementation described in the previous section. Details on the parameter averaging implementation are included here for the sake of completeness.
-
-The parameter averaging implementation is a synchronous SGD approach implemented entirely in Spark. DL4J's parameter averaging implementation uses a single parameter server, a role served by the Spark master node. 
-
-Parameter averaging is the conceptually simplest approach to data parallelism. It requires the user to specify the frequency at which the workers synchronize with each other and the master. With parameter averaging, training proceeds as follows:
-
-1. The master (Spark driver) starts with an initial network configuration and parameters
-2. Data is split into a number of subsets, based on the configuration of the TrainingMaster.
-3. Iterate over the data splits. For each split of the training data:
-  a. Distribute the configuration, parameters (and if applicable, network updater state for momentum/rmsprop/adagrad) from the master to each worker
-  b. Fit each worker on its portion of the split
-  c. Average the parameters (and if applicable, updater state) and return the averaged results to the master
-4. Training is complete, with the master having a copy of the trained network
-
-Steps 3a through 3c are demonstrated in the image below. In this diagram, W represents the parameters (weights, biases) in the neural network. Subscripts are used to index the version of the parameters over time, and where necessary for each worker machine.
-
-![Parameter Averaging](/images/guide/parameteraveraging.svg)
-
-The implementation uses Spark's treeAggregate under the hood. There are a number of enhancements that can be made to this implementation that will result in faster training times. Even with these enhancements in place the asynchronous SGD approach with quantized compressed updates is expected to continue to be much faster. Therefore the user is strongly recommended to switch from the parameter averaging implementation to the asynchronous SGD gradient sharing approach.
-
-
-## <a name="faulttol">Fault Tolerance</a>
-
-Spark implementations of distributed training in DL4J are fault tolerant as of 1.0.0-beta3.
-The parameter averaging implementation has always been fault tolerant; the gradient sharing implementation was made fully fault tolerant after (not including) 1.0.0-beta2.
-
-Before going into the details of the implementation let us first consider what happens when a node goes down. Since Spark is unaware of the updates send via Aeron the RDD lineage tracks back to the initial parameter and optimizer state. When Spark restores a node in place of one that went down it will therefore will resume training from its initial state. In other words, this restored node will be out of sync with the other nodes and this will cause training to diverge.
-
-DL4J's Gradient sharing utilizes its own internal heartbeat mechanism outside of Spark to detect when a node goes down, as well as to detect when a recovered node comes online. To ensure that training continues without diverging it is necessary that the restored node resumes training with a copy of the model identical to that on the other nodes at the current point. To ensure that updates are not applied multiple times each update is tagged with a unique ID. The state of the updater/optimizer (RMSProp, AdaGrad etc) as well as the iteration/epoch number are also required for network training to proceed from the state prior to the node failure. 
-
-The following outlines what happens when a node goes down in plain mode and is restored:
-1. The restored node reconnects to the master node
-2. The restored node starts receiving updates and then sends request for parameters, updater state and current epoch/iteration 
-3. Master fulfils these requests (by itself or by proxy)
-4. The restored node applies ONLY relevant updates (relative to the parameter vector)
-4. Training continues on the RDD data on the new node, properly in-sync with other nodes and properly converging
-
-Requesting a copy of the model after the node has started receiving updates makes sure that updates are not missed. Updates are tagged by unique IDs and no update will be incorrectly applied twice. Since the master does not do any training it does not hold the updater state, when it receives a request for the updater/optimizer state it sends out a request to one of the other nodes - upon receiving the request, it sends the updater to the restored node.
-
-The only additional step in mesh node when a node fails is to remap the descendants of the failed node. In this case a descendant of the failed node is mapped to master and all the remaining descendants are mapped to the one mapped to master.
-
-Concretely with the tree structure below if node 2 fails, node 5 is mapped to the master and node 6 and 7 are mapped to node 5.
-
-![Node Failure](/images/guide/nodefailure.png)
-
-
-The decision to remap to master instead of the neighboring nodes was made since the master is assumed to be the most reliable option. Requesting a copy of the model etc are also made to the master for this very same reason. It is to be noted that similar to a Spark job distributed neural network training with DL4J cannot withstand the master node failing. For this reason, the user is advised to persist the state of the model frequently. In this case if the master were to fail training can be restarted from the latest saved state. 
-
-Limitations of fault tolerance: There are two main limitations of fault tolerance for the gradient sharing implementation.
-First: A small amount of data (a few minibatches) may be processed multiple times. This is because a failed node may process part of a partition (sending out updates) before failing. This is not a problem in practice: the number of duplicated minibatches is usually very small, and we are typically training for multiple epochs anyway (thus each example is already being seen multiple times during training).
-Second: The master/driver node is a single point of failure. This is essentially a Spark limitation: DL4J could (in principle) implement functionality to recover from a failed master and continue training, but Apache Spark does not support fault tolerance for the master node.
-
--- a/docs/deeplearning4j-zoo/README.md
+++ b/docs/deeplearning4j-zoo/README.md
@ -1,16 +0,0 @@
-# deeplearning4j-zoo documentation
-
-Build and serve documentation for DataVec with MkDocs (install with `pip install mkdocs`)
-The source for Keras documentation is in this directory under `doc_sources/`.
-
-The structure of this project (template files, generating code, mkdocs YAML) is closely aligned
-with the [Keras documentation](keras.io) and heavily inspired by the [Keras docs repository](https://github.com/keras-team/keras/tree/master/docs).
-
-To generate docs into the `deeplearning4j-zoo/doc_sources` folder, first `cd docs` then run:
-
-```shell
-python generate_docs.py \
-    --project deeplearning4j-zoo \
-    --code ../deeplearning4j
-	--out_language en
-```
--- a/docs/deeplearning4j-zoo/pages.json
+++ b/docs/deeplearning4j-zoo/pages.json
@ -1,35 +0,0 @@
-{
-  "excludes": [
-    "abstract"
-  ],
-  "indices": [
-  ],
-  "pages": [
-    {
-      "page": "overview.md",
-      "class": []
-    },
-    {
-      "page": "models.md",
-      "class": [
-        "deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/AlexNet.java",
-        "deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/Darknet19.java",
-        "deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/FaceNetNN4Small2.java",
-        "deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/InceptionResNetV1.java",
-        "deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/LeNet.java",
-        "deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/NASNet.java",
-        "deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/ResNet50.java",
-        "deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/SimpleCNN.java",
-        "deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/SqueezeNet.java",
-        "deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/TextGenerationLSTM.java",
-        "deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/TinyYOLO.java",
-        "deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/UNet.java",
-        "deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/VGG16.java",
-        "deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/VGG19.java",
-        "deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/Xception.java",
-        "deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/YOLO2.java"
-      ]
-    }
-  ]
-}
-
--- a/docs/deeplearning4j-zoo/templates/models.md
+++ b/docs/deeplearning4j-zoo/templates/models.md
@ -1,11 +0,0 @@
---
-title: Deeplearning4j Zoo Models
-short_title: Zoo Models
-description: Prebuilt model architectures and weights for out-of-the-box application.
-category: Models
-weight: 10
---
-
-## Available models
-
-{{autogenerated}}
--- a/docs/deeplearning4j-zoo/templates/overview.md
+++ b/docs/deeplearning4j-zoo/templates/overview.md
@ -1,131 +0,0 @@
---
-title: Deeplearning4j Model Zoo
-short_title: Zoo Usage
-description: Prebuilt model architectures and weights for out-of-the-box application.
-category: Models
-weight: 10
---
-
-## About the Deeplearning4j model zoo
-
-Deeplearning4j has native model zoo that can be accessed and instantiated directly from DL4J. The model zoo also includes pretrained weights for different datasets that are downloaded automatically and checked for integrity using a checksum mechanism.
-
-If you want to use the new model zoo, you will need to add it as a dependency. A Maven POM would add the following:
-
-```
-<dependency>
-    <groupId>org.deeplearning4j</groupId>
-    <artifactId>deeplearning4j-zoo</artifactId>
-    <version>{{ page.version }}</version>
-</dependency>
-```
-
-## Getting started
-
-Once you've successfully added the zoo dependency to your project, you can start to import and use models. Each model extends the `ZooModel` abstract class and uses the `InstantiableModel` interface. These classes provide methods that help you initialize either an empty, fresh network or a pretrained network.
-
-### Initializing fresh configurations
-
-You can instantly instantiate a model from the zoo using the `.init()` method. For example, if you want to instantiate a fresh, untrained network of AlexNet you can use the following code:
-
-```
-import org.deeplearning4j.zoo.model.AlexNet
-import org.deeplearning4j.zoo.*;
-
-...
-
-int numberOfClassesInYourData = 1000;
-int randomSeed = 123;
-
-ZooModel zooModel = AlexNet.builder()
-                .numClasses(numberOfClassesInYourData)
-                .seed(randomSeed)
-                .build();
-Model net = zooModel.init();
-```
-
-If you want to tune parameters or change the optimization algorithm, you can obtain a reference to the underlying network configuration:
-
-```
-ZooModel zooModel = AlexNet.builder()
-                .numClasses(numberOfClassesInYourData)
-                .seed(randomSeed)
-                .build();
-MultiLayerConfiguration net = ((AlexNet) zooModel).conf();
-```
-
-### Initializing pretrained weights
-
-Some models have pretrained weights available, and a small number of models are pretrained across different datasets. `PretrainedType` is an enumerator that outlines different weight types, which includes `IMAGENET`, `MNIST`, `CIFAR10`, and `VGGFACE`.
-
-For example, you can initialize a VGG-16 model with ImageNet weights like so:
-
-```
-import org.deeplearning4j.zoo.model.VGG16;
-import org.deeplearning4j.zoo.*;
-
-...
-
-ZooModel zooModel = VGG16.builder().build();;
-Model net = zooModel.initPretrained(PretrainedType.IMAGENET);
-```
-
-And initialize another VGG16 model with weights trained on VGGFace:
-
-```
-ZooModel zooModel = VGG16.builder().build();
-Model net = zooModel.initPretrained(PretrainedType.VGGFACE);
-```
-
-If you're not sure whether a model contains pretrained weights, you can use the `.pretrainedAvailable()` method which returns a boolean. Simply pass a `PretrainedType` enum to this method, which returns true if weights are available.
-
-Note that for convolutional models, input shape information follows the NCHW convention. So if a model's input shape default is `new int[]{3, 224, 224}`, this means the model has 3 channels and height/width of 224.
-
-
-
-## What's in the zoo?
-
-The model zoo comes with well-known image recognition configurations in the deep learning community. The zoo also includes an LSTM for text generation, and a simple CNN for general image recognition.
-
-You can find a complete list of models using this [deeplearning4j-zoo Github link](https://github.com/eclipse/deeplearning4j/tree/master/deeplearning4j/deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model).
-
-This includes ImageNet models such as VGG-16, ResNet-50, AlexNet, Inception-ResNet-v1, LeNet, and more. 
-
-* [AlexNet](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/AlexNet.java)	
-* [Darknet19](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/Darknet19.java)	
-* [FaceNetNN4Small2](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/FaceNetNN4Small2.java)	
-* [InceptionResNetV1](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/InceptionResNetV1.java)	
-* [LeNet](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/LeNet.java)
-* [ResNet50](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/ResNet50.java)
-* [SimpleCNN](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/SimpleCNN.java)
-* [TextGenerationLSTM](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/TextGenerationLSTM.java)
-* [TinyYOLO](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/TinyYOLO.java)
-* [VGG16](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/VGG16.java)	
-* [VGG19](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/VGG19.java)
-
-## Advanced usage
-
-The zoo comes with a couple additional features if you're looking to use the models for different use cases.
-
-### Changing Inputs
-
-Aside from passing certain configuration information to the constructor of a zoo model, you can also change its input shape using `.setInputShape()`. NOTE: this applies to fresh configurations only, and will not affect pretrained models:
-
-```
-int numberOfClassesInYourData = 10;
-int randomSeed = 123;
-
-ZooModel zooModel = ResNet50.builder()
-        .numClasses(numberOfClassesInYourData)
-        .seed(randomSeed)
-        .build();
-zooModel.setInputShape(new int[][]{{3, 28, 28}});
-```
-
-### Transfer Learning
-
-Pretrained models are perfect for transfer learning! You can read more about transfer learning using DL4J [here](./deeplearning4j-nn-transfer-learning).
-
-### Workspaces
-
-Initialization methods often have an additional parameter named `workspaceMode`. For the majority of users you will not need to use this; however, if you have a large machine that has "beefy" specifications, you can pass `WorkspaceMode.SINGLE` for models such as VGG-19 that have many millions of parameters. To learn more about workspaces, please see [this section](./deeplearning4j-config-workspaces).
--- a/docs/deeplearning4j/README.md
+++ b/docs/deeplearning4j/README.md
@ -1,10 +0,0 @@
-# deeplearning4j documentation
-
-To generate docs into the `deeplearning4j/doc_sources` folder, first `cd docs` then run:
-
-```shell
-python generate_docs.py \
-    --project deeplearning4j \
-    --code ../deeplearning4j
-	--out_language en
-```
--- a/docs/deeplearning4j/pages.json
+++ b/docs/deeplearning4j/pages.json
@ -1,78 +0,0 @@
-{
-  "excludes": [
-    "abstract"
-  ],
-  "indices": [
-  ],
-  "pages": [
-    {
-      "page": "quickstart.md",
-      "class": []
-    },
-    {
-      "page": "examples-tour.md",
-      "class": []
-    },
-    {
-      "page": "cheat-sheet.md",
-      "class": []
-    },
-    {
-      "page": "android.md",
-      "class": []
-    },
-    {
-      "page": "android-prerequisites.md",
-      "class": []
-    },
-    {
-      "page": "android-linear-classifier.md",
-      "class": []
-    },
-    {
-      "page": "android-image-classification.md",
-      "class": []
-    },
-    {
-      "page": "beginners.md",
-      "class": []
-    },
-    {
-      "page": "benchmark.md",
-      "class": []
-    },
-    {
-      "page": "build-from-source.md",
-      "class": []
-    },
-    {
-      "page": "concepts.md",
-      "class": []
-    },
-    {
-      "page": "contribute.md",
-      "class": []
-    },
-    {
-      "page": "config-buildtools.md",
-      "class": []
-    },
-    {
-      "page": "config-maven.md",
-      "class": []
-    },
-    {
-      "page": "config-memory.md",
-      "class": []
-    },
-    {
-      "page": "config-workspaces.md",
-      "class": []
-    },
-    {
-      "page": "troubleshooting-training.md",
-      "class": []
-    }
-  ]
-}
-
--- a/docs/deeplearning4j/templates/android-image-classification.md
+++ b/docs/deeplearning4j/templates/android-image-classification.md
@ -1,395 +0,0 @@
---
-title: Using DL4J for Android Image Classification
-short_title: Android Image Classifier
-description: How to create an Android Image Classification app with Eclipse Deeplearning4j.
-category: Mobile
-weight: 3
---
-
-## Using Deeplearning4J in Android Applications
-
-Contents
-
-* [Setting the Dependencies](#head_link1)
-* [Training and loading the Mnist model in the Android project resources](#head_link2)
-* [Accessing the trained model using an AsyncTask](#head_link7)
-* [Handling images from user input](#head_link3)
-* [Updating the UI](#head_link5)
-* [Conclusion](#head_link6)
-
-
-## DL4JImageRecognitionDemo
-
-This example application uses a neural network trained on the standard MNIST dataset of 28x28 greyscale 0..255 pixel value images of hand drawn numbers 0..9. The application user interace allows the user to draw a number on the device screen which is then tested against the trained network. The output displays the most probable numeric values and the probability score. This tutorial will cover the use of a trained neural network in an Android Application, the handling of user generated images, and the output of the results to the UI from a background thread. More information on general prerequisites for building DL4J Android Applications can be found [here](./deeplearning4-android-prerequisites). 
-
-![](/images/guide/screen2.png)
-
-
-## <a name="head_link1">Setting the Dependencies</a>
-
-Deeplearning4J applications requires application specific dependencies in the build.gradle file. The Deeplearning library in turn depends on the libraries of ND4J and OpenBLAS, thus these must also be added to the dependencies declaration. Starting with Android Studio 3.0, annotationProcessors need to be defined as well, thus dependencies for either -x86 or -arm processors should be included, depending on your device, if you are working in Android Studio 3.0 or later. Note that both can be include without conflict as is done in the example app.
-```groovy
-implementation (group: 'org.deeplearning4j', name: 'deeplearning4j-core', version: '{{page.version}}') {
-    exclude group: 'org.bytedeco', module: 'opencv-platform'
-    exclude group: 'org.bytedeco', module: 'leptonica-platform'
-    exclude group: 'org.bytedeco', module: 'hdf5-platform'
-    exclude group: 'org.nd4j', module: 'nd4j-base64'
-}
-implementation group: 'org.nd4j', name: 'nd4j-native', version: '{{page.version}}'
-implementation group: 'org.nd4j', name: 'nd4j-native', version: '{{page.version}}', classifier: "android-arm"
-implementation group: 'org.nd4j', name: 'nd4j-native', version: '{{page.version}}', classifier: "android-arm64"
-implementation group: 'org.nd4j', name: 'nd4j-native', version: '{{page.version}}', classifier: "android-x86"
-implementation group: 'org.nd4j', name: 'nd4j-native', version: '{{page.version}}', classifier: "android-x86_64"
-implementation group: 'org.bytedeco', name: 'openblas', version: '0.3.9-1.5.3'
-implementation group: 'org.bytedeco', name: 'openblas', version: '0.3.9-1.5.3', classifier: "android-arm"
-implementation group: 'org.bytedeco', name: 'openblas', version: '0.3.9-1.5.3', classifier: "android-arm64"
-implementation group: 'org.bytedeco', name: 'openblas', version: '0.3.9-1.5.3', classifier: "android-x86"
-implementation group: 'org.bytedeco', name: 'openblas', version: '0.3.9-1.5.3', classifier: "android-x86_64"
-implementation group: 'org.bytedeco', name: 'opencv', version: '4.3.0-1.5.3'
-implementation group: 'org.bytedeco', name: 'opencv', version: '4.3.0-1.5.3', classifier: "android-arm"
-implementation group: 'org.bytedeco', name: 'opencv', version: '4.3.0-1.5.3', classifier: "android-arm64"
-implementation group: 'org.bytedeco', name: 'opencv', version: '4.3.0-1.5.3', classifier: "android-x86"
-implementation group: 'org.bytedeco', name: 'opencv', version: '4.3.0-1.5.3', classifier: "android-x86_64"
-implementation group: 'org.bytedeco', name: 'leptonica', version: '1.79.0-1.5.3'
-implementation group: 'org.bytedeco', name: 'leptonica', version: '1.79.0-1.5.3', classifier: "android-arm"
-implementation group: 'org.bytedeco', name: 'leptonica', version: '1.79.0-1.5.3', classifier: "android-arm64"
-implementation group: 'org.bytedeco', name: 'leptonica', version: '1.79.0-1.5.3', classifier: "android-x86"
-implementation group: 'org.bytedeco', name: 'leptonica', version: '1.79.0-1.5.3', classifier: "android-x86_64"
-
-implementation 'com.google.code.gson:gson:2.8.2'
-annotationProcessor 'org.projectlombok:lombok:1.16.16'
-
-//This corrects for a junit version conflict.
-configurations.all {
-    resolutionStrategy.force 'junit:junit:4.12'
-}
-```
-
-Compiling these dependencies involves a large number of files, thus it is necessary to set multiDexEnabled to true in defaultConfig.
-```java
-multiDexEnabled true
-```
-Finally, a conflict in the junit module versions will give the following error: > Conflict with dependency 'junit:junit' in project ':app'. Resolved versions for app (4.8.2) and test app (4.12) differ.
-This can be suppressed by forcing  all of the junit modules to use the same version.
-```java
-configurations.all {
-    resolutionStrategy.force 'junit:junit:4.12'
-}
-```
-## <a name="head_link2">Training and loading the Mnist model in the Android project resources</a>
-
-Using a neural  network requires a significant amount of processor power, which is in limited supply on mobile devices. Therefore, a background thread must be used for loading of the trained neural network and the testing of the user drawn image by using AsyncTask. In this application we will run the canvas.draw code on the main thread and use an AsyncTask to load the drawn image from internal memory and test it against the trained model on a background thread. First, lets look at how to save the trained neural network we will be using in the application.
-
-You will need to begin by following the DeepLearning4j quick start [guide](./deeplearning4j-quickstart) to set up, train, and save neural network models on a desktop computer. The DL4J example which trains and saves the Mnist model used in this application is *MnistImagePipelineExampleSave.java* and is included in the quick start guide referenced above. The code for the Mnist demo is also available [here](https://gist.github.com/tomthetrainer/7cb2fbc14a5c631a567a98c3134f7dd6). Running this demo will train the Mnist neural network model and save it as *"trained_mnist_model.zip"* in the *dl4j\target folder* of the *dl4j-examples* directory. You can then copy the file and save it in the raw folder of your Android project.
-
-![](/images/guide/rawFolder.PNG)
-
-## <a name="head_link7">Accessing the trained model using an AsyncTask</a>
-
-Now let’s start by writing our AsyncTask<*Params*, *Progress*, *Results*> to load and use the neural network on a background thread. The AsyncTask will use the parameter types <String, Integer, INDArray>. The *Params* type is set to String, which will pass the Path for the saved image to the asyncTask as it is executed. This path will be used in the doInBackground() method to locate and load the trained Mnist model. The *Results* parameter is of type INDArray which will store the results from the neural network and pass it to the onPostExecute method that has access to the main thread for updating the UI. For more on NDArrays, see https://nd4j.org/userguide. Note that the AsyncTask requires that we override two more methods (the onProgressUpdate and onPostExecute methods) which we will get to later in the demo.
-```java
-private class AsyncTaskRunner extends AsyncTask<String, Integer, INDArray> {
-
-        // Runs in UI before background thread is called. 
-        @Override
-        protected void onPreExecute() {
-            super.onPreExecute();
-        }
-
-        @Override
-        protected INDArray doInBackground(String... params) {
-            // Main background thread, this will load the model and test the input image
-	    // The dimensions of the images are set here
-            int height = 28;
-            int width = 28;
-            int channels = 1;
-
-            //Now we load the model from the raw folder with a try / catch block
-            try {
-                // Load the pretrained network.
-                InputStream inputStream = getResources().openRawResource(R.raw.trained_mnist_model);
-                MultiLayerNetwork model = ModelSerializer.restoreMultiLayerNetwork(inputStream);
-
-                //load the image file to test
-                File f=new File(absolutePath, "drawn_image.jpg");
-
-                //Use the nativeImageLoader to convert to numerical matrix
-                NativeImageLoader loader = new NativeImageLoader(height, width, channels);
-
-                //put image into INDArray
-                INDArray image = loader.asMatrix(f);
-
-                //values need to be scaled
-                DataNormalization scalar = new ImagePreProcessingScaler(0, 1);
-
-                //then call that scalar on the image dataset
-                scalar.transform(image);
-
-                //pass through neural net and store it in output array
-                output = model.output(image);
-
-            } catch (IOException e) {
-                e.printStackTrace();
-            }
-            return output;
-        }
-```
-
-## <a name="head_link3">Handling images from user input</a>
- 
-Now lets add the code for the drawing canvas that will run on the main thread and allow the user to draw a number on the screen. This is a generic draw program written as an inner class within the MainActivity. It extends View and overrides a series of methods. The drawing is saved to internal memory and the AsyncTask is executed with the image Path passed to it in the onTouchEvent case statement for case *MotionEvent.ACTION_UP*. This has the streamline action of automatically returning results for an image after the user completes the drawing. 
-```java
-//code for the drawing input
-    public class DrawingView extends View {
-
-        private Path    mPath;
-        private Paint   mBitmapPaint;
-        private Paint   mPaint;
-        private Bitmap  mBitmap;
-        private Canvas  mCanvas;
-
-        public DrawingView(Context c) {
-            super(c);
-
-            mPath = new Path();
-            mBitmapPaint = new Paint(Paint.DITHER_FLAG);
-            mPaint = new Paint();
-            mPaint.setAntiAlias(true);
-            mPaint.setStrokeJoin(Paint.Join.ROUND);
-            mPaint.setStrokeCap(Paint.Cap.ROUND);
-            mPaint.setStrokeWidth(60);
-            mPaint.setDither(true);
-            mPaint.setColor(Color.WHITE);
-            mPaint.setStyle(Paint.Style.STROKE);
-        }
-
-        @Override
-        protected void onSizeChanged(int W, int H, int oldW, int oldH) {
-            super.onSizeChanged(W, H, oldW, oldH);
-            mBitmap = Bitmap.createBitmap(W, H, Bitmap.Config.ARGB_4444);
-            mCanvas = new Canvas(mBitmap);
-        }
-
-        @Override
-        protected void onDraw(Canvas canvas) {
-            canvas.drawBitmap(mBitmap, 0, 0, mBitmapPaint);
-            canvas.drawPath(mPath, mPaint);
-        }
-
-        private float mX, mY;
-        private static final float TOUCH_TOLERANCE = 4;
-
-        private void touch_start(float x, float y) {
-            mPath.reset();
-            mPath.moveTo(x, y);
-            mX = x;
-            mY = y;
-        }
-        private void touch_move(float x, float y) {
-            float dx = Math.abs(x - mX);
-            float dy = Math.abs(y - mY);
-            if (dx >= TOUCH_TOLERANCE || dy >= TOUCH_TOLERANCE) {
-                mPath.quadTo(mX, mY, (x + mX)/2, (y + mY)/2);
-                mX = x;
-                mY = y;
-            }
-        }
-        private void touch_up() {
-            mPath.lineTo(mX, mY);
-            mCanvas.drawPath(mPath, mPaint);
-            mPath.reset();
-        }
-
-        @Override
-        public boolean onTouchEvent(MotionEvent event) {
-            float x = event.getX();
-            float y = event.getY();
-
-            switch (event.getAction()) {
-                case MotionEvent.ACTION_DOWN:
-                    invalidate();
-                    clear();
-                    touch_start(x, y);
-                    invalidate();
-                    break;
-                case MotionEvent.ACTION_MOVE:
-                    touch_move(x, y);
-                    invalidate();
-                    break;
-                case MotionEvent.ACTION_UP:
-                    touch_up();
-                    absolutePath = saveDrawing();
-                    invalidate();
-                    clear();
-                    loadImageFromStorage(absolutePath);
-                    onProgressBar();
-                    //launch the asyncTask now that the image has been saved
-                    AsyncTaskRunner runner = new AsyncTaskRunner();
-                    runner.execute(absolutePath);
-                    break;
-
-            }
-            return true;
-        }
-
-        public void clear(){
-            mBitmap.eraseColor(Color.TRANSPARENT);
-            invalidate();
-            System.gc();
-        }
-
-    }
-
-```
-Now we need to build a series of helper methods. First we will write the saveDrawing() method. It uses getDrawingCache() to retrieve the drawing from the drawingView and store it as a bitmap. We then create a file directory and file for the bitmap called "drawn_image.jpg". Finally, FileOutputStream is used in a try / catch block to write the bitmap to the file location. The method returns the absolute Path to the file location which will be used by the loadImageFromStorage() method. 
-```java
-public String saveDrawing(){
-        drawingView.setDrawingCacheEnabled(true);
-        Bitmap b = drawingView.getDrawingCache();
-
-        ContextWrapper cw = new ContextWrapper(getApplicationContext());
-        // set the path to storage
-        File directory = cw.getDir("imageDir", Context.MODE_PRIVATE);
-        // Create imageDir and store the file there. Each new drawing will overwrite the previous
-        File mypath=new File(directory,"drawn_image.jpg");
-
-        //use a fileOutputStream to write the file to the location in a try / catch block
-        FileOutputStream fos = null;
-        try {
-            fos = new FileOutputStream(mypath);
-            b.compress(Bitmap.CompressFormat.JPEG, 100, fos);
-        } catch (Exception e) {
-            e.printStackTrace();
-        } finally {
-            try {
-                fos.close();
-            } catch (IOException e) {
-                e.printStackTrace();
-            }
-        }
-        return directory.getAbsolutePath();
-    }
-```
-
-Next we will write the loadImageFromStorage method which will use the absolute path returned from saveDrawing() to load the saved image and display it in the UI as part of the output display. It uses a try / catch block and a FileInputStream to set the image to the ImageView *img* in the UI layout.
-
-```java
-    private void loadImageFromStorage(String path)
-    {
-
-        //use a fileInputStream to read the file in a try / catch block
-        try {
-            File f=new File(path, "drawn_image.jpg");
-            Bitmap b = BitmapFactory.decodeStream(new FileInputStream(f));
-            ImageView img=(ImageView)findViewById(R.id.outputView);
-            img.setImageBitmap(b);
-        }
-        catch (FileNotFoundException e)
-        {
-            e.printStackTrace();
-        }
-
-    }
-```
-
-We also need to write two methods that extract the predicted number from the neural network output and the confidence score, which we will call later when we complete the AsyncTask. 
-
-```java
-//helper class to return the largest value in the output array
-    public static double arrayMaximum(double[] arr) {
-        double max = Double.NEGATIVE_INFINITY;
-        for(double cur: arr)
-            max = Math.max(max, cur);
-        return max;
-    }
-
-    // helper class to find the index (and therefore numerical value) of the largest confidence score
-    public int getIndexOfLargestValue( double[] array )
-    {
-        if ( array == null || array.length == 0 ) return -1;
-        int largest = 0;
-        for ( int i = 1; i < array.length; i++ )
-        {if ( array[i] > array[largest] ) largest = i;            }
-        return largest;
-    }
-```
-
-Finally, we need a few methods we can call to control the visibility of an 'In Progress...' message while the background thread is running. These will be called when the AsyncTask is executed and in the onPostExecute method when the background thread completes.
-
-```java
-    public void onProgressBar(){
-        TextView bar = findViewById(R.id.processing);
-        bar.setVisibility(View.VISIBLE);
-    }
-
-    public void offProgressBar(){
-        TextView bar = findViewById(R.id.processing);
-        bar.setVisibility(View.INVISIBLE);
-    }
-```
-
-Now let's go to the onCreate method to initialize the draw canvas and set some global variables.
-
-```java
-public class MainActivity extends AppCompatActivity {
-
-    MainActivity.DrawingView drawingView;
-    String absolutePath;
-    public static INDArray output;
-
-    @Override
-    public void onCreate(Bundle savedInstanceState) {
-        super.onCreate(savedInstanceState);
-        setContentView(R.layout.activity_main);
-
-        RelativeLayout parent = findViewById(R.id.layout2);
-        drawingView = new MainActivity.DrawingView(this);
-        parent.addView(drawingView);
-    }
-```
-
-## <a name="head_link5">Updating the UI</a>
-
-Now we can complete our AsyncTask by overriding the onProgress and onPostExecute methods. Once the doInBackground method of AsyncTask completes, the classification results will be passed to the onPostExecute which has access to the main thread and UI allowing us to update the UI with the results. Since we will not be using the onProgress  method, a call to its superclass will suffice.
-
-```java
-@Override
-        protected void onProgressUpdate(Integer... values) {
-            super.onProgressUpdate(values);
-        }
-```
-
-The onPostExecute method will receive an INDArray which contains the neural network results as a 1x10 array of probability values that the input drawing is each possible digit (0..9). From this we need to determine which row of the array contains the largest value and what the size of that value is. These two values will determine which number the neural network has classified the drawing as and how confident the network score is. These values will be referred to in the UI as *Prediction* and the *Confidence*, respectively. In the code below, the individual values for each position of the INDArray are passed to an array of type double using the getDouble() method on the result INDArray. We then get references to the TextViews which will be updated in the UI and call our helper methods on the array to return the array maximum (confidence) and index of the largest value (prediction). Note we also need to limit the number of decimal places reported on the probabilities by setting a DecimalFormat pattern.
-
-```java
-
-        @Override
-        protected void onPostExecute(INDArray result) {
-            super.onPostExecute(result);
-
-            //used to control the number of decimals places for the output probability
-            DecimalFormat df2 = new DecimalFormat(".##");
-
-            //transfer the neural network output to an array
-            double[] results = {result.getDouble(0,0),result.getDouble(0,1),result.getDouble(0,2),
-                    result.getDouble(0,3),result.getDouble(0,4),result.getDouble(0,5),result.getDouble(0,6),
-                    result.getDouble(0,7),result.getDouble(0,8),result.getDouble(0,9),};
-
-            //find the UI tvs to display the prediction and confidence values
-            TextView out1 = findViewById(R.id.prediction);
-            TextView out2 = findViewById(R.id.confidence);
-
-            //display the values using helper functions defined below
-            out2.setText(String.valueOf(df2.format(arrayMaximum(results))));
-            out1.setText(String.valueOf(getIndexOfLargestValue(results)));
-
-            //helper function to turn off progress test
-            offProgressBar();
-        }
-```
-
-## <a name="head_link6">Conclusion</a>
-
-This tutorial provides a basic framework for image recognition in an Android Application using a DL4J neural network. It illustrates how to load a pre-trained DL4J model from the raw resources file and how to test user generate input images against the model. The AsyncTask then returns the output to the main thread and updates the UI.
-
-The complete code for this example is available [here.](https://github.com/eclipse/deeplearning4j-examples/tree/master/android/DL4JImageRecognitionDemo)
--- a/docs/deeplearning4j/templates/android-linear-classifier.md
+++ b/docs/deeplearning4j/templates/android-linear-classifier.md
@ -1,280 +0,0 @@
---
-title: Android Classifier with DL4J
-short_title: Android Classifier
-description: How to create an IRIS classifier on Android using Eclipse Deeplearning4j.
-category: Mobile
-weight: 2
---
-
-# IRIS Classifier Demo
-
-The example application trains a small neural network on the device using Anderson’s Iris data set for iris flower type classification. For a more indepth look at optimizing android for DL4J, please see the Prerequisites and Configuration documentation [here](./deeplearning4j-android-prerequisites). This application has a simple UI to take measurements of petal length, petal width, sepal length, and sepal width from the user and returns the probability that the measurements belong to one of three types of Iris (*Iris serosa*, *Iris versicolor*, and *Iris virginica*). A data set includes 150 measurement values (50 for each iris type) and training the model takes anywhere from 5-20 seconds, depending on the device.
-
-Contents
-
-* [Setting the Dependencies](#head_link1)
-* [Setting up the neural network on a background thread](#head_link2)
-* [Preparing the training data set and user input](#head_link3)
-* [Building and Training the Neural Network](#head_link4)
-* [Updating the UI](#head_link5)
-* [Conclusion](#head_link6)
-
-
-## DL4JIrisClassifierDemo
-
-## <a name="head_link1">Setting the Dependencies</a>
-Deeplearning4J applications require several dependencies in the build.gradle file. The Deeplearning library in turn depends on the libraries of ND4J and OpenBLAS, thus these must also be added to the dependencies declaration. Starting with Android Studio 3.0, annotationProcessors need to be defined as well, requiring dependencies for -x86 or -arm processors. 
-```groovy
-implementation (group: 'org.deeplearning4j', name: 'deeplearning4j-core', version: '{{page.version}}') {
-    exclude group: 'org.bytedeco', module: 'opencv-platform'
-    exclude group: 'org.bytedeco', module: 'leptonica-platform'
-    exclude group: 'org.bytedeco', module: 'hdf5-platform'
-}
-implementation group: 'org.nd4j', name: 'nd4j-native', version: '{{page.version}}'
-implementation group: 'org.nd4j', name: 'nd4j-native', version: '{{page.version}}', classifier: "android-arm"
-implementation group: 'org.nd4j', name: 'nd4j-native', version: '{{page.version}}', classifier: "android-arm64"
-implementation group: 'org.nd4j', name: 'nd4j-native', version: '{{page.version}}', classifier: "android-x86"
-implementation group: 'org.nd4j', name: 'nd4j-native', version: '{{page.version}}', classifier: "android-x86_64"
-implementation group: 'org.bytedeco', name: 'openblas', version: '0.3.9-1.5.3'
-implementation group: 'org.bytedeco', name: 'openblas', version: '0.3.9-1.5.3', classifier: "android-arm"
-implementation group: 'org.bytedeco', name: 'openblas', version: '0.3.9-1.5.3', classifier: "android-arm64"
-implementation group: 'org.bytedeco', name: 'openblas', version: '0.3.9-1.5.3', classifier: "android-x86"
-implementation group: 'org.bytedeco', name: 'openblas', version: '0.3.9-1.5.3', classifier: "android-x86_64"
-implementation group: 'org.bytedeco', name: 'opencv', version: '4.3.0-1.5.3'
-implementation group: 'org.bytedeco', name: 'opencv', version: '4.3.0-1.5.3', classifier: "android-arm"
-implementation group: 'org.bytedeco', name: 'opencv', version: '4.3.0-1.5.3', classifier: "android-arm64"
-implementation group: 'org.bytedeco', name: 'opencv', version: '4.3.0-1.5.3', classifier: "android-x86"
-implementation group: 'org.bytedeco', name: 'opencv', version: '4.3.0-1.5.3', classifier: "android-x86_64"
-implementation group: 'org.bytedeco', name: 'leptonica', version: '1.79.0-1.5.3'
-implementation group: 'org.bytedeco', name: 'leptonica', version: '1.79.0-1.5.3', classifier: "android-arm"
-implementation group: 'org.bytedeco', name: 'leptonica', version: '1.79.0-1.5.3', classifier: "android-arm64"
-implementation group: 'org.bytedeco', name: 'leptonica', version: '1.79.0-1.5.3', classifier: "android-x86"
-implementation group: 'org.bytedeco', name: 'leptonica', version: '1.79.0-1.5.3', classifier: "android-x86_64"
-```
-
-Compiling these dependencies involves a large number of files, thus it is necessary to set multiDexEnabled to true in defaultConfig.
-
-```java
-multiDexEnabled true
-```
-
-Finally, a conflict in the junit module versions will likely throw the following error: > Conflict with dependency 'junit:junit' in project ':app'. Resolved versions for app (4.8.2) and test app (4.12) differ.
-This can be suppressed by forcing all of the junit modules to use the same version.
-
-```java
-configurations.all {
-    resolutionStrategy.force 'junit:junit:4.12'
-}
-```
-
-
-## <a name="head_link2">Setting up the neural network on a background thread</a>
-
-Training even a simple neural network like in this example requires a significant amount of processor power, which is in limited supply on mobile devices. Thus, it is imperative that a background thread be used for the building and training of the neural network which then returns the output to the main thread for updating the UI. In this example we will be using an AsyncTask which accepts the input measurements from the UI and passes them as type double to the doInBackground() method. First, lets get references to the editTexts in the UI layout that accept the iris measurements inside of our onCreate method. Then an onClickListener will execute our asyncTask, pass it the measurements entered by the user, and show a progress bar until we hide it again in onPostExecute().
-
-```java
-public class MainActivity extends AppCompatActivity {
- 
- 
-@Override
-    public void onCreate(Bundle savedInstanceState) {
-        super.onCreate(savedInstanceState);
-        setContentView(R.layout.activity_main);
- 
-        //get references to the editTexts that take the measurements
-        final EditText PL = (EditText) findViewById(R.id.editText);
-        final EditText PW = (EditText) findViewById(R.id.editText2);
-        final EditText SL = (EditText) findViewById(R.id.editText3);
-        final EditText SW = (EditText) findViewById(R.id.editText4);
- 
-	  //onclick to capture the input and launch the asyncTask
-        Button button = (Button) findViewById(R.id.button);
- 
-        button.setOnClickListener(new View.OnClickListener() {
-            @Override
-            public void onClick(View v) {
- 
-                final double pl = Double.parseDouble(PL.getText().toString());
-                final double pw = Double.parseDouble(PW.getText().toString());
-                final double sl = Double.parseDouble(SL.getText().toString());
-                final double sw = Double.parseDouble(SW.getText().toString());
- 
-                AsyncTaskRunner runner = new AsyncTaskRunner();
- 
-		   //pass the measurement as params to the AsyncTask
-                runner.execute(pl,pw,sl,sw);
- 
-                ProgressBar bar = (ProgressBar) findViewById(R.id.progressBar);
-                bar.setVisibility(View.VISIBLE);
-            }
-        });
-        }
-```
-
-Now let’s write our AsyncTask<*Params*, *Progress*, *Results*>. The AsyncTask needs to have a *Params* of type Double to receive the decimal value measurements from the UI. The *Result* type is set to INDArray, which is returned from the doInBackground() Method and passed to the onPostExecute() method for updating the UI. NDArrays are provided by the ND4J library and are essentially n-dimensional arrays with a given number of dimensions. For more on NDArrays, see https://nd4j.org/userguide. 
-
-```java
-private class AsyncTaskRunner extends AsyncTask<Double, Integer, INDArray> {
- 
-    // Runs in UI before background thread is called
-    @Override
-    protected void onPreExecute() {
-        super.onPreExecute();
- 
-        ProgressBar bar = (ProgressBar) findViewById(R.id.progressBar);
-        bar.setVisibility(View.INVISIBLE);
-    }
-```
-
-
-## <a name="head_link3">Preparing the training data set and user input</a>
-
-The doInBackground() method will handle the formatting of the training data, the construction of the neural net, the training of the net, and the analysis of the input data by the trained model. The user input has only 4 values, thus we can add those directly to a 1x4 INDArray using the putScalar() method. The training data is much larger and must be converted from CSV lists to matrices through an iterative *for* loop. 
- 
-The training data is stored in the app as two arrays, one for the Iris measurements named *irisData* which contains a list of 150 iris measurements and another for the labels of iris type named *labelData*. These will be transformed to 150x4 and 150x3 matrices, respectively, so that they can be converted into INDArray objects that the neural network will use for training. 
-
-```java
-    // This is our main background thread for the neural net
-    @Override
-    protected String doInBackground(Double... params) {
-    //Get the doubles from params, which is an array so they will be 0,1,2,3
-        double pld = params[0];
-        double pwd = params[1];
-        double sld = params[2];
-        double swd = params[3];
-     
-        //Create input INDArray for the user measurements
-        INDArray actualInput = Nd4j.zeros(1,4);
-        actualInput.putScalar(new int[]{0,0}, pld);
-        actualInput.putScalar(new int[]{0,1}, pwd);
-        actualInput.putScalar(new int[]{0,2}, sld);
-        actualInput.putScalar(new int[]{0,3}, swd);
-     
-        //Convert the iris data into 150x4 matrix
-        int row=150;
-        int col=4;
-        double[][] irisMatrix=new double[row][col];
-        int i = 0;
-        for(int r=0; r<row; r++){
-            for( int c=0; c<col; c++){
-        irisMatrix[r][c]=com.example.jmerwin.irisclassifier.DataSet.irisData[i++];
-            }
-        }
-     
-        //Now do the same for the label data
-        int rowLabel=150;
-        int colLabel=3;
-        double[][] twodimLabel=new double[rowLabel][colLabel];
-        int ii = 0;
-        for(int r=0; r<rowLabel; r++){
-            for( int c=0; c<colLabel; c++){
-                twodimLabel[r][c]=com.example.jmerwin.irisclassifier.DataSet.labelData[ii++];
-            }
-        }
-     
-        //Converting the data matrices into training INDArrays is straight forward
-        INDArray trainingIn = Nd4j.create(irisMatrix);
-        INDArray trainingOut = Nd4j.create(twodimLabel);
-```
-## <a name="head_link4">Building and Training the Neural Network</a>
-
-Now that our data is ready, we can build a simple multi-layer perceptron with a single hidden layer. The *DenseLayer* class is used to create the input layer and the hidden layer of the network while the *OutputLayer* class is used for the Output layer. The number of columns in the input INDArray must equal to the number of neurons in the input layer (nIn). The number of neurons in the hidden layer input must equal the number inputLayer’s output array (nOut). Finally, the outputLayer input should match the hiddenLayer output. The output must equal the number of possible classifications, which is 3.
-
-```java
-    //define the layers of the network
-    DenseLayer inputLayer = new DenseLayer.Builder()
-            .nIn(4)
-            .nOut(3)
-            .name("Input")
-            .build();
- 
-    DenseLayer hiddenLayer = new DenseLayer.Builder()
-            .nIn(3)
-            .nOut(3)
-            .name("Hidden")
-            .build();
- 
-    OutputLayer outputLayer = new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
-            .nIn(3)
-            .nOut(3)
-            .name("Output")
-            .activation(Activation.SOFTMAX)
-            .build();
-```
-The next step is to build the neural network using *nccBuilder*. The parameters selected below for training are standard. To learn more about optimizing network training, see deeplearning4j.org.
-```java
-    NeuralNetConfiguration.Builder nncBuilder = new NeuralNetConfiguration.Builder();
-    long seed = 6;
-    nncBuilder.seed(seed);
-    nncBuilder.activation(Activation.TANH);
-    nncBuilder.weightInit(WeightInit.XAVIER);
-     
-    NeuralNetConfiguration.ListBuilder listBuilder = nncBuilder.list();
-    listBuilder.layer(0, inputLayer);
-    listBuilder.layer(1, hiddenLayer);
-    listBuilder.layer(2, outputLayer);
-     
-    listBuilder.backprop(true);
-     
-    MultiLayerNetwork myNetwork = new MultiLayerNetwork(listBuilder.build());
-    myNetwork.init();
-     
-    //Create a data set from the INDArrays and train the network
-    DataSet myData = new DataSet(trainingIn, trainingOut);
-    for(int l=0; l<=1000; l++) {
-    myNetwork.fit(myData);
-    }
-     
-    //Evaluate the input data against the model
-    INDArray actualOutput = myNetwork.output(actualInput);
-    Log.d("myNetwork Output ", actualOutput.toString());
-     
-    //Here we return the INDArray to onPostExecute where it can be 
-    //used to update the UI
-    return actualOutput;
-}
-```
-## <a name="head_link5">Updating the UI</a>
-
-Once the training of the neural network and the classification of the user measurements are complete, the doInBackground() method will finish and onPostExecute() will have access to the main thread and UI, allowing us to update the UI with the classification results. Note that the decimal places reported on the probabilities can be controlled by setting a DecimalFormat pattern.
-```java
-//This is where we update the UI with our classification results
-    @Override
-    protected void onPostExecute(INDArray result) {
-        super.onPostExecute(result);
- 
-    //Hide the progress bar now that we are finished
-    ProgressBar bar = (ProgressBar) findViewById(R.id.progressBar);
-    bar.setVisibility(View.INVISIBLE);
- 
-    //Retrieve the three probabilities
-    Double first = result.getDouble(0,0);
-    Double second = result.getDouble(0,1);
-    Double third = result.getDouble(0,2);
- 
-    //Update the UI with output
-    TextView setosa = (TextView) findViewById(R.id.textView11);
-    TextView versicolor = (TextView) findViewById(R.id.textView12);
-    TextView virginica = (TextView) findViewById(R.id.textView13);
- 
-    //Limit the double to values to two decimals using DecimalFormat
-    DecimalFormat df2 = new DecimalFormat(".##");
- 
-    //Set the text of the textViews in UI to show the probabilites
-    setosa.setText(String.valueOf(df2.format(first)));
-    versicolor.setText(String.valueOf(df2.format(second)));
-    virginica.setText(String.valueOf(df2.format(third)));
- 
-    }
-```
-
-
-## <a name="head_link6">Conclusion</a>
-
-Hopefully this tutorial has illustrated how the compatibility of DL4J with Android makes it easy to build, train, and evaluate neural networks on mobile devices. We used a simple UI to take input values from the measurement and then passed them as the *Params* in an AsyncTask. The processor intensive steps of data preparation, network layer building, model training, and evaluation of the user data were all performed in the doInBackground() method of the background thread, maintaining a stable and responsive device. Once completed, we passed the output INDArray as the AsyncTask *Results* to onPostExecute() where the UI was updated to demonstrate the classification results. 
-The limitations of processing power and battery life of mobile devices make training robust, multi-layer networks somewhat unfeasible. To address this limitation, we will next look at an example Android application that saves the trained model on the device for faster performance after an initial model training.
-
-The complete code for this example is available [here.](https://github.com/eclipse/deeplearning4j-examples/tree/master/android/DL4JIrisClassifierDemo)
-
-
-
--- a/docs/deeplearning4j/templates/android-prerequisites.md
+++ b/docs/deeplearning4j/templates/android-prerequisites.md
@ -1,410 +0,0 @@
---
-title: Prerequisites and Configurations for DL4J in Android
-short_title: Android Prerequisites
-description: Setting up and configuring Android Studio for DL4J.
-category: Mobile
-weight: 1
---
-
-## Prerequisites and Configurations for DL4J in Android
-
-Contents
-* [Prerequisites](#head_link1)
-* [Required Dependencies](#head_link2)
-* [Managing Dependencies with ProGuard](#head_link3)
-* [Memory Management](#head_link4)
-* [Saving and Loading Networks on Android](#head_link5)
-
-While neural networks are typically run on powerful computers using multiple GPUs, the compatibility of Deeplearning4J with the Android platform makes using DL4J neural networks in android applications a possibility. This tutorial will cover the basics of setting up android studio for building DL4J applications. Several configurations for dependencies, memory management, and compilation exclusions needed to mitigate the limitations of low powered mobile device are outlined below. If you just want to get a DL4J app running on your device, you can jump ahead to a simple demo application which trains a neural network for Iris flower classification available [here](./deeplearning4j-android-linear-classifier).
-
-
-## <a name="head_link1">Prerequisites</a>
-
-* Android Studio 2.2 or newer, which can be downloaded [here](https://developer.android.com/studio/index.html#Other). 
-* Android Studio version 2.2 and higher comes with the latest OpenJDK embedded; however, it is recommended to have the JDK installed on your own as you are then able to update it independent of Android Studio. Android Studio 3.0 and later supports all of Java 7 and a subset of Java 8 language features. Java JDKs can be downloaded from Oracle's website.
-* Within Android studio, the Android SDK Manager can be used to install Android Build tools 24.0.1 or later, SDK platform 24 or later, and the Android Support Repository. 
-* An Android device or an emulator running API level 21 or higher. A minimum of 200 MB of internal storage space free is recommended.
-
-It is also recommended that you download and install IntelliJ IDEA, Maven, and the complete dl4j-examples directory for building and building and training neural nets on your desktop instead of android studio.
-
-
-## <a name="head_link2">Required Dependencies</a>
-
-In order to use Deeplearning4J in your Android projects, you will need to add the following dependencies to your app module’s build.gradle file. Depending on the type of neural network used in your application, you may need to add additional dependencies.
-
-``` groovy
-implementation (group: 'org.deeplearning4j', name: 'deeplearning4j-core', version: '{{page.version}}') {
-    exclude group: 'org.bytedeco', module: 'opencv-platform'
-    exclude group: 'org.bytedeco', module: 'leptonica-platform'
-    exclude group: 'org.bytedeco', module: 'hdf5-platform'
-}
-implementation group: 'org.nd4j', name: 'nd4j-native', version: '{{page.version}}'
-implementation group: 'org.nd4j', name: 'nd4j-native', version: '{{page.version}}', classifier: "android-arm"
-implementation group: 'org.nd4j', name: 'nd4j-native', version: '{{page.version}}', classifier: "android-arm64"
-implementation group: 'org.nd4j', name: 'nd4j-native', version: '{{page.version}}', classifier: "android-x86"
-implementation group: 'org.nd4j', name: 'nd4j-native', version: '{{page.version}}', classifier: "android-x86_64"
-implementation group: 'org.bytedeco', name: 'openblas', version: '0.3.9-1.5.3'
-implementation group: 'org.bytedeco', name: 'openblas', version: '0.3.9-1.5.3', classifier: "android-arm"
-implementation group: 'org.bytedeco', name: 'openblas', version: '0.3.9-1.5.3', classifier: "android-arm64"
-implementation group: 'org.bytedeco', name: 'openblas', version: '0.3.9-1.5.3', classifier: "android-x86"
-implementation group: 'org.bytedeco', name: 'openblas', version: '0.3.9-1.5.3', classifier: "android-x86_64"
-implementation group: 'org.bytedeco', name: 'opencv', version: '4.3.0-1.5.3'
-implementation group: 'org.bytedeco', name: 'opencv', version: '4.3.0-1.5.3', classifier: "android-arm"
-implementation group: 'org.bytedeco', name: 'opencv', version: '4.3.0-1.5.3', classifier: "android-arm64"
-implementation group: 'org.bytedeco', name: 'opencv', version: '4.3.0-1.5.3', classifier: "android-x86"
-implementation group: 'org.bytedeco', name: 'opencv', version: '4.3.0-1.5.3', classifier: "android-x86_64"
-implementation group: 'org.bytedeco', name: 'leptonica', version: '1.79.0-1.5.3'
-implementation group: 'org.bytedeco', name: 'leptonica', version: '1.79.0-1.5.3', classifier: "android-arm"
-implementation group: 'org.bytedeco', name: 'leptonica', version: '1.79.0-1.5.3', classifier: "android-arm64"
-implementation group: 'org.bytedeco', name: 'leptonica', version: '1.79.0-1.5.3', classifier: "android-x86"
-implementation group: 'org.bytedeco', name: 'leptonica', version: '1.79.0-1.5.3', classifier: "android-x86_64"
-testimplementation 'junit:junit:4.12'
-```
-
-DL4J depends on ND4J, which is a library that offers fast n-dimensional arrays. ND4J in turn depends on a platform-specific native code library called JavaCPP, therefore you must load a version of ND4J that matches the architecture of the Android device. Both -x86 and -arm types can be included to support multiple device processor types.
-
-The above dependencies contain several files with identical names which must be handled with the following exclude parameters to your packagingOptions.
-
-```java
-packagingOptions {
-    exclude 'META-INF/DEPENDENCIES'
-    exclude 'META-INF/DEPENDENCIES.txt'
-    exclude 'META-INF/LICENSE'
-    exclude 'META-INF/LICENSE.txt'
-    exclude 'META-INF/license.txt'
-    exclude 'META-INF/NOTICE'
-    exclude 'META-INF/NOTICE.txt'
-    exclude 'META-INF/notice.txt'
-    exclude 'META-INF/INDEX.LIST'
-
-}
- ```       
-After adding the above dependencies and exclusions to the build.gradle file, try syncing Gradle with to see if any other exclusions are needed. The error message will identify the file path that should be added to the list of exclusions. An example error message with file path is: *> More than one file was found with OS independent path 'org/bytedeco/javacpp/ windows-x86_64/msvp120.dll'*
-Compiling these dependencies involves a large number of files, thus it is necessary to set multiDexEnabled to true in defaultConfig.
-
-```java
-multiDexEnabled true
-```
-
-A conflict in the junit module versions often causes the following error:  *> Conflict with dependency 'junit:junit' in project ':app'. Resolved versions for app (4.8.2) and test app (4.12) differ*. This can be suppressed by forcing all of the junit modules to use the same version with the following:
-
-```java
-configurations.all {
-    resolutionStrategy.force 'junit:junit:4.12'
-}
-```
-
-
-## <a name="head_link3">Managing Dependencies with ProGuard</a>
-
-The DL4J dependencies compile a large number of files. ProGuard can be used to minimize your APK file size. ProGuard detects and removes unused classes, fields, methods, and attributes from your packaged app, including those from code libraries. You can learn more about using Proguard [here](https://developer.android.com/studio/build/shrink-code.html).
-To enable code shrinking with ProGuard, add minifyEnabled true to the appropriate build type in your build.gradle file.
-
-```java
-buildTypes {
-    release {
-        minifyEnabled true
-        proguardFiles getDefaultProguardFile('proguard-android.txt'), 'proguard-rules.pro'
-    }
-}
-```
-
-It is recommended to upgrade your ProGuard in the Android SDK to the latest release (5.1 or higher). Note that upgrading the build tools or other aspects of your SDK might cause Proguard to reset to the version shipped with the SDK. In order to force ProGuard to use a version of other than the Android Gradle default, you can include this in the buildscript of `build.gradle` file:
-
-``` java
-buildscript {
-    configurations.all {
-        resolutionStrategy {
-            force 'net.sf.proguard:proguard-gradle:5.3.2'
-        }
-    }
-}
-```
-
-Proguard optimizes and reduces the amount of code in your Android application in order to make if  smaller and faster. Unfortunately, proguard removes annotations by default, including the @Platform annotation used by javaCV. To make proguard preserve these annotations and keep native methods add the following flags to the progaurd-rules.pro file. 
-
-```java
-# enable optimization
-optimizations !code/simplification/arithmetic,!code/simplification/cast,!field/*,!class/merging/*
-optimizationpasses 5
-allowaccessmodification
-dontwarn org.apache.lang.**
-ignorewarnings
-
-keepattributes *Annotation*
-# JavaCV
-keep @org.bytedeco.javacpp.annotation interface * {*;}
-keep @org.bytedeco.javacpp.annotation.Platform public class *
-keepclasseswithmembernames class * {@org.bytedeco.* <fields>;}
-keepclasseswithmembernames class * {@org.bytedeco.* <methods>;}
-
-keepattributes EnclosingMethod
-keep @interface org.bytedeco.javacpp.annotation.*,javax.inject.*
-
-keepattributes *Annotation*, Exceptions, Signature, Deprecated, SourceFile, SourceDir, LineNumberTable, LocalVariableTable, LocalVariableTypeTable, Synthetic, EnclosingMethod, RuntimeVisibleAnnotations, RuntimeInvisibleAnnotations, RuntimeVisibleParameterAnnotations, RuntimeInvisibleParameterAnnotations, AnnotationDefault, InnerClasses
-keep class org.bytedeco.javacpp.** {*;}
-dontwarn java.awt.**
-dontwarn org.bytedeco.javacv.**
-dontwarn org.bytedeco.javacpp.**
-# end javacv
-
-# This flag is needed to keep native methods
-keepclasseswithmembernames class * {
- native <methods>;
-}
-
-keep public class * extends android.view.View {
- public <init>(android.content.Context);
- public <init>(android.content.Context, android.util.AttributeSet);
- public <init>(android.content.Context, android.util.AttributeSet, int);
- public void set*(...);
-}
-
-keepclasseswithmembers class * {
- public <init>(android.content.Context, android.util.AttributeSet);
-}
-
-keepclasseswithmembers class * {
- public <init>(android.content.Context, android.util.AttributeSet, int);
-}
-
-keepclassmembers class * extends android.app.Activity {
- public void *(android.view.View);
-}
-
-# For enumeration classes
-keepclassmembers enum * {
- public static **[] values();
- public static ** valueOf(java.lang.String);
-}
-
-keep class * implements android.os.Parcelable {
- public static final android.os.Parcelable$Creator *;
-}
-
-keepclassmembers class **.R$* {
- public static <fields>;
-}
-
-keep class android.support.v7.app.** { *; }
-keep interface android.support.v7.app.** { *; }
-keep class com.actionbarsherlock.** { *; }
-keep interface com.actionbarsherlock.** { *; }
-dontwarn android.support.**
-dontwarn com.google.ads.**
-
-# Flags to keep standard classes
-keep public class * extends android.app.Activity
-keep public class * extends android.app.Application
-keep public class * extends android.app.Service
-keep public class * extends android.content.BroadcastReceiver
-keep public class * extends android.content.ContentProvider
-keep public class * extends android.app.backup.BackupAgent
-keep public class * extends android.preference.Preference
-keep public class * extends android.support.v7.app.Fragment
-keep public class * extends android.support.v7.app.DialogFragment
-keep public class * extends com.actionbarsherlock.app.SherlockListFragment
-keep public class * extends com.actionbarsherlock.app.SherlockFragment
-keep public class * extends com.actionbarsherlock.app.SherlockFragmentActivity
-keep public class * extends android.app.Fragment
-keep public class com.android.vending.licensing.ILicensingService
-```
-
-Testing your app is the best way to check if any errors are being caused by inappropriately removed code; however, you can also inspect what was removed by reviewing the usage.txt output file saved in <module-name>/build/outputs/mapping/release/.
-
-To fix errors and force ProGuard to retain certain code, add a -keep line in the ProGuard configuration file. For example:
-```java
-keep public class MyClass
-```
-
-
-## <a name="head_link4">Memory Management</a>
-
-It may also be advantageous to increase the allocated memory to your app by adding android:largeHeap="true" to the manifest file. Allocating a larger heap means that you decrease the risk of throwing an OutOfMemoryError during memory intensive operations. 
-
-```xml
-android:largeHeap="true"
-```
-
-As of release 0.9.0, ND4J offers an additional memory-management model: workspaces. Workspaces allow you to reuse memory for cyclic workloads without the JVM Garbage Collector for off-heap memory tracking. D4j Workspace allows for memory to be preallocated before a try / catch block and reused over in over within that block.
-
-If your training process uses workspaces, it is recommended that you disable or reduce the frequency of periodic GC calls prior to your model.fit() call.
-
-```java
-// this will limit frequency of gc calls to 5000 milliseconds
-Nd4j.getMemoryManager().setAutoGcWindow(5000)
-
-// this will totally disable it
-Nd4j.getMemoryManager().togglePeriodicGc(false);
-```
-
-The example below illustrates the use of a Workspace for memory allocation in the AsyncTask of and Android Application. More information concerning ND4J Workspaces can be found [here](https://deeplearning4j.org/workspaces).
-
-```java
-import org.nd4j.linalg.api.memory.MemoryWorkspace;
-import org.nd4j.linalg.api.memory.conf.WorkspaceConfiguration;
-import org.nd4j.linalg.api.memory.enums.AllocationPolicy;
-import org.nd4j.linalg.api.memory.enums.LearningPolicy;
-
-
-private class AsyncTaskRunner extends AsyncTask<String, Integer, INDArray> {
-
-    // Runs in UI before background thread is called
-    @Override
-    protected void onPreExecute() {
-        super.onPreExecute();
-    }
-
-    //Runs on background thread, this is where we will initiate the Workspace
-    protected INDArray doInBackground(String... params) {
-
-        // we will create configuration with 10MB memory space preallocated
-        WorkspaceConfiguration initialConfig = WorkspaceConfiguration.builder()
-                .initialSize(10 * 1024L * 1024L)
-                .policyAllocation(AllocationPolicy.STRICT)
-                .policyLearning(LearningPolicy.NONE)
-                .build();
-
-        INDArray result = null;
-
-        try(MemoryWorkspace ws = Nd4j.getWorkspaceManager().getAndActivateWorkspace(initialConfig, "SOME_ID")) {
-        // now, INDArrays created within this try block will be allocated from this workspace pool
-
-            //Load a trained model
-            File file = new File(Environment.getExternalStorageDirectory() + "/trained_model.zip");
-            MultiLayerNetwork restored = ModelSerializer.restoreMultiLayerNetwork(file);
-
-            // Create input in INDArray
-            INDArray inputData = Nd4j.zeros(1, 4);
-
-            inputData.putScalar(new int[]{0, 0}, 1);
-            inputData.putScalar(new int[]{0, 1}, 0);
-            inputData.putScalar(new int[]{0, 2}, 1);
-            inputData.putScalar(new int[]{0, 3}, 0);
-
-            result = restored.output(inputData);
-
-        }
-        catch(IOException ex){Log.d("AsyncTaskRunner2 ", "catchIOException = " + ex  );}
-
-        return result;
-    }
-
-    protected void onProgressUpdate(Integer... values) {
-        super.onProgressUpdate(values);
-    }
-
-    protected void onPostExecute(INDArray result) {
-        super.onPostExecute(result);
-     //Handle results and update UI here.
-    }
-
-}
-```
-
-
-## <a name="head_link5">Saving and Loading Networks on Android</a>
-
-Practical considerations regarding performance limits are needed when building Android applications that run neural networks. Training a neural network on a device is possible, but should only be attempted with networks with limited numbers of layers, nodes, and iterations. The first Demo app [DL4JIrisClassifierDemo](./deeplearning4j-android-linear-classifier) is able to train on a standard device in about 15 seconds. 
-
-When training on a device is a reasonable option, the application performance can be improved by saving the trained model on the phone's external storage once an initial training is complete. The trained model can then be used as an application resource. This approach is useful for training networks with data obtained from user input. The following code illustrates how to train a network and save it on the phone's external resources.
-
-For API 23 and greater, you will need to include the permissions in your manifest and also programmatically request the read and write permissions in your activity. The required Manifest permissions are:
-
-```xml
-<manifest>
-        <uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
-        <uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
-        ...
-```
-
-You need to implement ActivityCompat.OnRequestPermissionsResultCallback in the activity and then check for permission status.
-
-```java
-public class MainActivity extends AppCompatActivity
-        implements ActivityCompat.OnRequestPermissionsResultCallback {
-
-    private static final int REQUEST_EXTERNAL_STORAGE = 1;
-    private static String[] PERMISSIONS_STORAGE = {
-            Manifest.permission.READ_EXTERNAL_STORAGE,
-            Manifest.permission.WRITE_EXTERNAL_STORAGE
-    };
-
-    @Override
-    protected void onCreate(Bundle savedInstanceState) {
-        super.onCreate(savedInstanceState);
-        setContentView(R.layout.activity_main);
-
-        verifyStoragePermission(MainActivity.this);
-	//…
-	}
-
-	public static void verifyStoragePermission(Activity activity) {
-	    // Get permission status
-	    int permission = ActivityCompat.checkSelfPermission(activity, Manifest.permission.WRITE_EXTERNAL_STORAGE);
-	    if (permission != PackageManager.PERMISSION_GRANTED) {
-	    // We don't have permission we request it
-	    ActivityCompat.requestPermissions(
-	                activity,
-	                PERMISSIONS_STORAGE,
-	                REQUEST_EXTERNAL_STORAGE
-	        );
-	    }
-	}
-```
-
-To save a network after training on the device use a OutputStream within a try  catch block.
-
-```java 
-try {
-    File file = new File(Environment.getExternalStorageDirectory() + "/trained_model.zip");
-    OutputStream outputStream = new FileOutputStream(file);
-    boolean saveUpdater = true;
-    ModelSerializer.writeModel(myNetwork, outputStream, saveUpdater);
-
-} catch (Exception e) {
-    Log.e("saveToExternalStorage error", e.getMessage());
-}
-```
-
-To load the trained network from storage you can use the restoreMultiLayerNetwork method.
-
-```java 
-try{
-    //Load the model
-    File file = new File(Environment.getExternalStorageDirectory() + "/trained_model.zip");
-    MultiLayerNetwork restored = ModelSerializer.restoreMultiLayerNetwork(file);
- 
-} catch (Exception e) {
-    Log.e("Load from External Storage error", e.getMessage());
-}
-```
-
-For larger or more complex neural networks like Convolutional or Recurrent Neural Networks, training on the device is not a realistic option as long processing times during network training run the risk of generating an OutOfMemoryError and make for a poor user experience. As an alternative, the Neural Network can be trained on the desktop, saved via ModelSerializer, and then loaded as a pre-trained model in the application. Using a pre-trained model in you Android application can be achieved with the following steps:
-
-* Train the yourModel on desktop and save via modelSerializer.
-* Create a raw resource folder in the res directory of the application.
-* Copy yourModel.zip file into the raw folder.
-* Access it from your resources using an inputStream within a try / catch block.
-
-```java
-try {
-// Load name of model file (yourModel.zip).
-        InputStream is = getResources().openRawResource(R.raw.yourModel);
-
-// Load yourModel.zip.
-        MultiLayerNetwork restored = ModelSerializer.restoreMultiLayerNetwork(is);
-        
-// Use yourModel.
-        INDArray results = restored.output(input)
-        System.out.println("Results: "+ results );
-// Handle the exception error
-} catch(IOException e) {
-        e.printStackTrace();
-    }
-```
-
-
-## Next Step: Pretrained DL4J Models on Android
-
-An example application which uses a pretrained model can be found [here](./deeplearning4j-android-image-classification).
--- a/docs/deeplearning4j/templates/android.md
+++ b/docs/deeplearning4j/templates/android.md
@ -1,266 +0,0 @@
---
-title: Android for Deep Learning
-short_title: Android Overview
-description: Using Deep Learning and Neural Networks in Android Applications
-category: Mobile
-weight: 0
---
-
-## Using Deep Learning & Neural Networks in Android Applications
-
-Contents
-
-* [Prerequisites](#head_link1)
-* [Configuring Your Android Studio Project](#head_link2)
-* [Starting an Asynchronous Task](#head_link7)
-* [Creating a Neural Network](#head_link3)
-* [Creating Training Data](#head_link5)
-* [Conclusion](#head_link6)
-
-Generally speaking, training a neural network is a task best suited for powerful computers with multiple GPUs. But what if you want to do it on your humble Android phone or tablet? Well, it’s definitely possible. Considering an average Android device’s specifications, however, it will most likely be quite slow. If that’s not a problem for you, keep reading.
-
-In this tutorial, I’ll show you how to use [Deeplearning4J](https://deeplearning4j.org/quickstart), a popular Java-based deep learning library, to create and train a neural network on an Android device.
-
-
-## <a name="head_link1">Prerequisites</a>
-
-For best results, you’ll need the following:
-
-* An Android device or emulator that runs API level 21 or higher, and has about 200 MB of internal storage space free. I strongly suggest you use an emulator first because you can quickly tweak it in case you run out of memory or storage space.
-* Android Studio 2.2 or newer
-* A more in-depth look at using DL4J in Android Applications can be found here. This guide covers dependencies, memory management, saving device-trained models, and loading pre-trained models in the application.
-
-
-## <a name="head_link2">Configuring Your Android Studio Project</a>
-
-To be able to use Deeplearning4J in your project, add the following implementation dependencies to your app module’s build.gradle file:
-
-``` groovy
-implementation (group: 'org.deeplearning4j', name: 'deeplearning4j-core', version: '{{page.version}}') {
-    exclude group: 'org.bytedeco', module: 'opencv-platform'
-    exclude group: 'org.bytedeco', module: 'leptonica-platform'
-    exclude group: 'org.bytedeco', module: 'hdf5-platform'
-}
-implementation group: 'org.nd4j', name: 'nd4j-native', version: '{{page.version}}'
-implementation group: 'org.nd4j', name: 'nd4j-native', version: '{{page.version}}', classifier: "android-arm"
-implementation group: 'org.nd4j', name: 'nd4j-native', version: '{{page.version}}', classifier: "android-arm64"
-implementation group: 'org.nd4j', name: 'nd4j-native', version: '{{page.version}}', classifier: "android-x86"
-implementation group: 'org.nd4j', name: 'nd4j-native', version: '{{page.version}}', classifier: "android-x86_64"
-implementation group: 'org.bytedeco', name: 'openblas', version: '0.3.9-1.5.3'
-implementation group: 'org.bytedeco', name: 'openblas', version: '0.3.9-1.5.3', classifier: "android-arm"
-implementation group: 'org.bytedeco', name: 'openblas', version: '0.3.9-1.5.3', classifier: "android-arm64"
-implementation group: 'org.bytedeco', name: 'openblas', version: '0.3.9-1.5.3', classifier: "android-x86"
-implementation group: 'org.bytedeco', name: 'openblas', version: '0.3.9-1.5.3', classifier: "android-x86_64"
-implementation group: 'org.bytedeco', name: 'opencv', version: '4.3.0-1.5.3'
-implementation group: 'org.bytedeco', name: 'opencv', version: '4.3.0-1.5.3', classifier: "android-arm"
-implementation group: 'org.bytedeco', name: 'opencv', version: '4.3.0-1.5.3', classifier: "android-arm64"
-implementation group: 'org.bytedeco', name: 'opencv', version: '4.3.0-1.5.3', classifier: "android-x86"
-implementation group: 'org.bytedeco', name: 'opencv', version: '4.3.0-1.5.3', classifier: "android-x86_64"
-implementation group: 'org.bytedeco', name: 'leptonica', version: '1.79.0-1.5.3'
-implementation group: 'org.bytedeco', name: 'leptonica', version: '1.79.0-1.5.3', classifier: "android-arm"
-implementation group: 'org.bytedeco', name: 'leptonica', version: '1.79.0-1.5.3', classifier: "android-arm64"
-implementation group: 'org.bytedeco', name: 'leptonica', version: '1.79.0-1.5.3', classifier: "android-x86"
-implementation group: 'org.bytedeco', name: 'leptonica', version: '1.79.0-1.5.3', classifier: "android-x86_64"
-
-```
-
-If you choose to use a SNAPSHOT version of the dependencies with gradle, you will need to create the a pom.xml file in the root directory and run ``` mvn -U compile ``` on it from the terminal. You will also need to include ``` mavenLocal() ``` in the ```  repository {} ``` block of the build.gradle file. An example pom.xml file is provided below.
-
-``` xml
-<project>
-    <modelVersion>4.0.0</modelVersion>
-    <groupId>org.deeplearning4j</groupId>
-    <artifactId>snapshots</artifactId>
-    <version>1.0.0-SNAPSHOT</version>
-    <dependencies>
-       <dependency>
-            <groupId>org.nd4j</groupId>
-            <artifactId>nd4j-native-platform</artifactId>
-            <version>1.0.0-SNAPSHOT</version>
-        </dependency>
-        <dependency>
-            <groupId>org.deeplearning4j</groupId>
-            <artifactId>deeplearning4j-core</artifactId>
-            <version>1.0.0-SNAPSHOT</version>
-        </dependency>
-    </dependencies>
-    <repositories>
-        <repository>
-            <id>sonatype-nexus-snapshots</id>
-            <url>https://oss.sonatype.org/content/repositories/snapshots</url>
-            <releases>
-                <enabled>false</enabled>
-            </releases>
-            <snapshots>
-                <enabled>true</enabled>
-                <updatePolicy>always</updatePolicy>
-            </snapshots>
-        </repository>
-    </repositories>
-</project>
-
-```
-Android Studio 3.0 introduced new Gradle, now annotationProcessors should be defined too If you are using it, add following code to gradle dependencies:
-
-```java
-NeuralNetConfiguration.Builder nncBuilder = new NeuralNetConfiguration.Builder();
-nncBuilder.updater(Updater.ADAM);
-```
-As you can see, DL4J depends on ND4J, short for N-Dimensions for Java, which is a library that offers fast n-dimensional arrays. ND4J internally depends on a library called OpenBLAS, which contains platform-specific native code. Therefore, you must load a version of OpenBLAS and ND4J that matches the architecture of your Android device. 
-
-Dependencies of DL4J and ND4J have several files with identical names. In order to avoid build errors, add the following exclude parameters to your packagingOptions.
-
-```groovy
-packagingOptions {
-    exclude 'META-INF/DEPENDENCIES'
-    exclude 'META-INF/DEPENDENCIES.txt'
-    exclude 'META-INF/LICENSE'
-    exclude 'META-INF/LICENSE.txt'
-    exclude 'META-INF/license.txt'
-    exclude 'META-INF/NOTICE'
-    exclude 'META-INF/NOTICE.txt'
-    exclude 'META-INF/notice.txt'
-    exclude 'META-INF/INDEX.LIST'
-}
-```
-Your compiled code will have well over 65,536 methods. To be able to handle this condition, add the following option in the defaultConfig:
-
-```groovy
-multiDexEnabled true
-```
-And now, press Sync Now to update the project. Finally, make sure that your APK doesn't contain both lib/armeabi and lib/armeabi-v7a subdirectories. If it does, move all files to one or the other as some Android devices will have problems with both present.
-
-
-## <a name="head_link7">Starting an Asynchronous Task</a>
-
-Training a neural network is CPU-intensive, which is why you wouldn’t want to do it in your application’s UI thread. I’m not too sure if DL4J trains its networks asynchronously by default. Just to be safe, I’ll spawn a separate thread now using the AsyncTask class.
-
-```java
-AsyncTask.execute(new Runnable() {
-    @Override
-    public void run() {
-        createAndUseNetwork();
-    }
-});
-```
-
-Because the method createAndUseNetwork() doesn’t exist yet, create it.
-
-```java
-private void createAndUseNetwork() {
-}
-```
-
-
-## <a name="head_link3">Creating a Neural Network</a>
-
-DL4J has a very intuitive API. Let us now use it to create a simple multi-layer perceptron with hidden layers. It will take two input values, and spit out one output value. To create the layers, we’ll use the DenseLayer and OutputLayer classes. Accordingly, add the following code to the createAndUseNetwork() method you created in the previous step:
-``` java
-DenseLayer inputLayer = new DenseLayer.Builder()
-        .nIn(2)
-        .nOut(3)
-        .name("Input")
-        .build();
-DenseLayer hiddenLayer = new DenseLayer.Builder()
-        .nIn(3)
-        .nOut(2)
-        .name("Hidden")
-        .build();
-OutputLayer outputLayer = new OutputLayer.Builder()
-        .nIn(2)
-        .nOut(1)
-        .name("Output")
-        .build();
-```
-Now that our layers are ready, let’s create a NeuralNetConfiguration.Builder object to configure our neural network.
-``` java
-NeuralNetConfiguration.Builder nncBuilder = new NeuralNetConfiguration.Builder();
-nncBuilder.updater(Updater.ADAM);
-```
-We must now create a NeuralNetConfiguration.ListBuilder object to actually connect our layers and specify their order.
-``` java
-NeuralNetConfiguration.ListBuilder listBuilder = nncBuilder.list();
-listBuilder.layer(0, inputLayer);
-listBuilder.layer(1, hiddenLayer);
-listBuilder.layer(2, outputLayer);
-```
-Additionally, enable backpropagation by adding the following code:
-``` java
-listBuilder.backprop(true);
-```
-At this point, we can generate and initialize our neural network as an instance of the MultiLayerNetwork class.
-
-``` java
-MultiLayerNetwork myNetwork = new MultiLayerNetwork(listBuilder.build());
-myNetwork.init();
-```
-
-## <a name="head_link5">Creating Training Data</a>
-To create our training data, we’ll be using the INDArray class, which is provided by ND4J. Here’s what our training data will look like:
-```
-INPUTS      EXPECTED OUTPUTS
------      ----------------
-0,0         0
-0,1         1
-1,0         1
-1,1         0
-
-```
-As you might have guessed, our neural network will behave like an XOR gate. The training data has four samples, and you must mention it in your code.
-
-``` java
-final int NUM_SAMPLES = 4;
-```
-And now, create two INDArray objects for the inputs and expected outputs, and initialize them with zeroes.
-
-``` java
-INDArray trainingInputs = Nd4j.zeros(NUM_SAMPLES, inputLayer.getNIn());
-INDArray trainingOutputs = Nd4j.zeros(NUM_SAMPLES, outputLayer.getNOut());
-```
-Note that the number of columns in the inputs array is equal to the number of neurons in the input layer. Similarly, the number of columns in the outputs array is equal to the number of neurons in the output layer.
-
-Filling those arrays with the training data is easy. Just use the putScalar() method:
-
-
-``` java
-// If 0,0 show 0
-trainingInputs.putScalar(new int[]{0, 0}, 0);
-trainingInputs.putScalar(new int[]{0, 1}, 0);
-trainingOutputs.putScalar(new int[]{0, 0}, 0);
-// If 0,1 show 1
-trainingInputs.putScalar(new int[]{1, 0}, 0);
-trainingInputs.putScalar(new int[]{1, 1}, 1);
-trainingOutputs.putScalar(new int[]{1, 0}, 1);
-// If 1,0 show 1
-trainingInputs.putScalar(new int[]{2, 0}, 1);
-trainingInputs.putScalar(new int[]{2, 1}, 0);
-trainingOutputs.putScalar(new int[]{2, 0}, 1);
-// If 1,1 show 0
-trainingInputs.putScalar(new int[]{3, 0}, 1);
-trainingInputs.putScalar(new int[]{3, 1}, 1);
-trainingOutputs.putScalar(new int[]{3, 0}, 0);
-```
-
-We won’t be using the INDArray objects directly. Instead, we’ll convert them into a DataSet.
-```java
-DataSet myData = new DataSet(trainingInputs, trainingOutputs);
-```
-
-At this point, we can start the training by calling the ``` fit() ``` method of the neural network and passing the data set to it. The ``` for ``` loop controls the iterations of the data set through the network. It is set to 1000 iterations in this example.
-
-```java
-for(int l=0; l<=1000; l++) {
-    myNetwork.fit(myData);
-}
-```
-
-And that’s all there is to it. Your neural network is ready to be used.
-
-
-## <a name="head_link6">Conclusion</a>
-
-In this tutorial, you saw how easy it is to create and train a neural network using the Deeplearning4J library in an Android Studio project. I’d like to warn you, however, that training a neural network on a low-powered, battery operated device might not always be a good idea.
-
-A second example DL4J Android Application which includes a user interface can be found [here](./deeplearning4j-android-linear-classifier). This example trains a neural network on the device using Anderson’s iris data set for iris flower type classification. The application includes user input for the measurements and returns the probability that these measurements belong to one of three iris types (*Iris serosa, Iris versicolor,* and *Iris virginica*).
-
-The limitations of processing power and battery life on mobile devices make training robust, multi-layer networks unfeasible. As an alternative to training a network on the device, the neural network being used by your application can be trained on the desktop, saved via ModelSerializer, and then loaded as a pre-trained model in the application. A third example DL4J Android Application can be found [here](./deeplearning4j-android-image-classification) which loads a pre-trained Mnist network and uses it to classify user drawn numbers.
--- a/docs/deeplearning4j/templates/beginners.md
+++ b/docs/deeplearning4j/templates/beginners.md
@ -1,102 +0,0 @@
---
-title: Deep Learning for Beginners
-short_title: Beginners
-description: Road map for beginners new to deep learning.
-category: Get Started
-weight: 10
---
-
-## How Do I Start Using Deep Learning?
-
-Where you start depends on what you already know. 
-
-The prerequisites for really understanding deep learning are linear algebra, calculus and statistics, as well as programming and some machine learning. The prerequisites for applying it are just learning how to deploy a model. 
-
-In the case of Deeplearning4j, you should know Java well and be comfortable with tools like the IntelliJ IDE and the automated build tool Maven. [Skymind's SKIL](https://docs.skymind.ai/) also includes a managed Conda environment for machine learning tools using Python. 
-
-Below you'll find a list of resources. The sections are roughly organized in the order they will be useful. 
-
-## Free Machine- and Deep-learning Courses Online
-
-* [Andrew Ng's Machine-Learning Class on YouTube](https://www.youtube.com/watch?v=qeHZOdmJvFU) 
-* [Geoff Hinton's Neural Networks Class on YouTube](https://youtu.be/2fRnHVVLf1Y) 
-* [Patrick Winston's Introduction to Artificial Intelligence @MIT](http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-034-artificial-intelligence-fall-2010/) (For those interested in a survey of artificial intelligence.)
-* [Andrej Karpathy's Convolutional Neural Networks Class at Stanford](http://cs231n.github.io) (For those interested in image recognition.)
-* [ML@B: Machine Learning Crash Course: Part 1](https://ml.berkeley.edu/blog/2016/11/06/tutorial-1/)
-* [ML@B: Machine Learning Crash Course: Part 2](https://ml.berkeley.edu/blog/2016/12/24/tutorial-2/)
-* [Gradient descent, how neural networks learn, Deep learning, part 2](https://www.youtube.com/watch?v=IHZwWFHWa-w&feature=youtu.be)
-
-## Math
-
-The math involved with deep learning is basically linear algebra, calculus and probility, and if you have studied those at the undergraduate level, you will be able to understand most of the ideas and notation in deep-learning papers. If haven't studied those in college, never fear. There are many free resources available (and some on this website).
-
-* [Calculus Made Easy, by Silvanus P. Thompson](http://www.gutenberg.org/ebooks/33283?msg=welcome_stranger)
-* [Seeing Theory: A Visual Introduction to Probability and Statistics](http://students.brown.edu/seeing-theory/)
-* [Andrew Ng's 6-Part Review of Linear Algebra](https://www.youtube.com/playlist?list=PLnnr1O8OWc6boN4WHeuisJWmeQHH9D_Vg)
-* [Khan Academy's Linear Algebra Course](https://www.khanacademy.org/math/linear-algebra)
-* [Linear Algebra for Machine Learning](https://www.youtube.com/watch?v=ZumgfOei0Ak); Patrick van der Smagt
-* [CMU's Linear Algebra Review](http://www.cs.cmu.edu/~zkolter/course/linalg/outline.html)
-* [Math for Machine Learning](https://www.umiacs.umd.edu/~hal/courses/2013S_ML/math4ml.pdf)
-* [Immersive Linear Algebra](http://immersivemath.com/ila/learnmore.html)
-* [Probability Cheatsheet](https://static1.squarespace.com/static/54bf3241e4b0f0d81bf7ff36/t/55e9494fe4b011aed10e48e5/1441352015658/probability_cheatsheet.pdf)
-* [The best linear algebra books](https://begriffs.com/posts/2016-07-24-best-linear-algebra-books.html)
-* [Markov Chains, Visually Explained](http://setosa.io/ev/markov-chains/)
-* [An Introduction to MCMC for Machine Learning](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.13.7133&rep=rep1&type=pdf)
-* [Eigenvectors, Eigenvalues, PCA, Covariance and Entropy](https://skymind.ai/wiki/eigenvector)
-* [Markov Chain Monte Carlo (MCMC) & Machine Learning](https://skymind.ai/wiki/markov-chain-monte-carlo)
-* [Relearning Matrices as Linear Functions](https://www.dhruvonmath.com/2018/12/31/matrices/)
-
-## Programming
-
-If you do not know how to program yet, you can start with Java, but you might find other languages easier. Python and Ruby resources can convey the basic ideas in a faster feedback loop. "Learn Python the Hard Way" and "Learn to Program (Ruby)" are two great places to start. 
-
-* [Scratch: A Visual Programming Environment From MIT](https://scratch.mit.edu/)
-* [Learn to Program (Ruby)](https://pine.fm/LearnToProgram/)
-* [Grasshopper: A Mobile App to Learn Basic Coding (Javascript)](https://grasshopper.codes/)
-* [Intro to the Command Line](http://cli.learncodethehardway.org/book/)
-* [Additional command-line tutorial](http://www.learnenough.com/command-line)
-* [A Vim Tutorial and Primer](https://danielmiessler.com/study/vim/) (Vim is an editor accessible from the command line.)
-* [Intro to Computer Science (CS50 @Harvard edX)](https://www.edx.org/course/introduction-computer-science-harvardx-cs50x)
-* [A Gentle Introduction to Machine Fundamentals](https://marijnhaverbeke.nl/turtle/)
-* [Teaching C](https://blog.regehr.org/archives/1393)
-
-If you want to jump into deep-learning from here without Java, we recommend [Theano](http://deeplearning.net/) and the various Python frameworks built atop it, including [Keras](https://github.com/fchollet/keras) and [Lasagne](https://github.com/Lasagne/Lasagne).
-
-## Python
-
-* [Learn Python the Hard Way](http://learnpythonthehardway.org/)
-* [Google's Python Class](https://developers.google.com/edu/python/)
-* [Udemy: Complete Python 3 Masterclass Journey](https://www.udemy.com/complete-python-3-masterclass-journey/)
-* [MIT: Introduction to Computer Science and Python Programming](https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-0001-introduction-to-computer-science-and-programming-in-python-fall-2016/) 
-* [David Beazley: Python Tutorials](http://www.dabeaz.com/tutorials.html)
-* [CS231n: Python Numpy Tutorial](http://cs231n.github.io/python-numpy-tutorial/)
-* [Pyret: A Python Learning Environment](https://www.pyret.org/)
-
-## Java
-
-Once you have programming basics down, tackle Java, the world's most widely used programming language. Most large organizations in the world operate on huge Java code bases. (There will always be Java jobs.) The big data stack -- Hadoop, Spark, Kafka, Lucene, Solr, Cassandra, Flink -- have largely been written for Java's compute environment, the JVM.
-
-* [Think Java: Interactive Web-based Dev Environment](https://books.trinket.io/thinkjava/)
-* [Learn Java The Hard Way](https://learnjavathehardway.org/)
-* [Introduction to JShell](https://docs.oracle.com/javase/10/jshell/introduction-jshell.htm#JSHEL-GUID-630F27C8-1195-4989-9F6B-2C51D46F52C8)
-* [JShell in 5 Minutes](https://dzone.com/articles/jshell-in-five-minutes)
-* [Java Resources](http://wiht.link/java-resources)
-* [Java Ranch: A Community for Java Beginners](http://javaranch.com/)
-* [Intro to Programming in Java @Princeton](http://introcs.cs.princeton.edu/java/home/)
-* [Head First Java](http://www.amazon.com/gp/product/0596009208)
-* [Java in a Nutshell](http://www.amazon.com/gp/product/1449370829)
-* [Java Programming for Complete Beginners in 250 Steps](https://www.udemy.com/java-tutorial/)
-
-## Deeplearning4j
-
-With that under your belt, we recommend you approach Deeplearning4j through its [examples](https://github.com/eclipse/deeplearning4j-examples). 
-
-* [Quickstart](./deeplearning4j-quickstart)
-
-You can also download a [free version of the Skymind Intelligence Layer](https://docs.skymind.ai/), which supports Python, Java and Scala machine-learning and data science tools. SKIL is a machine-learning backend that works on prem and in the cloud, and can ship with your software to provide a machine learning model server. 
-
-## Other Resources
-
-Most of what we know about deep learning is contained in academic papers. You can find some of the major research groups [here](https://skymind.ai/wiki/machine-learning-research-groups-labs).
-
-While individual courses have limits on what they can teach, the Internet does not. Most math and programming questions can be answered by Googling and searching sites like [Stackoverflow](https://stackoverflow.com) and [Math Stackexchange](https://math.stackexchange.com/).
--- a/docs/deeplearning4j/templates/benchmark.md
+++ b/docs/deeplearning4j/templates/benchmark.md
@ -1,314 +0,0 @@
---
-title: Benchmarking with DL4J and ND4J
-short_title: Benchmark Guide
-description: General guidelines for benchmarking in DL4J and ND4J.
-category: Get Started
-weight: 10
---
-
-## General Benchmarking Guidelines
-
-**Guideline 1: Run Warm-Up Iterations Before Benchmarking**
-
-A warm-up period is where you run a number of iterations (for example, a few hundred) of your benchmark without timing, before commencing timing for further iterations.
-
-Why is a warm-up required? The first few iterations of any ND4J/DL4J execution may be slower than those that come later, for a number of reasons:
-1. In the initial benchmark iterations, the JVM has not yet had time to perform just-in-time compilation of code. Once JIT has completed, code is likely to execute faster for all subsequent operations
-2. ND4J and DL4J (and, some other libraries) have some degree of lazy initialization: the first operation may trigger some one-off execution code.
-3. DL4J or ND4J (when using workspaces) can take some iterations to learn memory requirements for execution. During this learning phase, performance will be lower than after its completion.
-
-
-**Guideline 2: Run Multiple Iterations of All Benchmarks**
-
-Your benchmark isn't the only thing running on your computer (not to mention if you are using cloud harware, that might have shared resources). And operation runtime is not perfectly deterministic.
-
-For benchmark results to be reliable, it is important to run multiple iterations - and ideally report both mean and standard deviation for the runtime. Without this, it's impossible to compare the performance of operations, as performance differences may simply be due to random variation.
-
-
-
-**Guideline 3: Pay Careful Attention to What You Are Benchmarking**
-
-This is especially important when comparing frameworks. Before you declare that "performance on operation X is Y" or "A is faster than B", make sure that:
-
-1. You are benchmarking only the operations of interest.
-
-If your goal is to check the performance of an operation, make sure that only this operation is being timed.
-
-You should carefully check whether you unintentionally including other things - for example, does it include:
-JVM initialization time? Library initialization time? Result array allocation time? Garbage collection time? Data loading time?
-
-Ideally, these should be excluded from any timing/performance results you report. If they cannot be excluded, make sure you note this whenever making performance claims.
-
-
-2. What native libraries are you using?
-
-For example: what BLAS implementation (MKL, OpenBLAS, etc)? If you are using CUDA, are you using CuDNN?
-ND4J and DL4J can use these libraries (MKL, CuDNN) when they are available - but are not always available by default. If they are not made available, performance can be lower - sometimes considerably.
-
-This is especially important when comparing results between libraries: for example, if you compared two libraries (one using OpenBLAS, another using MKL) your results may simply reflect the performance differences it the BLAS library being used - and not the performance oth the libraries being tested. Similarly, one library with CuDNN and another without CuDNN may simply reflect the performance benefit of using CuDNN.
-
-
-3. How are things configured?
-
-For better or worse, DL4J and ND4J allow a lot of configuration. The default values for a lot of this configuration is adequate for most users - but sometimes manual configuration is required for optimal performance. This can be especially true in some benchmarks!
-Some of these configuration options allow users to trade off higher memory use for better performance, for example. Some configuration options of note:
-(a) [Memory configuration](./deeplearning4j-config-memory)
-(b) [Workspaces and garbage collection](./deeplearning4j-config-workspaces)
-(c) [CuDNN](./deeplearning4j-config-cudnn)
-(d) DL4J Cache Mode (enable using ```.cacheMode(CacheMode.DEVICE)```)
-
-
-If you aren't sure if you are only measuring what you intend to measure when running DL4J or ND4J code, you can use a profiler such as VisualVM or YourKit Profilers.
-
-
-4. What versions are you using?
-
-When benchmarking, you should use the latest version of whatever libraries you are benchmarking. There's no point identifying and reporting a bottleneck that was fixed 6 months ago. An exception to this would be when you are comparing performance over time between versions.
-Note also that snapshot versions of DL4J and ND4J are also available - these may contain performance improvements (feel free to ask)
-
-
-**Guideline 4: Focus on Real-World Use Cases - And Run a Range of Sizes**
-
-Consider for example a benchmark a benchmark that adds two numbers:
-```
-double x = 0;
-//<start timing>
-x += 1.0;
-//<end timing>
-```
-
-And something equivalent in ND4J:
-```
-INDArray x = Nd4j.create(1);
-//<start timing>
-x.addi(1.0);
-//<end timing>
-```
-
-Of course, the ND4J benchmark above is going to be much slower - method calls are required, input validation is performed, native code has to be called (with context switching overhead), and so on. One must ask the question, however: is this what users will actually be doing with ND4J or an equivalent linear algebra library? It's an extreme example - but the general point is a valid one.
-
-
-Note also that performance on mathematical operations can be size - and shape - specific.
-For example, if you are benchmarking the performance on matrix multiplication - the matrix dimensions can matter a lot. In some internal benchmarks, we found that different BLAS implementations (MKL vs OpenBLAS) - and different backends (CPU vs GPU) - can perform very differently with different matrix dimensions. None of the BLAS implementations (OpenBLAS, MKL, CUDA) we have tested internally were uniformly faster than others for all input shapes and sizes.
-
-Therefore - whenever you are running benchmarks, it's important to run those benchmarks with multiple different input shapes/sizes, to get the full performance picture.
-
-
-**Guideline 5: Understand Your Hardware**
-
-When comparing different hardware, it's important to be aware of what it excels at.
-For example, you might find that neural network training performs faster on a CPU with minibatch size 1 than on a GPU - yet larger minibatch sizes show exactly the opposite. Similarly, small layer sizes may not be able to adequately utilize the power of a GPU.
-
-Furthermore, some deep learning distributions may need to be specifically compiled to provide support for hardware features such as AVX2 (note that recent version of ND4J are packaged with binaries for CPUs that support these features). When running benchmarks, the utilization (or lack there-of) of these features can make a considerable difference to performance.
-
-
-**Guideline 6: Make It Reproducible**
-
-When running benchmarks, it's important to make your benchmarks reproducible.
-Why? Good or bad performance may only occur under certain limited circumstances.
-
-And finally - remember that (a) ND4J and DL4J are in constant development, and (b) benchmarks do sometimes identify performance bottlenecks (after all we - ND4J includes literally hundreds of distinct operations). If you identify a performance bottleneck, great - we want to know about it - so we can fix it. Any time a potential bottleneck is identified, we first need to reproduce it - so that we can study it, understand it and ultimately fix it.
-
-**Guideline 7: Understand the Limitations of Your Benchmarks**
-
-Linear algebra libraries contain hundreds of distinct operations. Neural network libraries contain dozens of layer types. When benchmarking, it's important to understand the limitations of those benchmarks. Benchmarking one type of operation or layer cannot tell you anything about the performance on other types of layers or operations - unless they share code that has been identified to be a performance bottleneck.
-
-**Guideline 8: If You Aren't Sure - Ask**
-
-The DL4J/ND4J developers are available on Gitter. You can ask questions about benchmarking and performance there: [https://gitter.im/deeplearning4j/deeplearning4j](https://gitter.im/deeplearning4j/deeplearning4j)
-
-And if you do happen to find a performance issue - let us know!
-
-
-
-## ND4J Specific Benchmarking
-
-
-**A Note on BLAS and Array Orders**
-
-BLAS - or Basic Linear Algebra Subprograms - refers to an interface and set of methods used for linear algebra operations. Some examples include 'gemm' - General Matrix Multiplication - and 'axpy', which implements ```Y = a*X+Y```.
-
-
-ND4J can use multiple BLAS implementations - versions up to and including 1.0.0-beta have defaulted to OpenBLAS. However, if Intel MKL (free versions are available [here](https://software.intel.com/en-us/mkl)) is installed an available, ND4J will link with it for improved performance in many BLAS operations.
-
-Note that ND4J will log the BLAS backend used when it initializes. For example:
-```
-14:17:34,169 INFO  ~ Loaded [CpuBackend] backend
-14:17:34,672 INFO  ~ Number of threads used for NativeOps: 8
-14:17:34,823 INFO  ~ Number of threads used for BLAS: 8
-14:17:34,831 INFO  ~ Backend used: [CPU]; OS: [Windows 10]
-14:17:34,831 INFO  ~ Cores: [16]; Memory: [7.1GB];
-14:17:34,831 INFO  ~ Blas vendor: [OPENBLAS]
-```
-
-
-Performance can depend on the available BLAS library - in internal tests, we have found that OpenBLAS has been between 30% faster and 8x slower than MKL - depending on the array sizes and array orders.
-
-Regarding array orders, this also matters for performance. ND4J has the possibility of representing arrays in either row major ('c') or column major ('f') order. See [this Wikipedia page](https://en.wikipedia.org/wiki/Row-_and_column-major_order) for more details. Performance in operations such as matrix multiplication - but also more general ND4J operations - depends on the input and result array orders.
-
-For matrix multiplication, this means there are 8 possible combinations of array orders (c/f for each of input 1, input 2 and result arrays). Performance won't be the same for all cases.
-
-Similarly, an operation such as element-wise addition (i.e., z=x+y) will be much faster for some combinations of input orders than others - notably, when x, y and z are all the same order. In short, this is due to memory striding: it's cheaper to read a sequencee of memory addresses when those memory addresess are adjacent to each other in memory, as compared to being spread far apart.
-
-Note that, by default, ND4J expects result arrays (for matrix multiplication) to be defined in column major ('f') order, to be consistent across backends, given that CuBLAS (i.e., NVIDIA's BLAS library for CUDA) requires results to be in f order. As a consequence, some ways of performing matrix multiplication with the result array being in c order will have lower performance than if the same operation was executed with an 'f' order array.
-
-Finally, when it comes to CUDA: array orders/striding can matter even more than when running on CPU. For example, certain combinations of orders can be much faster than others - and input/output dimesions that are even multiples of 32 or 64 typically perform faster (sometimes considerably) than when input/output dimensions are not multiples of 32.
-
-
-
-## DL4J Specific Benchmarking
-
-
-Most of what has been said for ND4J also applies to DL4J.
-
-In addition:
-1. If you are using the nd4j-native (CPU) backend, ensure you are using Intel MKL. This is faster than the default of OpenBLAS in most cases.
-2. If you are using CUDA, ensure you are using CuDNN ([link](./deeplearning4j-config-cudnn)
-3. Check the [Workspaces](./deeplearning4j-config-workspaces) and [Memory](./deeplearning4j-config-memory) guides. The defaults are usually good - but sometimes better performance can be obtained with some tweaking. This is especially important if you have a lot of Java objects (such as, Word2Vec vectors) in memory while training.
-4. Watch out for ETL bottlenecks. You can add PerformanceListener to your network training to see if ETL is a bottleneck.
-5. Don't forget that performance is dependent on minibatch sizes. Don't benchmark with minibatch size 1 - use something more realistic.
-6. If you need multi-GPU training or inference support, use ParallelWrapper or ParallelInference.
-7. Don't forget that CuDNN is configurable: you can specify DL4J/CuDNN to prefer performance - at the expense of memory - using ```.cudnnAlgoMode(ConvolutionLayer.AlgoMode.PREFER_FASTEST)``` configuration on convolution layers
-8. When using GPUs, multiples of 8 (or 32) for input sizes and layer sizes may perform better.
-9. When using RNNs (and manually creating INDArrays), use 'f' ordered arrays for both features and (RnnOutputLayer) labels. Otherwise, use 'c' ordered arrays. This is for faster memory access.
-
-
-## Common Benchmark Mistakes
-
-Finally, here's a summary list of common benchmark mistakes:
-
-1. Not using the latest version of ND4J/DL4J (there's no point identifying a bottleneck that was fixed many releases back). Consider trying snapshots to get the latest performance improvements.
-2. Not paying attention to whan native libraries (MKL, OpenBLAS, CuDNN etc) are being used
-3. Providing no warm-up period before benchmarking begins
-4. Running only a single (or too few) iterations, or not reporting mean, standard deviation and number of iterations
-5. Not configuring workspaces, garbage collection, etc
-6. Running only one possible case - for example, benchmarking a single set of array dimensions/orders when benchmarking BLAS operations
-7. Running unusually small inputs - for example, minibatch size 1 on a GPU (which might be slower - but isn't realistic!)
-8. Not measuring exactly - and only - what you claim to be measuring (for example, not accounting for array allocation, initialization or garbage collection time)
-9. Not making your benchmarks reprodicable (does the benchmark conclusion generalize? are there problems with the benchmark? what can we do to fix it?)
-10. Comparing results across different hardware, not accounting for differences (for example, testing on one machine with AVX2 support, and on another without)
-11. Not asking the devs (via the [DL4J/ND4J Gitter Channel](https://gitter.im/deeplearning4j/deeplearning4j) - we are happy to provide suggestions and investigate if performance isn't where it should be!
-
-
-
-
-
-
-# How to Run Deeplearning4j Benchmarks - A Guide
-
-Total training time is always ETL plus computation. That is, both the data pipeline and the matrix manipulations determine how long a neural network takes to train on a dataset. 
-
-When programmers familiar with Python try to run benchmarks comparing Deeplearning4j to well-known Python frameworks, they usually end up comparing ETL + computation on DL4J to just computation on the Python framework. That is, they're comparing apples to oranges. We'll explain how to optimize several parameters below. 
-
-The JVM has knobs to tune, and if you know how to tune them, you can make it a very fast environment for deep learning. There are several things to keep in mind on the JVM. You need to:
-
-* Increase the [heap space](http://javarevisited.blogspot.com/2011/05/java-heap-space-memory-size-jvm.html)
-* Get garbage collection right
-* Make ETL asynchronous
-* Presave datasets (aka pickling)
-
-## Setting Heap Space
-
-Users have to reconfigure their JVMs themselves, including setting the heap space. We can't give it to you preconfigured, but we can show you how to do it. Here are the two most important knobs for heap space.
-
-* Xms sets the minimum heap space
-* Xmx sets the maximum heap space
-
-You can set these in IDEs like IntelliJ and Eclipse, as well as via the CLI like so:
-
-		java -Xms256m -Xmx1024m YourClassNameHere
-
-In [IntelliJ, this is a VM parameter](https://www.jetbrains.com/help/idea/2016.3/setting-configuration-options.html), not a program argument. When you hit run in IntelliJ (the green button), that sets up a run-time configuration. IJ starts a Java VM for you with the configurations you specify. 
-
-What’s the ideal amount to set `Xmx` to? That depends on how much RAM is on your computer. In general, allocate as much heap space as you think the JVM will need to get work done. Let’s say you’re on a 16G RAM laptop — allocate 8G of RAM to the JVM. A sound minimum on laptops with less RAM would be 3g, so 
-
-		java -Xmx3g
-
-It may seem counterintuitive, but you want the min and max to be the same; i.e. `Xms` should equal `Xmx`. If they are unequal, the JVM will progressively allocate more memory as needed until it reaches the max, and that process of gradual allocation slows things down. You want to pre-allocate it at the beginning. So 
-
-		java -Xms3g -Xmx3g YourClassNameHere
-
-IntelliJ will automatically specify the [Java main class](https://docs.oracle.com/javase/tutorial/getStarted/application/) in question.
-
-Another way to do this is by setting your environmental variables. Here, you would alter your hidden `.bash_profile` file, which adds environmental variables to bash. To see those variables, enter `env` in the command line. To add more heap space, enter this command in your console:
-
-		echo "export MAVEN_OPTS="-Xmx512m -XX:MaxPermSize=512m"" > ~/.bash_profile
-
-We need to increase heap space because Deeplearning4j loads data in the background, which means we're taking more RAM in memory. By allowing more heap space for the JVM, we can cache more data in memory. 
-
-## Garbage Collection
-
-A garbage collector is a program which runs on the JVM and gets rid of objects no longer used by a Java application. It is automatic memory management. Creating a new object in Java takes on-heap memory: A new Java object takes up 8 bytes of memory by default. So every new `DatasetIterator` you create takes another 8 bytes. 
-
-You may need to alter the garbage collection algorithm that Java is using. This can be done via the command line like so:
-
-		java -XX:+UseG1GC
-
-Better garbage collection increases throughput. For a more detailed exploration of the issue, please read this [InfoQ article](https://www.infoq.com/articles/Make-G1-Default-Garbage-Collector-in-Java-9).
-
-DL4J is tightly linked to the garbage collector. [JavaCPP](https://github.com/bytedeco/javacpp), the bridge between the JVM and C++, adheres to the heap space you set with `Xmx` and works extensively with off-heap memory. The off-heap memory will not surpass the amount of heap space you specify. 
-
-JavaCPP, created by a Skymind engineer, relies on the garbage collector to tell it what has been done. We rely on the Java GC to tell us what to collect; the Java GC points at things, and we know how to de-allocate them with JavaCPP. This applies equally to how we work with GPUs. 
-
-The larger the batch size you use, the more RAM you’re taking in memory. 
-
-## ETL & Asynchronous ETL
-
-In our `dl4j-examples` repo, we don't make the ETL asynchronous, because the point of examples is to keep them simple. But for real-world problems, you need asynchronous ETL, and we'll show you how to do it with examples. 
-
-Data is stored on disk and disk is slow. That’s the default. So you run into bottlenecks when loading data onto your harddrive. When optimizing throughput, the slowest component is always the bottleneck. For example, a distributed Spark job using three GPU workers and one CPU worker will have a bottleneck with the CPU. The GPUs have to wait for that CPU to finish. 
-
-The Deeplearning4j class `DatasetIterator` hides the complexity of loading data on disk. The code for using any Datasetiterator will always be the same, invoking looks the same, but they work differently. 
-
-* one loads from disk 
-* one loads asynchronously
-* one loads pre-saved from RAM
-
-Here's how the DatasetIterator is uniformly invoked for MNIST:
-
-            while(mnistTest.hasNext()){
-	                DataSet ds = mnistTest.next();
-	                INDArray output = model.output(ds.getFeatures(), false);
-	                eval.eval(ds.getLabels(), output);
-            }
-
-You can optimize by using an asychronous loader in the background. Java can do real multi-threading. It can load data in the background while other threads take care of compute. So you load data into the GPU at the same time that compute is being run. The neural net trains even as you grab new data from memory.
-
-This is the [relevant code](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-scaleout/deeplearning4j-scaleout-parallelwrapper/src/main/java/org/deeplearning4j/parallelism/ParallelWrapper.java#L136), in particular the third line:
-
-        MultiDataSetIterator iterator;
-        if (prefetchSize > 0 && source.asyncSupported()) {
-            iterator = new AsyncMultiDataSetIterator(source, prefetchSize);
-        } else iterator = source;
-
-There are actually two types of asynchronous dataset iterators. The `AsyncDataSetIterator` is what you would use most of the time. It's described in the [Javadoc here](https://deeplearning4j.org/api/{{page.version}}/org/deeplearning4j/datasets/iterator/AsyncDataSetIterator.html).
-
-For special cases such as recurrent neural nets applied to time series, or for computation graphs, you would use a `AsyncMultiDataSetIterator`, described in the [Javadoc here](https://deeplearning4j.org/api/{{page.version}}/org/deeplearning4j/datasets/iterator/AsyncMultiDataSetIterator.html).
-
-Notice in the code above that `prefetchSize` is another parameter to set. Normal batch size might be 1000 examples, but if you set `prefetchSize` to 3, it would pre-fetch 3,000 instances.
-
-## ETL: Comparing Python frameworks With Deeplearning4j
-
-In Python, programmers are converting their data into [pickles](https://docs.python.org/2/library/pickle.html), or binary data objects. And if they're working with a smallish toy dataset, they're loading all those pickles into RAM. So they're effectively sidestepping a major task in dealing with larger datasets. At the same time, when benchmarking against Dl4j, they're not loading all the data onto RAM. So they're effectively comparing Dl4j speed for training computations + ETL against only training computation time for Python frameworks. 
-
-But Java has robust tools for moving big data, and if compared correctly, is much faster than Python. The Deeplearning4j community has reported up to 3700% increases in speed over Python frameworks, when ETL and computation are optimized.
-
-Deeplearning4j uses DataVec as it ETL and vectorization library. Unlike other deep-learning tools, DataVec does not force a particular format on your dataset. (Caffe forces you to use [hdf5](https://support.hdfgroup.org/HDF5/), for example.)
-
-We try to be more flexible. That means you can point DL4J at raw photos, and it will load the image, run the transforms and put it into an NDArray to generate a dataset on the fly. 
-
-But if your training pipeline is doing that every time, Deeplearning4j will seem about 10x slower than other frameworks, because you’re spending your time creating datasets. Every time you call `fit`, you're recreating a dataset, over and over again. We allow it to happen for ease of use, but we can show you how to speed things up. There are ways to make it just as fast. 
-
-One way is to pre-save the datasets, in a manner similar to the Python frameworks. (Pickles are pre-formatted data.) When you pre-save the dataset, you create a separate class.
-
-Here’s how you [pre-save datasets](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/misc/presave/PreSave.java).
-
-A `Recordreaderdatasetiterator` talks to Datavec and outputs datasets for DL4J. 
-
-Here’s how you [load a pre-saved dataset](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/misc/presave/LoadPreSavedLenetMnistExample.java).
-
-Line 90 is where you see the asynchronous ETL. In this case, it's wrapping the pre-saved iterator, so you're taking advantage of both methods, with the asynch loading the pre-saved data in the background as the net trains. 
-
-## MKL and Inference on CPUs
-
-If you are running inference benchmarks on CPUs, make sure you are using Deeplearning4j with Intel's MKL library, which is available via a clickwrap; i.e. Deeplearning4j does not bundle MKL like Anaconda, which is used by libraries like PyTorch. 
--- a/docs/deeplearning4j/templates/build-from-source.md
+++ b/docs/deeplearning4j/templates/build-from-source.md
@ -1,388 +0,0 @@
---
-title: Buidling Deeplearning4j from Source
-short_title: Build from Source
-description: Instructions to build all DL4J libraries from source.
-category: Get Started
-weight: 10
---
-
-## Build Locally from Master
-
-**NOTE: MOST USERS SHOULD USE THE RELEASES ON MAVEN CENTRAL AS PER THE QUICK START GUIDE, AND NOT BUILD FROM SOURCE**
-
-*Unless you have a very good reason to build from source (such as developing new features - excluding custom layers, custom activation functions, custom loss functions, etc - all of which can be added without modifying DL4J directly) then you shouldn't build from source. Building from source can be quite complex, with no benefit in a lot of cases.*
-
-For those developers and engineers who prefer to use the most up-to-date version of Deeplearning4j or fork and build their own version, these instructions will walk you through building and installing Deeplearning4j. The preferred installation destination is to your machine's local maven repository. If you are not using the master branch, you can modify these steps as needed (i.e.: switching GIT branches and modifying the `build-dl4j-stack.sh` script).
-
-Building locally requires that you build the entire Deeplearning4j stack which includes:
-
- [libnd4j](https://github.com/eclipse/deeplearning4j/tree/master/libnd4j)
- [nd4j](https://github.com/eclipse/deeplearning4j/tree/master/nd4j)
- [datavec](https://github.com/eclipse/deeplearning4j/tree/master/datavec)
- [deeplearning4j](https://github.com/eclipse/deeplearning4j)
-
-Note that Deeplearning4j is designed to work on most platforms (Windows, OS X, and Linux) and is also includes multiple "flavors" depending on the computing architecture you choose to utilize. This includes CPU (OpenBLAS, MKL, ATLAS) and GPU (CUDA). The DL4J stack also supports x86 and PowerPC architectures.
-
-## Prerequisites
-
-Your local machine will require some essential software and environment variables set *before* you try to build and install the DL4J stack. Depending on your platform and the version of your operating system, the instructions may vary in getting them to work. This software includes:
-
- git
- cmake (3.2 or higher)
- OpenMP
- gcc (4.9 or higher)
- maven (3.3 or higher)
-
-Architecture-specific software includes:
-
-CPU options:
-
- Intel MKL
- OpenBLAS
- ATLAS
-
-GPU options:
-
- CUDA
-
-
-IDE-specific requirements:
-
- IntelliJ Lombok plugin
-
-DL4J testing dependencies:
-
- dl4j-test-resources
-
-### Installing Prerequisite Tools
-
-#### Linux
-
-**Ubuntu**
-Assuming you are using Ubuntu as your flavor of Linux and you are running as a non-root user, follow these steps to install prerequisite software:
-
-```
-sudo apt-get purge maven maven2 maven3
-sudo add-apt-repository ppa:natecarlson/maven3
-sudo apt-get update
-sudo apt-get install maven build-essentials cmake libgomp1
-```
-
-#### OS X
-
-Homebrew is the accepted method of installing prerequisite software. Assuming you have Homebrew installed locally, follow these steps to install your necessary tools.
-
-First, before using Homebrew we need to ensure an up-to-date version of Xcode is installed (it is used as a primary compiler):
-
-```
-xcode-select --install
-```
-
-Finally, install prerequisite tools:
-
-```
-brew update
-brew install maven gcc5
-```
-Note: You can *not* use clang. You also can *not* use a new version of gcc. If you have a newer version of gcc, please
-switch versions with [this link](https://apple.stackexchange.com/questions/190684/homebrew-how-to-switch-between-gcc-versions-gcc49-and-gcc)
-
-
-#### Windows
-
-libnd4j depends on some Unix utilities for compilation. So in order to compile it you will need to install  [Msys2](https://msys2.github.io/).
-
-After you have setup Msys2 by following [their instructions](https://msys2.github.io/), you will have to install some additional development packages. Start the msys2 shell and setup the dev environment with:
-
-    pacman -S mingw-w64-x86_64-gcc mingw-w64-x86_64-cmake mingw-w64-x86_64-extra-cmake-modules make pkg-config grep sed gzip tar mingw64/mingw-w64-x86_64-openblas
-
-This will install the needed dependencies for use in the msys2 shell.
-
-You will also need to setup your PATH environment variable to include `C:\msys64\mingw64\bin` (or where ever you have decided to install msys2). If you have IntelliJ (or another IDE) open, you will have to restart it before this change takes effect for applications started through them. If you don't, you probably will see a "Can't find dependent libraries" error.
-
-### Installing Prerequisite Architectures
-
-Once you have installed the prerequisite tools, you can now install the required architectures for your platform.
-
-#### Intel MKL
-
-Of all the existing architectures available for CPU, Intel MKL is currently the fastest. However, it requires some "overhead" before you actually install it.
-
-1. Apply for a license at [Intel's site](https://software.intel.com/en-us/intel-mkl)
-2. After a few steps through Intel, you will receive a download link
-3. Download and install Intel MKL using [the setup guide](https://software.intel.com/sites/default/files/managed/94/bf/Install_Guide_0.pdf)
-
-#### OpenBLAS
-
-##### Linux
-
-**Ubuntu**
-Assuming you are using Ubuntu, you can install OpenBLAS via:
-
-```
-sudo apt-get install libopenblas-dev
-```
-
-You will also need to ensure that `/opt/OpenBLAS/lib` (or any other home directory for OpenBLAS) is on your `PATH`. In order to get OpenBLAS to work with Apache Spark, you will also need to do the following:
-
-```
-sudo cp libopenblas.so liblapack.so.3
-sudo cp libopenblas.so libblas.so.3
-```
-
-**CentOS**
-Enter the following in your terminal (or ssh session) as a root user:
-
-    yum groupinstall 'Development Tools'
-
-After that, you should see a lot of activity and installs on the terminal. To verify that you have, for example, *gcc*, enter this line:
-
-    gcc --version
-
-For more complete instructions, [go here](http://www.cyberciti.biz/faq/centos-linux-install-gcc-c-c-compiler/).
-
-##### OS X
-
-You can install OpenBLAS on OS X with Homebrew:
-
-```
-brew install openblas
-```
-
-##### Windows
-
-An OpenBLAS package is available for `msys2`. You can install it using the `pacman` command.
-
-#### ATLAS
-
-##### Linux
-
-**Ubuntu**
-An apt package is available for ATLAS on Ubuntu:
-
-```
-sudo apt-get install libatlas-base-dev libatlas-dev
-```
-
-**CentOS**
-You can install ATLAS on CentOS using:
-
-```
-sudo yum install atlas-devel
-```
-
-##### OS X
-
-Installing ATLAS on OS X is a somewhat complicated and lengthy process. However, the following commands will work on most machines:
-
-```
-wget --content-disposition https://sourceforge.net/projects/math-atlas/files/latest/download?source=files
-tar jxf atlas*.tar.bz2
-mkdir atlas (Creating a directory for ATLAS)
-mv ATLAS atlas/src-3.10.1
-cd atlas/src-3.10.1
-wget http://www.netlib.org/lapack/lapack-3.5.0.tgz (It may be possible that the atlas download already contains this file in which case this command is not needed)
-mkdir intel(Creating a build directory)
-cd intel
-cpufreq-selector -g performance (This command requires root access. It is recommended but not essential)
-../configure --prefix=/path to the directory where you want ATLAS installed/ --shared --with-netlib-lapack-tarfile=../lapack-3.5.0.tgz
-make
-make check
-make ptcheck
-make time
-make install
-```
-
-#### CUDA
-
-##### Linux & OS X
-
-Detailed instructions for installing GPU architectures such as CUDA can be found [here](./deeplearning4j-config-gpu-cpu).
-
-##### Windows
-
-The CUDA Backend has some additional requirements before it can be built:
-
-* [CUDA SDK](https://developer.nvidia.com/cuda-downloads)
-* [Visual Studio 2012 or 2013](https://www.visualstudio.com/en-us/news/vs2013-community-vs.aspx) (Please note: Visual Studio 2015 is *NOT SUPPORTED* by CUDA 7.5 and below)
-
-In order to build the CUDA backend you will have to setup some more environment variables first, by calling `vcvars64.bat`.
-But first, set the system environment variable `SET_FULL_PATH` to `true`, so all of the variables that `vcvars64.bat` sets up, are passed to the mingw shell.
-
-1. Inside a normal cmd.exe command prompt, run `C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\amd64\vcvars64.bat`
-2. Run `c:\msys64\mingw64_shell.bat` inside that
-3. Change to your libnd4j folder
-4. `./buildnativeoperations.sh -c cuda`
-
-This builds the CUDA nd4j.dll.
-
-#### IDE Requirements
-
-If you are building Deeplearning4j through an IDE such as IntelliJ, you will need to install certain plugins to ensure your IDE renders code highlighting appropriately. You will need to install a plugin for Lombok:
-
-* IntelliJ Lombok Plugin: https://plugins.jetbrains.com/plugin/6317-lombok-plugin
-* Eclipse Lombok Plugin: Follow instructions at https://projectlombok.org/download.html
-
-If you want to work on ScalNet, the Scala API, or on certain modules such as the DL4J UI, you will need to ensure your IDE has Scala support installed and available to you.
-
-#### Testing
-
-Deeplearning4j uses a separate repository that contains all resources necessary for testing. This is to keep the central DL4J repository lightweight and avoid large blobs in the GIT history. To run the tests you need to install the test-resources from https://github.com/deeplearning4j/dl4j-test-resources (~10gb). If you don't care about history, do a shallow clone only with
-```bash
-git clone --depth 1 --branch master https://github.com/deeplearning4j/dl4j-test-resources
-cd dl4j-test-resources
-mvn install
-```
-
-Tests will run __only__ when `testresources` and a backend profile (such as `test-nd4j-native`) are selected
-
-```bash
-mvn clean test -P  testresources,test-nd4j-native
-```
-
-Running the tests will take a while. To run tests of just a single maven module you can add a module constraint with `-pl deeplearning4j-core` (for details see [here](https://stackoverflow.com/questions/11869762/maven-run-only-single-test-in-multi-module-project))
-
-## Installing the DL4J Stack
-
-## OS X & Linux
-
-### Checking ENV
-
-Before running the DL4J stack build script, you must ensure certain environment variables are defined before running your build. These are outlined below depending on your architecture.
-
-#### LIBND4J_HOME
-
-You will need to know the exact path of the directory where you are running the DL4J build script (you are encouraged to use a clean empty directory). Otherwise, your build will fail. Once you determine this path, add `/libnd4j` to the end of that path and export it to your local environment. This will look like:
-
-```
-export LIBND4J_HOME=/home/user/directory/libnd4j
-```
-
-#### CPU architecture w/ MKL
-
-You can link with MKL either at build time, or at runtime with binaries initially linked with another BLAS implementation such as OpenBLAS. To build against MKL, simply add the path containing `libmkl_rt.so` (or `mkl_rt.dll` on Windows), say `/path/to/intel64/lib/`, to the `LD_LIBRARY_PATH` environment variable on Linux (or `PATH` on Windows) and build like before. On Linux though, to make sure it uses the correct version of OpenMP, we also might need to set these environment variables:
-
-```bash
-export MKL_THREADING_LAYER=GNU
-export LD_PRELOAD=/lib64/libgomp.so.1
-```
-
-When libnd4j cannot be rebuilt, we can use the MKL libraries after the facts and get them loaded instead of OpenBLAS at runtime, but things are a bit trickier. Please additionally follow the instructions below.
-
-1. Make sure that files such as `/lib64/libopenblas.so.0` and `/lib64/libblas.so.3` are not available (or appear after in the `PATH` on Windows), or they will get loaded by libnd4j by their absolute paths, before anything else.
-2. Inside `/path/to/intel64/lib/`, create a symbolic link or copy of `libmkl_rt.so` (or `mkl_rt.dll` on Windows) to the name that libnd4j expect to load, for example:
-
-```bash
-ln -s libmkl_rt.so libopenblas.so.0
-ln -s libmkl_rt.so libblas.so.3
-```
-
-```cmd
-copy mkl_rt.dll libopenblas.dll
-copy mkl_rt.dll libblas3.dll
-```
-
-3. Finally, add `/path/to/intel64/lib/` to the `LD_LIBRARY_PATH` environment variable (or early in the `PATH` on Windows) and run your Java application as usual.
-
-
-### Build Script
-
-You can use the [build-dl4j-stack.sh](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/build-dl4j-stack.sh) script from the deeplearning4j repository to build the whole deeplearning4j stack from source: libndj4, ndj4, datavec, deeplearning4j. It clones the DL4J stack, builds each repository, and installs them locally to Maven. This script will work on both Linux and OS X platforms.
-
-OK, now read the following section carefully. 
-
-Use the build script below for CPU architectures:
-
-```
-./build-dl4j-stack.sh
-```
-Make sure to read this if you are on OS X (ensure gcc 5.x is setup and you aren't using clang):
-https://github.com/eclipse/deeplearning4j/issues/2668
-
-
-If you are using a GPU backend, use this instead:
-
-```
-./build-dl4j-stack.sh -c cuda
-```
-
-You can speed up your CUDA builds by using the `cc` flag as explained in the [libndj4 README](https://github.com/eclipse/deeplearning4j/tree/master/libnd4j).
-
-For Scala users, you can pass your binary version for Spark compatibility:
-
-```
-./build-dl4j-stack.sh -c cuda --scalav 2.11
-```
-
-The build script passes all options and flags to the libnd4j `./buildnativeoperations.sh` script. All flags used for those script can be passed via `build-dl4j-stack.sh`.
-
-### Building Manually
-
-If you prefer, you can build each piece in the DL4J stack by hand. The procedure for each piece of software is essentially:
-
-1. Git clone
-2. Build
-3. Install
-
-The overall procedure looks like the following commands below, with the exception that libnd4j's `./buildnativeoperations.sh` accepts parameters based on the backend you are building for. You need to follow these instructions in the order they're given. If you don't, you'll run into errors. The GPU-specific instructions below have been commented out, but should be substituted for the CPU-specific commands when building for a GPU backend. 
-
-``` shell
-# removes any existing repositories to ensure a clean build
-rm -rf libnd4j
-rm -rf nd4j
-rm -rf datavec
-rm -rf deeplearning4j
-
-# compile libnd4j
-git clone https://github.com/eclipse/deeplearning4j.git
-cd libnd4j
-./buildnativeoperations.sh
-# and/or when using GPU
-# ./buildnativeoperations.sh -c cuda -cc INSERT_YOUR_DEVICE_ARCH_HERE 
-# i.e. if you have GTX 1070 device, use -cc 61
-export LIBND4J_HOME=`pwd`
-cd ..
-
-# build and install nd4j to maven locally
-git clone https://github.com/eclipse/deeplearning4j.git
-cd nd4j
-# cross-build across Scala versions (recommended)
-bash buildmultiplescalaversions.sh clean install -DskipTests -Dmaven.javadoc.skip=true -pl '!:nd4j-cuda-9.0,!:nd4j-cuda-9.0-platform,!:nd4j-tests'
-# or build for a single scala version
-# mvn clean install -DskipTests -Dmaven.javadoc.skip=true -pl '!:nd4j-cuda-9.0,!:nd4j-cuda-9.0-platform,!:nd4j-tests'
-# or when using GPU
-# mvn clean install -DskipTests -Dmaven.javadoc.skip=true -pl '!:nd4j-tests'
-cd ..
-
-# build and install datavec
-git clone https://github.com/eclipse/deeplearning4j.git
-cd datavec
-if [ "$SCALAV" == "" ]; then
-  bash buildmultiplescalaversions.sh clean install -DskipTests -Dmaven.javadoc.skip=true
-else
-  mvn clean install -DskipTests -Dmaven.javadoc.skip=true -Dscala.binary.version=$SCALAV -Dscala.version=$SCALA
-fi
-cd ..
-
-# build and install deeplearning4j
-git clone https://github.com/eclipse/deeplearning4j.git
-cd deeplearning4j
-# cross-build across Scala versions (recommended)
-./buildmultiplescalaversions.sh clean install -DskipTests -Dmaven.javadoc.skip=true
-# or build for a single scala version
-# mvn clean install -DskipTests -Dmaven.javadoc.skip=true
-# If you skipped CUDA you may need to add
-# -pl '!./deeplearning4j-cuda/'
-# to the mvn clean install command to prevent the build from looking for cuda libs
-cd ..
-```
-
-## Using Local Dependencies
-
-Once you've installed the DL4J stack to your local maven repository, you can now include it in your build tool's dependencies. Follow the typical [Getting Started](http://deeplearning4j.org/gettingstarted) instructions for Deeplearning4j, and appropriately replace versions with the SNAPSHOT version currently on the [master POM](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/pom.xml).
-
-Note that some build tools such as Gradle and SBT don't properly pull in platform-specific binaries. You can follow instructions [here](http://nd4j.org/dependencies.html) for setting up your favorite build tool.
-
-## Support and Assistance
-
-If you encounter issues while building locally, the Deeplearning4j [Early Adopters Channel](https://gitter.im/deeplearning4j/deeplearning4j/earlyadopters) is a channel dedicated to assisting with build issues and other source problems. Please reach out on Gitter for help.
--- a/docs/deeplearning4j/templates/cheat-sheet.md
+++ b/docs/deeplearning4j/templates/cheat-sheet.md
@ -1,697 +0,0 @@
---
-title: Deeplearning4j Cheat Sheet
-short_title: Cheat Sheet
-description: Snippets and links for common functionality in Eclipse Deeplearning4j.
-category: Get Started
-weight: 2
---
-
-## Quick reference
-
-Deeplearning4j (and related projects) have a lot of functionality. The goal of this page is to summarize this functionality so users know what exists, and where to find more information.
-
-**Contents**
-
-* [Layers](#layers)
-    * [Feed-Forward Layers](#layers-ff)
-    * [Output Layers](#layers-out)
-    * [Convolutional Layers](#layers-conv)
-    * [Recurrent Layers](#layers-rnn)
-    * [Unsupervised Layers](#layers-unsupervised)
-    * [Other Layers](#layers-other)
-    * [Graph Vertices](#layers-vertices)
-    * [InputPreProcessors](#layers-preproc)
-* [Iteration/Training Listeners](#listeners)
-* [Evaluation](#evaluation)
-* [Network Saving and Loading](#saving)
-* [Network Configurations](#config)
-    * [Activation Functions](#config-afn)
-    * [Weight Initialization](#config-init)
-    * [Updaters (Optimizers)](#config-updaters)
-    * [Learning Rate Schedules](#config-schedules)
-    * [Regularization](#config-regularization)
-        * [L1/L2 regularization](#config-l1l2)
-        * [Dropout](#config-dropout)
-        * [Weight Noise](#config-weightnoise)
-        * [Constraints](#config-constraints)
-* [Data Classes](#data)
-    * [Iterators](#data-iter)
-        * [Iterators - Build-In (DL4J-Provided Data)](#data-iter-builtin)
-        * [Iterators - User Provided Data](#data-iter-user)
-        * [Iterators - Adapter and Utility Iterators](#data-iter-util)
-    * [Reading Raw Data: DataVec RecordReaders](#data-datavec)
-    * [Data Normalization](#data-norm)
-    * [Spark Network Training Data Classes](#data-spark)
-* [Transfer Learning](#transfer)
-* [Trained Model Library - Model Zoo](#zoo)
-* [SKIL - Model Deployment](#skil)
-* [Keras Import](#keras)
-* [Distributed Training (Spark)](#spark)
-* [Hyperparameter Optimization (Arbiter)](#arbiter)
-
-### <a name="layers">Layers</a>
-
-### <a name="layers-ff">Feed-Forward Layers</a>
-
-* **DenseLayer** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/layers/feedforward/dense/DenseLayer.java)) - A simple/standard fully-connected layer
-* **EmbeddingLayer** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/layers/feedforward/embedding/EmbeddingLayer.java)) - Takes positive integer indexes as input, outputs vectors. Only usable as first layer in a model. Mathematically equivalent (when bias is enabled) to DenseLayer with one-hot input, but more efficient. See also: EmbeddingSequenceLayer.
-
-#### <a name="layers-out">Output Layers</a>
-
-Output layers: usable only as the last layer in a network. Loss functions are set here.
-
-* **OutputLayer** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/OutputLayer.java)) - Output layer for standard classification/regression in MLPs/CNNs. Has a fully connected DenseLayer built in. 2d input/output (i.e., row vector per example).
-* **LossLayer** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/LossLayer.java)) - Output layer without parameters - only loss function and activation function. 2d input/output (i.e., row vector per example). Unlike Outputlayer, restricted to nIn = nOut.
-* **RnnOutputLayer** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/RnnOutputLayer.java)) - Output layer for recurrent neural networks. 3d (time series) input and output. Has time distributed fully connected layer built in.
-* **RnnLossLayer** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/RnnLossLayer.java)) - The 'no parameter' version of RnnOutputLayer. 3d (time series) input and output.
-* **CnnLossLayer** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/CnnLossLayer.java)) - Used with CNNs, where a prediction must be made at each spatial location of the output (for example: segmentation or denoising). No parameters, 4d input/output with shape [minibatch, depth, height, width]. When using softmax, this is applied depthwise at each spatial location.
-* **Cnn3DLossLayer** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/Cnn3DLossLayer.java)) - used with 3D CNNs, where a preduction must be made at each spatial location (x/y/z) of the output. Layer has no parameters, 5d data in either NCDHW or NDHWC ("channels first" or "channels last") format (configurable). Supports masking. When using Softmax, this is applied along channels at each spatial location.
-* **Yolo2OutputLayer** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/objdetect/Yolo2OutputLayer.java)) - Implentation of the YOLO 2 model for object detection in images
-* **CenterLossOutputLayer** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/CenterLossOutputLayer.java)) - A version of OutputLayer that also attempts to minimize the intra-class distance of examples' activations - i.e., "If example x is in class Y, ensure that embedding(x) is close to average(embedding(y)) for all examples y in Y"
-
-
-#### <a name="layers-conv">Convolutional Layers</a>
-
-* **ConvolutionLayer** / Convolution2D - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/ConvolutionLayer.java)) - Standard 2d convolutional neural network layer. Inputs and outputs have 4 dimensions with shape [minibatch,depthIn,heightIn,widthIn] and [minibatch,depthOut,heightOut,widthOut] respectively.
-* **Convolution1DLayer** / Convolution1D - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/Convolution1DLayer.java)) - Standard 1d convolution layer
-* **Convolution3DLayer** / Convolution3D - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/Convolution3D.java)) - Standard 3D convolution layer. Supports both NDHWC ("channels last") and NCDHW ("channels first") activations format.
-* **Deconvolution2DLayer** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/layers/convolution/Deconvolution2DLayer.java)) - also known as transpose or fractionally strided convolutions. Can be considered a "reversed" ConvolutionLayer; output size is generally larger than the input, whilst maintaining the spatial connection structure.
-* **SeparableConvolution2DLayer** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/layers/convolution/SeparableConvolution2DLayer.java)) - depthwise separable convolution layer
-* **SubsamplingLayer** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/layers/convolution/subsampling/SubsamplingLayer.java)) - Implements standard 2d spatial pooling for CNNs - with max, average and p-norm pooling available.
-* **Subsampling1DLayer** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/Subsampling1DLayer.java)) - 1D version of the subsampling layer.
-* **Upsampling2D** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/layers/convolution/upsampling/Upsampling2D.java)) - Upscale CNN activations by repeating the row/column values
-* **Upsampling1D** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/layers/convolution/upsampling/Upsampling1D.java)) - 1D version of the upsampling layer
-* **Cropping2D** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/convolutional/Cropping2D.java)) - Cropping layer for 2D convolutional neural networks
-* **DepthwiseConvolution2D** ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/DepthwiseConvolution2D.java))- 2d depthwise convolution layer
-* **ZeroPaddingLayer** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/layers/convolution/ZeroPaddingLayer.java)) - Very simple layer that adds the specified amount of zero padding to edges of the 4d input activations.
-* **ZeroPadding1DLayer** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/layers/convolution/ZeroPadding1DLayer.java)) - 1D version of ZeroPaddingLayer
-* **SpaceToDepth** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/SpaceToDepthLayer.java)) - This operation takes 4D array in, and moves data from spatial dimensions (HW) to channels (C) for given blockSize
-* **SpaceToBatch** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/SpaceToBatchLayer.java)) - Transforms data from a tensor from 2 spatial dimensions into batch dimension according to the "blocks" specified
-
-
-#### <a name="layers-rnn">Recurrent Layers</a>
-
-* **LSTM** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/LSTM.java)) - LSTM RNN without peephole connections. Supports CuDNN.
-* **GravesLSTM** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/GravesLSTM.java)) - LSTM RNN with peephole connections. Does *not* support CuDNN (thus for GPUs, LSTM should be used in preference).
-* **GravesBidirectionalLSTM** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/GravesBidirectionalLSTM.java)) - A bidirectional LSTM implementation with peephole connections. Equivalent to Bidirectional(ADD, GravesLSTM). Due to addition of Bidirecitonal wrapper (below), has been deprecated on master.
-* **Bidirectional** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/recurrent/Bidirectional.java)) - A 'wrapper' layer - converts any standard uni-directional RNN into a bidirectional RNN (doubles number of params - forward/backward nets have independent parameters). Activations from forward/backward nets may be either added, multiplied, averaged or concatenated.
-* **SimpleRnn** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/recurrent/SimpleRnn.java)) - A standard/'vanilla' RNN layer. Usually not effective in practice with long time series dependencies - LSTM is generally preferred.
-* **LastTimeStep** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/recurrent/LastTimeStep.java)) - A 'wrapper' layer - extracts out the last time step of the (non-bidirectional) RNN layer it wraps. 3d input with shape [minibatch, size, timeSeriesLength], 2d output with shape [minibatch, size].
-* EmbeddingSequenceLayer: ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/EmbeddingSequenceLayer.java)) - A version of EmbeddingLayer that expects fixed-length number (inputLength) of integers/indices per example as input, ranged from 0 to numClasses - 1. This input thus has shape [numExamples, inputLength] or shape [numExamples, 1, inputLength]. The output of this layer is 3D (sequence/time series), namely of shape [numExamples, nOut, inputLength]. Can only be used as the first layer for a network.
-
-
-#### <a name="layers-unsupervised">Unsupervised Layers</a>
-
-* **VariationalAutoencoder** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/variational/VariationalAutoencoder.java)) - A variational autoencoder implementation with MLP/dense layers for the encoder and decoder. Supports multiple different types of [reconstruction distributions](https://github.com/eclipse/deeplearning4j/tree/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/variational)
-* **AutoEncoder** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/AutoEncoder.java)) - Standard denoising autoencoder layer
-
-#### <a name="layers-other">Other Layers</a>
-
-* **GlobalPoolingLayer** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/GlobalPoolingLayer.java)) - Implements both pooling over time (for RNNs/time series - input size [minibatch, size, timeSeriesLength], out [minibatch, size]) and global spatial pooling (for CNNs - input size [minibatch, depth, h, w], out [minibatch, depth]). Available pooling modes: sum, average, max and p-norm.
-* **ActivationLayer** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/ActivationLayer.java)) - Applies an activation function (only) to the input activations. Note that most DL4J layers have activation functions built in as a config option.
-* **DropoutLayer** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/DropoutLayer.java)) - Implements dropout as a separate/single layer. Note that most DL4J layers have a "built-in" dropout configuration option.
-* **BatchNormalization** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/BatchNormalization.java)) - Batch normalization for 2d (feedforward), 3d (time series) or 4d (CNN) activations. For time series, parameter sharing across time; for CNNs, parameter sharing across spatial locations (but not depth).
-* **LocalResponseNormalization** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/LocalResponseNormalization.java)) - Local response normalization layer for CNNs. Not frequently used in modern CNN architectures.
-* **FrozenLayer** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/misc/FrozenLayer.java)) - Usually not used directly by users - added as part of transfer learning, to freeze a layer's parameters such that they don't change during further training.
-* **LocallyConnected2D** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/LocallyConnected2D.java)) - a 2d locally connected layer, assumes input is 4d data in NCHW ("channels first") format.
-* **LocallyConected1D** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/LocallyConnected1D.java)) - a 1d locally connected layer, assumes input is 3d data in NCW ([minibatch, size, sequenceLength]) format
-
-
-#### <a name="layers-vertices">Graph Vertices</a>
-
-Graph vertex: use with ComputationGraph. Similar to layers, vertices usually don't have any parameters, and may support multiple inputs.
-
-* **ElementWiseVertex** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/graph/ElementWiseVertex.java)) - Performs an element-wise operation on the inputs - add, subtract, product, average, max
-* **L2NormalizeVertex** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/graph/L2NormalizeVertex.java)) - normalizes the input activations by dividing by the L2 norm for each example. i.e., out <- out / l2Norm(out)
-* **L2Vertex** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/graph/L2Vertex.java)) - calculates the L2 distance between the two input arrays, for each example separately. Output is a single value, for each input value.
-* **MergeVertex** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/graph/L2Vertex.java)) - merge the input activations along dimension 1, to make a larger output array. For CNNs, this implements merging along the depth/channels dimension
-* **PreprocessorVertex** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/graph/PreprocessorVertex.java)) - a simple GraphVertex that contains an InputPreProcessor only
-*  **ReshapeVertex** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/graph/ReshapeVertex.java)) - Performs arbitrary activation array reshaping. The preprocessors in the next section should usually be preferred.
-* **ScaleVertex** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/graph/ScaleVertex.java)) - implements simple multiplicative scaling of the inputs - i.e., out = scalar * input
-* **ShiftVertex** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/graph/ShiftVertex.java)) - implements simple scalar element-wise addition on the inputs - i.e., out = input + scalar
-* **StackVertex** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/graph/StackVertex.java)) - used to stack all inputs along the minibatch dimension. Analogous to MergeVertex, but along dimension 0 (minibatch) instead of dimension 1 (nOut/channels)
-* **SubsetVertex** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/graph/SubsetVertex.java)) - used to get a contiguous subset of the input activations along dimension 1. For example, two SubsetVertex instances could be used to split the activations from an input array into two separate activations. Essentially the opposite of MergeVertex.
-*  **UnstackVertex** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/graph/UnstackVertex.java)) - similar to SubsetVertex, but along dimension 0 (minibatch) instead of dimension 1 (nOut/channels). Opposite of StackVertex
-
-
-
-### <a name="layers-preproc">Input Pre Processors</a>
-
-An InputPreProcessor is a simple class/interface that operates on the input to a layer. That is, a preprocessor is attached to a layer, and performs some operation on the input, before passing the layer to the output. Preprocessors also handle backpropagation - i.e., the preprocessing operations are generally differentiable.
-
-Note that in many cases (such as the XtoYPreProcessor classes), users won't need to (and shouldn't) add these manually, and can instead just use ```.setInputType(InputType.feedForward(10))``` or similar, which whill infer and add the preprocessors as required.
-
-* **CnnToFeedForwardPreProcessor** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/preprocessor/CnnToFeedForwardPreProcessor.java)) - handles the activation reshaping necessary to transition from a CNN layer (ConvolutionLayer, SubsamplingLayer, etc) to DenseLayer/OutputLayer etc.
-* **CnnToRnnPreProcessor** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/preprocessor/CnnToRnnPreProcessor.java)) - handles reshaping necessary to transition from a (effectively, time distributed) CNN layer to a RNN layer.
-* **ComposableInputPreProcessor** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/preprocessor/ComposableInputPreProcessor.java)) - simple class that allows multiple preprocessors to be chained + used on a single layer
-* **FeedForwardToCnnPreProcessor** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/preprocessor/FeedForwardToCnnPreProcessor.java)) - handles activation reshaping to transition from a row vector (per example) to a CNN layer. Note that this transition/preprocessor only makes sense if the activations are actually CNN activations, but have been 'flattened' to a row vector.
-* **FeedForwardToRnnPreProcessor** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/preprocessor/FeedForwardToRnnPreProcessor.java)) - handles transition from a (time distributed) feed-forward layer to a RNN layer
-* **RnnToCnnPreProcessor** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/preprocessor/RnnToCnnPreProcessor.java)) - handles transition from a sequence of CNN activations with shape ```[minibatch, depth*height*width, timeSeriesLength]``` to time-distributed ```[numExamples*timeSeriesLength, numChannels, inputWidth, inputHeight]``` format
-* **RnnToFeedForwardPreProcessor** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/preprocessor/RnnToFeedForwardPreProcessor.java)) - handles transition from time series activations (shape ```[minibatch,size,timeSeriesLength]```) to time-distributed feed-forward (shape ```[minibatch*tsLength,size]```) activations.
-
-
-### <a name="listeners">Iteration/Training Listeners</a>
-
-IterationListener: can be attached to a model, and are called during training, once after every iteration (i.e., after each parameter update).
-TrainingListener: extends IterationListener. Has a number of additional methods are called at different stages of training - i.e., after forward pass, after gradient calculation, at the start/end of each epoch, etc.
-
-Neither type (iteration/training) are called outside of training (i.e., during output or feed-forward methods)
-
-
-* **ScoreIterationListener** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/optimize/listeners/ScoreIterationListener.java), Javadoc) - Logs the loss function score every N training iterations
-* **PerformanceListener** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/optimize/listeners/PerformanceListener.java), Javadoc) - Logs performance (examples per sec, minibatches per sec, ETL time), and optionally score, every N training iterations.
-* **EvaluativeListener** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/optimize/listeners/EvaluativeListener.java), Javadoc) - Evaluates network performance on a test set every N iterations or epochs. Also has a system for callbacks, to (for example) save the evaluation results.
-* **CheckpointListener** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/optimize/listeners/checkpoint/CheckpointListener.java), Javadoc) - Save network checkpoints periodically - based on epochs, iterations or time (or some combination of all three).
-* **StatsListener** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-ui-parent/deeplearning4j-ui-model/src/main/java/org/deeplearning4j/ui/stats/StatsListener.java)) - Main listener for DL4J's web-based network training user interface. See [visualization page](https://deeplearning4j.org/visualization) for more details.
-* **CollectScoresIterationListener** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/optimize/listeners/CollectScoresIterationListener.java), Javadoc) - Similar to ScoreIterationListener, but stores scores internally in a list (for later retrieval) instead of logging scores
-* **TimeIterationListener** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/optimize/listeners/TimeIterationListener.java), Javadoc) - Attempts to estimate time until training completion, based on current speed and specified total number of iterations
-
-### <a name="evaluation">Evaluation</a>
-
-Link: [Main evaluation page](https://deeplearning4j.org/evaluation)
-
-ND4J has a number of classes for evaluating the performance of a network, against a test set. Deeplearning4j (and SameDiff) use these ND4J evaluation classes. Different evaluation classes are suitable for different types of networks.
-Note: in 1.0.0-beta3 (November 2018), all evaluation classes were moved from DL4J to ND4J; previously they were in DL4J.
-
-
-* **Evaluation** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/evaluation/classification/Evaluation.java)) - Used for the evaluation of multi-class classifiers (assumes standard one-hot labels, and softmax probability distribution over N classes for predictions). Calculates a number of metrics - accuracy, precision, recall, F1, F-beta, Matthews correlation coefficient, confusion matrix. Optionally calculates top N accuracy, custom binary decision thresholds, and cost arrays (for non-binary case). Typically used for softmax + mcxent/negative-log-likelihood networks.
-* **EvaluationBinary** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/evaluation/classification/EvaluationBinary.java)) - A multi-label binary version of the Evaluation class. Each network output is assumed to be a separate/independent binary class, with probability 0 to 1 independent of all other outputs. Typically used for sigmoid + binary cross entropy networks.
-* **EvaluationCalibration** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/evaluation/classification/EvaluationCalibration.java)) - Used to evaluation the calibration of a binary or multi-class classifier. Produces reliability diagrams, residual plots, and histograms of probabilities. Export plots to HTML using [EvaluationTools](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-core/src/main/java/org/deeplearning4j/evaluation/EvaluationTools.java).exportevaluationCalibrationToHtmlFile method
-* **ROC** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/evaluation/classification/ROC.java)) - Used for single output binary classifiers only - i.e., networks with nOut(1) + sigmoid, or nOut(2) + softmax. Supports 2 modes: thresholded (approximate) or exact (the default). Calculates area under ROC curve, area under precision-recall curve. Plot ROC and P-R curves to HTML using [EvaluationTools](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-core/src/main/java/org/deeplearning4j/evaluation/EvaluationTools.java)
-* **ROCBinary** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/evaluation/classification/ROCBinary.java)) - a version of ROC that is used for multi-label binary networks (i.e., sigmoid + binary cross entropy), where each network output is assumed to be an independent binary variable.  
-* **ROCMultiClass** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/evaluation/classification/ROCMultiClass.java)) - a version of ROC that is used for multi-class (non-binary) networks (i.e., softmax + mcxent/negative-log-likelihood networks). As ROC metrics are only defined for binary classification, this treats the multi-class output as a set of 'one-vs-all' binary classification problems.
-* **RegressionEvaluation** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/evaluation/regression/RegressionEvaluation.java)) - An evaluation class used for regression models (including multi-output regression models). Reports metrics such as mean-squared error (MSE), mean-absolute error, etc for each output/column.
-
-
-## <a name="saving">Network Saving and Loading</a>
-
-```MultiLayerNetwork.save(File)``` and ```MultiLayerNetwork.load(File)``` methods can be used to save and load models. These use ModelSerializer internally. Similar save/load methods are also available for ComputationGraph.
-
-MultiLayerNetwork and ComputationGraph can be saved using the [ModelSerializer](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/util/ModelSerializer.java) class - and specifically the ```writeModel```, ```restoreMultiLayerNetwork``` and ```restoreComputationGraph``` methods.
-
-Examples: [Saving and loading network](https://github.com/eclipse/deeplearning4j-examples/tree/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/misc/modelsaving)
-
-Networks can be trained further after saving and loading: however, be sure to load the 'updater' (i.e., the historical state for updaters like momentum, ). If no futher training is required, the updater state can be ommitted to save disk space and memory.
-
-Most Normalizers (implementing the ND4J ```Normalizer``` interface) can also be added to a model using the ```addNormalizerToModel``` method.
-
-Note that the format used for models in DL4J is .zip: it's possible to open/extract these files using programs supporting the zip format.
-
-
-
-## <a name="config">Network Configurations</a>
-
-This section lists the various configuration options that Deeplearning4j supports.
-
-### <a name="config-afn">Activation Functions</a>
-
-Activation functions can be defined in one of two ways:
-(a) By passing an [Activation](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/activations/Activation.java) enumeration value to the configuration - for example, ```.activation(Activation.TANH)```
-(b) By passing an [IActivation](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/activations/IActivation.java) instance - for example, ```.activation(new ActivationSigmoid())```
-
-Note that Deeplearning4j supports custom activation functions, which can be defined by extending [BaseActivationFunction](https://github.com/eclipse/deeplearning4j/tree/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/activations/impl)
-
-List of supported activation functions:
-* **CUBE** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/activations/impl/ActivationCube.java)) - ```f(x) = x^3```
-* **ELU** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/activations/impl/ActivationELU.java)) - Exponential linear unit ([Reference](https://arxiv.org/abs/1511.07289))
-* **HARDSIGMOID** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/activations/impl/ActivationHardSigmoid.java)) - a piecewise linear version of the standard sigmoid activation function. ```f(x) = min(1, max(0, 0.2*x + 0.5))```
-* **HARDTANH** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/activations/impl/ActivationHardTanH.java)) - a piecewise linear version of the standard tanh activation function.
-* **IDENTITY** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/activations/impl/ActivationIdentity.java)) - a 'no op' activation function: ```f(x) = x```
-* **LEAKYRELU** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/activations/impl/ActivationLReLU.java)) - leaky rectified linear unit. ```f(x) = max(0, x) + alpha * min(0, x)``` with ```alpha=0.01``` by default.
-* **RATIONALTANH** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/activations/impl/ActivationRationalTanh.java)) - ```tanh(y) ~ sgn(y) * { 1 - 1/(1+|y|+y^2+1.41645*y^4)}``` which approximates ```f(x) = 1.7159 * tanh(2x/3)```, but should be faster to execute. ([Reference](https://arxiv.org/abs/1508.01292))
-* **RELU** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/activations/impl/ActivationReLU.java)) - standard rectified linear unit: ```f(x) = x``` if ```x>0``` or ```f(x) = 0``` otherwise
-* **RRELU** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/activations/impl/ActivationRReLU.java)) - randomized rectified linear unit. Deterministic during test time. ([Reference](https://arxiv.org/abs/1505.00853))
-* **SIGMOID** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/activations/impl/ActivationSigmoid.java)) - standard sigmoid activation function, ```f(x) = 1 / (1 + exp(-x))```
-* **SOFTMAX** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/activations/impl/ActivationSoftmax.java)) - standard softmax activation function
-* **SOFTPLUS** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/activations/impl/ActivationSoftPlus.java)) - ```f(x) = log(1+e^x)``` - shape is similar to a smooth version of the RELU activation function
-* **SOFTSIGN** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/activations/impl/ActivationSoftSign.java)) - ```f(x) = x / (1+|x|)``` - somewhat similar in shape to the standard tanh activation function (faster to calculate).
-* **TANH** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/activations/impl/ActivationTanH.java)) - standard tanh (hyperbolic tangent) activation function
-* **RECTIFIEDTANH** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/activations/impl/ActivationRectifiedTanh.java)) - ```f(x) = max(0, tanh(x))```
-* **SELU** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/activations/impl/ActivationSELU.java)) - scaled exponential linear unit - used with [self normalizing neural networks](https://arxiv.org/abs/1706.02515)
-* **SWISH** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/activations/impl/ActivationSwish.java)) - Swish activation function, ```f(x) = x * sigmoid(x)``` ([Reference](https://arxiv.org/abs/1710.05941))
-
-### <a name="config-init">Weight Initialization</a>
-
-Weight initialization refers to the method by which the initial parameters for a new network should be set.
-
-Weight initialization are usually defined using the [WeightInit](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/weights/WeightInit.java) enumeration.
-
-Custom weight initializations can be specified using ```.weightInit(WeightInit.DISTRIBUTION).dist(new NormalDistribution(0, 1))``` for example. As for master (but not 0.9.1 release) ```.weightInit(new NormalDistribution(0, 1))``` is also possible, which is equivalent to the previous approach.
-
-Available weight initializations. Not again that not all are available in the 0.9.1 release:
-
-* **DISTRIBUTION**: Sample weights from a provided distribution (specified via ```dist``` configuration method
-* **ZERO**: Generate weights as zeros
-* **ONES**: All weights are set to 1
-* **SIGMOID_UNIFORM**: A version of XAVIER_UNIFORM for sigmoid activation functions. U(-r,r) with r=4*sqrt(6/(fanIn + fanOut))
-* **NORMAL**: Normal/Gaussian distribution, with mean 0 and standard deviation 1/sqrt(fanIn). This is the initialization recommented in [Klambauer et al. 2017, "Self-Normalizing Neural Network"](https://arxiv.org/abs/1706.02515) paper. Equivalent to DL4J's XAVIER_FAN_IN and LECUN_NORMAL (i.e. Keras' "lecun_normal")
-* **LECUN_UNIFORM**: Uniform U[-a,a] with a=3/sqrt(fanIn).
-* **UNIFORM**: Uniform U[-a,a] with a=1/sqrt(fanIn). "Commonly used heuristic" as per Glorot and Bengio 2010
-* **XAVIER**: As per [Glorot and Bengio 2010](http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf): Gaussian distribution with mean 0, variance 2.0/(fanIn + fanOut)
-* **XAVIER_UNIFORM**: As per [Glorot and Bengio 2010](http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf): Uniform distribution U(-s,s) with s = sqrt(6/(fanIn + fanOut))
-* **XAVIER_FAN_IN**: Similar to Xavier, but 1/fanIn -> Caffe originally used this.
-* **RELU**: [He et al. (2015), "Delving Deep into Rectifiers"](https://arxiv.org/abs/1502.01852). Normal distribution with variance 2.0/nIn
-* **RELU_UNIFORM**: [He et al. (2015), "Delving Deep into Rectifiers"](https://arxiv.org/abs/1502.01852). Uniform distribution U(-s,s) with s = sqrt(6/fanIn)
-* **IDENTITY**: Weights are set to an identity matrix. Note: can only be used with square weight matrices
-* **VAR_SCALING_NORMAL_FAN_IN**: Gaussian distribution with mean 0, variance 1.0/(fanIn)
-* **VAR_SCALING_NORMAL_FAN_OUT**: Gaussian distribution with mean 0, variance 1.0/(fanOut)
-* **VAR_SCALING_NORMAL_FAN_AVG**: Gaussian distribution with mean 0, variance 1.0/((fanIn + fanOut)/2)
-* **VAR_SCALING_UNIFORM_FAN_IN**: Uniform U[-a,a] with a=3.0/(fanIn)
-* **VAR_SCALING_UNIFORM_FAN_OUT**: Uniform U[-a,a] with a=3.0/(fanOut)
-* **VAR_SCALING_UNIFORM_FAN_AVG**: Uniform U[-a,a] with a=3.0/((fanIn + fanOut)/2)
-
-
-### <a name="config-updaters">Updaters (Optimizers)</a>
-
-An 'updater' in DL4J is a class that takes raw gradients and modifies them to become updates. These updates will then be applied to the network parameters.
-The [CS231n course notes](http://cs231n.github.io/neural-networks-3/#ada) have a good explanation of some of these updaters.
-
-Supported updaters in Deeplearning4j:
-* **AdaDelta** - ([Source](https://github.com/eclipse/deeplearning4j/tree/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/learning/config/AdaDelta.java)) - [Reference](https://arxiv.org/abs/1212.5701)
-* **AdaGrad** - ([Source](https://github.com/eclipse/deeplearning4j/tree/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/learning/config/AdaGrad.java)) - [Reference](http://jmlr.org/papers/v12/duchi11a.html)
-* **AdaMax** - ([Source](https://github.com/eclipse/deeplearning4j/tree/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/learning/config/AdaMax.java)) - A variant of the Adam updater - [Reference](https://arxiv.org/abs/1412.6980)
-* **Adam** - ([Source](https://github.com/eclipse/deeplearning4j/tree/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/learning/config/Adam.java))
-* **Nadam** - ([Source](https://github.com/eclipse/deeplearning4j/tree/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/learning/config/Nadam.java)) - A variant of the Adam updater, using the Nesterov mementum update rule - [Reference](https://arxiv.org/abs/1609.04747)
-* **Nesterovs** - ([Source](https://github.com/eclipse/deeplearning4j/tree/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/learning/config/Nesterovs.java)) - Nesterov momentum updater
-* **NoOp** - ([Source](https://github.com/eclipse/deeplearning4j/tree/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/learning/config/NoOp.java)) - A 'no operation' updater. That is, gradients are not modified at all by this updater. Mathematically equivalent to the SGD updater with a learning rate of 1.0
-* **RmsProp** - ([Source](https://github.com/eclipse/deeplearning4j/tree/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/learning/config/RmsProp.java)) - [Reference - slide 29](http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf)
-* **Sgd** - ([Source](https://github.com/eclipse/deeplearning4j/tree/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/learning/config/Sgd.java)) - Standard stochastic gradient descent updater. This updater applies a learning rate only.
-
-
-### <a name="config-schedules">Learning Rate Schedules</a>
-
-All updaters that support a learning rate also support learning rate schedules (the Nesterov momentum updater also supports a momentum schedule). Learning rate schedules can be specified either based on the number of iterations, or the number of epochs that have elapsed. Dropout (see below) can also make use of the schedules listed here.
-
-Configure using, for example: ```.updater(new Adam(new ExponentialSchedule(ScheduleType.ITERATION, 0.1, 0.99 )))```
-You can plot/inspect the learning rate that will be used at any point by calling ```ISchedule.valueAt(int iteration, int epoch)``` on the schedule object you have created.
-
-Available schedules:
-
-* **ExponentialSchedule** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/schedule/ExponentialSchedule.java)) - Implements ```value(i) = initialValue * gamma^i```
-* **InverseSchedule** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/schedule/InverseSchedule.java)) - Implements ```value(i) = initialValue * (1 + gamma * i)^(-power)```
-* **MapSchedule** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/schedule/MapSchedule.java)) - Learning rate schedule based on a user-provided map. Note that the provided map must have a value for iteration/epoch 0. Has a builder class to conveniently define a schedule.
-* **PolySchedule** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/schedule/PolySchedule.java)) - Implements ```value(i) = initialValue * (1 + i/maxIter)^(-power)```
-* **SigmoidSchedule** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/schedule/SigmoidSchedule.java)) - Implements ```value(i) = initialValue * 1.0 / (1 + exp(-gamma * (iter - stepSize)))```
-* **StepSchedule** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/schedule/StepSchedule.java)) - Implements ```value(i) = initialValue * gamma^( floor(iter/step) )```
-
-
-Note that custom schedules can be created by implementing the ISchedule interface.
-
-
-### <a name="config-regularization">Regularization</a>
-
-#### <a name="config-l1l2">L1/L2 Regularization</a>
-
-L1 and L2 regularization can easily be added to a network via the configuration: ```.l1(0.1).l2(0.2)```.
-Note that ```.regularization(true)``` must be enabled on 0.9.1 also (this option has been removed after 0.9.1 was released).
-
-L1 and L2 regularization is applied by default on the weight parameters only. That is, .l1 and .l2 will not impact bias parameters - these can be regularized using ```.l1Bias(0.1).l2Bias(0.2)```.
-
-
-#### <a name="config-dropout">Dropout</a>
-
-All dropout types are applied at training time only. They are not applied at test time.
-
-* **Dropout** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/dropout/Dropout.java)) - Each input activation x is independently set to (0, with probability 1-p) or (x/p with probability p)<br>
-* **GaussianDropout** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/dropout/GaussianDropout.java)) - This is a multiplicative Gaussian noise (mean 1) on the input activations. Each input activation x is independently set to: ```x * y```, where ```y ~ N(1, stdev = sqrt((1-rate)/rate))```
-* **GaussianNoise** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/dropout/GaussianNoise.java)) - Applies additive, mean-zero Gaussian noise to the input - i.e., ```x = x + N(0,stddev)```
-* **AlphaDropout** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/dropout/AlphaDropout.java)) - AlphaDropout is a dropout technique proposed by [Klaumbauer et al. 2017 - Self-Normalizing Neural Networks](https://arxiv.org/abs/1706.02515). Designed for self-normalizing neural networks (SELU activation, NORMAL weight init). Attempts to keep both the mean and variance of the post-dropout activations to the same (in expectation) as before alpha dropout was applied
-
-Note that (as of current master - but not 0.9.1) the dropout parameters can also be specified according to any of the schedule classes mentioned in the Learning Rate Schedules section.
-
-### <a name="config-weightnoise">Weight Noise</a>
-
-As per dropout, dropconnect / weight noise is applied only at training time
-
-* **DropConnect** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/weightnoise/DropConnect.java)) - DropConnect is similar to dropout, but applied to the parameters of a network (instead of the input activations). [Reference](https://cs.nyu.edu/~wanli/dropc/dropc.pdf)
-* **WeightNoise** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/weightnoise/WeightNoise.java)) - Apply noise of the specified distribution to the weights at training time. Both additive and multiplicative modes are supported - when additive, noise should be mean 0, when multiplicative, noise should be mean 1
-
-### <a name="config-constraints">Constraints</a>
-
-Constraints are deterministic limitations that are placed on a model's parameters at the end of each iteration (after the parameter update has occurred). They can be thought of as a type of regularization.
-
-* **MaxNormConstraint** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/constraint/MaxNormConstraint.java)) - Constrain the maximum L2 norm of the incoming weights for each unit to be less than or equal to the specified value. If the L2 norm exceeds the specified value, the weights will be scaled down to satisfy the constraint.
-* **MinMaxNormConstraint** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/constraint/MinMaxNormConstraint.java)) - Constrain the minimum AND maximum L2 norm of the incoming weights for each unit to be between the specified values. Weights will be scaled up/down if required.
-* **NonNegativeConstraint** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/constraint/NonNegativeConstraint.java)) - Constrain all parameters to be non-negative. Negative parameters will be replaced with 0.
-* **UnitNormConstraint** - ([Source](https://github.com/eclipse/deeplearning4j/blob/b841c0f549194dbdf88b42836df662d9b8ea8c6d/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/constraint/UnitNormConstraint.java)) - Constrain the L2 norm of the incoming weights for each unit to be 1.0.
-
-
-## <a name="data">Data Classes</a>
-
-### <a name="data-iter">Iterators</a>
-
-DataSetIterator is an abstraction that DL4J uses to iterate over minibatches of data, used for training. DataSetIterator returns DataSet objects, which are minibatches, and support a maximum of 1 input and 1 output array (INDArray).
-
-MultiDataSetIterator is similar to DataSetIterator, but returns MultiDataSet objects, which can have as many input and output arrays as required for the network.
-
-#### <a name="data-iter-builtin">Iterators - Build-In (DL4J-Provided Data)</a>
-
-These iterators download their data as required. The actual datasets they return are not customizable.
-
-* **MnistDataSetIterator** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-data/deeplearning4j-datasets/src/main/java/org/deeplearning4j/datasets/iterator/impl/MnistDataSetIterator.java)) - DataSetIterator for the well-known MNIST digits dataset. By default, returns a row vector (1x784), with values normalized to 0 to 1 range. Use ```.setInputType(InputType.convolutionalFlat())``` to use with CNNs.
-* **EmnistDataSetIterator** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-data/deeplearning4j-datasets/src/main/java/org/deeplearning4j/datasets/iterator/impl/EmnistDataSetIterator.java)) - Similar to the MNIST digits dataset, but with more examples, and also letters. Includes multiple different splits (letters only, digits only, letters + digits, etc). Same 1x784 format as MNIST, hence (other than different number of labels for some splits) can be used as a drop-in replacement for MnistDataSetIterator. [Reference 1](https://www.nist.gov/itl/iad/image-group/emnist-dataset), [Reference 2](https://arxiv.org/abs/1702.05373)
-* **IrisDataSetIterator** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-data/deeplearning4j-datasets/src/main/java/org/deeplearning4j/datasets/iterator/impl/IrisDataSetIterator.java)) - An iterator for the well known Iris dataset. 4 features, 3 output classes.
-* **CifarDataSetIterator** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-data/deeplearning4j-datasets/src/main/java/org/deeplearning4j/datasets/iterator/impl/CifarDataSetIterator.java)) - An iterator for the CIFAR images dataset. 10 classes, 4d features/activations format for CNNs in DL4J: ```[minibatch,channels,height,width] = [minibatch,3,32,32]```. Features are *not* normalized - instead, are in the range 0 to 255.
-* **LFWDataSetIterator** - ([Source]())
-* **TinyImageNetDataSetIterator** ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-data/deeplearning4j-datasets/src/main/java/org/deeplearning4j/datasets/iterator/impl/TinyImageNetDataSetIterator.java)) - A subset of the standard imagenet dataset; 200 classes, 500 images per class
-* **UciSequenceDataSetIterator** ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-data/deeplearning4j-datasets/src/main/java/org/deeplearning4j/datasets/iterator/impl/UciSequenceDataSetIterator.java)) - UCI synthetic control time series dataset
-
-#### <a name="data-iter-user">Iterators - User Provided Data</a>
-
-The iterators in this subsection are used with user-provided data.
-
-* **RecordReaderDataSetIterator** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-data/deeplearning4j-datavec-iterators/src/main/java/org/deeplearning4j/datasets/datavec/RecordReaderDataSetIterator.java)) - an iterator that takes a DataVec record reader (such as CsvRecordReader or ImageRecordReader) and handles conversion to DataSets, batching, masking, etc. One of the most commonly used iterators in DL4J. Handles non-sequence data only, as input (i.e., RecordReader, no SequenceeRecordReader).
-* **RecordReaderMultiDataSetIterator** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-data/deeplearning4j-datavec-iterators/src/main/java/org/deeplearning4j/datasets/datavec/RecordReaderMultiDataSetIterator.java)) - the MultiDataSet version of RecordReaderDataSetIterator, that supports multiple readers. Has a builder pattern for creating more complex data pipelines (such as different subsets of a reader's output to different input/output arrays, conversion to one-hot, etc). Handles both sequence and non-sequence data as input.
-* **SequenceRecordReaderDataSetIterator** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-data/deeplearning4j-datavec-iterators/src/main/java/org/deeplearning4j/datasets/datavec/SequenceRecordReaderDataSetIterator.java)) - The sequence (SequenceRecordReader) version of RecordReaderDataSetIterator. Users may be better off using RecordReaderMultiDataSetIterator, in conjunction with
-* **DoublesDataSetIterator** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-data/deeplearning4j-utility-iterators/src/main/java/org/deeplearning4j/datasets/iterator/DoublesDataSetIterator.java))
-* **FloatsDataSetIterator** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-data/deeplearning4j-utility-iterators/src/main/java/org/deeplearning4j/datasets/iterator/FloatsDataSetIterator.java))
-* **INDArrayDataSetIterator** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-data/deeplearning4j-utility-iterators/src/main/java/org/deeplearning4j/datasets/iterator/INDArrayDataSetIterator.java))
-
-
-#### <a name="data-iter-util">Iterators - Adapter and Utility Iterators</a>
-
-* **MultiDataSetIteratorAdapter** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/datasets/iterator/impl/MultiDataSetIteratorAdapter.java)) - Wrap a DataSetIterator to convert it to a MultiDataSetIterator
-* **SingletonMultiDataSetIterator** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-data/deeplearning4j-utility-iterators/src/main/java/org/deeplearning4j/datasets/iterator/impl/SingletonMultiDataSetIterator.java)) - Wrap a MultiDataSet into a MultiDataSetIterator that returns one MultiDataSet (i.e., the wrapped MultiDataSet is *not* split up)
-* **AsyncDataSetIterator** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-data/deeplearning4j-utility-iterators/src/main/java/org/deeplearning4j/datasets/iterator/AsyncDataSetIterator.java)) - Used automatically by MultiLayerNetwork and ComputationGraph where appropriate. Implements asynchronous prefetching of datasets to improve performance.
-* **AsyncMultiDataSetIterator**  - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-data/deeplearning4j-utility-iterators/src/main/java/org/deeplearning4j/datasets/iterator/AsyncMultiDataSetIterator.java)) - Used automatically by ComputationGraph where appropriate. Implements asynchronous prefetching of MultiDataSets to improve performance.
-* **AsyncShieldDataSetIterator** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-data/deeplearning4j-utility-iterators/src/main/java/org/deeplearning4j/datasets/iterator/AsyncShieldDataSetIterator.java)) - Generally used only for debugging. Stops MultiLayerNetwork and ComputationGraph from using an AsyncDataSetIterator.
-* **AsyncShieldMultiDataSetIterator** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-data/deeplearning4j-utility-iterators/src/main/java/org/deeplearning4j/datasets/iterator/AsyncShieldMultiDataSetIterator.java)) - The MultiDataSetIterator version of AsyncShieldDataSetIterator
-* **EarlyTerminationDataSetIterator** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-data/deeplearning4j-utility-iterators/src/main/java/org/deeplearning4j/datasets/iterator/EarlyTerminationDataSetIterator.java)) - Wraps another DataSetIterator, ensuring that only a specified (maximum) number of minibatches (DataSet) objects are returned between resets. Can be used to 'cut short' an iterator, returning only the first N DataSets.
-* **EarlyTerminationMultiDataSetIterator** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-data/deeplearning4j-utility-iterators/src/main/java/org/deeplearning4j/datasets/iterator/EarlyTerminationMultiDataSetIterator.java)) - The MultiDataSetIterator version of EarlyTerminationDataSetIterator
-* **ExistingDataSetIterator** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-data/deeplearning4j-utility-iterators/src/main/java/org/deeplearning4j/datasets/iterator/ExistingDataSetIterator.java)) - Convert an ```Iterator<DataSet>``` or ```Iterable<DataSet>``` to a DataSetIterator. Does not split the underlying DataSet objects
-* **FileDataSetIterator** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-data/deeplearning4j-utility-iterators/src/main/java/org/deeplearning4j/datasets/iterator/file/FileDataSetIterator.java)) - An iterator that iterates over DataSet files that have been previously saved with ```DataSet.save(File)```. Supports randomization, filtering, different output batch size vs. saved DataSet batch size, etc.
-* **FileMultiDataSetIterator** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-data/deeplearning4j-utility-iterators/src/main/java/org/deeplearning4j/datasets/iterator/file/FileMultiDataSetIterator.java)) - A MultiDataSet version of FileDataSetIterator
-* **IteratorDataSetIterator** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-data/deeplearning4j-utility-iterators/src/main/java/org/deeplearning4j/datasets/iterator/IteratorDataSetIterator.java)) - Convert an ```Iterator<DataSet>``` to a DataSetIterator. Unlike ExistingDataSetIterator, the underlying DataSet objects may be split/combined - i.e., the minibatch size may differ for the output, vs. the input iterator.
-* **IteratorMultiDataSetIterator** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-data/deeplearning4j-utility-iterators/src/main/java/org/deeplearning4j/datasets/iterator/IteratorMultiDataSetIterator.java)) - The ```Iterator<MultiDataSet>``` version of IteratorDataSetIterator
-* **MultiDataSetWrapperIterator** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-data/deeplearning4j-utility-iterators/src/main/java/org/deeplearning4j/datasets/iterator/MultiDataSetWrapperIterator.java)) - Convert a MultiDataSetIterator to a DataSetIterator. Note that this is only possible if the number of features and labels arrays is equal to 1.
-* **MultipleEpochsIterator** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-data/deeplearning4j-utility-iterators/src/main/java/org/deeplearning4j/datasets/iterator/MultipleEpochsIterator.java)) - Treat multiple passes (epochs) of the underlying iterator as a single epoch, when training.
-* **WorkspaceShieldDataSetIterator** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-data/deeplearning4j-utility-iterators/src/main/java/org/deeplearning4j/datasets/iterator/WorkspacesShieldDataSetIterator.java)) - Generally used only for debugging, and not usually by users. Detaches/migrates DataSets coming out of the underlying DataSetIterator.
-
-
-### <a name="data-norm">Data Normalization</a>
-
-ND4J provides a number of classes for performing data normalization. These are implemented as DataSetPreProcessors.
-The basic pattern for normalization:
-
-1. Create your (unnormalized) DataSetIterator or MultiDataSetIterator: ```DataSetIterator myTrainData = ...```
-2. Create the normalizer you want to use: ```NormalizerMinMaxScaler normalizer = new NormalizerMinMaxScaler();```
-3. Fit the normalizer: ```normalizer.fit(myTrainData)```
-4. Set the normalizer/preprocessor on the iterator: ```myTrainData.setPreProcessor(normalizer);```
-End result: the data that comes from your DataSetIterator will now be normalized.
-
-In general, you should fit *only* on the training data, and do ```trainData.setPreProcessor(normalizer)``` and ```testData.setPreProcessor(normalizer)``` with the same/single normalizer that has been fit on the training data only.
-
-Note that where appropriate (NormalizerStandardize, NormalizerMinMaxScaler) statistics such as mean/standard-deviation/min/max are shared across time (for time series) and across image x/y locations (but not depth/channels - for image data).
-
-Data normalization example: [link](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/dataexamples/PreprocessNormalizerExample.java)
-
-**Available normalizers: DataSet / DataSetIterator**
-
-* **ImagePreProcessingScaler** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/dataset/api/preprocessor/ImagePreProcessingScaler.java)) - Applies min-max scaling to image activations. Default settings do 0-255 input to 0-1 output (but is configurable). Note that unlike the other normalizers here, this one does not rely on statistics (mean/min/max etc) collected from the data, hence the ```normalizer.fit(trainData)``` step is unnecessary (is a no-op).
-* **NormalizerStandardize** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/dataset/api/preprocessor/NormalizerStandardize.java)) - normalizes each feature value independently (and optionally label values) to have 0 mean and a standard deviation of 1
-* **NormalizerMinMaxScaler** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/dataset/api/preprocessor/NormalizerMinMaxScaler.java)) - normalizes each feature value independently (and optionally label values) to lie between a minimum and maximum value (by default between 0 and 1)
-* **VGG16ImagePreProcessor** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/dataset/api/preprocessor/VGG16ImagePreProcessor.java)) - This is a preprocessor specifically for VGG16. It subtracts the mean RGB value, computed on the training set, from each pixel as reported in [Link](https://arxiv.org/pdf/1409.1556.pdf)
-
-
-**Available normalizers: MultiDataSet / MultiDataSetIterator**
-
-* **ImageMultiPreProcessingScaler** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/dataset/api/preprocessor/ImageMultiPreProcessingScaler.java)) - A MultiDataSet/MultiDataSetIterator version of ImagePreProcessingScaler
-* **MultiNormalizerStandardize** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/dataset/api/preprocessor/MultiNormalizerStandardize.java)) - MultiDataSet/MultiDataSetIterator version of NormalizerStandardize
-* **MultiNormalizerMinMaxScaler** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/dataset/api/preprocessor/MultiNormalizerMinMaxScaler.java)) - MultiDataSet/MultiDataSetIterator version of NormalizerMinMaxScaler
-* **MultiNormalizerHybrid** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/dataset/api/preprocessor/MultiNormalizerHybrid.java)) - A MultiDataSet normalizer that can combine different normalization types (standardize, min/max etc) for different input/feature and output/label arrays.
-
-
-### <a name="transfer">Transfer Learning</a>
-
-Deeplearning4j has classes/utilities for performing transfer learning - i.e., taking an existing network, and modifying some of the layers (optionally freezing others so their parameters don't change). For example, an image classifier could be trained on ImageNet, then applied to a new/different dataset. Both MultiLayerNetwork and ComputationGraph can be used with transfer learning - frequently starting from a pre-trained model from the model zoo (see next section), though any MultiLayerNetwork/ComputationGraph can be used.
-
-Link: [Transfer learning examples](https://github.com/eclipse/deeplearning4j-examples/tree/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/transferlearning/vgg16)
-
-The main class for transfer learning is [TransferLearning](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/transferlearning/TransferLearning.java). This class has a builder pattern that can be used to add/remove layers, freeze layers, etc.
-[FineTuneConfiguration](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/transferlearning/FineTuneConfiguration.java) can be used here to specify the learning rate and other settings for the non-frozen layers.
-
-
-### <a name="zoo">Trained Model Library - Model Zoo</a>
-
-Deeplearning4j provides a 'model zoo' - a set of pretrained models that can be downloaded and used either as-is (for image classification, for example) or often for transfer learning.
-
-Link: [Deeplearning4j Model Zoo](https://deeplearning4j.org/model-zoo)
-
-Models available in DL4J's model zoo:
-
-* **AlexNet** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/AlexNet.java))
-* **Darknet19** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/Darknet19.java))
-* **FaceNetNN4Small2** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/FaceNetNN4Small2.java))
-* **InceptionResNetV1** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/InceptionResNetV1.java))
-* **LeNet** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/LeNet.java))
-* **ResNet50** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/ResNet50.java))
-* **SimpleCNN** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/SimpleCNN.java))
-* **TextGenerationLSTM** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/TextGenerationLSTM.java))
-* **TinyYOLO** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/TinyYOLO.java))
-* **VGG16** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/VGG16.java))
-* **VGG19** - ([Source](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/VGG19.java))
-
-
-**Note*: Trained Keras models (not provided by DL4J) may also be imported, using Deeplearning4j's Keras model import functionality.
-
-## Cheat sheet code snippets
-
-The Eclipse Deeplearning4j libraries come with a lot of functionality, and we've put together this cheat sheet to help users assemble neural networks and use tensors faster.
-
-### Neural networks
-
-Code for configuring common parameters and layers for both `MultiLayerNetwork` and `ComputationGraph`. See [MultiLayerNetwork](/api/{{page.version}}/org/deeplearning4j/nn/multilayer/MultiLayerNetwork.html) and [ComputationGraph](/api/{{page.version}}/org/deeplearning4j/nn/graph/ComputationGraph.html) for full API.
-
-**Sequential networks**
-
-Most network configurations can use `MultiLayerNetwork` class if they are sequential and simple.
-
-```java
-MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
-    .seed(1234)
-    // parameters below are copied to every layer in the network
-    // for inputs like dropOut() or activation() you should do this per layer
-    // only specify the parameters you need
-    .updater(new AdaGrad())
-    .activation(Activation.RELU)
-    .dropOut(0.8)
-    .l1(0.001)
-    .l2(1e-4)
-    .weightInit(WeightInit.XAVIER)
-    .weightInit(Distribution.TruncatedNormalDistribution)
-    .cudnnAlgoMode(ConvolutionLayer.AlgoMode.PREFER_FASTEST)
-    .gradientNormalization(GradientNormalization.RenormalizeL2PerLayer)
-    .gradientNormalizationThreshold(1e-3)
-    .list()
-    // layers in the network, added sequentially
-    // parameters set per-layer override the parameters above
-    .layer(new DenseLayer.Builder().nIn(numInputs).nOut(numHiddenNodes)
-            .weightInit(WeightInit.XAVIER)
-            .build())
-    .layer(new ActivationLayer(Activation.RELU))
-    .layer(new ConvolutionLayer.Builder(1,1)
-            .nIn(1024)
-            .nOut(2048)
-            .stride(1,1)
-            .convolutionMode(ConvolutionMode.Same)
-            .weightInit(WeightInit.XAVIER)
-            .activation(Activation.IDENTITY)
-            .build())
-    .layer(new GravesLSTM.Builder()
-            .activation(Activation.TANH)
-            .nIn(inputNum)
-            .nOut(100)
-            .build())
-    .layer(new OutputLayer.Builder(LossFunction.NEGATIVELOGLIKELIHOOD)
-            .weightInit(WeightInit.XAVIER)
-            .activation(Activation.SOFTMAX)
-            .nIn(numHiddenNodes).nOut(numOutputs).build())
-    .pretrain(false).backprop(true)
-    .build();
-
-MultiLayerNetwork neuralNetwork = new MultiLayerNetwork(conf);
-```
-
-**Complex networks**
-
-Networks that have complex graphs and "branching" such as *Inception* need to use `ComputationGraph`.
-
-```java
-ComputationGraphConfiguration.GraphBuilder graph = new NeuralNetConfiguration.Builder()
-	.seed(seed)
-    // parameters below are copied to every layer in the network
-    // for inputs like dropOut() or activation() you should do this per layer
-    // only specify the parameters you need
-    .activation(Activation.IDENTITY)
-    .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
-    .updater(updater)
-    .weightInit(WeightInit.RELU)
-    .l2(5e-5)
-    .miniBatch(true)
-    .cacheMode(cacheMode)
-    .trainingWorkspaceMode(workspaceMode)
-    .inferenceWorkspaceMode(workspaceMode)
-    .cudnnAlgoMode(cudnnAlgoMode)
-    .convolutionMode(ConvolutionMode.Same)
-    .graphBuilder()
-    // layers in the network, added sequentially
-    // parameters set per-layer override the parameters above
-    // note that you must name each layer and manually specify its input
-    .addInputs("input1")
-    .addLayer("stem-cnn1", new ConvolutionLayer.Builder(new int[] {7, 7}, new int[] {2, 2}, new int[] {3, 3})
-    	.nIn(inputShape[0])
-    	.nOut(64)
-	    .cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE)
-	    .build(),"input1")
-    .addLayer("stem-batch1", new BatchNormalization.Builder(false)
-    	.nIn(64)
-    	.nOut(64)
-    	.build(), "stem-cnn1")
-    .addLayer("stem-activation1", new ActivationLayer.Builder()
-    	.activation(Activation.RELU)
-    	.build(), "stem-batch1")
-    .addLayer("lossLayer", new CenterLossOutputLayer.Builder()
-        .lossFunction(LossFunctions.LossFunction.SQUARED_LOSS)
-        .activation(Activation.SOFTMAX).nOut(numClasses).lambda(1e-4).alpha(0.9)
-        .gradientNormalization(GradientNormalization.RenormalizeL2PerLayer).build(),
-        "stem-activation1")
-    .setOutputs("lossLayer")
-    .setInputTypes(InputType.convolutional(224, 224, 3))
-    .backprop(true).pretrain(false).build();
-
-ComputationGraph neuralNetwork = new ComputationGraph(graph);
-```
-
-
-### Training
-
-The code snippet below creates a basic pipeline that loads images from disk, applies random transformations, and fits them to a neural network. It also sets up a UI instance so you can visualize progress, and uses early stopping to terminate training early. You can adapt this pipeline for many different use cases.
-
-```java
-ParentPathLabelGenerator labelMaker = new ParentPathLabelGenerator();
-File mainPath = new File(System.getProperty("user.dir"), "dl4j-examples/src/main/resources/animals/");
-FileSplit fileSplit = new FileSplit(mainPath, NativeImageLoader.ALLOWED_FORMATS, rng);
-int numExamples = Math.toIntExact(fileSplit.length());
-int numLabels = fileSplit.getRootDir().listFiles(File::isDirectory).length; //This only works if your root is clean: only label subdirs.
-BalancedPathFilter pathFilter = new BalancedPathFilter(rng, labelMaker, numExamples, numLabels, maxPathsPerLabel);
-
-InputSplit[] inputSplit = fileSplit.sample(pathFilter, splitTrainTest, 1 - splitTrainTest);
-InputSplit trainData = inputSplit[0];
-InputSplit testData = inputSplit[1];
-
-boolean shuffle = false;
-ImageTransform flipTransform1 = new FlipImageTransform(rng);
-ImageTransform flipTransform2 = new FlipImageTransform(new Random(123));
-ImageTransform warpTransform = new WarpImageTransform(rng, 42);
-List<Pair<ImageTransform,Double>> pipeline = Arrays.asList(
-	new Pair<>(flipTransform1,0.9),
-    new Pair<>(flipTransform2,0.8),
-    new Pair<>(warpTransform,0.5));
-
-ImageTransform transform = new PipelineImageTransform(pipeline,shuffle);
-DataNormalization scaler = new ImagePreProcessingScaler(0, 1);
-
-// training dataset
-ImageRecordReader recordReaderTrain = new ImageRecordReader(height, width, channels, labelMaker);
-recordReader.initialize(trainData, null);
-DataSetIterator trainingIterator = new RecordReaderDataSetIterator(recordReaderTrain, batchSize, 1, numLabels);
-
-// testing dataset
-ImageRecordReader recordReaderTest = new ImageRecordReader(height, width, channels, labelMaker);
-recordReader.initialize(testData, null);
-DataSetIterator testingIterator = new RecordReaderDataSetIterator(recordReaderTest, batchSize, 1, numLabels);
-
-// early stopping configuration, model saver, and trainer
-EarlyStoppingModelSaver saver = new LocalFileModelSaver(System.getProperty("user.dir"));
-EarlyStoppingConfiguration esConf = new EarlyStoppingConfiguration.Builder()
-    .epochTerminationConditions(new MaxEpochsTerminationCondition(50)) //Max of 50 epochs
-    .evaluateEveryNEpochs(1)
-    .iterationTerminationConditions(new MaxTimeIterationTerminationCondition(20, TimeUnit.MINUTES)) //Max of 20 minutes
-    .scoreCalculator(new DataSetLossCalculator(testingIterator, true))     //Calculate test set score
-    .modelSaver(saver)
-    .build();
-
-EarlyStoppingTrainer trainer = new EarlyStoppingTrainer(esConf, neuralNetwork, trainingIterator);
-
-// begin training
-trainer.fit();
-```
-
-### Complex Transformation
-
-DataVec comes with a portable `TransformProcess` class that allows for more complex data wrangling and data conversion. It works well with both 2D and sequence datasets.
-
-```java
-Schema schema = new Schema.Builder()
-    .addColumnsDouble("Sepal length", "Sepal width", "Petal length", "Petal width")
-    .addColumnCategorical("Species", "Iris-setosa", "Iris-versicolor", "Iris-virginica")
-    .build();
-
-TransformProcess tp = new TransformProcess.Builder(schema)
-    .categoricalToInteger("Species")
-    .build();
-
-// do the transformation on spark
-JavaRDD<List<Writable>> processedData = SparkTransformExecutor.execute(parsedInputData, tp);
-```
-
-We recommend having a look at the [DataVec examples](https://github.com/eclipse/deeplearning4j-examples/tree/master/datavec-examples/src/main/java/org/datavec/transform) before creating more complex transformations.
-
-
-### Evaluation
-
-Both `MultiLayerNetwork` and `ComputationGraph` come with built-in `.eval()` methods that allow you to pass a dataset iterator and return evaluation results.
-
-```java
-// returns evaluation class with accuracy, precision, recall, and other class statistics
-Evaluation eval = neuralNetwork.eval(testIterator);
-System.out.println(eval.accuracy());
-System.out.println(eval.precision());
-System.out.println(eval.recall());
-
-// ROC for Area Under Curve on multi-class datasets (not binary classes)
-ROCMultiClass roc = neuralNetwork.doEvaluation(testIterator, new ROCMultiClass());
-System.out.println(roc.calculateAverageAuc());
-System.out.println(roc.calculateAverageAucPR());
-```
-
-For advanced evaluation the code snippet below can be adapted into training pipelines. This is when the built-in `neuralNetwork.eval()` method outputs confusing results or if you need to examine raw data.
-
-```java
-//Evaluate the model on the test set
-Evaluation eval = new Evaluation(numClasses);
-INDArray output = neuralNetwork.output(testData.getFeatures());
-eval.eval(testData.getLabels(), output, testMetaData); //Note we are passing in the test set metadata here
-
-//Get a list of prediction errors, from the Evaluation object
-//Prediction errors like this are only available after calling iterator.setCollectMetaData(true)
-List<Prediction> predictionErrors = eval.getPredictionErrors();
-System.out.println("\n\n+++++ Prediction Errors +++++");
-for(Prediction p : predictionErrors){
-    System.out.println("Predicted class: " + p.getPredictedClass() + ", Actual class: " + p.getActualClass()
-        + "\t" + p.getRecordMetaData(RecordMetaData.class).getLocation());
-}
-
-//We can also load the raw data:
-List<Record> predictionErrorRawData = recordReader.loadFromMetaData(predictionErrorMetaData);
-for(int i=0; i<predictionErrors.size(); i++ ){
-    Prediction p = predictionErrors.get(i);
-    RecordMetaData meta = p.getRecordMetaData(RecordMetaData.class);
-    INDArray features = predictionErrorExamples.getFeatures().getRow(i);
-    INDArray labels = predictionErrorExamples.getLabels().getRow(i);
-    List<Writable> rawData = predictionErrorRawData.get(i).getRecord();
-
-    INDArray networkPrediction = model.output(features);
-
-    System.out.println(meta.getLocation() + ": "
-        + "\tRaw Data: " + rawData
-        + "\tNormalized: " + features
-        + "\tLabels: " + labels
-        + "\tPredictions: " + networkPrediction);
-}
-
-//Some other useful evaluation methods:
-List<Prediction> list1 = eval.getPredictions(1,2);                  //Predictions: actual class 1, predicted class 2
-List<Prediction> list2 = eval.getPredictionByPredictedClass(2);     //All predictions for predicted class 2
-List<Prediction> list3 = eval.getPredictionsByActualClass(2);       //All predictions for actual class 2
-```
--- a/docs/deeplearning4j/templates/concepts.md
+++ b/docs/deeplearning4j/templates/concepts.md
@ -1,108 +0,0 @@
---
-title: Core Concepts in Deeplearning4j
-short_title: Core Concepts
-description: Introduction to core Deeplearning4j concepts.
-category: Get Started
-weight: 1
---
-
-## Overview
-
-Every machine-learning workflow consists of at least two parts. The first is loading your data and preparing it to be used for learning. We refer to this part as the ETL (extract, transform, load) process. [DataVec](./datavec-overview) is the library we built to make building data pipelines easier. The second part is the actual learning system itself. That is the algorithmic core of DL4J. 
-
-All deep learning is based on vectors and tensors, and DL4J relies on a tensor library called [ND4J](./nd4j-overview). It provides us with the ability to work with *n-dimensional arrays* (also called tensors). Thanks to its different backends, it even enables us to use both CPUs and GPUs.  
-
-## Preparing Data for Learning and Prediction
-
-Unlike other machine learning or deep learning frameworks, DL4J treats the tasks of loading data and training algorithms as separate processes. You don't just point the model at data saved somewhere on disk, you load the data using DataVec. This gives you a lot more flexibility, and retains the convenience of simple data loading.
-
-Before the algorithm can start learning, you have to prepare the data, even if you already have a trained model. Preparing data means loading it and putting it in the right shape and value range (e.g. normalization, zero-mean and unit variance). Building these processes from scratch is error prone, so use DataVec wherever possible.
-
-Deeplearning4j works with a lot of different data types, such as images, CSV, plain text and, with [Apache Camel](https://camel.apache.org/) [integration](https://github.com/eclipse/deeplearning4j/tree/master/datavec/tree/master/datavec-camel), pretty much any other data type you can think of.
-
-To use DataVec, you will need one of the implementations of the [RecordReader](/api/{{page.version}}/org/datavec/api/records/reader/RecordReader.html) interface along with the [RecordReaderDataSetIterator](/api/{{page.version}}/org/deeplearning4j/datasets/datavec/RecordReaderDataSetIterator.html).
-
-Once you have a [DataSetIterator](/api/{{page.version}}/org/nd4j/linalg/dataset/api/iterator/DataSetIterator.html), which is just a pattern that describes sequential access to data, you can use it to retrieve the data in a format suited for training a neural net model.
-
-### Normalizing Data
-
-Neural networks work best when the data they're fed is normalized, constrained to a range between -1 and 1. There are several reasons for that. One is that nets are trained using [gradient descent](https://en.wikipedia.org/wiki/Gradient_descent), and their activation functions usually having an active range somewhere between -1 and 1. Even when using an activation function that doesn't saturate quickly, it is still good practice to constrain your values to this range to improve performance.
-
-Normalizing data is pretty easy in DL4J. Decide how you want to normalize your data, and set the corresponding [DataNormalization](./datavec-normalization) up as a preprocessor for your DataSetIterator.
-
-The `ImagePreProcessingScaler` is obviously a good choice for image data. The `NormalizerMinMaxScaler` is a good choice if you have a uniform range along all dimensions of your input data, and `NormalizerStandardize` is what you would usually use in other cases.
-
-If you need other types of normalization, you are also free to implement the `DataNormalization` interface.
-
-If you use `NormalizerStandardize`, note that this is a normalizer that depends on statistics that it extracts from the data. So you will have to save those statistics along with the model to restore them when you restore your model.
-
-## DataSets, INDArrays and Mini-Batches
-
-As the name suggests, a DataSetIterator returns [DataSet](/api/{{page.version}}/org/nd4j/linalg/dataset/DataSet.html) objects. DataSet objects are containers for the features and labels of your data. But they aren't constrained to holding just a single example at once. A DataSet can contain as many examples as needed.
-
-It does that by keeping the values in several instances of [INDArray](/api/{{page.version}}/org/nd4j/linalg/api/ndarray/INDArray.html): one for the features of your examples, one for the labels and two additional ones for masking, if you are using timeseries data (see [Using RNNs / Masking](./deeplearning4j-nn-recurrent) for more information).
-
-An INDArray is one of the n-dimensional arrays, or tensors, used in ND4J. In the case of the features, it is a matrix of the size `Number of Examples x Number of Features`. Even with only a single example, it will have this shape.
-
-Why doesn't it contain all of the data examples at once? 
-
-This is another important concept for deep learning: mini-batching. In order to produce accurate results, a lot of real-world training data is often needed. Often that is more data than can fit in available memory, so storing it in a single `DataSet` sometimes isn't possible. But even if there is enough data storage, there is another important reason not to use all of your data at once. With mini-batches you can get more updates to your model in a single epoch.
-
-So why bother having more than one example in a DataSet? Since the model is trained using [gradient descent](https://en.wikipedia.org/wiki/Gradient_descent), it requires a good gradient to learn how to minimize error. Using only one example at a time will create a gradient that only takes errors produced with the current example into consideration. This would make the learning behavior erratic, slow down the learning, and may not even lead to a usable result.
-
-A mini-batch should be large enough to provide a representative sample of the real world (or at least your data). That means that it should always contain all of the classes that you want to predict and that the count of those classes should be distributed in approximately the same way as they are in your overall data.
-
-## Building a Neural Net Model
-
-DL4J gives data scientists and developers tools to build a deep neural networks on a high level using concepts like `layer`. It employs a builder pattern in order to build the neural net declaratively, as you can see in this (simplified) example:
-
-```java
-MultiLayerConfiguration conf = 
-	new NeuralNetConfiguration.Builder()
-		.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
-		.updater(new Nesterovs(learningRate, 0.9))
-		.list(
-			new DenseLayer.Builder().nIn(numInputs).nOut(numHiddenNodes).activation("relu").build(),
-			new OutputLayer.Builder(LossFunction.NEGATIVELOGLIKELIHOOD).activation("softmax").nIn(numHiddenNodes).nOut(numOutputs).build()
-		).backprop(true).build();
-```
-
-If you are familiar with other deep learning frameworks, you will notice that this looks a bit like Keras.
-
-Unlike other frameworks, DL4J splits the optimization algorithm from the updater algorithm. This allows for flexibility as you seek a combination of optimizer and updater that works best for your data and problem.
-
-Besides the [DenseLayer](/api/{{page.version}}/org/deeplearning4j/nn/conf/layers/DenseLayer.html)
-and [OutputLayer](/api/{{page.version}}/org/deeplearning4j/nn/conf/layers/OutputLayer.html)
-that you have seen in the example above, there are several [other layer types](/api/{{page.version}}/org/deeplearning4j/nn/conf/layers/package-summary.html), like `GravesLSTM`, `ConvolutionLayer`, `RBM`, `EmbeddingLayer`, etc. Using those layers you can define not only simple neural networks, but also [recurrent](./deeplearning4j-nn-recurrent) and [convolutional](./deeplearning4j-nn-convolutional) networks. 
-
-## Training a Model
-
-After configuring your neural, you will have to train the model. The simplest case is to simply call the `.fit()` method on the model configuration with your `DataSetIterator` as an argument. This will train the model on all of your data
-once. A single pass over the entire dataset is called an *epoch*. DL4J has several different methods for passing through the data more than just once.
-
-The simplest way, is to reset your `DataSetIterator` and loop over the fit call as many times as you want. This way you can train your model for as many epochs as you think is a good fit.
-
-Yet another way would be to use an [EarlyStoppingTrainer](/api/{{page.version}}/org/deeplearning4j/earlystopping/trainer/EarlyStoppingTrainer.html). 
-You can configure this trainer to run for as many epochs as you like and
-additionally for as long as you like. It will evaluate the performance of your
-network after each epoch (or what ever you have configured) and save the best
-performing version for later use. 
-
-Also note that DL4J does not only support training just `MultiLayerNetworks`, but it also supports a more flexible [ComputationGraph](./deeplearning4j-nn-computationgraph).
-
-### Evaluating Model Performance
-
-As you train your model, you will want to test how well it performs. For that test, you will need a dedicated data set that will not be used for training but instead will only be used for evaluating your model. This data should have the same distribution as the real-world data you want to make predictions about with your model. The reason you can't simply use your training data for evaluation is because machine learning methods are prone to overfitting (getting good at making predictions about the training set, but not performing well on larger datasets).
-
-The [Evaluation](/api/{{page.version}}/org/deeplearning4j/eval/Evaluation.html)
-class is used for evaluation. Slightly different methods apply to evaluating a normal feed forward networks or recurrent networks. For more details on using it, take a look at the corresponding [examples](https://github.com/eclipse/deeplearning4j-examples).
-
-## Troubleshooting a Neural Net Model
-
-Building neural networks to solve problems is an empirical process. That is, it requires trial and error. So you will have to try different settings and architectures in order to find a neural net configuration that performs well.
-
-DL4J provides a listener facility help you monitor your network's performance visually. You can set up listeners for your model that will be called after each mini-batch is processed. One of most often used listeners that DL4J ships out of the box is [ScoreIterationListener](/api/{{page.version}}/org/deeplearning4j/optimize/listeners/ScoreIterationListener.html). Check out all [Listeners](./deeplearning4j-nn-listeners) for more.
-
-While `ScoreIterationListener` will simply print the current error score for your network, `HistogramIterationListener` will start up a web UI that to provide you with a host of different information that you can use to fine tune your network configuration. See [Visualize, Monitor and Debug Network Learning](./deeplearning4j-nn-visualization) on how to interpret that data.
-
-See [Troubleshooting neural nets](./deeplearning4j-troubleshooting-training) for more information on how to improve results.
--- a/docs/deeplearning4j/templates/config-buildtools.md
+++ b/docs/deeplearning4j/templates/config-buildtools.md
@ -1,56 +0,0 @@
---
-title: Configuration for Gradle, SBT, and More
-short_title: SBT, Gradle, & Others
-description: Configure the build tools for Deeplearning4j.
-category: Configuration
-weight: 3
---
-
-## Configuring your build tool
-
-While we encourage Deeplearning4j, ND4J and DataVec users to employ Maven, it's worthwhile documenting how to configure build files for other tools, like Ivy, Gradle and SBT -- particularly since Google prefers Gradle over Maven for Android projects. 
-
-The instructions below apply to all DL4J and ND4J submodules, such as deeplearning4j-api, deeplearning4j-scaleout, and ND4J backends.
-
-## Gradle
-
-You can use Deeplearning4j with Gradle by adding the following to your build.gradle in the dependencies block:
-
-    compile "org.deeplearning4j:deeplearning4j-core:{{ page.version }}"
-
-Add a backend by adding the following:
-
-    compile "org.nd4j:nd4j-native-platform:{{ page.version }}"
-
-You can also swap the standard CPU implementation for [GPUs](./deeplearning4j-config-gpu-cpu).
-
-## SBT
-
-You can use Deeplearning4j with SBT by adding the following to your build.sbt:
-
-    libraryDependencies += "org.deeplearning4j" % "deeplearning4j-core" % "{{ page.version }}"
-
-Add a backend by adding the following:
-
-    libraryDependencies += "org.nd4j" % "nd4j-native-platform" % "{{ page.version }}"
-
-You can also swap the standard CPU implementation for [GPUs](./deeplearning4j-config-gpu-cpu).
-
-## Ivy
-
-You can use Deeplearning4j with ivy by adding the following to your ivy.xml:
-
-    <dependency org="org.deeplearning4j" name="deeplearning4j-core" rev="{{ page.version }}" conf="build" />
-
-
-Add a backend by adding the following:
-
-    <dependency org="org.nd4j" name="nd4j-native-platform" rev="{{ page.version }}" conf="build" />
-
-You can also swap the standard CPU implementation for [GPUs](./deeplearning4j-config-gpu-cpu).
-
-## Leinengen
-
-Clojure programmers may want to use [Leiningen](https://github.com/technomancy/leiningen/) or [Boot](http://boot-clj.com/) to work with Maven. A [Leiningen tutorial is here](https://github.com/technomancy/leiningen/blob/master/doc/TUTORIAL.md).
-
-NOTE: You'll still need to download ND4J, DataVec and Deeplearning4j, or doubleclick on the their respective JAR files file downloaded by Maven / Ivy / Gradle, to install them in your Eclipse installation.
--- a/docs/deeplearning4j/templates/config-cudnn.md
+++ b/docs/deeplearning4j/templates/config-cudnn.md
@ -1,80 +0,0 @@
---
-title: Using Deeplearning4j with cuDNN
-short_title: cuDNN
-description: Using the NVIDIA cuDNN library with DL4J.
-category: Configuration
-weight: 3
---
-
-## Using Deeplearning4j with cuDNN
-
-Deeplearning4j supports CUDA but can be further accelerated with cuDNN. Most 2D CNN layers (such as ConvolutionLayer, SubsamplingLayer, etc), and also LSTM and BatchNormalization layers support CuDNN.
-
-The only thing we need to do to have DL4J load cuDNN is to add a dependency on `deeplearning4j-cuda-10.0`, `deeplearning4j-cuda-10.1`, or `deeplearning4j-cuda-10.2` for example:
-
-```xml
-<dependency>
-	<groupId>org.deeplearning4j</groupId>
-	<artifactId>deeplearning4j-cuda-10.0</artifactId>
-	<version>{{page.version}}</version>
-</dependency>
-```
-
-or
-```xml
-<dependency>
-	<groupId>org.deeplearning4j</groupId>
-	<artifactId>deeplearning4j-cuda-10.1</artifactId>
-	<version>{{page.version}}</version>
-</dependency>
-```
-
-or
-```xml
-<dependency>
-	<groupId>org.deeplearning4j</groupId>
-	<artifactId>deeplearning4j-cuda-10.2</artifactId>
-	<version>{{page.version}}</version>
-</dependency>
-```
-
-
-The actual library for cuDNN is not bundled, so be sure to download and install the appropriate package for your platform from NVIDIA:
-
-* [NVIDIA cuDNN](https://developer.nvidia.com/cudnn)
-
-Note there are multiple combinations of cuDNN and CUDA supported. At this time the following combinations are supported by Deeplearning4j:
-<table style="width:60%">
-	<tr>
-		<th>CUDA Version</th>
-		<th>cuDNN Version</th>
-	</tr>
-	<tr><td>10.0</td><td>7.4</td></tr>
-	<tr><td>10.1</td><td>7.6</td></tr>
-	<tr><td>10.2</td><td>7.6</td></tr>
-</table>
-
- 
- To install, simply extract the library to a directory found in the system path used by native libraries. The easiest way is to place it alongside other libraries from CUDA in the default directory (`/usr/local/cuda/lib64/` on Linux, `/usr/local/cuda/lib/` on Mac OS X, and `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\`, `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\`, or `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin\` on Windows).
-
-Alternatively, in the case of CUDA 10.2, cuDNN comes bundled with the "redist" package of the [JavaCPP Presets for CUDA](https://github.com/bytedeco/javacpp-presets/tree/master/cuda). [After agreeing to the license](https://github.com/bytedeco/javacpp-presets/tree/master/cuda#license-agreements), we can add the following dependencies instead of installing CUDA and cuDNN:
-		 
-	 <dependency>
-	     <groupId>org.bytedeco</groupId>
-	     <artifactId>cuda-platform-redist</artifactId>
-	     <version>10.2-7.6-1.5.3</version>
-	 </dependency>
-
-Also note that, by default, Deeplearning4j will use the fastest algorithms available according to cuDNN, but memory usage may be excessive, causing strange launch errors. When this happens, try to reduce memory usage by using the [`NO_WORKSPACE` mode settable via the network configuration](/api/{{page.version}}/org/deeplearning4j/nn/conf/layers/ConvolutionLayer.Builder.html#cudnnAlgoMode-org.deeplearning4j.nn.conf.layers.ConvolutionLayer.AlgoMode-), instead of the default of `ConvolutionLayer.AlgoMode.PREFER_FASTEST`, for example:
-
-```java
-    // for the whole network
-    new NeuralNetConfiguration.Builder()
-            .cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE)
-            // ...
-    // or separately for each layer
-    new ConvolutionLayer.Builder(h, w)
-            .cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE)
-            // ...
-
-```
--- a/docs/deeplearning4j/templates/config-gpu-cpu.md
+++ b/docs/deeplearning4j/templates/config-gpu-cpu.md
@ -1,59 +0,0 @@
---
-title: Deeplearning4j Hardware and CPU/GPU Setup
-short_title: GPU/CPU Setup
-description: Hardware setup for Eclipse Deeplearning4j, including GPUs and CUDA.
-category: Configuration
-weight: 1
---
-
-## ND4J backends for GPUs and CPUs
-
-You can choose GPUs or native CPUs for your backend linear algebra operations by changing the dependencies in ND4J's POM.xml file. Your selection will affect both ND4J and DL4J being used in your application.
-
-If you have CUDA v9.2+ installed and NVIDIA-compatible hardware, then your dependency declaration will look like:
-
-```xml
-<dependency>
- <groupId>org.nd4j</groupId>
- <artifactId>nd4j-cuda-{{ page.cudaVersion }}</artifactId>
- <version>{{ page.version }}</version>
-</dependency>
-```
-As of now, the `artifactId` for the CUDA versions can be one of `nd4j-cuda-9.0`, `nd4j-cuda-9.2` or `nd4j-cuda-10.0`.
-
-You can also find the available CUDA versions via [Maven Central search](https://search.maven.org/search?q=nd4j-cuda) or in the [Release Notes](https://deeplearning4j.org/release-notes.html).
-
-Otherwise you will need to use the native implementation of ND4J as a CPU backend:
-
-```xml
-<dependency>
- <groupId>org.nd4j</groupId>
- <artifactId>nd4j-native</artifactId>
- <version>{{ page.version }}</version>
-</dependency>
-```
-
-## System architectures
-
-If you are developing your project on multiple operating systems/system architectures, you can add `-platform` to the end of your `artifactId` which will download binaries for most major systems.
-
-```xml
-<dependency>
- ...
- <artifactId>nd4j-native-platform</artifactId>
- ...
-</dependency>
-```
-
-## Multiple GPUs
-
-If you have several GPUs, but your system is forcing you to use just one, you can use the helper `CudaEnvironment.getInstance().getConfiguration().allowMultiGPU(true);` as first line of your `main()` method.
-
-## CuDNN
-
-See our page on [CuDNN](./deeplearning4j-config-cudnn).
-
-
-## CUDA Installation
-
-Check the NVIDIA guides for instructions on setting up CUDA on the NVIDIA [website](http://docs.nvidia.com/cuda/).
--- a/docs/deeplearning4j/templates/config-maven.md
+++ b/docs/deeplearning4j/templates/config-maven.md
@ -1,38 +0,0 @@
---
-title: Configuration for Maven
-short_title: Maven
-description: Configure the Maven build tool for Deeplearning4j.
-category: Configuration
-weight: 2
---
-
-## Configuring the Maven build tool
-
-You can use Deeplearning4j with Maven by adding the following to your `pom.xml`:
-```xml
-<dependencies>
-  <dependency>
-      <groupId>org.deeplearning4j</groupId>
-      <artifactId>deeplearning4j-core</artifactId>
-      <version>{{ page.version }}</version>
-  </dependency>
-</dependencies>
-```
-
-The instructions below apply to all DL4J and ND4J submodules, such as `deeplearning4j-api`, `deeplearning4j-scaleout`, and ND4J backends.
-
-## Add a backend
-
-DL4J relies on ND4J for hardware-specific implementations and tensor operations. Add a backend by pasting the following snippet into your `pom.xml`:
-
-```xml
-<dependencies>
-  <dependency>
-      <groupId>org.nd4j</groupId>
-      <artifactId>nd4j-native-platform</artifactId>
-      <version>{{ page.version }}</version>
-  </dependency>
-</dependencies>
-```
-
-You can also swap the standard CPU implementation for [GPUs](./deeplearning4j-config-gpu-cpu).
--- a/docs/deeplearning4j/templates/config-memory.md
+++ b/docs/deeplearning4j/templates/config-memory.md
@ -1,101 +0,0 @@
---
-title: Memory management in DL4J and ND4J
-short_title: Memory Management
-description: Setting available Memory/RAM for a DL4J application
-category: Configuration
-weight: 1
---
-
-## Memory Management for ND4J/DL4J: How does it work?
-
-ND4J uses off-heap memory to store NDArrays, to provide better performance while working with NDArrays from native code such as BLAS and CUDA libraries.
-
-"Off-heap" means that the memory is allocated outside of the JVM (Java Virtual Machine) and hence isn't managed by the JVM's garbage collection (GC). On the Java/JVM side, we only hold pointers to the off-heap memory, which can be passed to the underlying C++ code via JNI for use in ND4J operations.
-
-To manage memory allocations, we use two approaches:
-
- JVM Garbage Collector (GC) and WeakReference tracking
- MemoryWorkspaces - see [Workspaces guide](https://deeplearning4j.org/workspaces) for details
-
-Despite the differences between these two approaches, the idea is the same: once an NDArray is no longer required on the Java side, the off-heap associated with it should be released so that it can be reused later. The difference between the GC and `MemoryWorkspaces` approaches is in when and how the memory is released.
-
- For JVM/GC memory: whenever an INDArray is collected by the garbage collector, its off-heap memory will be deallocated, assuming it is not used elsewhere.
- For `MemoryWorkspaces`: whenever an INDArray leaves the workspace scope - for example, when a layer finished forward pass/predictions - its memory may be reused without deallocation and reallocation. This results in better performance for cyclical workloads like neural network training and inference.
-
-## Configuring Memory Limits
-
-With DL4J/ND4J, there are two types of memory limits to be aware of and configure: The on-heap JVM memory limit, and the off-heap memory limit, where NDArrays live. Both limits are controlled via Java command-line arguments:
-
-* `-Xms` - this defines how much memory JVM heap will use at application start.
-
-* `-Xmx` - this allows you to specify JVM heap memory limit (maximum, at any point). Only allocated up to this amount (at the discretion of the JVM) if required.
-
-* `-Dorg.bytedeco.javacpp.maxbytes`  - this allows you to specify the off-heap memory limit. This can also be a percentage, in which case it would apply to maxMemory.
-
-* `-Dorg.bytedeco.javacpp.maxphysicalbytes` - this specifies the maximum bytes for the entire process - usually set to `maxbytes` plus Xmx plus a bit extra, in case other libraries require some off-heap memory also. This can also be a percentage (>100%), in which case it would apply to maxMemory. Unlike setting `maxbytes` setting `maxphysicalbytes` is optional
-
-Example: Configuring 1GB initial on-heap, 2GB max on-heap, 8GB off-heap, 10GB maximum for process:
-
-```shell
-Xms1G -Xmx2G -Dorg.bytedeco.javacpp.maxbytes=8G -Dorg.bytedeco.javacpp.maxphysicalbytes=10G
-```
-
-## Gotchas: A few things to watch out for
-
-* With GPU systems, the maxbytes and maxphysicalbytes settings currently also effectively defines the memory limit for the GPU, since the off-heap memory is mapped (via NDArrays) to the GPU - read more about this in the GPU-section below.
-
-* For many applications, you want less RAM to be used in JVM heap, and more RAM to be used in off-heap, since all NDArrays are stored there. If you allocate too much to the JVM heap, there will not be enough memory left for the off-heap memory.
-
-* If you get a "RuntimeException: Can't allocate [HOST] memory: xxx; threadId: yyy", you have run out of off-heap memory. You should most often use a WorkspaceConfiguration to handle your NDArrays allocation, in particular in e.g. training or evaluation/inference loops - if you do not, the NDArrays and their off-heap (and GPU) resources are reclaimed using the JVM GC, which might introduce severe latency and possible out of memory situations.
-
-* If you don't specify JVM heap limit, it will use 1/4 of your total system RAM as the limit, by default.
-
-* If you don't specify off-heap memory limit, the JVM heap limit (Xmx) will be used by default. i.e. `-Xmx8G` will mean that 8GB can be used by JVM heap, and an additional 8GB can be used by ND4j in off-heap.
-
-* In limited memory environments, it's usually a bad idea to use high `-Xmx` value together with `-Xms` option. That is because doing so won't leave enough off-heap memory. Consider a 16GB system in which you set `-Xms14G`: 14GB of 16GB would be allocated to the JVM, leaving only 2GB for the off-heap memory, the OS and all other programs.
-
-# Memory-mapped files
-
-ND4J supports the use of a memory-mapped file instead of RAM when using the `nd4j-native` backend. On one hand, it's slower then RAM, but on other hand, it allows you to allocate memory chunks in a manner impossible otherwise.
-
-Here's sample code:
-
-```java
-WorkspaceConfiguration mmap = WorkspaceConfiguration.builder()
-                .initialSize(1000000000)
-                .policyLocation(LocationPolicy.MMAP)
-                .build();
-                
-try (MemoryWorkspace ws = Nd4j.getWorkspaceManager().getAndActivateWorkspace(mmap, "M2")) {
-    INDArray x = Nd4j.create(10000);
-}
-``` 
-In this case, a 1GB temporary file will be created and mmap'ed, and NDArray `x` will be created in that space. Obviously, this option is mostly viable for cases when you need NDArrays that can't fit into your RAM.
-
-## GPUs
-
-When using GPUs, oftentimes your CPU RAM will be greater than GPU RAM. When GPU RAM is less than CPU RAM, you need to monitor how much RAM is being used off-heap. You can check this based on the JavaCPP options specified above.
-
-We allocate memory on the GPU equivalent to the amount of off-heap memory you specify. We don't use any more of your GPU than that. You are also allowed to specify heap space greater than your GPU (that's not encouraged, but it's possible). If you do so, your GPU will run out of RAM when trying to run jobs.
-
-We also allocate off-heap memory on the CPU RAM as well. This is for efficient communicaton of CPU to GPU, and CPU accessing data from an NDArray without having to fetch data from the GPU each time you call for it.
-
-If JavaCPP or your GPU throw an out-of-memory error (OOM), or even if your compute slows down due to GPU memory being limited, then you may want to either decrease batch size or increase the amount of off-heap memory that JavaCPP is allowed to allocate, if that's possible.
-
-Try to run with an off-heap memory equal to your GPU's RAM. Also, always remember to set up a small JVM heap space using the `Xmx` option.
-
-Note that if your GPU has < 2g of RAM, it's probably not usable for deep learning. You should consider using your CPU if this is the case. Typical deep-learning workloads should have 4GB of RAM *at minimum*. Even that is small. 8GB of RAM on a GPU is recommended for deep learning workloads.
-
-It is possible to use HOST-only memory with a CUDA backend. That can be done using workspaces.
-
-Example:
-```java
-WorkspaceConfiguration basicConfig = WorkspaceConfiguration.builder()
-    .policyAllocation(AllocationPolicy.STRICT)
-    .policyLearning(LearningPolicy.FIRST_LOOP)
-    .policyMirroring(MirroringPolicy.HOST_ONLY) // <--- this option does this trick
-    .policySpill(SpillPolicy.EXTERNAL)
-    .build();
-```
-
-It's not recommended to use HOST-only arrays directly, since they will dramatically reduce performance. But they might be useful as in-memory cache pairs with the `INDArray.unsafeDuplication()` method.
--- a/docs/deeplearning4j/templates/config-performance-debugging.md
+++ b/docs/deeplearning4j/templates/config-performance-debugging.md
@ -1,445 +0,0 @@
---
-title: Deeplearning4j and ND4J - Debugging Performance Issues
-short_title: Performance Issues Debugging
-description: How to debug performance issues in Deeplearning4j and ND4J
-category: Configuration
-weight: 11
---
-
-# DL4J and ND4J: How to Debugging Performance Issues
-
-This page is a how-to guide for debugging performance issues encountered when training neural networks with Deeplearning4j.
-Much of the information also applies to debugging performance issues encountered when using ND4J.
-
-Deeplearning4j and ND4J provide excellent performance in most cases (utilizing optimized c++ code for all numerical operations as well as high performance libraries such as NVIDIA cuDNN and Intel MKL). However, sometimes bottlenecks or misconfiguration issues may limit performance to well below the maximum. This page is intended to be a guide to help users identify the cause of poor performance, and provide steps to fix these issues.
-
-Performance issues may include:
-1. Poor CPU/GPU utilization
-2. Slower than expected training or operation execution
-
-To start, here's a summary of some possible causes of performance issues:
-1. Wrong ND4J backend is used (for example, CPU backend when GPU backend is expected)
-2. Not using cuDNN when using CUDA GPUs
-3. ETL (data loading) bottlenecks
-4. Garbage collection overheads
-5. Small batch sizes
-6. Multi-threaded use of MultiLayerNetwork/ComputationGraph for inference (not thread safe)
-7. Double precision floating point data type used when single precision should be used
-8. Not using workspaces for memory management (enabled by default)
-9. Poorly configured network
-10. Layer or operation is CPU-only
-11. CPU: Lack of hardware support for modern AVX etc extensions
-12. Other processes using CPU or GPU resources
-13. CPU: Lack of configuration of OMP_NUM_THREADS when using many models/threads simultaneously
-
-Finally, this page has a short section on [Debugging Performance Issues with JVM Profiling](#profiling)
-
-## Step 1: Check if correct backend is used
-
-ND4J (and by extension, Deeplearning4j) can perform computation on either the CPU or GPU.
-The device used for computation is determined by your project dependencies - you include ```nd4j-native-platform``` to use CPUs for computation or ```nd4j-cuda-x.x-platform``` to use GPUs for computation (where ```x.x``` is your CUDA version - such as 9.2, 10.0 etc).
-
-It is straightforward to check which backend is used. ND4J will log the backend upon initialization.
-
-For CPU execution, you will expect output that looks something like:
-```
-o.n.l.f.Nd4jBackend - Loaded [CpuBackend] backend
-o.n.n.NativeOpsHolder - Number of threads used for NativeOps: 8
-o.n.n.Nd4jBlas - Number of threads used for BLAS: 8
-o.n.l.a.o.e.DefaultOpExecutioner - Backend used: [CPU]; OS: [Windows 10]
-o.n.l.a.o.e.DefaultOpExecutioner - Cores: [16]; Memory: [7.1GB];
-o.n.l.a.o.e.DefaultOpExecutioner - Blas vendor: [MKL]
-```
-
-For CUDA execution, you would expect the output to look something like:
-```
-13:08:09,042 INFO  ~ Loaded [JCublasBackend] backend
-13:08:13,061 INFO  ~ Number of threads used for NativeOps: 32
-13:08:14,265 INFO  ~ Number of threads used for BLAS: 0
-13:08:14,274 INFO  ~ Backend used: [CUDA]; OS: [Windows 10]
-13:08:14,274 INFO  ~ Cores: [16]; Memory: [7.1GB];
-13:08:14,274 INFO  ~ Blas vendor: [CUBLAS]
-13:08:14,274 INFO  ~ Device Name: [TITAN X (Pascal)]; CC: [6.1]; Total/free memory: [12884901888]
-```
-
-Pay attention to the ```Loaded [X] backend``` and ```Backend used: [X]``` messages to confirm that the correct backend is used.
-If the incorrect backend is being used, check your program dependencies to ensure tho correct backend has been included.
-
-
-## Step 2: Check for cuDNN
-
-If you are using CPUs only (nd4j-native backend) then you can skip to step 3 as cuDNN only applies when using NVIDIA GPUs (```nd4j-cuda-x.x-platform``` dependency).
-
-cuDNN is NVIDIA's library for accelerating neural network training on NVIDIA GPUs.
-Deeplearning4j can make use of cuDNN to accelerate a number of layers - including ConvolutionLayer, SubsamplingLayer, BatchNormalization, Dropout, LocalResponseNormalization and LSTM. When training on GPUs, cuDNN should always be used if possible as it is usually much faster than the built-in layer implementations.
-
-Instructions for configuring CuDNN can be found [here](https://deeplearning4j.org/docs/latest/deeplearning4j-config-cudnn).
-In summary, include the ```deeplearning4j-cuda-x.x``` dependency (where ```x.x``` is your CUDA version - such as 9.2 or 10.0). The network configuration does not need to change to utilize cuDNN - cuDNN simply needs to be available along with the deeplearning4j-cuda module.
-
-
-**How to determine if CuDNN is used or not**
-
-Not all DL4J layer types are supported in cuDNN. DL4J layers with cuDNN support include ConvolutionLayer, SubsamplingLayer, BatchNormalization, Dropout, LocalResponseNormalization and LSTM.
-
-To check if cuDNN is being used, the simplest approach is to look at the log output when running inference or training:
-If cuDNN is NOT available when you are using a layer that supports it, you will see a message such as:
-```
-o.d.n.l.c.ConvolutionLayer - cuDNN not found: use cuDNN for better GPU performance by including the deeplearning4j-cuda module. For more information, please refer to: https://deeplearning4j.org/docs/latest/deeplearning4j-config-cudnn
-java.lang.ClassNotFoundException: org.deeplearning4j.nn.layers.convolution.CudnnConvolutionHelper
-	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
-	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
-	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
-	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
-	at java.lang.Class.forName0(Native Method)
-```
-
-If cuDNN is available and was loaded successfully, no message will be logged.
-
-Alternatively, you can confirm that cuDNN is used by using the following code:
-```
-MultiLayerNetwork net = ...
-LayerHelper h = net.getLayer(0).getHelper();    //Index 0: assume layer 0 is a ConvolutionLayer in this example
-System.out.println("Layer helper: " + (h == null ? null : h.getClass().getName()));
-```
-Note that you will need to do at least one forward pass or fit call to initialize the cuDNN layer helper.
-
-If cuDNN is available and was loaded successfully, you will see the following printed:
-```
-Layer helper: org.deeplearning4j.nn.layers.convolution.CudnnConvolutionHelper
-```
-whereas if cuDNN is not available or could not be loaded successfully (you will get a warning or error logged also):
-```
-Layer helper: null
-```
-
-
-
-## Step 3: Check for ETL (Data Loading) Bottlenecks
-
-Neural network training requires data to be in memory before training can proceed. If the data is not loaded fast enough, the network will have to wait until data is available.
-DL4J uses asynchronous prefetch of data to improve performance by default. Under normal circumstances, this asynchronous prefetching means the network should never be waiting around for data (except on the very first iteration) - the next minibatch is loaded in another thread while training is proceeding in the main thread.
-
-However, when data loading takes longer than the iteration time, data can be a bottleneck. For example, if a network takes 100ms to perform fitting on a single minibatch, but data loading takes 200ms, then we have a bottleneck: the network will have to wait 100ms per iteration (200ms loading - 100ms loading in parallel with training) before continuing the next iteration.
-Conversely, if network fit operation was 100ms and data loading was 50ms, then no data loading bottleck will occur, as the 50ms loading time can be completed asynchronously within one iteration.
-
-**How to check for ETL / data loading bottlenecks**
-
-The way to identify ETL bottlenecks is simple: add PerformanceListener to your network, and train as normal.
-For example:
-```
-MultiLayerNetwork net = ...
-net.setListeners(new PerformanceListener(1));       //Logs ETL and iteration speed on each iteration
-```
-When training, you will see output such as:
-```
-.d.o.l.PerformanceListener - ETL: 0 ms; iteration 16; iteration time: 65 ms; samples/sec: 492.308; batches/sec: 15.384; 
-```
-The above output shows that there is no ETL bottleneck (i.e., ```ETL: 0 ms```). However, if ETL time is greater than 0 consistently (after the first iteration), an ETL bottleneck is present.
-
-**How to identify the cause of an ETL bottleneck**
-
-There are a number of possible causes of ETL bottlenecks. These include (but are not limited to):
-* Slow hard drives
-* Network latency or throughput issues (when reading from remote or network storage)
-* Computationally intensive or inefficient ETL (especially for custom ETL pipelines)
-
-One useful way to get more information is to perform profiling, as described in the [profiling section](#profiling) later in this page.
-For custom ETL pipelines, adding logging for the various stages can help. Finally, another approach to use a process of elimination - for example, measuring the latency and throughput of reading raw files from disk or from remote storage vs. measuring the time to actually process the data from its raw format.
-
-## Step 4: Check for Garbage Collection Overhead
-
-Java uses garbage collection for management of on-heap memory (see [this link](https://stackify.com/what-is-java-garbage-collection/) for example for an explanation).
-Note that DL4J and ND4J use off-heap memory for storage of all INDArrays (see the [memory page](https://deeplearning4j.org/docs/latest/deeplearning4j-config-memory) for details). 
-
-Even though DL4J/ND4J array memory is off-heap, garbage collection can still cause performance issues.
-
-In summary:
-* Garbage collection will sometimes (temporarily and briefly) pause/stop application execution ("stop the world")
-* These GC pauses slow down program execution
-* The overall performance impact of GC pauses depends on both the frequency of GC pauses, and the duration of GC pauses
-* The frequency is controllable (in part) by ND4J, using ```Nd4j.getMemoryManager().setAutoGcWindow(10000);``` and ```Nd4j.getMemoryManager().togglePeriodicGc(false);```
-* Not every GC event is caused by or controlled by the above ND4J configuration.
-
-In our experience, garbage collection time depends strongly on the number of objects in the JVM heap memory.
-As a rough guide:
-* Less than 100,000 objects in heap memory: short GC events (usually not a performance problem)
-* 100,000-500,000 objects: GC overhead becomes noticeable, often in the 50-250ms range per full GC event
-* 500,000 or more objects: GC can be a bottleneck if performed frequently. Performance may still be good if GC events are infrequent (for example, every 10 seconds or less).
-* 10 million or more objects: GC is a major bottleneck even if infrequently called, with each full GC takes multiple seconds
-
-**How to configure ND4J garbage collection settings**
-
-In simple terms, there are two settings of note:
-```
-Nd4j.getMemoryManager().setAutoGcWindow(10000);             //Set to 10 seconds (10000ms) between System.gc() calls
-Nd4j.getMemoryManager().togglePeriodicGc(false);            //Disable periodic GC calls
-```
-
-If you suspect garbage collection overhead is having an impact on performance, try changing these settings.
-The main downside to reducing the frequency or disabling periodic GC entirely is when you are not using [workspaces](https://deeplearning4j.org/docs/latest/deeplearning4j-config-workspaces), though workspaces are enabled by default for all neural networks in Deeplearning4j.
-
-
-Side note: if you are using DL4J for training on Spark, setting these values on the master/driver will not impact the settings on the worker. Instead, see [this guide](https://deeplearning4j.org/docs/latest/deeplearning4j-scaleout-howto#gc).
-
-**How to determine GC impact using PerformanceListener**
-
-*NOTE: this feature was added after 1.0.0-beta3 and will be available in future releases*
-To determine the impact of garbage collection using PerformanceListener, you can use the following:
-
-```
-int listenerFrequency = 1;
-boolean reportScore = true;
-boolean reportGC = true;
-net.setListeners(new PerformanceListener(listenerFrequency, reportScore, reportGC));
-```
-
-This will report GC activity:
-```
-o.d.o.l.PerformanceListener - ETL: 0 ms; iteration 30; iteration time: 17 ms; samples/sec: 588.235; batches/sec: 58.824; score: 0.7229335801186025; GC: [PS Scavenge: 2 (1ms)], [PS MarkSweep: 2 (24ms)];
-```
-The garbage collection activity is reported for all available garbage collectors - the ```GC: [PS Scavenge: 2 (1ms)], [PS MarkSweep: 2 (24ms)]``` means that garbage collection was performed 2 times since the last PerformanceListener reporting, and took 1ms and 24ms total respectively for the two GC algorithms, respectively.
-
-Keep in mind: PerformanceListener reports GC events every N iterations (as configured by the user). Thus, if PerformanceListener is configured to report statistics every 10 iterations, the garbage collection stats would be for the period of time corresponding to the last 10 iterations.
-
-**How to determine GC impact using ```-verbose:gc```**
-
-Another useful tool is the ```-verbose:gc```, ```-XX:+PrintGCDetails``` ```-XX:+PrintGCTimeStamps``` command line options.
-For more details, see [Oracle Command Line Options](https://www.oracle.com/technetwork/java/javase/clopts-139448.html#gbmpt) and [Oracle GC Portal Documentation](https://www.oracle.com/technetwork/articles/javase/gcportal-136937.html)
-
-These options can be passed to the JVM on launch (when using ```java -jar``` or ```java -cp```) or can be added to IDE launch options (for example, in IntelliJ: these should be placed in the "VM Options" field in Run/Debug Configurations - see [Setting Configuration Options](https://www.jetbrains.com/help/idea/setting-configuration-options.html))
-
-When these options are enabled, you will have information reported on each GC event, such as:
-```
-5.938: [GC (System.gc()) [PSYoungGen: 5578K->96K(153088K)] 9499K->4016K(502784K), 0.0006252 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
-5.939: [Full GC (System.gc()) [PSYoungGen: 96K->0K(153088K)] [ParOldGen: 3920K->3911K(349696K)] 4016K->3911K(502784K), [Metaspace: 22598K->22598K(1069056K)], 0.0117132 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
-```
-
-This information can be used to determine the frequency, cause (System.gc() calls, allocation failure, etc) and duration of GC events.
-
-
-**How to determine GC impact using a profiler**
-
-An alternative approach is to use a profiler to collect garbage collection information.
-
-For example, [YourKit Java Profiler](https://www.yourkit.com) can be used to determine both the frequency and duration of garbage collection - see [Garbage collection telemetry](https://www.yourkit.com/docs/java/help/garbage_collection.jsp) for more details.
-
-[Other tools](https://www.cubrid.org/blog/how-to-monitor-java-garbage-collection/), such as VisualVM can also be used to monitor GC activity.
-
-
-**How to determine number (and type) of JVM heap objects using memory dumps**
-
-If you determine that garbage collection is a problem, and suspect that this is due to the number of objects in memory, you can perform a heap dump.
-
-To perform a heap dump:
-* Step 1: Run your program
-* Step 2: While running, determine the process ID
-    - One approach is to use jps:
-        - For basic details, run ```jps``` on the command line. If jps is not on the system PATH, it can be found (on Windows) at ```C:\Program Files\Java\jdk<VERSION>\bin\jps.exe```
-        - For more details on each process, run ```jps -lv``` instead
-    - Alternatively, you can use the ```top``` command on Linux or Task Manager (Windows) to find the PID (on Windows, the PID column may not be enabled by default)
-* Step 3: Create a heap dump using ```jmap -dump:format=b,file=file_name.hprof 123``` where ```123``` is the process id (PID) to create the heap dump for
-
-A number of alternatives for generating heap dumps can be found [here](https://www.yourkit.com/docs/java/help/hprof_snapshots.jsp).
-
-After a memory dump has been collected, it can be opened in tools such as YourKit profiler and VisualVM to determine the number, type and size of objects.
-With this information, you should be able to pinpoint the cause of the large number of objects and make changes to your code to reduce or eliminate the objects that are causing the garbage collection overhead.
-
-## Step 5: Check Minibatch Size
-
-Another common cause of performance issues is a poorly chosen minibatch size.
-A minibatch is a number of examples used together for one step of inference and training. Minibatch sizes of 32 to 128 are commonly used, though smaller or larger are sometimes used.
-
-In summary:
-* If minibatch size is too small (for example, training or inference with 1 example at a time), poor hardware utilization and lower overall throughput is expected
-* If minibatch size is too large
-    - Hardware utilization will usually be good
-    - Iteration times will slow down
-    - Memory utilization may be too high (leading to out-of-memory errors)
-
-For inference, avoid using minibatch size of 1, as throughput will suffer. Unless there are strict latency requirements, you should use larger minibatch sizes as this will give you the best hardware utilization and hence throughput, and is especially important for GPUs.
-
-For training, you should never use a minibatch size of 1 as overall performance and hardware utilization will be reduced. Network convergence may also suffer. Start with a minibatch size of 32-128, if memory will allow this to be used.
-
-For serving predictions in multi-threaded applications (such as a web server), [ParallelInference](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-scaleout/deeplearning4j-scaleout-parallelwrapper/src/main/java/org/deeplearning4j/parallelism/ParallelInference.java) should be used.
-
-
-## Step 6: Ensure you are not using a single MultiLayerNetwork/ComputationGraph for inference from multiple threads
-
-MultiLayerNetwork and ComputationGraph are not considered thread-safe, and should not be used from multiple threads.
-That said, most operations such as fit, output, etc use synchronized blocks. These synchronized methods should avoid hard to understand exceptions (race conditions due to concurrent use), they will limit throughput to a single thread (though, note that native operation parallelism will still be parallelized as normal).
-In summary, using the one network from multiple threads should be avoided as it is not thread safe and can be a performance bottleneck.
-
-
-For inference from multiple threads, you should use one model per thread (as this avoids locks) or for serving predictions in multi-threaded applications (such as a web server), use [ParallelInference](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-scaleout/deeplearning4j-scaleout-parallelwrapper/src/main/java/org/deeplearning4j/parallelism/ParallelInference.java).
-
-## Step 7: Check Data Types
-
-As of 1.0.0-beta3 and earlier, ND4J has a global datatype setting that determines the datatype of all arrays.
-The default value is 32-bit floating point. The data type can be set using ```Nd4j.setDataType(DataBuffer.Type.FLOAT);``` for example.
-
-For best performance, this value should be left as its default. If 64-bit floating point precision (double precision) is used instead, performance can be significantly reduced, especially on GPUs - most consumer NVIDIA GPUs have very poor double precision performance (and half precision/FP16). On Tesla series cards, double precision performance is usually much better than for consumer (GeForce) cards, though is still usually half or less of the single precision performance.
-Wikipedia has a summary of the single and double precision performance of NVIDIA GPUs [here](https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units).
-
-Performance on CPUs can also be reduced for double precision due to the additional memory batchwidth requirements vs. float precision.
-
-You can check the data type setting using:
-```
-System.out.println("ND4J Data Type Setting: " + Nd4j.dataType());
-```
-
-## Step 8: Check workspace configuration for memory management (enabled by default)
-
-For details on workspaces, see the [workspaces page](https://deeplearning4j.org/docs/latest/deeplearning4j-config-workspaces).
-
-In summary, workspaces are enabled by default for all Deeplearning4j networks, and enabling them improves performance and reduces memory requirements.
-There are very few reasons to disable workspaces.
-
-You can check that workspaces are enabled for your MultiLayerNetwork using:
-```
-System.out.println("Training workspace config: " + net.getLayerWiseConfigurations().getTrainingWorkspaceMode());
-System.out.println("Inference workspace config: " + net.getLayerWiseConfigurations().getInferenceWorkspaceMode());
-```
-or for a ComputationGraph using:
-```
-System.out.println("Training workspace config: " + cg.getConfiguration().getTrainingWorkspaceMode());
-System.out.println("Inference workspace config: " + cg.getConfiguration().getInferenceWorkspaceMode());
-```
-
-You want to see the output as ```ENABLED``` output for both training and inference.
-To change the workspace configuration, use the setter methods, for example: ```net.getLayerWiseConfigurations().setTrainingWorkspaceMode(WorkspaceMode.ENABLED);```
-
-
-## Step 9: Check for a badly configured network or network with layer bottlenecks
-
-Another possible cause (especially for newer users) is a poorly designed network.
-A network may be poorly designed if:
-* It has too many layers. A rough guideline:
-    - More than about 100 layers for a CNN may be too many
-    - More than about 10 layers for a RNN/LSTM network may be too many
-    - More than about 20 feed-forward layers may be too many for a MLP
-* The input/activations are too large
-    - For CNNs, inputs in the range of 224x224 (for image classification) to 600x600 (for object detection and segmentation) are used. Large image sizes (such as 500x500) are computationally demanding, and much larger than this should be considered too large in most cases.
-    - For RNNs, the sequence length matters. If you are using sequences longer than a few hundred steps, you should use [truncated backpropgation through time](https://deeplearning4j.org/docs/latest/deeplearning4j-nn-recurrent#tbptt) if possible. 
-* The output number of classes is too large
-    - Classification with more than about 10,000 classes can become a performance bottleneck with standard softmax output layers
-* The layers are too large
-    - For CNNs, most layers have kernel sizes in the range 2x2 to 7x7, with channels equal to 32 to 1024 (with larger number of channels appearing later in the network). Much larger than this may cause a performance bottleneck.
-    - For MLPs, most layers have at most 2048 units/neurons (often much smaller). Much larger than this may be too large.
-    - For RNNs such as LSTMs, layers are typically in the range of 128 to 512, though the largest RNNs may use around 1024 units per layer.
-* The network has too many parameters
-    - This is usually a consequence of the other issues already mentioned - too many layers, too large input, too many output classes
-    - For comparison, less than 1 million parameters would be considered small, and more than about 100 million parameters would be considered very large.
-    - You can check the number of parameters using ```MultiLayerNetwork/ComputationGraph.numParams()``` or ```MultiLayerNetwork/ComputationGraph.summary()```
-
-Note that these are guidelines only, and some reasonable network may exceed the numbers specified here. Some networks can become very large, such as those commonly used for imagenet classification or object detection. However, in these cases, the network is usually carefully designed to provide a good tradeoff between accuracy and computation time.
-
-If your network architecture is significantly outside of the guidelines specified here, you may want to reconsider the design to improve performance.
-
-
-## Step 10: Check for CPU-only ops (when using GPUs)
-
-If you are using CPUs only (nd4j-native backend), you can skip this step, as it only applies when using the GPU (nd4j-cuda) backend.
-
-As of 1.0.0-beta3, a handful of recently added operations do not yet have GPU implementations. Thus, when these layer are used in a network, they will execute on CPU only, irrespective of the nd4j-backend used. GPU support for these layers will be added in an upcoming release.
-
-The layers without GPU support as of 1.0.0-beta3 include:
-* Convolution3D
-* Upsampling1D/2D/3D
-* Deconvolution2D
-* LocallyConnected1D/2D
-* SpaceToBatch
-* SpaceToDepth
-
-Unfortunately, there is no workaround or fix for now, until these operations have GPU implementations completed.
-
-
-## Step 11: Check CPU support for hardware extensions (AVX etc)
-
-If you are running on a GPU, this section does not apply.
-
-When running on older CPUs or those that lack modern AVX extensions such as AVX2 and AVX512, performance will be reduced compared to running on CPUs with these features.
-Though there is not much you can do about the lack of such features, it is worth knowing about if you are comparing performance between different CPU models.
-
-In summary, CPU models with AVX2 support will perform better than those without it; similarly, AVX512 is an improvement over AVX2.
-
-For more details on AVX, see the [Wikipedia AVX article](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions)
-
-
-## Step 12: Check other processes using CPU or GPU resources
-
-Another obvious cause of performance issues is other processes using CPU or GPU resources.
-
-For CPU, it is straightforward to see if other processes are using resources using tools such as ```top``` (for Linux) or task managed (for Windows).
-
-For NVIDIA CUDA GPUs, nvidia-smi can be used. nvidia-smi is usually installed with the NVIDIA display drivers, and (when run) shows the overall GPU and memory utilization, as well as the GPU utilization of programs running on the system.
-
-On Linux, this is usually on the system path by default.
-On Windows, it may be found at ```C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi```
-
-
-## Step 13: Check OMP_NUM_THREADS performing concurrent inference using CPU in multiple threads simultaneously
-
-If you are using GPUs (nd4j-cuda backend), you can skip this section.
-
-One issue to be aware of when running multiple DL4J networks (or ND4J operations generally) concurrently in multiple threads is the OpenMP number of threads setting.
-In summary, in ND4J we use OpenMP pallelism at the c++ level to increase operation performance. By default, ND4J will use a value equal to the number of physical CPU cores (*not logical cores*) as this will give optimal performance 
-
-This also applies if the CPU resources are shared with other computationally demanding processes.
-
-In either case, you may see better overall throughput by reducing the number of OpenMP threads by setting the OMP_NUM_THREADS environment variable - see [ND4JEnvironmentVars](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-common/src/main/java/org/nd4j/config/ND4JEnvironmentVars.java) for details.
-
-One reason for reducing OMP_NUM_THREADS improving overall performance is due to reduced [cache thrashing](https://en.wikipedia.org/wiki/Thrashing_(computer_science)).
-
-
-# <a href="profiling">Debugging Performance Issues with JVM Profiling</a>
-
-Profiling is a process whereby you can trace how long each method in your code takes to execute, to identify and debug performance bottlenecks.
-
-A full guide to profiling is beyond the scope of this page, but the summary is that you can trace how long each method takes to execute (and where it is being called from) using a profiling tool. This information can then be used to identify bottlenecks (and their causes) in your program.
-
-
-## How to Perform Profiling
-
-Multiple options are available for performing profiling locally.
-We suggest using either [YourKit Java Profiler](https://www.yourkit.com/java/profiler/features/) or [VisualVM](https://visualvm.github.io/) for profiling.
-
-The YourKit profiling documentation is quite good. To perform profiling with YourKit:
-* Install and start YourKit Profiler
-* Start your application with the profiler enabled. For details, see [Running applications with the profiler](https://www.yourkit.com/docs/java/help/running_with_profiler.jsp) and [Local profiling](https://www.yourkit.com/docs/java/help/local_profiling.jsp)
-    - Note that IDE integrations are available - see [IDE integration](https://www.yourkit.com/docs/java/help/ide_integration.jsp)
-* Collect a snapshot and analyze
-
-Note that YourKit provides multiple different types of profiling: Sampling, tracing, and call counting.
-Each type of profiling has different pros and cons, such as accuracy vs. overhead. For more details, see [Sampling, tracing, call counting](https://www.yourkit.com/docs/java/help/cpu_intro.jsp)
-
-VisualVM also supports profiling - see the Profiling Applications section of the [VisualVM documentation](https://visualvm.github.io/documentation.html) for more details.
-
-## Profiling on Spark
-
-When debugging performance issues for Spark training or inference jobs, it can often be useful to perform profiling here also.
-
-One approach that we have used internally is to combine manual profiling settings (```-agentpath``` JVM argument) with spark-submit arguments for YourKit profiler.
-
-To perform profiling in this manner, 5 steps are required:
-1. Download YourKit profiler to a location on each worker (must be the same location on each worker) and (optionally) the driver
-2. [Optional] Copy the profiling configuration onto each worker (must be the same location on each worker)
-3. Create a local output directory for storing the profiling result files on each worker
-4. Launch the Spark job with the appropriate configuration (see example below)
-5. The snapshots will be saved when the Spark job completes (or is cancelled) to the specified directories.
-
-For example, to perform tracing on both the driver and the workers, 
-```
-spark-submit
-    --conf 'spark.executor.extraJavaOptions=-agentpath:/home/user/YourKit-JavaProfiler-2018.04/bin/linux-x86-64/libyjpagent.so=tracing,port=10001,dir=/home/user/yourkit_snapshots/executor/,tracing_settings_path=/home/user/yourkitconf.txt'
-    --conf 'spark.driver.extraJavaOptions=-agentpath:/home/user/YourKit-JavaProfiler-2018.04/bin/linux-x86-64/libyjpagent.so=tracing,port=10001,dir=/home/user/yourkit_snapshots/driver/,tracing_settings_path=/home/user/yourkitconf.txt'
-    <other spark submit arguments>
-```
-
-The configuration (tracing_settings_path) is optional. A sample tracing settings file is provided below:
-```
-walltime=*
-adaptive=true
-adaptive_min_method_invocation_count=1000
-adaptive_max_average_method_time_ns=100000
-```
-
--- a/docs/deeplearning4j/templates/config-snapshots.md
+++ b/docs/deeplearning4j/templates/config-snapshots.md
@ -1,123 +0,0 @@
---
-title: Snapshots and daily builds
-short_title: Snapshots
-description: Using daily builds for access to latest Eclipse Deeplearning4j features.
-category: Configuration
-weight: 10
---
-
-## Contents
-
-* [Introduction to Snapshots](#Introduction)
-* [Setup Instructions](#Setup_Instructions)
-* [Limitations](#Limitations)
-* [Configuration of ND4J Backend](#ND4J_Backend)
-* [Note to Gradle Users](#Note_to_gradle_users)
-
-## <a name="Introduction">Overview/Introduction</a>
-
-We provide automated daily builds of repositories such as ND4J, DataVec, DeepLearning4j, RL4J etc. So all the newest functionality and most recent bug fixes are released daily.
-
-Snapshots work like any other Maven dependency. The only difference is that they are served from a custom repository rather than from Maven Central.
-
-**Due to ongoing development, snapshots should be considered less stable than releases: breaking changes or bugs can in principle be introduced at any point during the course of normal development. Typically, releases (not snapshots) should be used when possible, unless a bug fix or new feature is required.**
-
-## <a name="Setup_Instructions">Setup Instructions</a>
-
-**Step 1:**
-To use snapshots in your project, you should add snapshot repository information like this to your `pom.xml` file:
-
-```
-<repositories>
-    <repository>
-        <id>snapshots-repo</id>
-        <url>https://oss.sonatype.org/content/repositories/snapshots</url>
-        <releases>
-            <enabled>false</enabled>
-        </releases>
-        <snapshots>
-            <enabled>true</enabled>
-            <updatePolicy>daily</updatePolicy>  <!-- Optional, update daily -->
-        </snapshots>
-    </repository>
-</repositories>
-```
-
-**Step 2:**
-Make sure to specify the snapshot version. We follow a simple rule: If the latest stable release version is `A.B.C`, the snapshot version will be `A.B.(C+1)-SNAPSHOT`. The current snapshot version is `1.0.0-SNAPSHOT`.
-For more details on the repositories section of the pom.xml file, see [Maven documentation](https://maven.apache.org/settings.html#Repositories)
-
-If using properties like the DL4J examples, change:
-From version:
-```
-<dl4j.version>1.0.0-beta2</dl4j.version>
-<nd4j.version>1.0.0-beta2</nd4j.version>
-```
-To version:
-```
-<dl4j.version>1.0.0-SNAPSHOT</dl4j.version>
-<nd4j.version>1.0.0-SNAPSHOT</nd4j.version>
-```
-
-**Sample pom.xml using Snapshots**
-
-A sample pom.xml is provided here: [sample pom.xml using snapshots](https://gist.github.com/AlexDBlack/28b0c9a72bce562c8782be326a6e2aaa)
-This has been taken from the DL4J standalone sample project and modified using step 1 and 2 above. The original (using the last release) can be found [here](https://github.com/eclipse/deeplearning4j-examples/blob/master/standalone-sample-project/pom.xml)
-
-
-## <a name="Limitations">Limitations</a>
-
-Both `-platform` (all operating systems) and single OS (non-platform) snapshot dependencies are released.
-Due to the multi-platform build nature of snapshots, it is possible (though rare) for the `-platform` artifacts to temporarily get out of sync, which can cause build issues.
-
-If you are building and deploying on just one platform, it is safter use the non-platform artifacts, such as:
-```
-        <dependency>
-            <groupId>org.nd4j</groupId>
-            <artifactId>nd4j-native</artifactId>
-            <version>${nd4j.version}</version>
-        </dependency>
-```
-
-    
-## <a name="mavencommands">Useful Maven Commands for Snapshots</a>
-
-Two commands that might be useful when using snapshot dependencies in Maven is as follows:
-1. ```-U``` - for example, in ```mvn package -U```. This ```-U``` option forces Maven to check (and if necessary, download) of new snapshot releases. This can be useful if you need the be sure you have the absolute latest snapshot release.
-2. ```-nsu``` - for example, in ```mvn package -nsu```. This ```-nsu``` option stops Maven from checking for snapshot releases. Note however your build will only succeed with this option if you have some snapshot dependencies already downloaded into your local Maven cache (.m2 directory) 
-
-An alternative approach to (1) is to set ```<updatePolicy>always</updatePolicy>``` in the ```<repositories>``` section found earlier in this page.
-An alternative approach to (2) is to set ```<updatePolicy>never</updatePolicy>``` in the ```<repositories>``` section found earlier in this page.
-
-## <a name="Note_to_gradle_users">Note to Gradle users</a>
-
-Snapshots will not work with Gradle. You must use Maven to download the files. After that, you may try using your local Maven repository with `mavenLocal()`.
-
-A bare minimum file like this:
-
-```Gradle
-version '1.0-SNAPSHOT'
- 
-apply plugin: 'java'
- 
-sourceCompatibility = 1.8
- 
-repositories {
-    maven { url "https://oss.sonatype.org/content/repositories/snapshots" }
-    mavenCentral()
-}
- 
-dependencies {
-    compile group: 'org.deeplearning4j', name: 'deeplearning4j-core', version: '1.0.0-SNAPSHOT'
-    compile group: 'org.deeplearning4j', name: 'deeplearning4j-modelimport', version: '1.0.0-SNAPSHOT'
-    compile "org.nd4j:nd4j-native:1.0.0-SNAPSHOT"
-    // Use windows-x86_64 or linux-x86_64 if you are not on macos
-    compile "org.nd4j:nd4j-native:1.0.0-SNAPSHOT:macosx-x86_64"
-    testCompile group: 'junit', name: 'junit', version: '4.12'
- 
-}
-```
-
-should work in theory, but it does not. This is due to [a bug in Gradle](https://github.com/gradle/gradle/issues/2882). Gradle with snapshots *and* Maven classifiers appears to be a problem.
-
- Of note when using the nd4j-native backend on Gradle (and SBT - but not Maven), you need to add openblas as a dependency. We do this for you in the -platform pom. Reference the -platform pom [here](https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-backend-impls/nd4j-native-platform/pom.xml#L19) to double check your dependencies. Note that these are version properties. See the ```<properties>``` section of the pom for current versions of the openblas and javacpp presets required to run nd4j-native.
--- a/docs/deeplearning4j/templates/config-workspaces.md
+++ b/docs/deeplearning4j/templates/config-workspaces.md
@ -1,132 +0,0 @@
---
-title: Workspaces for Memory Management
-short_title: Memory Workspaces
-description: Workspaces are an efficient model for memory paging in DL4J.
-category: Configuration
-weight: 10
---
-
-## What are workspaces?
-
-ND4J offers an additional memory-management model: workspaces. That allows you to reuse memory for cyclic workloads without the JVM Garbage Collector for off-heap memory tracking. In other words, at the end of the workspace loop, all `INDArray`s' memory content is invalidated. Workspaces are integrated into DL4J for training and inference.
-
-The basic idea is simple: You can do what you need within a workspace (or spaces), and if you want to get an INDArray out of it (i.e. to move result out of the workspace), you just call `INDArray.detach()` and you'll get an independent `INDArray` copy.
-
-## Neural Networks
-
-For DL4J users, workspaces provide better performance out of the box, and are enabled by default from 1.0.0-alpha onwards.
-Thus for most users, no explicit worspaces configuration is required.
-
-To benefit from worspaces, they need to be enabled. You can configure the workspace mode using:
-
- `.trainingWorkspaceMode(WorkspaceMode.SEPARATE)` and/or `.inferenceWorkspaceMode(WorkspaceMode.SINGLE)` in your neural network configuration. 
-
-The difference between **SEPARATE** and **SINGLE** workspaces is a tradeoff between the performance & memory footprint:
-
-* **SEPARATE** is slightly slower, but uses less memory.
-* **SINGLE** is slightly faster, but uses more memory.
-
-That said, it’s fine to use different modes for training & inference (i.e. use SEPARATE for training, and use SINGLE for inference, since inference only involves a feed-forward loop without backpropagation or updaters involved).
-
-With workspaces enabled, all memory used during training will be reusable and tracked without the JVM GC interference.
-The only exclusion is the `output()` method that uses workspaces (if enabled) internally for the feed-forward loop. Subsequently, it detaches the resulting `INDArray` from the workspaces, thus providing you with independent `INDArray` which will be handled by the JVM GC.
-
-***Please note***: After the 1.0.0-alpha release, workspaces in DL4J were refactored - SEPARATE/SINGLE modes have been deprecated, and users should use ENABLED instead.
-
-## Garbage Collector
-
-If your training process uses workspaces, we recommend that you disable (or reduce the frequency of) periodic GC calls. That can be done like so:
-
-```java
-// this will limit frequency of gc calls to 5000 milliseconds
-Nd4j.getMemoryManager().setAutoGcWindow(5000)
-
-// OR you could totally disable it
-Nd4j.getMemoryManager().togglePeriodicGc(false);
-```
-
-Put that somewhere before your `model.fit(...)` call.
-
-## ParallelWrapper & ParallelInference
-
-For `ParallelWrapper`, the workspace-mode configuration option was also added. As such, each of the trainer threads will use a separate workspace attached to the designated device.
-
-
-```java
-ParallelWrapper wrapper = new ParallelWrapper.Builder(model)
-      // DataSets prefetching options. Buffer size per worker.
-      .prefetchBuffer(8)
-
-      // set number of workers equal to number of GPUs.
-      .workers(2)
-
-      // rare averaging improves performance but might reduce model accuracy
-      .averagingFrequency(5)
-
-      // if set to TRUE, on every averaging model score will be reported
-      .reportScoreAfterAveraging(false)
-
-      // 3 options here: NONE, SINGLE, SEPARATE
-      .workspaceMode(WorkspaceMode.SINGLE)
-
-      .build();
-```
-
-## Iterators
-
-We provide asynchronous prefetch iterators, `AsyncDataSetIterator` and `AsyncMultiDataSetIterator`, which are usually used internally. 
-
-These iterators optionally use a special, cyclic workspace mode to obtain a smaller memory footprint. The size of the workspace, in this case, will be determined by the memory requirements of the first `DataSet` coming out of the underlying iterator, whereas the buffer size is defined by the user. The workspace will be adjusted if memory requirements change over time (e.g. if you’re using variable-length time series).
-
-***Caution***: If you’re using a custom iterator or the `RecordReader`, please make sure you’re not initializing something huge within the first `next()` call. Do that in your constructor to avoid undesired workspace growth.
-
-***Caution***: With `AsyncDataSetIterator` being used, `DataSets` are supposed to be used before calling the `next()` DataSet. You are not supposed to store them, in any way, without the `detach()` call. Otherwise, the memory used for `INDArrays` within DataSet will be overwritten within `AsyncDataSetIterator` eventually.
-
-If for some reason you don’t want your iterator to be wrapped into an asynchronous prefetch (e.g. for debugging purposes), special wrappers are provided: `AsyncShieldDataSetIterator` and `AsyncShieldMultiDataSetIterator`. Basically, those are just thin wrappers that prevent prefetch.
-
-## Evaluation
-
-Usually, evaluation assumes use of the `model.output()` method, which essentially returns an `INDArray` detached from the workspace. In the case of regular evaluations during training, it might be better to use the built-in methods for evaluation. For example:
-
-```
-Evaluation eval = new Evaluation(outputNum);
-ROC roceval = new ROC(outputNum);
-model.doEvaluation(iteratorTest, eval, roceval);
-```
-
-This piece of code will run a single cycle over `iteratorTest`, and it will update both (or less/more if required by your needs) `IEvaluation` implementations without any additional `INDArray` allocation. 
-
-## Workspace Destruction
-
-There are also some situations, say, where you're short on RAM, and might want do release all workspaces created out of your control; e.g. during evaluation or training.
-
-That could be done like so: `Nd4j.getWorkspaceManager().destroyAllWorkspacesForCurrentThread();`
-
-This method will destroy all workspaces that were created within the calling thread. If you've created workspaces in some external threads on your own, you can use the same method in that thread, after the workspaces are no longer needed.
-
-## Workspace Exceptions
-
-If workspaces are used incorrectly (such as a bug in a custom layer or data pipeline, for example), you may see an error message such as:
-```
-org.nd4j.linalg.exception.ND4JIllegalStateException: Op [set] Y argument uses leaked workspace pointer from workspace [LOOP_EXTERNAL]
-For more details, see the ND4J User Guide: nd4j.org/userguide#workspaces-panic
-```
-
-
-## DL4J's LayerWorkspaceMgr
-
-DL4J's Layer API includes the concept of a "layer workspace manager".
-
-The idea with this class is that it allows us to easily and precisely control the location of a given array, given different possible configurations for the workspaces.
-For example, the activations out of a layer may be placed in one workspace during inference, and another during training; this is for performance reasons.
-However, with the LayerWorkspaceMgr design, implementers of layers don't need to worry about this.
-
-What does this mean in practice? Usually it's quite simple...
-* When returning activations (`activate(boolean training, LayerWorkspaceMgr workspaceMgr)` method), make sure the returned array is defined in `ArrayType.ACTIVATIONS` (i.e., use LayerWorkspaceMgr.create(ArrayType.ACTIVATIONS, ...) or similar)
-* When returning activation gradients (`backpropGradient(INDArray epsilon, LayerWorkspaceMgr workspaceMgr)`), similarly return an array defined in `ArrayType.ACTIVATION_GRAD`
-
-You can also leverage an array defined in any workspace to the appropriate workspace using, for example, `LayerWorkspaceMgr.leverageTo(ArrayType.ACTIVATIONS, myArray)`
-
-
-Note that if you are *not* implementing a custom layer (and instead just want to perform forward pass for a layer outside of a MultiLayerNetwork/ComputationGraph) you can use `LayerWorkspaceMgr.noWorkspaces()`.
-
--- a/docs/deeplearning4j/templates/contribute.md
+++ b/docs/deeplearning4j/templates/contribute.md
@ -1,63 +0,0 @@
---
-title: Contributor's Guide
-short_title: Contribute
-description: How to contribute to the Eclipse Deeplearning4j source code.
-category: Get Started
-weight: 10
---
-
-## Prerequisites
-
-Before contributing, make sure you know the structure of all of the Eclipse Deeplearning4j libraries. As of early 2018, all libraries now live in the Deeplearning4j [monorepo](https://github.com/eclipse/deeplearning4j). These include:
-
- DeepLearning4J: Contains all of the code for learning neural networks, both on a single machine and distributed.
- ND4J: “N-Dimensional Arrays for Java”. ND4J is the mathematical backend upon which DL4J is built. All of DL4J’s neural networks are built using the operations (matrix multiplications, vector operations, etc) in ND4J. ND4J is how DL4J supports both CPU and GPU training of networks, without any changes to the networks themselves. Without ND4J, there would be no DL4J.
- DataVec: DataVec handles the data import and conversion side of the pipeline. If you want to import images, video, audio or simply CSV data into DL4J: you probably want to use DataVec to do this.
- Arbiter: Arbiter is a package for (amongst other things) hyperparameter optimization of neural networks. Hyperparameter optimization refers to the process of automating the selection of network hyperparameters (learning rate, number of layers, etc) in order to obtain good performance.
-
-We also have an extensive examples repository at [dl4j-examples](https://github.com/eclipse/deeplearning4j-examples).
-
-
-## Ways to contribute
-
-There are numerous ways to contribute to DeepLearning4J (and related projects), depending on your interests and experince. Here’s some ideas:
-
- Add new types of neural network layers (for example: different types of RNNs, locally connected networks, etc)
- Add a new training feature
- Bug fixes
- DL4J examples: Is there an application or network architecture that we don’t have examples for?
- Testing performance and identifying bottlenecks or areas to improve
- Improve website documentation (or write tutorials, etc)
- Improve the JavaDocs
-
-
-There are a number of different ways to find things to work on. These include:
-
- Looking at the issue trackers:
-https://github.com/eclipse/deeplearning4j/issues
-https://github.com/eclipse/deeplearning4j-examples/issues
- Reviewing our Roadmap
- Talking to the developers on Gitter, especially our early adopters channel
- Reviewing recent papers and blog posts on training features, network architectures and applications
- Reviewing the website and examples - what seems missing, incomplete, or would simply be useful (or cool) to have?
-
-## General guidelines
-
-Before you dive in, there’s a few things you need to know. In particular, the tools we use:
-
- Maven: a dependency management and build tool, used for all of our projects. See this for details on Maven.
- Git: the version control system we use
- Project Lombok: Project Lombok is a code generation/annotation tool that is aimed to reduce the amount of ‘boilerplate’ code (i.e., standard repeated code) needed in Java. To work with source, you’ll need to install the Project Lombok plugin for your IDE
- VisualVM: A profiling tool, most useful to identify performance issues and bottlenecks.
- IntelliJ IDEA: This is our IDE of choice, though you may of course use alternatives such as Eclipse and NetBeans. You may find it easier to use the same IDE as the developers in case you run into any issues. But this is up to you.
-
-Things to keep in mind:
-
- Code should be Java 7 compliant
- If you are adding a new method or class: add JavaDocs
- You are welcome to add an author tag for significant additions of functionality. This can also help future contributors, in case they need to ask questions of the original author. If multiple authors are present for a class: provide details on who did what (“original implementation”, “added feature x” etc)
- Provide informative comments throughout your code. This helps to keep all code maintainable.
- Any new functionality should include unit tests (using JUnit) to test your code. This should include edge cases.
- If you add a new layer type, you must include numerical gradient checks, as per these unit tests. These are necessary to confirm that the calculated gradients are correct
- If you are adding significant new functionality, consider also updating the relevant section(s) of the website, and providing an example. After all, functionality that nobody knows about (or nobody knows how to use) isn’t that helpful. Adding documentation is definitely encouraged when appropriate, but strictly not required.
- If you are unsure about something - ask us on Gitter!
--- a/docs/deeplearning4j/templates/examples-tour.md
+++ b/docs/deeplearning4j/templates/examples-tour.md
@ -1,285 +0,0 @@
---
-title: Tour of Eclipse Deeplearning4j Examples
-short_title: Examples Tour
-description: Brief tour of available examples in DL4J.
-category: Get Started
-weight: 10
---
-
-## Survey of DeepLearning4j Examples
-
-Deeplearning4j's Github repository has many examples to cover its functionality. The [Quick Start Guide](./deeplearning4j-quickstart) shows you how to set up Intellij and clone the repository. This page provides an overview of some of those examples.
-
-## DataVec examples
-
-Most of the examples make use of DataVec, a toolkit for preprocessing and clearning data through normalization, standardization, search and replace, column shuffles and vectorization. Reading raw data and transforming it into a DataSet object for your Neural Network is often the first step toward training that network. If you're unfamiliar with DataVec, here is a description and some links to useful examples. 
-
-### IrisAnalysis.java
-
-This example takes the canonical Iris dataset of the flower species of the same name, whose relevant measurements are sepal length, sepal width, petal length and petal width. It builds a Spark RDD from the relatively small dataset and runs an analysis against it. 
-
-[Show me the code](https://github.com/eclipse/deeplearning4j-examples/blob/master/datavec-examples/src/main/java/org/datavec/transform/analysis/IrisAnalysis.java)
-
-### BasicDataVecExample.java
-
-This example loads data into a Spark RDD. All DataVec transform operations use Spark RDDs. Here, we use DataVec to filter data, apply time transformations and remove columns.
-
-[Show me the code](https://github.com/eclipse/deeplearning4j-examples/blob/master/datavec-examples/src/main/java/org/datavec/transform/basic/BasicDataVecExample.java)
-
-### PrintSchemasAtEachStep.java
-
-This example shows the print Schema tools that are useful to visualize and to ensure that the code for the transform is behaving as expected. 
-
-[Show me the code](https://github.com/eclipse/deeplearning4j-examples/blob/master/datavec-examples/src/main/java/org/datavec/transform/debugging/PrintSchemasAtEachStep.java)
-
-### JoinExample.java
-
-You may need to join datasets before passing to a neural network. You can do that in DataVec, and this example shows you how. 
-
-[Show me the code](https://github.com/eclipse/deeplearning4j-examples/blob/master/datavec-examples/src/main/java/org/datavec/transform/join/JoinExample.java)
-
-### LogDataExample.java
-
-This is an example of parsing log data using DataVec. The obvious use cases are cybersecurity and customer relationship management. 
-
-[Show me the code](https://github.com/eclipse/deeplearning4j-examples/blob/master/datavec-examples/src/main/java/org/datavec/transform/logdata/LogDataExample.java)
-
-### MnistImagePipelineExample.java
-
-This example is from the video below, which demonstrates the ParentPathLabelGenerator and ImagePreProcessing scaler. 
-
-<iframe width="560" height="315" src="http://www.youtube.com/embed/GLC8CIoHDnI" frameborder="0" allowfullscreen></iframe>
-
-[Show me the code](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/dataExamples/MnistImagePipelineExample.java)
-
-### PreprocessNormalizerExample.java
-
-This example demonstrates preprocessing features available in DataVec.
-
-[Show me the code](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/dataexamples/PreprocessNormalizerExample.java)
-
-### CSVExampleEvaluationMetaData.java
-
-DataMeta data tracking - i.e. seeing where data for each example comes from - is useful when tracking down malformed data that causes errors and other issues. This example demostrates the functionality in the RecordMetaData class. 
-
-[Show me the code](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/dataexamples/CSVExampleEvaluationMetaData.java)
-
---
-
-## DeepLearning4J Examples
-
-To build a neural net, you will use either `MultiLayerNetwork` or `ComputationGraph`. Both options work using a Builder interface. A few highlights from the examples are described below. 
-
-### MNIST dataset of handwritten digits
-
-MNIST is the "Hello World" of deep learning. Simple, straightforward, and focussed on image recognition, a task that Neural Networks do well. 
-
-### MLPMnistSingleLayerExample.java
-
-This is a Single Layer Perceptron for recognizing digits. Note that this pulls the images from a binary package containing the dataset, a rather special case for data ingestion.
-
-[Show me the code](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/feedforward/mnist/MLPMnistSingleLayerExample.java)
-
-### MLPMnistTwoLayerExample.java
-
-A two-layer perceptron for MNIST, showing there is more than one useful network for a given dataset. 
-
-[Show me the code](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/feedforward/mnist/MLPMnistTwoLayerExample.java)
-
-### Feedforward Examples
-
-Data flows through feed-forward neural networks in a single pass from input via hidden layers to output.
-
-These networks can be used for a wide range of tasks depending on they are configured. Along with image classification over MNIST data, this directory has examples demonstrating regression, classification, and anomoly detection.
-
-[Show me the code](https://github.com/eclipse/deeplearning4j-examples/tree/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/feedforward)
-
-### Convolutional Neural Networks
-
-Convolutional Neural Networks are mainly used for image recognition, although they apply to sound and text as well. 
-
-### AnimalsClassification.java
-
-This example can be run using either LeNet or AlexNet. 
-
-[Show me the code](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/convolution/AnimalsClassification.java)
-
---
-
-## Saving and Loading Models
-
-Training a network over a large volume of training data takes time. Fortunately, you can save a trained model and
-load the model for later training or inference.
-
-### SaveLoadComputationGraph.java
-
-This demonstrates saving and loading a network build using the class ComputationGraph.
-
-[Show me the code](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/misc/modelsaving/SaveLoadComputationGraph.java)
-
-### SaveLoadMultiLayerNetwork.java
-
-Demonstrates saving and loading a Neural Network built with the class MultiLayerNetwork.
-
-### Saving/loading a trained model and passing it new input 
-
-Our video series shows code that includes saving and loading models, as well as inference. 
-
-[Our YouTube channel](https://www.youtube.com/channel/UCa-HKBJwkfzs4AgZtdUuBXQ)
-
---
-
-## Custom Loss Functions and Layers
-
-Do you need to add a Loss Function that is not available or prebuilt yet? Check out these examples.
-
-### CustomLossExample.java
-
-[Show me the code](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/misc/lossfunctions/CustomLossExample.java)
-
-### CustomLossL1L2.java
-
-[Show me the code](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/misc/lossfunctions/CustomLossL1L2.java)
-
-### Custom Layer
-
-Do you need to add a layer with features that aren't available in DeepLearning4J core? This example show where to begin. 
-
-### CustomLayerExample.java
-
-[Show me the code](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/misc/customlayers/CustomLayerExample.java)
-
---
-
-## Natural Language Processing
-
-Neural Networks for NLP? We have those, too.
-
-### GloVe 
-
-Global Vectors for Word Representation are useful for detecting relationships between words. 
-
-[Show me the code](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/nlp/glove/GloVeExample.java)
-
-### Paragraph Vectors
-
-A vectorized representation of words. Described [here](https://cs.stanford.edu/~quocle/paragraph_vector.pdf)
-
-[Show me the code](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/nlp/paragraphvectors/ParagraphVectorsClassifierExample.java)
-
-### Sequence Vectors
-
-One way to represent sentences is as a sequence of words. 
-
-[Show me the code](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/nlp/sequencevectors/SequenceVectorsTextExample.java)
-
-### Word2Vec
-
-Described [here](https://deeplearning4j.org/word2vec.html)
-
-[Show me the code](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/nlp/word2vec/Word2VecRawTextExample.java)
-
---
-
-## Data Visualization
-
-t-Distributed Stochastic Neighbor Embedding (t-SNE) is useful for data visualization. We include an example in the NLP section since word similarity visualization is a common use. 
-
-[Show me the code](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/nlp/tsne/TSNEStandardExample.java)
-
---
-
-## Recurrent Neural Networks
-
-Recurrent Neural Networks are useful for processing time series data or other sequentially fed data like video. 
-
-The examples folder for Recurrent Neural Networks has the following:
-
-### BasicRNNExample.java
-
-An RNN learns a string of characters.
-
-[Show me the code](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/recurrent/basic/BasicRNNExample.java)
-
-### GravesLSTMCharModellingExample.java
-
-Takes the complete works of Shakespeare as a sequence of characters and Trains a Neural Net to generate "Shakespeare" one character at a time.
-
-[Show me the code](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/recurrent/character/GravesLSTMCharModellingExample.java)
-
-### SingleTimestepRegressionExample.java
-
-Regression with an LSTM (Long Short Term Memory) Recurrent Neural Network. 
-
-[Show me the code](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/recurrent/regression/SingleTimestepRegressionExample.java)
-
-### AdditionRNN.java
-
-This example trains a neural network to do addition. 
-
-[Show me the code](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/recurrent/seq2seq/AdditionRNN.java)
-
-### RegressionMathFunctions.java
-
-This example trains a neural network to perform various math operations. 
-
-[Show me the code](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/feedforward/regression/RegressionMathFunctions.java)
-
-### UCISequenceClassificationExample.java
-
-A publicly available dataset of time series data of six classes, cyclic, up-trending, etc. Example of an RNN learning to classify the time series. 
-
-[Show me the code](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/recurrent/seqclassification/UCISequenceClassificationExample.java)
-
-### VideoClassificationExample.java
-
-How do autonomous vehicles distinguish between a pedestrian, a stop sign and a green light? A complex neural net using Convolutional and Recurrent layers is trained on a set of training videos. The trained network is passed live onboard video and decisions based on object detection from the Neural Net determine the vehicles actions. 
-
-This example is similar, but simplified. It combines convolutional, max pooling, dense (feed forward) and recurrent (LSTM) layers to classify frames in a video. 
-
-[Show me the code](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/recurrent/video/VideoClassificationExample.java)
-
-### SentimentExampleIterator.java
-
-This sentiment analysis example classifies sentiment as positive or negative using word vectors and a Recurrent Neural Network. 
-
-[Show me the code](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/recurrent/word2vecsentiment/Word2VecSentimentRNN.java)
-
---
-
-## Distributed Training on Spark
-
-DeepLearning4j supports using a Spark Cluster for network training. Here are the examples. 
-
-### MnistMLPExample.java
-
-This is an example of a Multi-Layer Perceptron training on the Mnist data set of handwritten digits. 
-[Show me the code](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-spark-examples/dl4j-spark/src/main/java/org/deeplearning4j/mlp/MnistMLPExample.java)
-
-### SparkLSTMCharacterExample.java
-
-An LSTM recurrent Network in Spark. 
-
-[Show me the code](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-spark-examples/dl4j-spark/src/main/java/org/deeplearning4j/rnn/SparkLSTMCharacterExample.java)
-
---
-
-## ND4J Examples
-
-ND4J is a tensor processing library. It can be thought of as Numpy for the JVM. Neural Networks work by processing and updating MultiDimensional arrays of numeric values. In a typical Neural Net application you use DataVec to ingest and convert the data to numeric. Classes used would be RecordReader. Once you need to pass data into a Neural Network, you typically use RecordReaderDataSetIterator. RecordReaderDataSetIterator returns a DataSet object. DataSet consists of an NDArray of the input features and an NDArray of the labels. 
-
-The learning algorithms and loss functions are executed as ND4J operations. 
-
-### Basic ND4J examples
-
-This is a directory with examples for creating and manipulating NDArrays.
-
-[Show me the code](https://github.com/eclipse/deeplearning4j-examples/tree/master/nd4j-examples/src/main/java/org/nd4j/examples)
-
---
-
-## Reinforcement Learning Examples
-
-Deep learning algorithms have learned to play Space Invaders and Doom using reinforcement learning. DeepLearning4J/RL4J examples of Reinforcement Learning are available here: 
-
-[Show me the code](https://github.com/eclipse/deeplearning4j-examples/tree/master/rl4j-examples)
--- a/docs/deeplearning4j/templates/quickstart.md
+++ b/docs/deeplearning4j/templates/quickstart.md
@ -1,251 +0,0 @@
---
-title: Deeplearning4j Quickstart
-short_title: Quickstart
-description: Quickstart for Java using Maven
-category: Get Started
-weight: 1
---
-
-## Get started
-
-This is everything you need to run DL4J examples and begin your own projects.
-
-We recommend that you join our [Gitter Live Chat](https://gitter.im/deeplearning4j/deeplearning4j). Gitter is where you can request help and give feedback, but please do use this guide before asking questions we've answered below. If you are new to deep learning, we've included [a road map for beginners](./deeplearning4j-beginners) with links to courses, readings and other resources.
-
-### A Taste of Code
-
-Deeplearning4j is a domain-specific language to configure deep neural networks, which are made of multiple layers. Everything starts with a `MultiLayerConfiguration`, which organizes those layers and their hyperparameters.
-
-Hyperparameters are variables that determine how a neural network learns. They include how many times to update the weights of the model, how to initialize those weights, which activation function to attach to the nodes, which optimization algorithm to use, and how fast the model should learn. This is what one configuration would look like:
-
-```java
-    MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
-        .weightInit(WeightInit.XAVIER)
-        .activation("relu")
-        .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
-        .updater(new Sgd(0.05))
-        // ... other hyperparameters
-        .list()
-        .backprop(true)
-        .build();
-```
-
-With Deeplearning4j, you add a layer by calling `layer` on the `NeuralNetConfiguration.Builder()`, specifying its place in the order of layers (the zero-indexed layer below is the input layer), the number of input and output nodes, `nIn` and `nOut`, as well as the type: `DenseLayer`.
-
-```java
-        .layer(0, new DenseLayer.Builder().nIn(784).nOut(250)
-                .build())
-```
-
-Once you've configured your net, you train the model with `model.fit`.
-
-## Prerequisites
-
-* [Java (developer version)](#Java) 1.7 or later (**Only 64-Bit versions supported**)
-* [Apache Maven](#Maven) (automated build and dependency manager)
-* [IntelliJ IDEA](#IntelliJ) or Eclipse
-* [Git](#Git)
-
-You should have these installed to use this QuickStart guide. DL4J targets professional Java developers who are familiar with production deployments, IDEs and automated build tools. Working with DL4J will be easiest if you already have experience with these.
-
-If you are new to Java or unfamiliar with these tools, read the details below for help with installation and setup. Otherwise, **skip to <a href="#examples">DL4J Examples</a>**.
-
-#### <a name="Java">Java</a>
-
-If you don't have Java 1.7 or later, download the current [Java Development Kit (JDK) here](http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html). To check if you have a compatible version of Java installed, use the following command:
-
-```shell
-java -version
-```
-
-Please make sure you have a 64-Bit version of java installed, as you will see an error telling you `no jnind4j in java.library.path` if you decide to try to use a 32-Bit version instead. Make sure the JAVA_HOME environment variable is set.
-
-#### <a name="Maven">Apache Maven</a>
-
-Maven is a dependency management and automated build tool for Java projects. It works well with IDEs such as IntelliJ and lets you install DL4J project libraries easily. [Install or update Maven](https://maven.apache.org/download.cgi) to the latest release following [their instructions](https://maven.apache.org/install.html) for your system. To check if you have the most recent version of Maven installed, enter the following:
-
-```shell
-mvn --version
-```
-
-If you are working on a Mac, you can simply enter the following into the command line:
-
-```shell
-brew install maven
-```
-
-Maven is widely used among Java developers and it's pretty much mandatory for working with DL4J. If you come from a different background, and Maven is new to you, check out [Apache's Maven overview](http://maven.apache.org/what-is-maven.html) and our [introduction to Maven for non-Java programmers](./deeplearning4j-config-maven), which includes some additional troubleshooting tips. [Other build tools](./deeplearning4j-config-buildtools) such as Ivy and Gradle can also work, but we support Maven best.
-
-* [Paul Dubs' guide to maven](http://www.dubs.tech/guides/maven-essentials/)
-
-* [Maven In Five Minutes](http://maven.apache.org/guides/getting-started/maven-in-five-minutes.html)
-
-#### <a name="IntelliJ">IntelliJ IDEA</a>
-
-An Integrated Development Environment ([IDE](http://encyclopedia.thefreedictionary.com/integrated+development+environment)) allows you to work with our API and configure neural networks in a few steps. We strongly recommend using [IntelliJ](https://www.jetbrains.com/idea/download/), which communicates with Maven to handle dependencies. The [community edition of IntelliJ](https://www.jetbrains.com/idea/download/) is free.
-
-There are other popular IDEs such as [Eclipse](http://books.sonatype.com/m2eclipse-book/reference/creating-sect-importing-projects.html) and [Netbeans](http://wiki.netbeans.org/MavenBestPractices). However, IntelliJ is preferred, and using it will make finding help on [Gitter Live Chat](https://gitter.im/deeplearning4j/deeplearning4j) easier if you need it.
-
-#### <a name="Git">Git</a>
-
-Install the [latest version of Git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git). If you already have Git, you can update to the latest version using Git itself:
-
-```shell
-$ git clone git://git.kernel.org/pub/scm/git/git.git
-```
-
-The latest version of Mac's Mojave OS breaks git, producing the following error message:
-
-```xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
-```
-
-This can be fixed by running:
-
-```xcode-select --install
-```
-
-## <a name="examples">DL4J Examples in a Few Easy Steps</a>
-
-1. Use the command line to enter the following:
-
-```shell
-$ git clone https://github.com/eclipse/deeplearning4j-examples.git
-$ cd dl4j-examples/
-$ mvn clean install
-```
-
-2. Open IntelliJ and choose Import Project. Then select the main 'dl4j-examples' directory. (Note: the example in the illustration below refers to an outdated repository named dl4j-0.4-examples. However, the repository that you will download and install will be called dl4j-examples).
-
-![select directory](/images/guide/Install_IntJ_1.png)
-
-3. Choose 'Import project from external model' and ensure that Maven is selected.
-![import project](/images/guide/Install_IntJ_2.png)
-
-4. Continue through the wizard's options. Select the SDK that begins with `jdk`. (You may need to click on a plus sign to see your options...) Then click Finish. Wait a moment for IntelliJ to download all the dependencies. You'll see the horizontal bar working on the lower right.
-
-5. Pick an example from the file tree on the left.
-![run IntelliJ example](/images/guide/Install_IntJ_3.png)
-Right-click the file to run.
-
-## Using DL4J In Your Own Projects: Configuring the POM.xml File
-
-To run DL4J in your own projects, we highly recommend using Maven for Java users, or a tool such as SBT for [Scala](https://github.com/SkymindIO/SKIL_Examples/blob/master/skil_example_notebooks/scala/uci_quickstart_notebook.scala). The basic set of dependencies and their versions are shown below. This includes:
-
- `deeplearning4j-core`, which contains the neural network implementations
- `nd4j-native-platform`, the CPU version of the ND4J library that powers DL4J
- `datavec-api` - Datavec is our library vectorizing and loading data
-
-Every Maven project has a POM file. Here is [how the POM file should appear](https://github.com/eclipse/deeplearning4j-examples/blob/master/pom.xml) when you run your examples.
-
-Within IntelliJ, you will need to choose the first Deeplearning4j example you're going to run. We suggest `MLPClassifierLinear`, as you will almost immediately see the network classify two groups of data in our UI. The file on [Github can be found here](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/feedforward/classification/MLPClassifierLinear.java).
-
-To run the example, right click on it and select the green button in the drop-down menu. You will see, in IntelliJ's bottom window, a series of scores. The rightmost number is the error score for the network's classifications. If your network is learning, then that number will decrease over time with each batch it processes. At the end, this window will tell you how accurate your neural-network model has become:
-
-![mlp classifier results](/images/guide/mlp_classifier_results.png)
-
-In another window, a graph will appear, showing you how the multilayer perceptron (MLP) has classified the data in the example. It will look like this:
-
-![mlp classifier viz](/images/guide/mlp_classifier_viz.png)
-
-Congratulations! You just trained your first neural network with Deeplearning4j.
-
-## Next Steps
-
-1. Join us on Gitter. We have three big community channels.
-  * [DL4J Live Chat](https://gitter.im/deeplearning4j/deeplearning4j) is the main channel for all things DL4J. Most people hang out here.
-  * [Tuning Help](https://gitter.im/deeplearning4j/deeplearning4j/tuninghelp) is for people just getting started with neural networks. Beginners please visit us here!
-  * [Early Adopters](https://gitter.im/deeplearning4j/deeplearning4j/earlyadopters) is for those who are helping us vet and improve the next release. WARNING: This is for more experienced folks.
-2. Read the [introduction to deep neural networks](https://skymind.ai/wiki/neural-network).
-3. Check out the more detailed [Comprehensive Setup Guide](./deeplearning4j-quickstart).
-4. Browse the [DL4J documentation](./).
-5. **Python folks**: If you plan to run benchmarks on Deeplearning4j comparing it to well-known Python framework [x], please read [these instructions](./deeplearning4j-benchmark) on how to optimize heap space, garbage collection and ETL on the JVM. By following them, you will see at least a *10x speedup in training time*.
-
-### Additional links
-
- [Deeplearning4j artifacts on Maven Central](http://search.maven.org/#search%7Cga%7C1%7Cdeeplearning4j)
- [ND4J artifacts on Maven Central](http://search.maven.org/#search%7Cga%7C1%7Cnd4j)
- [Datavec artifacts on Maven Central](http://search.maven.org/#search%7Cga%7C1%7Cdatavec)
- [Scala code for UCI notebook](https://github.com/SkymindIO/SKIL_Examples/blob/master/skil_example_notebooks/scala/uci_quickstart_notebook.scala)
-
-### Troubleshooting
-
-**Q:** I'm using a 64-Bit Java on Windows and still get the `no jnind4j in java.library.path` error
-
-**A:** You may have incompatible DLLs on your PATH. To tell DL4J to ignore those, you have to add the following as a VM parameter (Run -> Edit Configurations -> VM Options in IntelliJ):
-
-```
-Djava.library.path=""
-```
-**Q:** **SPARK ISSUES** I am running the examples and having issues with the Spark based examples such as distributed training or datavec transform options.
-
-
-**A:** You may be missing some dependencies that Spark requires. See this [Stack Overflow discussion](https://stackoverflow.com/a/38735202/3892515) for a discussion of potential dependency issues. Windows users may need the winutils.exe from Hadoop.
-
-Download winutils.exe from https://github.com/steveloughran/winutils and put it into the null/bin/winutils.exe (or create a hadoop folder and add that to HADOOP_HOME)
-
-### Troubleshooting: Debugging UnsatisfiedLinkError on Windows
-
-Windows users might be seeing something like:
-
-```
-Exception in thread "main" java.lang.ExceptionInInitializerError
-at org.deeplearning4j.nn.conf.NeuralNetConfiguration$Builder.seed(NeuralNetConfiguration.java:624)
-at org.deeplearning4j.examples.feedforward.anomalydetection.MNISTAnomalyExample.main(MNISTAnomalyExample.java:46)
-Caused by: java.lang.RuntimeException: org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: http://nd4j.org/getstarted.html
-at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5556)
-at org.nd4j.linalg.factory.Nd4j.(Nd4j.java:189)
-... 2 more
-Caused by: org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: http://nd4j.org/getstarted.html
-at org.nd4j.linalg.factory.Nd4jBackend.load(Nd4jBackend.java:259)
-at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5553)
-... 3 more
-```
-
-If that is the issue, see [this page](https://github.com/bytedeco/javacpp-presets/wiki/Debugging-UnsatisfiedLinkError-on-Windows#using-dependency-walker). In this case replace with "Nd4jCpu".
-
-### Eclipse setup without Maven
-
-We recommend and use Maven and Intellij. If you prefer Eclipse and dislike Maven here is a nice [blog post](http://electronsfree.blogspot.com/2016/10/how-to-setup-dl4j-project-with-eclipse.html) to walk you through an Eclipse configuration.
-
-## Quickstart template
-
-Now that you've learned how to run the different examples, we've made a template available for you that has a basic EMNIST trainer with early stopping and evaluation code.
-
-The Quickstart template is available at [https://github.com/deeplearning4j/dl4j-quickstart](https://github.com/deeplearning4j/dl4j-quickstart).
-
-To use the template:
-
-1. Clone to your local machine `git clone https://github.com/deeplearning4j/dl4j-quickstart.git`
-2. Import the `dl4j-quickstart` main folder into IntelliJ.
-3. Start coding!
-
-## More about Eclipse Deeplearning4j
-
-Deeplearning4j is a framework that lets you pick and choose with everything available from the beginning. We're not Tensorflow (a low-level numerical computing library with automatic differentiation) or Pytorch. Deeplearning4j has several subprojects that make it easy-ish to build end-to-end applications.
-
-If you'd like to deploy models to production, you might like our [model import from Keras](./keras-import-get-started).
-
-Deeplearning4j has several submodules. These range from a visualization UI to distributed training on Spark. For an overview of these modules, please look at the [**Deeplearning4j examples on Github**](https://github.com/eclipse/deeplearning4j-examples).
-
-To get started with a simple desktop app, you need two things: An [nd4j backend](http://nd4j.org/backend.html) and `deeplearning4j-core`. For more code, see the [simpler examples submodule](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/pom.xml#L64).
-
-If you want a flexible deep-learning API, there are two ways to go.  You can use nd4j standalone See our [nd4j examples](https://github.com/eclipse/deeplearning4j-examples/tree/master/nd4j-examples) or the [computation graph API](http://deeplearning4j.org/compgraph).
-
-If you want distributed training on Spark, you can see our [Spark page](http://deeplearning4j.org/spark)
-Keep in mind that we cannot setup Spark for you. If you want to set up distributed Spark and GPUs, that is largely up to you. Deeplearning4j simply deploys as a JAR file on an existing Spark cluster.
-
-If you want Spark with GPUs, we recommend [Spark with Mesos](https://spark.apache.org/docs/latest/running-on-mesos.html).
-
-If you want to deploy on mobile, you can see our [Android page](./deeplearning4j-android).
-
-We deploy optimized code for various hardware architectures natively. We use C++ based for loops just like everybody else.
-For that, please see our [C++ framework libnd4j](https://github.com/eclipse/deeplearning4j/tree/master/libnd4j).
-
-Deeplearning4j has two other notable components:
-
-* [Arbiter: hyperparameter optimization and model evaluation](./arbiter-overview)
-* [DataVec: built-in ETL for machine-learning data pipelines](./datavec-overview)
-
-Deeplearning4j is meant to be an end-to-end platform for building real applications, not just a tensor library with automatic differentiation. If you want a tensor library with autodiff, please see ND4J and [samediff](https://github.com/eclipse/deeplearning4j/tree/master/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/autodiff). Samediff is still in alpha, but if you want to contribute, please join our [live chat on Gitter](https://gitter.im/deeplearning4j/deeplearning4j).
-
-Lastly, if you are benchmarking Deeplearnin4j, please consider coming in to our live chat and asking for tips. Deeplearning4j has [all the knobs](./deeplearning4j-config-gpu-cpu), but some may not work exactly like the Python frameworks to do. You have to build Deeplearning4j from source for some applications.
--- a/docs/deeplearning4j/templates/troubleshooting-training.md
+++ b/docs/deeplearning4j/templates/troubleshooting-training.md
@ -1,140 +0,0 @@
---
-title: Troubleshooting
-short_title: Troubleshooting
-description: Understanding common errors like NaNs and tuning hyperparameters.
-category: Tuning & Training
-weight: 0
---
-
-## Troubleshooting Neural Net Training
-
-Neural networks can be difficult to tune. If the network hyperparameters are poorly chosen, the network may learn slowly, or perhaps not at all. This page aims to provide some baseline steps you should take when tuning your network.
-
-Many of these tips have already been discussed in the academic literature. Our purpose is to consolidate them in one site and express them as clearly as possible.
-
-## Contents
-
-* <a href="#normalization">Data Normalization</a>
-* <a href="#weight">Weight Initialization</a>
-* <a href="#epochs">Epochs and Iterations</a>
-* <a href="#lrate">Learning Rate</a>
-* <a href="#activation">Activation Function</a>
-* <a href="#loss">Loss Function</a>
-* <a href="#regularization">Regularization</a>
-* <a href="#minibatch">Minibatch Size</a>
-* <a href="#updater">Updater and Optimization Algorithm</a>
-* <a href="#normalization">Gradient Normalization</a>
-* <a href="#rnn">Recurrent Neural Networks</a>
-* <a href="#dbn">Deep Belief Network</a>
-* <a href="#rbm">Restricted Boltzmann Machines</a>
-* <a href="#NaN">NaN, Not a Number issues</a>
-
-
-## <a name="normalization">Data Normalization</a>
-
-What's distribution of your data? Are you scaling it properly? As a general rule:
-
- For continuous values: you want these to be in the range of -1 to 1, 0 to 1 or ditributed normally with mean 0 and standard deviation 1. This does not have to be exact, but ensuring your inputs are approximately in this range can help during training. Scale down large inputs, and scale up small inputs.
- For discrete classes (and, for classification problems for the output), generally use a one-hot representation. That is, if you have 3 classes, then your data will be represeted as [1,0,0], [0,1,0] or [0,0,1] for each of the 3 classes respectively.
-
-Note that it's very important to use the exact same normalization method for both the training data and testing data.
-
-## <a name="weight">Weight Initialization</a>
-
-Deeplearning4j supports several different kinds of weight initializations with the weightInit parameter. These are set using the .weightInit(WeightInit) method in your configuration.
-
-You need to make sure your weights are neither too big nor too small. Xavier weight initialization is usually a good choice for this. For networks with rectified linear (relu) or leaky relu activations, RELU weight initialization is a sensible choice.
-
-## <a name="epochs">Number of Epochs and Number of Iterations</a>
-
-An epoch is defined as a full pass of the data set.
-
-Too few epochs don't give your network enough time to learn good parameters; too many and you might overfit the training data. One way to choose the number of epochs is to use early stopping. [Early stopping](http://deeplearning4j.org/earlystopping) can also help to prevent the neural network from overfitting (i.e., can help the net generalize better to unseen data).
-
-## <a name="lrate">Learning Rate</a>
-
-The learning rate is one of, if not the most important hyperparameter. If this is too large or too small, your network may learn very poorly, very slowly, or not at all. Typical values for the learning rate are in the range of 0.1 to 1e-6, though the optimal learning rate is usually data (and network architecture) specific. Some simple advice is to start by trying three different learning rates – 1e-1, 1e-3, and 1e-6 – to get a rough idea of what it should be, before further tuning this. Ideally, they run models with different learning rates simultaneously to save time.
-
-The usual approach to selecting an appropriate learning rate is to use [DL4J's visualization interface](http://deeplearning4j.org/visualization) to visualize the progress of training. You want to pay attention to both the loss over time, and the ratio of update magnitudes to parameter magnitudes (a ratio of approximately 1:1000 is a good place to start). For more information on tuning the learning rate, see [this link](http://cs231n.github.io/neural-networks-3/#baby).
-
-For training neural networks in a distributed manner, you may need a different (frequently higher) learning rate compared to training the same network on a single machine.
-
-### Policies and Scheduling
-
-You can optionally define a learning rate policy for your neural network. A policy will change the learning rate over time, achieving better results since the learning rate can "slow down" to find closer local minima for convergence. A common policy used is scheduling. See the [LeNet example](https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/convolution/LenetMnistExample.java) for a learning rate schedule used in practice.
-
-Note that if you're using multiple GPUs, this will affect your scheduling. For example, if you have 2x GPUs, then you will need to divide the iterations in your schedule by 2, since the throughput of your training process will be double, and the learning rate schedule is only applicable to the local GPU.
-
-## <a name="activation">Activation Function</a>
-
-There are two aspects to be aware of, with regard to the choice of activation function.
-
-First, the activation function of the hidden (non-output) layers. As a general rule, 'relu' or 'leakyrelu' activations are good choices for this. Some other activation functions (tanh, sigmoid, etc) are more prone to vanishing gradient problems, which can make learning much harder in deep neural networks. However, for LSTM layers, the tanh activation function is still commonly used.
-
-Second, regarding the activation function for the output layer: this is usually application specific. For classification problems, you generally want to use the softmax activation function, combined with the negative log likelihood / MCXENT (multi-class cross entropy). The softmax activation function gives you a probability distribution over classes (i.e., outputs sum to 1.0). For regression problems, the "identity" activation function is frequently a good choice, in conjunction with the MSE (mean squared error) loss function.
-
-## <a name="loss">Loss Function</a>
-
-Loss functions for each neural network layer can either be used in pretraining, to learn better weights, or in classification (on the output layer) for achieving some result. (In the example above, classification happens in the override section.)
-
-Your net's purpose will determine the loss function you use. For pretraining, choose reconstruction entropy. For classification, use multiclass cross entropy.
-
-## <a name="regularization">Regularization</a>
-
-Regularization methods can help to avoid overfitting during training. Overfitting occurs when the network predicts the training set very well, but makes poor predictions on data the network has never seen. One way to think about overfitting is that the network memorizes the training data (instead of learning the general relationships in it).
-
-Common types of regularization include:
-
- l1 and l2 regularization penalizes large network weights, and avoids weights becoming too large. Some level of l2 regularization is commonly used in practice. However, note that if the l1 or l2 regularization coefficients are too high, they may over-penalize the network, and stop it from learning. Common values for l2 regularization are 1e-3 to 1e-6.
- [Dropout](./glossary.html#dropout), is a frequently used regularization method can be very effective. Dropout is most commoly used with a dropout rate of 0.5.
- Dropconnect (conceptually similar to dropout, but used much less frequently)
- Restricting the total number of network size (i.e., limit the number of layers and size of each layer)
- [Early stopping](http://deeplearning4j.org/earlystopping)
-
-To use l1/l2/dropout regularization, use .regularization(true) followed by .l1(x), .l2(y), .dropout(z) respectively. Note that z in dropout(z) is the probability of retaining an activation.
-
-## <a name="minibatch">Minibatch Size</a>
-
-A minibatch refers to the number of examples used at a time, when computing gradients and parameter updates. In practice (for all but the smallest data sets), it is standard to break your data set up into a number of minibatches.
-
-The ideal minibatch size will vary. For example, a minibatch size of 10 is frequently too small for GPUs, but can work on CPUs. A minibatch size of 1 will allow a network to train, but will not reap the benefits of parallelism. 32 may be a sensible starting point to try, with minibatches in the range of 16-128 (sometimes smaller or larger, depending on the application and type of network) being common.
-
-## <a name="updater">Updater and Optimization Algorithm</a>
-
-In DL4J, the term 'updater' refers to training mechanisms such as momentum, RMSProp, adagrad, and others. Using one of these methods can result in much faster network training companed to 'vanilla' stochastic gradient descent. You can set the updater using the .updater(Updater) configuration option.
-
-The optimization algorithm is how updates are made, given the gradient. The simplest (and most commonly used) method is stochastic gradient descent (SGD), however DL4J also provides SGD with line search, conjugate gradient and LBFGS optimization algorithms. These latter algorithms are more powerful compared to SGD, but considerably more costly per parameter update due to a line search component, and aren't used as much in practice. Note that you can in principle combine any updater with any optimization algorithm.
-
-A good default choice in most cases is to use the stochastic gradient descent optimization algorithm combined with one of the momentum/rmsprop/adagrad updaters, with momentum frequently being used in practice. Note that for momentum, the updater is called NESTEROVS (a reference to the Nesterovs variant of momentum), and the momentum rate can be set by the .momentum(double) option.
-
-## <a name="normalization">Gradient Normalization</a>
-
-When training a neural network, it can sometimes be helpful to apply gradient normalization, to avoid the gradients being too large (the so-called exploding gradient problem, common in recurrent neural networks) or too small. This can be applied using the .gradientNormalization(GradientNormalization) and .gradientNormalizationThreshould(double) methods. For an example of gradient normalization see, [GradientNormalization.java](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/GradientNormalization.java). The test code for that example is [here](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-core/src/test/java/org/deeplearning4j/nn/updater/TestGradientNormalization.java).
-
-## <a name="rnn">Recurrent Neural Networks: Truncated Backpropagation through Time</a>
-
-When training recurrent networks with long time series, it is generally advisable to use truncated backpropagation through time. With 'standard' backpropagation through time (the default in DL4J) the cost per parameter update can become prohibative. For more details, see [this page](http://deeplearning4j.org/usingrnns) and [this glossary entry](./glossary.html#backprop).
-
-## <a name="dbn">Visible/Hidden Unit</a>
-
-When using a deep-belief network, pay close attention here. An RBM (the component of the DBN used for feature extraction) is stochastic and will sample from different probability distributions relative to the visible or hidden units specified.
-
-See Geoff Hinton's definitive work, [A Practical Guide to Training Restricted Boltzmann Machines](https://www.cs.toronto.edu/~hinton/absps/guideTR.pdf), for a list of all of the different probability distributions.
-
-## <a name="rbm">Restricted Boltzmann Machines (RBMs)</a>
-
-When creating hidden layers for autoencoders that perform compression, give them fewer neurons than your input data. If the hidden-layer nodes are too close to the number of input nodes, you risk reconstructing the identity function. Too many hidden-layer neurons increase the likelihood of noise and overfitting. For an input layer of 784, you might choose an initial hidden layer of 500, and a second hidden layer of 250. No hidden layer should be less than a quarter of the input layer’s nodes. And the output layer will simply be the number of labels.
-
-Larger datasets require more hidden layers. Facebook’s Deep Face uses nine hidden layers on what we can only presume to be an immense corpus. Many smaller datasets might only require three or four hidden layers, with their accuracy decreasing beyond that depth. As a rule: larger data sets contain more variation, which require more features/neurons for the net to obtain accurate results. Typical machine learning, of course, has one hidden layer, and those shallow nets are called Perceptrons.
-
-Large datasets require that you pretrain your RBM several times. Only with multiple pretrainings will the algorithm learn to correctly weight features in the context of the dataset. That said, you can run the data in parallel or through a cluster to speed up the pretraining.
-
-## <a name="">NaN, Not a Number Errors</a>
-
-Q. Why is my Neural Network throwing nan values? 
-
-A. Backpropagation involves the multiplication of very small gradients, due to limited precision when representing real numbers values very close to zero can not be represented. The term for this issue is Arithmetic Underflow. If your Neural Network is throwing nan's then the solution is to retune your network to avoid the very small gradients. This is more likely an issue with deeper Neural Networks. 
-
-You can try using double data type but it's usually recommended to retune the net first.
-
-Following the basic tuning tips and monitoring the results is the way to ensure NAN doesn't show up anymore.
--- a/docs/doc_generator.py
+++ b/docs/doc_generator.py
@ -1,292 +0,0 @@
-# -*- coding: utf-8 -*-
-
-################################################################################
-# Copyright (c) 2015-2019 Skymind, Inc.
-#
-# This program and the accompanying materials are made available under the
-# terms of the Apache License, Version 2.0 which is available at
-# https://www.apache.org/licenses/LICENSE-2.0.
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
-# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
-# License for the specific language governing permissions and limitations
-# under the License.
-#
-# SPDX-License-Identifier: Apache-2.0
-################################################################################
-
-import abc
-import re
-import os
-import shutil
-import json
-import sys
-
-
-"""Abstract base class for document generators. Implementations for various programming languages
-need to implement the following six methods:
-
- process_main_docstring
- process_docstring
- render
- get_main_doc_string
- get_constructor_data
- get_public_method_data
-"""
-class BaseDocumentationGenerator:
-
-    __metaclass__ = abc.ABCMeta
-
-    def __init__(self, args):
-        reload(sys)
-        sys.setdefaultencoding('utf8')
-
-        self.out_language = args.out_language
-        self.template_dir = args.templates if self.out_language == 'en' else args.templates + '_' + self.out_language
-        self.project_name = args.project + '/'
-        self.validate_templates()
-
-        self.target_dir = args.sources if self.out_language == 'en' else args.sources + '_' + self.out_language
-        self.language = args.language
-        self.docs_root = args.docs_root
-        self.source_code_path = args.code
-        self.github_root = ('https://github.com/deeplearning4j/deeplearning4j/tree/master/'
-                            + self.source_code_path[3:])
-
-        with open(self.project_name + 'pages.json', 'r') as f:
-            json_pages = f.read()
-        site = json.loads(json_pages)
-        self.pages = site.get('pages', [])
-        self.indices = site.get('indices', [])
-        self.excludes = site.get('excludes', [])
-
-    """Process top class docstring
-    """
-    @abc.abstractmethod
-    def process_main_docstring(self, doc_string):
-        raise NotImplementedError
-
-    """Process method and other docstrings
-    """
-    @abc.abstractmethod
-    def process_docstring(self, doc_string):
-        raise NotImplementedError
-
-    """Takes unformatted signatures and doc strings and returns a properly
-    rendered piece that fits into our markdown layout.
-    """
-    @abc.abstractmethod
-    def render(self, signature, doc_string, class_name, is_method):
-        raise NotImplementedError
-
-
-    """Returns main doc string of class/object in question.
-    """
-    @abc.abstractmethod
-    def get_main_doc_string(self, class_string, class_name):
-        raise NotImplementedError
-
-
-    """Returns doc string and signature data for constructors.
-    """
-    @abc.abstractmethod
-    def get_constructor_data(self, class_string, class_name, use_contructor):
-        raise NotImplementedError
-
-
-    """Returns doc string and signature data for methods
-    in the public API of an object
-    """
-    @abc.abstractmethod
-    def get_public_method_data(self, class_string, includes, excludes):
-        raise NotImplementedError
-
-
-    """Validate language templates
-    """
-    def validate_templates(self):
-        assert os.path.exists(self.project_name + self.template_dir), \
-            'No template folder for language ' + self.out_language
-        # TODO: check if folder structure for 'templates' and 'templates_XX' aligns
-        # TODO: do additional sanity checks to assure different languages are in sync
-
-    """Generate links within documentation.
-    """
-    def class_to_docs_link(self, module_name, class_name):
-        return self.docs_root + module_name.replace('.', '/') + '#' + class_name
-
-    """Generate links to source code.
-    """
-    def class_to_source_link(self, module_name, cls_name):
-        return '[[source]](' + self.github_root + module_name + '/' + cls_name + '.' + self.language + ')'
-
-    """Returns code string as markdown snippet of the respective language.
-    """
-    def to_code_snippet(self, code):
-        return '```' + self.language + '\n' + code + '\n```\n'
-
-    """Returns source code of a class in a module as string.
-    """
-    def inspect_class_string(self, module, cls):
-        return self.read_file(self.source_code_path + module + '/' + cls)
-
-    """Searches for file names within a module to generate an index. The result
-    of this is used to create index.md files for each module in question so as
-    to easily navigate documentation.
-    """
-    def read_index_data(self, data):
-        module_index = data.get('module_index', "")
-        modules = os.listdir(self.project_name + self.target_dir + '/' + module_index)
-        modules = [mod.replace('.md', '') for mod in modules if mod != 'index.md']
-        index_string = ''.join('- [{}](./{})\n'.format(mod.title().replace('-', ' '), mod) for mod in modules if mod)
-        print(index_string)
-        return ['', index_string]
-
-
-    """Grabs page data for each class and allows for iteration in modules and specific classes.
-    """
-    def organize_page_data(self, module, cls, tag, use_constructors, includes, excludes):
-        class_string = self.inspect_class_string(module, cls)
-        class_string = self.get_tag_data(class_string, tag)
-        class_string = class_string.replace('<p>', '').replace('</p>', '')
-        class_name = cls.replace('.' + self.language, '')
-        doc_string, class_string = self.get_main_doc_string(class_string, class_name)
-        constructors, class_string = self.get_constructor_data(class_string, class_name, use_constructors)
-        methods = self.get_public_method_data(class_string, includes, excludes)
-        return module, class_name, doc_string, constructors, methods
-
-
-    """Main workhorse of this script. Inspects source files per class or module and reads
-            - class names
-            - doc strings of classes / objects
-            - doc strings and signatures of methods
-            - doc strings and signatures of methods
-    Values are returned as nested list, picked up in the main program to write documentation blocks.      
-    """
-    def read_page_data(self, data):
-        if data.get('module_index', ""):  # indices are created after pages
-            return []
-        page_data = []
-        classes = []
-
-        includes = data.get('include', [])
-        excludes = data.get('exclude', [])
-
-        use_constructors = data.get('constructors', True)
-        tag = data.get('autogen_tag', '')
-
-        modules = data.get('module', "")
-        if modules:
-            for module in modules:
-                module_files = os.listdir(self.source_code_path + module)
-                print(module_files)
-                for cls in module_files:
-                    if '.' in cls:
-                        module, class_name, doc_string, constructors, methods = self.organize_page_data(module, cls, tag, use_constructors, includes, excludes)
-                        page_data.append([module, class_name, doc_string, constructors, methods])
-
-
-        class_files = data.get('class', "")
-        if class_files:
-            for cls in class_files:
-                classes.append(cls)
-
-        for cls in sorted(classes):
-            module = ""
-            module, class_name, doc_string, constructors, methods = self.organize_page_data(module, cls, tag, use_constructors, includes, excludes)
-            page_data.append([module, class_name, doc_string, constructors, methods])
-
-        return page_data
-
-    """If a tag is present in a source code string, extract everything between
-    tag::<tag>::start and tag::<tag>::end.
-    """
-    def get_tag_data(self, class_string, tag):
-        start_tag = r'tag::' + tag + '::start'
-        end_tag = r'tag::' + tag + '::end'
-        if not tag:
-            return class_string
-        elif tag and start_tag in class_string and end_tag not in class_string:
-            print("Warning: Start tag, but no end tag found for tag: ", tag)
-        elif tag and start_tag in class_string and end_tag not in class_string:
-            print("Warning: End tag, but no start tag found for tag: ", tag)
-        else:
-            start = re.search(start_tag, class_string)
-            end = re.search(end_tag, class_string)
-            return class_string[start.end():end.start()]
-
-    """Before generating new docs into target folder, clean up old files. 
-    """
-    def clean_target(self):
-        if os.path.exists(self.project_name + self.target_dir):
-            shutil.rmtree(self.project_name + self.target_dir)
-
-        for subdir, dirs, file_names in os.walk(self.project_name + self.template_dir):
-            for file_name in file_names:
-                new_subdir = subdir.replace(self.project_name + self.template_dir, self.project_name + self.target_dir)
-                if not os.path.exists(new_subdir):
-                    os.makedirs(new_subdir)
-                if file_name[-3:] == '.md':
-                    file_path = os.path.join(subdir, file_name)
-                    new_file_path = self.project_name + self.target_dir + '/' + self.project_name.replace('/','') + '-' + file_name
-                    # print(new_file_path)
-                    shutil.copy(file_path, new_file_path)
-
-
-    """Given a file path, read content and return string value.
-    """
-    def read_file(self, path):
-        with open(path) as f:
-            return f.read()
-
-
-    """Create main index.md page for a project by parsing README.md
-    and appending it to the template version of index.md
-    """
-    def create_index_page(self):
-        readme = self.read_file(self.project_name + 'README.md')
-        index = self.read_file(self.project_name + self.template_dir + '/index.md')
-        # if readme has a '##' tag, append it to index
-        index = index.replace('{{autogenerated}}', readme[readme.find('##'):])
-        with open(self.project_name + self.target_dir + '/index.md', 'w') as f:
-            f.write(index)
-
-
-    """Write blocks of content (arrays of strings) as markdown to
-    the file name provided in page_data.
-    """
-    def write_content(self, blocks, page_data):
-        #assert blocks, 'No content for page ' + page_data['page'] # unsure if necessary
-
-        markdown = '\n\n\n'.join(blocks)
-        exp_name = self.project_name.replace('/','') + '-' + page_data['page']
-        path = os.path.join(self.project_name + self.target_dir, exp_name)
-
-        if os.path.exists(path):
-            template = self.read_file(path)
-            #assert '{{autogenerated}}' in template, 'Template found for {} but missing {{autogenerated}} tag.'.format(path) # unsure if needed
-            markdown = template.replace('{{autogenerated}}', markdown)
-        print('Auto-generating docs for {}'.format(path))
-        markdown = markdown
-        subdir = os.path.dirname(path)
-        if not os.path.exists(subdir):
-            os.makedirs(subdir)
-        with open(path, 'w') as f:
-            f.write(markdown)
-
-
-    """Prepend headers for jekyll, i.e. provide "default" layout and a
-    title for the post.
-    """
-    def prepend_headers(self):
-        for subdir, dirs, file_names in os.walk(self.project_name + self.target_dir):
-            for file_name in file_names:
-                if file_name[-3:] == '.md':
-                    file_path = os.path.join(subdir, file_name)
-                    header = '---\ntitle: {}\n---\n'.format(file_name.replace('.md', ''))
-                    with open(file_path, 'r+') as f:
-                        content = f.read()
-                        f.seek(0, 0)
-                        if not content.startswith('---'):
-                            f.write(header.rstrip('\r\n') + '\n' + content)
--- a/docs/gen_all_docs.sh
+++ b/docs/gen_all_docs.sh
@ -1,92 +0,0 @@
-#!/usr/bin/env bash
-set -eu
-
-################################################################################
-# Copyright (c) 2015-2018 Skymind, Inc.
-#
-# This program and the accompanying materials are made available under the
-# terms of the Apache License, Version 2.0 which is available at
-# https://www.apache.org/licenses/LICENSE-2.0.
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
-# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
-# License for the specific language governing permissions and limitations
-# under the License.
-#
-# SPDX-License-Identifier: Apache-2.0
-################################################################################
-
-
-python generate_docs.py \
-    --project deeplearning4j \
-    --language java \
-    --code ../deeplearning4j \
-    --out_language en
-
-python generate_docs.py \
-    --project deeplearning4j-nn \
-    --language java \
-    --code ../deeplearning4j \
-    --out_language en
-
-python generate_docs.py \
-    --project deeplearning4j-nlp \
-    --language java \
-    --code ../deeplearning4j \
-    --out_language en
-
-python generate_docs.py \
-    --project deeplearning4j-scaleout \
-    --language java \
-    --code ../deeplearning4j \
-    --out_language en
-
-python generate_docs.py \
-    --project deeplearning4j-zoo \
-    --language java \
-    --code ../deeplearning4j \
-    --out_language en
-
-python generate_docs.py \
-    --project datavec \
-    --language java \
-    --code ../datavec \
-    --out_language en
-
-python generate_docs.py \
-    --project nd4j \
-    --language java \
-    --code ../nd4j \
-    --out_language en
-
-python generate_docs.py \
-    --project nd4j-nn \
-    --language java \
-    --code ../nd4j \
-    --out_language en
-
-python generate_docs.py \
-    --project arbiter \
-    --language java \
-    --code ../arbiter \
-    --out_language en
-
-python generate_docs.py \
-    --project keras-import \
-    --language java \
-    --code ../deeplearning4j/deeplearning4j-modelimport/src/main/java/org/deeplearning4j/nn/modelimport/keras/ \
-    --docs_root deeplarning4j.org/keras \
-    --out_language en
-
-# python generate_docs.py \
-#     --project scalnet \
-#     --language scala \
-#     --code ../scalnet/src/main/scala/org/deeplearning4j/scalnet/ \
-#     --docs_root deeplarning4j.org/scalnet
-
-# python generate_docs.py \
-#     --project samediff \
-#     --language java \
-#     --code ../nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/autodiff/ \
-#     --docs_root deeplarning4j.org/samediff
--- a/docs/generate_docs.py
+++ b/docs/generate_docs.py
@ -1,93 +0,0 @@
-# -*- coding: utf-8 -*-
-
-################################################################################
-# Copyright (c) 2015-2018 Skymind, Inc.
-#
-# This program and the accompanying materials are made available under the
-# terms of the Apache License, Version 2.0 which is available at
-# https://www.apache.org/licenses/LICENSE-2.0.
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
-# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
-# License for the specific language governing permissions and limitations
-# under the License.
-#
-# SPDX-License-Identifier: Apache-2.0
-################################################################################
-
-import argparse
-from java_doc import JavaDocumentationGenerator
-from python_doc import PythonDocumentationGenerator
-from scala_doc import ScalaDocumentationGenerator
-
-SUPPORTED_LANGUAGES = ["java", "scala", "python"]
-
-if __name__ == '__main__':
-
-    parser = argparse.ArgumentParser()
-    parser.add_argument('--project', '-p', type=str, required=True)  # e.g. keras-import
-    parser.add_argument('--code', '-c', type=str, required=True)  # relative path to source code for this project
-
-    parser.add_argument('--language', '-l', type=str, required=False, default='java')
-    parser.add_argument('--docs_root', '-d', type=str, required=False, default='http://deeplearning4j.org')
-    parser.add_argument('--templates', '-t', type=str, required=False, default='templates')
-    parser.add_argument('--sources', '-s', type=str, required=False, default='doc_sources')
-    parser.add_argument('--out_language', '-o', type=str, required=False, default='en')
-
-    args = parser.parse_args()
-
-    language = args.language
-    if language not in SUPPORTED_LANGUAGES:
-        raise ValueError("provided language not supported: {}".format(language))
-
-    if language == "python":
-        doc_generator = PythonDocumentationGenerator(args)
-    elif language == "scala":
-        doc_generator = ScalaDocumentationGenerator(args)
-    else:
-        doc_generator = JavaDocumentationGenerator(args)
-
-    doc_generator.clean_target()
-    #doc_generator.create_index_page() # not necessary for now
-
-    for page_data in doc_generator.pages:
-        data = doc_generator.read_page_data(page_data)
-        blocks = []
-        for module_name, class_name, doc_string, constructors, methods in data:
-            class_string = doc_generator.inspect_class_string(module_name, class_name + '.' + doc_generator.language)
-            # skip class if it contains any exclude keywords
-            if not any(ex in class_string for ex in doc_generator.excludes):
-                sub_blocks = []
-                link = doc_generator.class_to_source_link(module_name, class_name)
-                try:
-                    class_name = class_name.rsplit('/',1)[1]
-                except:
-                    print('Skipping split on '+class_name)
-                # if module_name:
-                #     sub_blocks.append('### {}'.format(module_name))
-                #     sub_blocks.append('<span style="float:right;"> {} </span>\n'.format(link))
-
-                if doc_string:
-                    sub_blocks.append('\n---\n')
-                    sub_blocks.append('### {}'.format(class_name))
-                    sub_blocks.append('<span style="float:right;"> {} </span>\n'.format(link))
-                    sub_blocks.append(doc_string)
-
-                if constructors:
-                    sub_blocks.append("".join([doc_generator.render(cs, cd, class_name, False) for (cs, cd) in constructors]))
-
-                if methods:
-                    # sub_blocks.append('<button class="btn btn-primary" type="button" data-toggle="collapse" data-target="#'+class_name+'" aria-expanded="false" aria-controls="'+class_name+'">Show methods</button>')
-                    # sub_blocks.append('<div class="collapse" id="'+class_name+'"><div class="card card-body">\n')
-                    sub_blocks.append("".join([doc_generator.render(ms, md, class_name, True) for (ms, md) in methods]))
-                    # sub_blocks.append('</div></div>')                   
-                blocks.append('\n'.join(sub_blocks))
-
-        doc_generator.write_content(blocks, page_data)
-
-    for index_data in doc_generator.indices:
-        index = doc_generator.read_index_data(index_data)
-        doc_generator.write_content(index, index_data)
-
-doc_generator.prepend_headers()
--- a/docs/java_doc.py
+++ b/docs/java_doc.py
@ -1,130 +0,0 @@
-# -*- coding: utf-8 -*-
-
-################################################################################
-# Copyright (c) 2015-2018 Skymind, Inc.
-#
-# This program and the accompanying materials are made available under the
-# terms of the Apache License, Version 2.0 which is available at
-# https://www.apache.org/licenses/LICENSE-2.0.
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
-# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
-# License for the specific language governing permissions and limitations
-# under the License.
-#
-# SPDX-License-Identifier: Apache-2.0
-################################################################################
-
-import re
-import sys
-from doc_generator import BaseDocumentationGenerator
-
-
-class JavaDocumentationGenerator(BaseDocumentationGenerator):
-
-    def __init__(self, args):
-        reload(sys)
-        sys.setdefaultencoding('utf8')
-
-        super(JavaDocumentationGenerator, self).__init__(args)
-
-    '''Doc strings (in Java/Scala) need to be stripped of all '*' values.
-    Convert '@param' to '- param'. Strip line with author as well.
-    
-    TODO can be vastly improved.
-    '''
-    def process_main_docstring(self, doc_string):
-        lines = doc_string.split('\n')
-        doc = [line.replace('*', '').lstrip(' ').rstrip('/') for line in lines[1:-1] if not '@' in line]
-        return '\n'.join(doc)
-
-
-    '''Doc strings (in Java/Scala) need to be stripped of all '*' values.
-    Convert '@param' to '- param'. TODO can be vastly improved.
-    '''
-    def process_docstring(self, doc_string):
-        lines = doc_string.split('\n')
-        doc = [line.replace('*', '').lstrip(' ').replace('@', '- ') for line in lines]
-        return '\n'.join(doc)
-
-
-    '''Takes unformatted signatures and doc strings and returns a properly
-    rendered piece that fits into our markdown layout.
-    '''
-    def render(self, signature, doc_string, class_name, is_method):
-        if is_method:  # Method name from signature
-            method_regex = r'public (?:static )?[a-zA-Z0-9]* ([a-zA-Z0-9]*)\('
-            name = re.findall(method_regex, signature)[0]
-        else:  # Constructor takes class name
-            name = class_name
-        sub_blocks = ['##### {} \n{}'.format(name, self.to_code_snippet(signature))]
-        if doc_string:
-            sub_blocks.append(doc_string + '\n')
-        return '\n\n'.join(sub_blocks)
-
-
-    '''Returns main doc string of class/object in question.
-    '''
-    def get_main_doc_string(self, class_string, class_name):
-        print(class_name)
-        doc_regex = r'\/\*\*\n([\S\s]*?.*)\*\/\n'  # match "/** ... */" at the top
-        doc_string = re.search(doc_regex, class_string)
-        try:
-            doc_match = doc_string.group();
-        except:
-            doc_match = ''
-        doc = self.process_main_docstring(doc_match)
-        if not doc_string:
-            print('Warning, no doc string found for class {}'.format(class_name))
-        doc_index = 0 if not doc_match else doc_string.end()
-        return doc, class_string[doc_index:]
-
-
-    '''Returns doc string and signature data for constructors.
-    '''
-    def get_constructor_data(self, class_string, class_name, use_contructor):
-        constructors = []
-        if 'public ' + class_name in class_string and use_contructor:
-            doc_regex = r'\/\*\*\n([\S\s]*?.*)\*\/\n[\S\s]*?(public ' \
-                        + class_name + '.[\S\s]*?){'
-            result = re.search(doc_regex, class_string)
-            if result:
-                doc_string, signature = result.groups()
-                doc = self.process_docstring(doc_string)
-                class_string = class_string[result.end():]
-                constructors.append((signature, doc))
-            else:
-                print("Warning, no doc string found for constructor {}".format(class_name))
-        return constructors, class_string
-
-
-    '''Returns doc string and signature data for methods
-    in the public API of an object
-    '''
-    def get_public_method_data(self, class_string, includes, excludes):
-        method_regex = r'public (?:static )?[a-zA-Z0-9]* ([a-zA-Z0-9]*)\('
-
-        # Either use all methods or use include methods that can be found
-        method_strings = re.findall(method_regex, class_string)
-        if includes:
-            method_strings = [i for i in includes if i in method_strings]
-
-        # Exclude all 'exclude' methods
-        method_strings = [m for m in method_strings if m not in excludes]
-
-        methods = []
-        for method in method_strings:
-            # print("Processing doc string for method {}".format(method))
-            doc_regex = r'\/\*\*\n([\S\s]*?.*)\*\/\n[\S\s]*?' + \
-                        '(public (?:static )?[a-zA-Z0-9]* ' + method + '[\S\s]*?){'
-            # TODO: this will sometimes run forever. fix regex
-            result = re.search(doc_regex, class_string)
-            if result:
-                doc_string, signature = result.groups()
-                doc = self.process_docstring(doc_string)
-                class_string = class_string[result.end():]
-                methods.append((signature, doc))
-            else:
-                print("Warning, no doc string found for method {}".format(method))
-        return methods
--- a/docs/keras-import/README.md
+++ b/docs/keras-import/README.md
@ -1,10 +0,0 @@
-# DL4J Keras model-import documentation
-
-To generate docs into the`keras-import/doc_sources` folder, run
-
-```
-python generate_docs.py \
-    --project keras-import \
-    --code ../deeplearning4j/deeplearning4j-modelimport/src/main/java/org/deeplearning4j/nn/modelimport/keras/
-
-```
--- a/docs/keras-import/mkdocs.yml
+++ b/docs/keras-import/mkdocs.yml
@ -1,37 +0,0 @@
-site_name: DL4J Keras model-import documentation
-theme: readthedocs
-docs_dir: doc_sources
-repo_url: https://github.com/deeplearning4j/deeplearning4j
-site_url: http://deeplearning4j.org
-site_description: 'DL4J Keras model-import documentation'
-
-dev_addr: '0.0.0.0:8000'
-
-pages:
- Home: index.md
- Overview of supported features: supported-features.md
- Getting started:
-  - Guide to KerasSequentialModel: getting-started/keras-sequential-guide.md
-  - Guide to KerasModel: getting-started/keras-model-guide.md
- Models:
-  - KerasSequentialModel: models/sequential.md
-  - KerasModel (functional API): models/model.md
- Layers:
-  - About Keras import layers: layers/about-importing-layers.md
-  - Core Layers: layers/core.md
-  - Convolutional Layers: layers/convolutional.md
-  - Pooling Layers: layers/pooling.md
-  - Recurrent Layers: layers/recurrent.md
-  - Embedding Layers: layers/embeddings.md
-  - Advanced Activations Layers: layers/advanced-activations.md
-  - Normalization Layers: layers/normalization.md
-  - Noise layers: layers/noise.md
-  - Layer wrappers: layers/wrappers.md
-  - Writing custom import layers: layers/writing-custom-import-layers.md
- Losses: losses.md
- Optimizers: optimizers.md
- Activations: activations.md
- Backend: backend.md
- Initializers: initializers.md
- Regularizers: regularizers.md
- Constraints: constraints.md
--- a/docs/keras-import/pages.json
+++ b/docs/keras-import/pages.json
@ -1,68 +0,0 @@
-{
-  "excludes": [
-    "abstract"
-  ],
-  "indices": [
-  ],
-  "pages": [
-    {
-      "page": "model-import.md",
-      "class": [
-        "KerasModelImport.java"
-      ]
-    },
-    {
-      "page": "model-sequential.md",
-      "class": [
-        "KerasSequentialModel.java"
-      ]
-    },
-    {
-      "page": "model-functional.md",
-      "class": [
-        "KerasModel.java"
-      ]
-    },
-    {
-      "page": "layers-core.md",
-      "module": ["layers/core"]
-    },
-    {
-      "page": "layers-convolutional.md",
-      "module": ["layers/convolutional"]
-    },
-    {
-      "page": "layers-pooling.md",
-      "module": ["layers/pooling"]
-    },
-    {
-      "page": "layers-local.md",
-      "module": ["layers/local"]
-    },
-    {
-      "page": "layers-recurrent.md",
-      "module": ["layers/recurrent"]
-    },
-    {
-      "page": "layers-embeddings.md",
-      "module": ["layers/embeddings"]
-    },
-    {
-      "page": "layers-normalization.md",
-      "module": ["layers/normalization"]
-    },
-    {
-      "page": "layers-advanced-activations.md",
-      "module": ["layers/advanced/activations"]
-    },
-    {
-      "page": "layers-noise.md",
-      "module": ["layers/noise"]
-    },
-    {
-      "page": "layers-wrappers.md",
-      "module": ["layers/wrappers"]
-    }
-  ]
-}
-
--- a/docs/keras-import/templates/activations.md
+++ b/docs/keras-import/templates/activations.md
@ -1,24 +0,0 @@
---
-title: Keras Activations
-short_title: Activations
-description: Supported Keras activations.
-category: Keras Import
-weight: 4
---
-
-## Available activations
-
-We support all [Keras activation functions](https://keras.io/activations), namely:
-
-* <i class="fa fa-check-square-o"></i> softmax
-* <i class="fa fa-check-square-o"></i> elu
-* <i class="fa fa-check-square-o"></i> selu
-* <i class="fa fa-check-square-o"></i> softplus
-* <i class="fa fa-check-square-o"></i> softsign
-* <i class="fa fa-check-square-o"></i> relu
-* <i class="fa fa-check-square-o"></i> tanh
-* <i class="fa fa-check-square-o"></i> sigmoid
-* <i class="fa fa-check-square-o"></i> hard_sigmoid
-* <i class="fa fa-check-square-o"></i> linear
-
-The mapping of Keras to DL4J activation functions is defined in [KerasActivationUtils](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-modelimport/src/main/java/org/deeplearning4j/nn/modelimport/keras/utils/KerasActivationUtils.java)
--- a/docs/keras-import/templates/backend.md
+++ b/docs/keras-import/templates/backend.md
@ -1,12 +0,0 @@
---
-title: Keras Backends
-short_title: Backends
-description: Supported Keras backends.
-category: Keras Import
-weight: 4
---
-
-## Supported Keras backends
-
-DL4J Keras model import is backend agnostic. No matter which backend you choose (TensorFlow, Theano, CNTK), your models
-can be imported into DL4J. 
--- a/docs/keras-import/templates/constraints.md
+++ b/docs/keras-import/templates/constraints.md
@ -1,18 +0,0 @@
---
-title: Keras Constraints
-short_title: Constraints
-description: Supported Keras constraints.
-category: Keras Import
-weight: 4
---
-
-## Supported constraints
-
-All [Keras constraints](https://keras.io/constraints) are supported:
-
-* <i class="fa fa-check-square-o"></i> max_norm
-* <i class="fa fa-check-square-o"></i> non_neg
-* <i class="fa fa-check-square-o"></i> unit_norm
-* <i class="fa fa-check-square-o"></i> min_max_norm
-
-Mapping Keras to DL4J constraints happens in [KerasConstraintUtils](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-modelimport/src/main/java/org/deeplearning4j/nn/modelimport/keras/utils/KerasConstraintUtils.java).
--- a/docs/keras-import/templates/get-started.md
+++ b/docs/keras-import/templates/get-started.md
@ -1,19 +0,0 @@
---
-title: Keras Model Import Get Started
-short_title: Get Started
-description: Getting started with model import.
-category: Keras Import
-weight: 1
---
-
-## Getting started with Keras model import
-
-Below is a [video tutorial](https://www.youtube.com/embed/bI1aR1Tj2DM) demonstrating 
-working code to load a Keras model into Deeplearning4j and validating the working network. 
-Instructor Tom Hanlon provides an overview of a simple classifier over Iris data built 
-in Keras with a Theano backend, and exported and loaded into Deeplearning4j:
-
-<iframe width="560" height="315" src="https://www.youtube.com/embed/bI1aR1Tj2DM" frameborder="0" allowfullscreen></iframe>
-
-If you have trouble viewing the video, please click here to 
-[view it on YouTube](https://www.youtube.com/embed/bI1aR1Tj2DM).
--- a/docs/keras-import/templates/initializers.md
+++ b/docs/keras-import/templates/initializers.md
@ -1,29 +0,0 @@
---
-title: Keras Initializers
-short_title: Initializers
-description: Supported Keras weight initializers.
-category: Keras Import
-weight: 4
---
-
-## Supported initializers
-
-DL4J supports all available [Keras initializers](https://keras.io/initializers), namely:
-
-* <i class="fa fa-check-square-o"></i> Zeros
-* <i class="fa fa-check-square-o"></i> Ones
-* <i class="fa fa-check-square-o"></i> Constant
-* <i class="fa fa-check-square-o"></i> RandomNormal
-* <i class="fa fa-check-square-o"></i> RandomUniform
-* <i class="fa fa-check-square-o"></i> TruncatedNormal
-* <i class="fa fa-check-square-o"></i> VarianceScaling
-* <i class="fa fa-check-square-o"></i> Orthogonal
-* <i class="fa fa-check-square-o"></i> Identity
-* <i class="fa fa-check-square-o"></i> lecun_uniform
-* <i class="fa fa-check-square-o"></i> lecun_normal
-* <i class="fa fa-check-square-o"></i> glorot_normal
-* <i class="fa fa-check-square-o"></i> glorot_uniform
-* <i class="fa fa-check-square-o"></i> he_normal
-* <i class="fa fa-check-square-o"></i> he_uniform
-
-The mapping of Keras to DL4J initializers can be found in [KerasInitilizationUtils](https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-modelimport/src/main/java/org/deeplearning4j/nn/modelimport/keras/utils/KerasInitilizationUtils.java).
--- a/docs/keras-import/templates/layers-advanced-activations.md
+++ b/docs/keras-import/templates/layers-advanced-activations.md
@ -1,11 +0,0 @@
---
-title: Keras Advanced Activations
-short_title: Advanced Activations
-description: Supported Keras advanced layer activations.
-category: Keras Import
-weight: 4
---
-
-## Keras advanced activations
-
-{{autogenerated}}
--- a/docs/keras-import/templates/layers-convolutional.md
+++ b/docs/keras-import/templates/layers-convolutional.md
@ -1,11 +0,0 @@
---
-title: Keras Import Convolutional Layers
-short_title: Convolutional Layers
-description: Supported Keras convolutional layers.
-category: Keras Import
-weight: 4
---
-
-## Keras layers
-
-{{autogenerated}}
--- a/docs/keras-import/templates/layers-core.md
+++ b/docs/keras-import/templates/layers-core.md
@ -1,11 +0,0 @@
---
-title: Keras Import Core Layers
-short_title: Core Layers
-description: Supported Keras layers.
-category: Keras Import
-weight: 4
---
-
-## Keras core layers
-
-{{autogenerated}}
--- a/Show More
+++ b/Show More