History

* Added declarations for decode/encode_bitmap ops.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added implementation for bitmap encoding/decoding ops.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added helpers for encode/decode bitmap ops.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored encodingBitmap helper.

Signed-off-by: shugeo <sgazeos@gmail.com>

* threshold encode/decode skeleton

* helper skeleton

* minor import fix

* encoder shape fn & op impl

* thresholdEncode cpu impl

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* thresholdDecode cpu impl

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* Only cosmetical changes.

Signed-off-by: shugeo <sgazeos@gmail.com>

* placeholder

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* Added cuda implementation for bitmap decode helper.

Signed-off-by: shugeo <sgazeos@gmail.com>

* cuda thresholdEstimate

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* cuda thresholdDecode

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* next step

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* - nano cmakelist update (get rid of Clion section)
- fixed forgotten throw in AtomicTests

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* thesholdEncode cuda impl

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* Added tests for bitmap encoding/decoding ops.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed tests for encode/decode bitmaps.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored decode/encode helpers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed crashes with bitmap decode/encode helpers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* bitmap encode/decode CPU

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* bitmap encode/decode CUDA

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* C API removed for threshold/bitmap encode

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* EncodeBitmap/DecodeBitmap Java side

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* EncodeThreshold/DecodeThreshold Java side

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* EncodeThreshold/DecodeThreshold Java side

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* few more tests for threshold encoding

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* minor test tweak

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* two special tests

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* encodeBitmap CPU fix

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* parallel_long/parallel_double proper spans fix

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* encodeThreshold CUDA fix

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* nano fix

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* grid tweaks

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* RTX adaptation for thresholdEncode

Signed-off-by: raver119 <raver119@gmail.com>

* don't allow threshold encoding for length < 2

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* get rid of NDArrayCompressor in EncodingHandler

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* one more minor update of EncodingHandler

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* one more minor tweak of EncodingHandler

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* - matmul allows integer data types use
- EncodingHandler boundary default value
- few tests for integer matmul

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* minor fix of CUDA bitmap encode

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* boundary changed to integer everywhere

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* boundary changed to integer everywhere

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* re-enable CUDA deallocator

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* threshold encoder fix for systems without omp

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* - encode_threshold now requires non-negative boundary
- minor tweak in EncodingHandler

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* restore parallelism in decode_bitmap

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* fall back to omp for encode_bitmap cpu

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* single time casts

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* - additional test for encode_threshold
- sync buffers to device before calling for shape function

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

Co-authored-by: shugeo <sgazeos@gmail.com>

2020-05-08 20:59:39 +03:00

auto_vectorization

auto-vectorization check for gcc (#172 )

2020-01-28 19:00:12 +03:00

blas

compression ops (#436 )

2020-05-08 20:59:39 +03:00

cmake

Bugfix failing builds (#341 )

2020-03-24 12:55:47 +11:00

include

compression ops (#436 )

2020-05-08 20:59:39 +03:00

minifier

libnd4j polishing (#273 )

2020-03-02 12:49:41 +03:00

msi

Eclipse Migration Initial Commit

2019-06-06 15:21:15 +03:00

packages

Eclipse Migration Initial Commit

2019-06-06 15:21:15 +03:00

profile

Eclipse Migration Initial Commit

2019-06-06 15:21:15 +03:00

server

libnd4j polishing (#273 )

2020-03-02 12:49:41 +03:00

tests_cpu

compression ops (#436 )

2020-05-08 20:59:39 +03:00

.gitignore

Eclipse Migration Initial Commit

2019-06-06 15:21:15 +03:00

AddingNewOps.md

Eclipse Migration Initial Commit

2019-06-06 15:21:15 +03:00

assembly-cuda.xml

Eclipse Migration Initial Commit

2019-06-06 15:21:15 +03:00

assembly.xml

Eclipse Migration Initial Commit

2019-06-06 15:21:15 +03:00

buildnativeoperations.sh

Bugfix failing builds (#341 )

2020-03-24 12:55:47 +11:00

cibuild.sh

Eclipse Migration Initial Commit

2019-06-06 15:21:15 +03:00

CMakeLists.txt

compression ops (#436 )

2020-05-08 20:59:39 +03:00

CMakeLists.txt.cpu_features.in

Platform helpers (#8216 )

2019-09-11 21:50:28 +03:00

CMakeLists.txt.in

roll back flatbuffers version

2020-01-31 15:57:55 +03:00

CMakeLists.txt.mkldnn.in

mkldnn version upgrade: v1.4

2020-04-20 08:19:03 +03:00

CMakeSettings.json

libnd4j polishing (#273 )

2020-03-02 12:49:41 +03:00

development.md

Eclipse Migration Initial Commit

2019-06-06 15:21:15 +03:00

flatproto.txt

Eclipse Migration Initial Commit

2019-06-06 15:21:15 +03:00

iOS.md

Eclipse Migration Initial Commit

2019-06-06 15:21:15 +03:00

LICENSE

Eclipse Migration Initial Commit

2019-06-06 15:21:15 +03:00

linuxOnPower.md

Update links to eclipse repos (#252 )

2019-09-10 19:09:46 +10:00

macOSx10 (CPU only).md

Update links to eclipse repos (#252 )

2019-09-10 19:09:46 +10:00

pom.xml

Bugfix failing builds (#341 )

2020-03-24 12:55:47 +11:00

proto.sh

Eclipse Migration Initial Commit

2019-06-06 15:21:15 +03:00

RaspberryPi.md

Update links to eclipse repos (#252 )

2019-09-10 19:09:46 +10:00

README.md

auto-vectorization check for gcc (#172 )

2020-01-28 19:00:12 +03:00

setuposx.sh

Eclipse Migration Initial Commit

2019-06-06 15:21:15 +03:00

UnderstandingGraph.md

Update links to eclipse repos (#252 )

2019-09-10 19:09:46 +10:00

windows.md

windows build instructions minor update

2020-04-30 18:13:09 +03:00

README.md

LibND4J

Native operations for nd4j. Build using cmake

Prerequisites

GCC 4.9+
CUDA 8.0 or 9.0 (if desired)
CMake 3.8 (as of Nov 2017, in near future will require 3.9)

Additional build arguments

There's few additional arguments for buildnativeoperations.sh script you could use:

 -a XXXXXXXX// shortcut for -march/-mtune, i.e. -a native
 -b release OR -b debug // enables/desables debug builds. release is considered by default
 -j XX // this argument defines how many threads will be used to binaries on your box. i.e. -j 8 
 -cc XX// CUDA-only argument, builds only binaries for target GPU architecture. use this for fast builds
 --check-vectorization  auto-vectorization report for developers. (Currently, only GCC is supported)

More about AutoVectorization report

You can find the compute capability for your card on the NVIDIA website here.

For example, a GTX 1080 has compute capability 6.1, for which you would use -cc 61 (note no decimal point).

OS Specific Requirements

Android

Download the NDK, extract it somewhere, and execute the following commands, replacing android-xxx with either android-arm or android-x86:

git clone https://github.com/deeplearning4j/libnd4j
git clone https://github.com/deeplearning4j/nd4j
export ANDROID_NDK=/path/to/android-ndk/
cd libnd4j
bash buildnativeoperations.sh -platform android-xxx
cd ../nd4j
mvn clean install -Djavacpp.platform=android-xxx -DskipTests -pl '!:nd4j-cuda-9.0,!:nd4j-cuda-9.0-platform,!:nd4j-tests'

OSX

Run ./setuposx.sh (Please ensure you have brew installed)

See macOSx10 CPU only.md

Linux

Depends on the distro - ask in the earlyadopters channel for specifics on distro

Ubuntu Linux 15.10

wget http://developer.download.nvidia.com/compute/cuda/7.5/Prod/local_installers/cuda-repo-ubuntu1504-7-5-local_7.5-18_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1504-7-5-local_7.5-18_amd64.deb
sudo apt-get update
sudo apt-get install cuda
sudo apt-get install cmake
sudo apt-get install gcc-4.9
sudo apt-get install g++-4.9
sudo apt-get install git
git clone https://github.com/deeplearning4j/libnd4j
cd libnd4j/
export LIBND4J_HOME=~/libnd4j/
sudo rm /usr/bin/gcc
sudo rm /usr/bin/g++
sudo ln -s /usr/bin/gcc-4.9 /usr/bin/gcc
sudo ln -s /usr/bin/g++-4.9 /usr/bin/g++
./buildnativeoperations.sh
./buildnativeoperations.sh -c cuda -сс YOUR_DEVICE_ARCH

Ubuntu Linux 16.04

sudo apt install cmake
sudo apt install nvidia-cuda-dev nvidia-cuda-toolkit nvidia-361
export TRICK_NVCC=YES
./buildnativeoperations.sh
./buildnativeoperations.sh -c cuda -сс YOUR_DEVICE_ARCH

The standard development headers are needed.

CentOS 6

yum install centos-release-scl-rh epel-release
yum install devtoolset-3-toolchain maven30 cmake3 git
scl enable devtoolset-3 maven30 bash
./buildnativeoperations.sh
./buildnativeoperations.sh -c cuda -сс YOUR_DEVICE_ARCH

Windows

See Windows.md

Setup for All OS

Set a LIBND4J_HOME as an environment variable to the libnd4j folder you've obtained from GIT
- Note: this is required for building nd4j as well.

Setup cpu followed by gpu, run the following on the command line:

For standard builds:

./buildnativeoperations.sh
./buildnativeoperations.sh -c cuda -сс YOUR_DEVICE_ARCH

For Debug builds:

./buildnativeoperations.sh blas -b debug
./buildnativeoperations.sh blas -c cuda -сс YOUR_DEVICE_ARCH -b debug

For release builds (default):

./buildnativeoperations.sh
./buildnativeoperations.sh -c cuda -сс YOUR_DEVICE_ARCH

OpenMP support

OpenMP 4.0+ should be used to compile libnd4j. However, this shouldn't be any trouble, since OpenMP 4 was released in 2015 and should be available on all major platforms.

Linking with MKL

We can link with MKL either at build time, or at runtime with binaries initially linked with another BLAS implementation such as OpenBLAS. In either case, simply add the path containing libmkl_rt.so (or mkl_rt.dll on Windows), say /path/to/intel64/lib/, to the LD_LIBRARY_PATH environment variable on Linux (or PATH on Windows), and build or run your Java application as usual. If you get an error message like undefined symbol: omp_get_num_procs, it probably means that libiomp5.so, libiomp5.dylib, or libiomp5md.dll is not present on your system. In that case though, it is still possible to use the GNU version of OpenMP by setting these environment variables on Linux, for example:

export MKL_THREADING_LAYER=GNU
export LD_PRELOAD=/usr/lib64/libgomp.so.1

##Troubleshooting MKL

Sometimes the above steps might not be all you need to do. Another additional step might be the need to add:

export LD_LIBRARY_PATH=/opt/intel/lib/intel64/:/opt/intel/mkl/lib/intel64

This ensures that mkl will be found first and liked to.

Packaging

If on Ubuntu (14.04 or above) or CentOS (6 or above), this repository is also set to create packages for your distribution. Let's assume you have built:

for the cpu, your command-line was ./buildnativeoperations.sh ...:

cd blasbuild/cpu
make package

for the gpu, your command-line was ./buildnativeoperations.sh -c cuda ...:

cd blasbuild/cuda
make package

Uploading package to Bintray

The package upload script is in packaging. The upload command for an rpm built for cpu is:

./packages/push_to_bintray.sh myAPIUser myAPIKey deeplearning4j blasbuild/cpu/libnd4j-0.8.0.fc7.3.1611.x86_64.rpm https://github.com/deeplearning4j

The upload command for a deb package built for cuda is:

./packages/push_to_bintray.sh myAPIUser myAPIKey deeplearning4j blasbuild/cuda/libnd4j-0.8.0.fc7.3.1611.x86_64.deb https://github.com/deeplearning4j

Running tests

Tests are written with gtest, run using cmake. Tests are currently under tests_cpu/

There are 2 directories for running tests:

1. libnd4j_tests: These are older legacy ops tests.
2. layers_tests: This covers the newer graph operations and ops associated with samediff.

For running the tests, we currently use cmake or CLion to run the tests.

To run tests using CUDA backend it's pretty much similar process:

1. ./buildnativeoperations.h -c cuda -cc <YOUR_ARCH> -b debug -t -j <NUMBER_OF_CORES>
2. ./blasbuild/cuda/tests_cpu/layers_tests/runtests (.exe on Windows)

README.md Unescape Escape

LibND4J

Prerequisites

Additional build arguments

OS Specific Requirements

Android

OSX

Linux

Ubuntu Linux 15.10

Ubuntu Linux 16.04

CentOS 6

Windows

Setup for All OS

OpenMP support

Linking with MKL

Packaging

Uploading package to Bintray

Running tests

README.md