dc0036f2c6
* Added implementation files for image_resize and resize_bicubic ops. * Image resize and image.resize_bicubic ops implementation. Initial revision. * Finished with infrastructure development for image.resize_bilinear op and image_resizo op implementation. * Refactored resize methods. * Added processing for Mitchelcubic algorithm. * Added check for input/output sizes. * Added int and float types for crop_and_resize op. * Refactored crop_and_resize output type check. * Added helper for bicubic interpolation as TF v.1 does. * Added TF v.1 bicubic helper for cuda platform. * Added cached class for bicubic algorithm. * Refactored cuda implementation for crop_and_resize helper to use proper output type. * Added facilities for bicubic interpolation. * Portion bicubic interpolation from TF. * Added tests for resize_bilinear testing. * Working implementation of bicubic interpolation and tests. * Refactored routines with image_resize bicubic op helper. * Refactored code with coding standards. * Refactored cpu helpers for resize_bicubic op. * Refactored bicubic helpers. * Added bicubic resize facilities. * Implementing cuda kernels for bicubic interpolation. Implementation step. * Cuda implementation of resize_bicubic op helper. * Refactor image.resize_bicubic op helpers. * Refactored helpers for resize_bicubic. Added error checking with cuda implementation. * Refactored cuda implementation of resize_bicubic op helper. The first working revision. * Cuda arch implementation for resize_bicubic op helper. Full working single-threaded revision. * Intermediate bicubic interpolation helper for cuda. * Refactored cpu helper for resize_bicubic. * Multithreaded cuda implementation for resize_bicubic. * Fixed merge issues. * Refactored nlp helpers. * Replicated resize_bicubic for 3D also. * Eliminated waste comments of unused code. * Eliminated waste comments with unused code. * Eliminated waste template definitions. * Eliminated waste debug code. * Eliminated waste comments. * Fixed multithreading with helpers. * Fixed test suites for float and double in float point input lists. * Fixed usage of reshape with 3D/4D on resizes. * Final fixes. * Fixed resize_neighbor op problem. |
||
---|---|---|
.. | ||
blas | ||
cmake | ||
include | ||
minifier | ||
msi | ||
packages | ||
profile | ||
server | ||
tests_cpu | ||
.gitignore | ||
AddingNewOps.md | ||
CMakeLists.txt | ||
CMakeLists.txt.cpu_features.in | ||
CMakeLists.txt.in | ||
CMakeLists.txt.mkldnn.in | ||
CMakeSettings.json | ||
LICENSE | ||
README.md | ||
RaspberryPi.md | ||
UnderstandingGraph.md | ||
assembly-cuda.xml | ||
assembly.xml | ||
buildnativeoperations.sh | ||
cibuild.sh | ||
development.md | ||
flatproto.txt | ||
iOS.md | ||
linuxOnPower.md | ||
macOSx10 (CPU only).md | ||
pom.xml | ||
proto.sh | ||
setuposx.sh | ||
windows.md |
README.md
LibND4J
Native operations for nd4j. Build using cmake
Prerequisites
- GCC 4.9+
- CUDA 8.0 or 9.0 (if desired)
- CMake 3.8 (as of Nov 2017, in near future will require 3.9)
Additional build arguments
There's few additional arguments for buildnativeoperations.sh
script you could use:
-a XXXXXXXX// shortcut for -march/-mtune, i.e. -a native
-b release OR -b debug // enables/desables debug builds. release is considered by default
-j XX // this argument defines how many threads will be used to binaries on your box. i.e. -j 8
-cc XX// CUDA-only argument, builds only binaries for target GPU architecture. use this for fast builds
You can find the compute capability for your card on the NVIDIA website here.
For example, a GTX 1080 has compute capability 6.1, for which you would use -cc 61
(note no decimal point).
OS Specific Requirements
Android
Download the NDK, extract it somewhere, and execute the following commands, replacing android-xxx
with either android-arm
or android-x86
:
git clone https://github.com/deeplearning4j/libnd4j
git clone https://github.com/deeplearning4j/nd4j
export ANDROID_NDK=/path/to/android-ndk/
cd libnd4j
bash buildnativeoperations.sh -platform android-xxx
cd ../nd4j
mvn clean install -Djavacpp.platform=android-xxx -DskipTests -pl '!:nd4j-cuda-9.0,!:nd4j-cuda-9.0-platform,!:nd4j-tests'
OSX
Run ./setuposx.sh (Please ensure you have brew installed)
Linux
Depends on the distro - ask in the earlyadopters channel for specifics on distro
Ubuntu Linux 15.10
wget http://developer.download.nvidia.com/compute/cuda/7.5/Prod/local_installers/cuda-repo-ubuntu1504-7-5-local_7.5-18_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1504-7-5-local_7.5-18_amd64.deb
sudo apt-get update
sudo apt-get install cuda
sudo apt-get install cmake
sudo apt-get install gcc-4.9
sudo apt-get install g++-4.9
sudo apt-get install git
git clone https://github.com/deeplearning4j/libnd4j
cd libnd4j/
export LIBND4J_HOME=~/libnd4j/
sudo rm /usr/bin/gcc
sudo rm /usr/bin/g++
sudo ln -s /usr/bin/gcc-4.9 /usr/bin/gcc
sudo ln -s /usr/bin/g++-4.9 /usr/bin/g++
./buildnativeoperations.sh
./buildnativeoperations.sh -c cuda -сс YOUR_DEVICE_ARCH
Ubuntu Linux 16.04
sudo apt install cmake
sudo apt install nvidia-cuda-dev nvidia-cuda-toolkit nvidia-361
export TRICK_NVCC=YES
./buildnativeoperations.sh
./buildnativeoperations.sh -c cuda -сс YOUR_DEVICE_ARCH
The standard development headers are needed.
CentOS 6
yum install centos-release-scl-rh epel-release
yum install devtoolset-3-toolchain maven30 cmake3 git
scl enable devtoolset-3 maven30 bash
./buildnativeoperations.sh
./buildnativeoperations.sh -c cuda -сс YOUR_DEVICE_ARCH
Windows
See Windows.md
Setup for All OS
-
Set a LIBND4J_HOME as an environment variable to the libnd4j folder you've obtained from GIT
- Note: this is required for building nd4j as well.
-
Setup cpu followed by gpu, run the following on the command line:
-
For standard builds:
./buildnativeoperations.sh ./buildnativeoperations.sh -c cuda -сс YOUR_DEVICE_ARCH
-
For Debug builds:
./buildnativeoperations.sh blas -b debug ./buildnativeoperations.sh blas -c cuda -сс YOUR_DEVICE_ARCH -b debug
-
For release builds (default):
./buildnativeoperations.sh ./buildnativeoperations.sh -c cuda -сс YOUR_DEVICE_ARCH
-
OpenMP support
OpenMP 4.0+ should be used to compile libnd4j. However, this shouldn't be any trouble, since OpenMP 4 was released in 2015 and should be available on all major platforms.
Linking with MKL
We can link with MKL either at build time, or at runtime with binaries initially linked with another BLAS implementation such as OpenBLAS. In either case, simply add the path containing libmkl_rt.so
(or mkl_rt.dll
on Windows), say /path/to/intel64/lib/
, to the LD_LIBRARY_PATH
environment variable on Linux (or PATH
on Windows), and build or run your Java application as usual. If you get an error message like undefined symbol: omp_get_num_procs
, it probably means that libiomp5.so
, libiomp5.dylib
, or libiomp5md.dll
is not present on your system. In that case though, it is still possible to use the GNU version of OpenMP by setting these environment variables on Linux, for example:
export MKL_THREADING_LAYER=GNU
export LD_PRELOAD=/usr/lib64/libgomp.so.1
##Troubleshooting MKL
Sometimes the above steps might not be all you need to do. Another additional step might be the need to add:
export LD_LIBRARY_PATH=/opt/intel/lib/intel64/:/opt/intel/mkl/lib/intel64
This ensures that mkl will be found first and liked to.
Packaging
If on Ubuntu (14.04 or above) or CentOS (6 or above), this repository is also set to create packages for your distribution. Let's assume you have built:
- for the cpu, your command-line was
./buildnativeoperations.sh ...
:
cd blasbuild/cpu
make package
- for the gpu, your command-line was
./buildnativeoperations.sh -c cuda ...
:
cd blasbuild/cuda
make package
Uploading package to Bintray
The package upload script is in packaging. The upload command for an rpm built for cpu is:
./packages/push_to_bintray.sh myAPIUser myAPIKey deeplearning4j blasbuild/cpu/libnd4j-0.8.0.fc7.3.1611.x86_64.rpm https://github.com/deeplearning4j
The upload command for a deb package built for cuda is:
./packages/push_to_bintray.sh myAPIUser myAPIKey deeplearning4j blasbuild/cuda/libnd4j-0.8.0.fc7.3.1611.x86_64.deb https://github.com/deeplearning4j
Running tests
Tests are written with gtest, run using cmake. Tests are currently under tests_cpu/
There are 2 directories for running tests:
1. libnd4j_tests: These are older legacy ops tests.
2. layers_tests: This covers the newer graph operations and ops associated with samediff.
For running the tests, we currently use cmake or CLion to run the tests.
To run tests using CUDA backend it's pretty much similar process:
1. ./buildnativeoperations.h -c cuda -cc <YOUR_ARCH> -b debug -t -j <NUMBER_OF_CORES>
2. ./blasbuild/cuda/tests_cpu/layers_tests/runtests (.exe on Windows)