Yurii Shyrma
f7a9190407
profiling of concat op (both cuda and cpu) ( #151 )
...
* - profiling of concat op (both cuda and cpu)
Signed-off-by: Yurii <iuriish@yahoo.com>
* better comparison for large concat
Signed-off-by: raver119 <raver119@gmail.com>
* - further improving of concat op
Signed-off-by: Yurii <iuriish@yahoo.com>
* some loggin
Signed-off-by: raver119 <raver119@gmail.com>
* - add possibility to verify presence of trailing unities in shape and set strides/ews correspondingly
- restrict second simple case in concat op to c order only
Signed-off-by: Yurii <iuriish@yahoo.com>
* - move concat op to specials_single.cpp file
Signed-off-by: Yurii <iuriish@yahoo.com>
* - get rid of second concat op declaration in transforms.cpp file
Signed-off-by: Yurii <iuriish@yahoo.com>
Co-authored-by: raver119 <raver119@gmail.com>
2020-02-20 21:19:01 +03:00
raver119
9e3c1b02b1
Perf improvements ( #242 )
...
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* meh
Signed-off-by: raver119 <raver119@gmail.com>
* better ExpandDims impl
Signed-off-by: raver119 <raver119@gmail.com>
* better Squeeze impl
Signed-off-by: raver119 <raver119@gmail.com>
* better Softmax impl
Signed-off-by: raver119 <raver119@gmail.com>
* one test disabled
Signed-off-by: raver119 <raver119@gmail.com>
* more accurate impl
Signed-off-by: raver119 <raver119@gmail.com>
* - GraphProfiler now prints full shapeInfo instead of shape
- softmax typo fix
Signed-off-by: raver119 <raver119@gmail.com>
2020-02-14 16:20:31 +03:00
raver119
1dfac9a736
DataBuffer.write() tweak ( #221 )
...
* special workaround methods for DataBuffer.write
Signed-off-by: raver119 <raver119@gmail.com>
* one test removed
Signed-off-by: raver119 <raver119@gmail.com>
* more of unsynced
Signed-off-by: raver119 <raver119@gmail.com>
* missing asLong for BaseCudaDataBuffer
Signed-off-by: raver119 <raver119@gmail.com>
2020-02-07 18:16:11 +03:00
raver119
a0da5a9e47
Events removed from Java ( #219 )
...
* replace mutex with lock_guards
Signed-off-by: raver119 <raver119@gmail.com>
* Events ditched from Java CUDA logic
Signed-off-by: raver119 <raver119@gmail.com>
2020-02-07 12:34:55 +03:00
Alex Black
569a46f87d
Fixes ( #213 )
...
* Increase timeouts for 2 tests occasionally failing on CI
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Explicitly set character encoding via argline for maven surefire tests
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* CUDA gradient check timeout fix + simple rnn masking fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
2020-02-05 17:07:36 +11:00
raver119
5d28e6143d
OpContext handling ( #214 )
...
* nano tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* OpContext tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* OpContext deallocators
Signed-off-by: raver119 <raver119@gmail.com>
* get rid of few mkldnn safety checks
Signed-off-by: raver119 <raver119@gmail.com>
* databuffer setSpecial fix
Signed-off-by: raver119 <raver119@gmail.com>
2020-02-05 07:27:24 +03:00
raver119
81efa5c3b6
[WIP] one small fix ( #207 )
...
* one small fix
Signed-off-by: raver119 <raver119@gmail.com>
* assert added
Signed-off-by: raver119 <raver119@gmail.com>
2020-02-02 19:17:26 +03:00
raver119
5d98cfcf47
Configurable DataType for ops ( #201 )
...
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* - one more test for OneHot with dtype
- one more signature in Nd4j
Signed-off-by: raver119 <raver119@gmail.com>
* ones_as/zeros_as now accept dtype
Signed-off-by: raver119 <raver119@gmail.com>
* one more test
Signed-off-by: raver119 <raver119@gmail.com>
* - more updates for configurable data types
- ones_as/zeros_as java side + tests
Signed-off-by: raver119 <raver119@gmail.com>
* few c++ tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* few more changes around DArgs
Signed-off-by: raver119 <raver119@gmail.com>
2020-01-30 18:46:12 +03:00
raver119
ba961c7601
DataTypes & FlatBuffers ( #197 )
...
* flatbuffers version upgrade
Signed-off-by: raver119 <raver119@gmail.com>
* flatbuffers version upgrade java side
Signed-off-by: raver119 <raver119@gmail.com>
* flatbuffers dependency version upgrade java side
Signed-off-by: raver119 <raver119@gmail.com>
* MKLDNN version upgrade
Signed-off-by: raver119 <raver119@gmail.com>
* DArgs first pass
Signed-off-by: raver119 <raver119@gmail.com>
* signatures first pass
Signed-off-by: raver119 <raver119@gmail.com>
* signatures second pass
Signed-off-by: raver119 <raver119@gmail.com>
* signatures third pass
Signed-off-by: raver119 <raver119@gmail.com>
* signatures third pass
Signed-off-by: raver119 <raver119@gmail.com>
* signatures fourth pass
Signed-off-by: raver119 <raver119@gmail.com>
* signatures fifth pass
Signed-off-by: raver119 <raver119@gmail.com>
* flatbuffers UI version upgrade java side
Signed-off-by: raver119 <raver119@gmail.com>
* flatbuffers ui update
Signed-off-by: raver119 <raver119@gmail.com>
* flatbuffers downgrade
Signed-off-by: raver119 <raver119@gmail.com>
* flatbuffers downgrade java side
Signed-off-by: raver119 <raver119@gmail.com>
2020-01-30 10:07:24 +03:00
raver119
9f719488b9
CUDA sync tweaks ( #194 )
...
* ThreadLocal cache for CudaContext
Signed-off-by: raver119 <raver119@gmail.com>
* temp commit
Signed-off-by: raver119 <raver119@gmail.com>
* remove unwanted synchronization
Signed-off-by: raver119 <raver119@gmail.com>
2020-01-28 10:55:06 +03:00
raver119
7ef0ef907e
Packages fix ( #193 )
...
* packages fix
Signed-off-by: raver119 <raver119@gmail.com>
* few imports fixed
Signed-off-by: raver119 <raver119@gmail.com>
* few imports fixed
Signed-off-by: raver119 <raver119@gmail.com>
2020-01-27 23:04:21 +03:00
raver119
531a72fabd
execution mode ( #183 )
...
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* execution mode java side
Signed-off-by: raver119 <raver119@gmail.com>
* meh
Signed-off-by: raver119 <raver119@gmail.com>
* move exec mode to ContextPrototype
Signed-off-by: raver119 <raver119@gmail.com>
* copyrights
Signed-off-by: raver119 <raver119@gmail.com>
2020-01-27 10:00:07 +03:00
raver119
5d69069177
[WIP] Memory limits ( #167 )
...
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* one more initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* additional initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* subsequent initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* initial commit testing
Signed-off-by: raver119 <raver119@gmail.com>
* initial commit per device
Signed-off-by: raver119 <raver119@gmail.com>
* initial commit per group
Signed-off-by: raver119 <raver119@gmail.com>
* initial commit for cuda
Signed-off-by: raver119 <raver119@gmail.com>
* initial commit for cuda + few missed lines
Signed-off-by: raver119 <raver119@gmail.com>
* initial commit for cuda + missed includes
Signed-off-by: raver119 <raver119@gmail.com>
* initial commit for cuda + one more missed include
Signed-off-by: raver119 <raver119@gmail.com>
* initial commit shouldn't count host mem as dev0 in cuda
Signed-off-by: raver119 <raver119@gmail.com>
* initial commit that tracks HOST group limits for CUDA
Signed-off-by: raver119 <raver119@gmail.com>
* initial commit with some Environment changes
Signed-off-by: raver119 <raver119@gmail.com>
* initial commit with more Environment changes
Signed-off-by: raver119 <raver119@gmail.com>
* initial commit with maxMasterThreads fix
Signed-off-by: raver119 <raver119@gmail.com>
* initial commit with maxMasterThreads fix
Signed-off-by: raver119 <raver119@gmail.com>
* initial commit without maxMasterThreads exception
Signed-off-by: raver119 <raver119@gmail.com>
* initial commit without Nd4jULong in Environment
Signed-off-by: raver119 <raver119@gmail.com>
* add sleep and more iterations for OOM cases
Signed-off-by: raver119 <raver119@gmail.com>
* limits propagation from java side
Signed-off-by: raver119 <raver119@gmail.com>
* - consume ErrorCode every time
- one test for memory limits
Signed-off-by: raver119 <raver119@gmail.com>
* unordered_map
Signed-off-by: raver119 <raver119@gmail.com>
* unordered_map
Signed-off-by: raver119 <raver119@gmail.com>
* unordered_map
Signed-off-by: raver119 <raver119@gmail.com>
* RSub op mapping fixed
Signed-off-by: raver119 <raver119@gmail.com>
* typo fixed
Signed-off-by: raver119 <raver119@gmail.com>
* one bad test fixed
Signed-off-by: raver119 <raver119@gmail.com>
2020-01-24 10:11:09 +03:00
raver119
256c9d20b0
alloc check for RNG ( #179 )
...
* missing alloc validation in RandomGenerator for CUDA
Signed-off-by: raver119 <raver119@gmail.com>
* set error message if rng alloc failed
Signed-off-by: raver119 <raver119@gmail.com>
* check for error code during RNG creation in java
Signed-off-by: raver119 <raver119@gmail.com>
2020-01-23 09:51:02 +03:00
raver119
25db3a44f1
[WIP] few fixes for tests ( #177 )
...
* nd4j-aeron profiles
Signed-off-by: raver119 <raver119@gmail.com>
* nd4j-aeron profiles
Signed-off-by: raver119 <raver119@gmail.com>
* skip one long test
Signed-off-by: raver119 <raver119@gmail.com>
* skip one long test
Signed-off-by: raver119 <raver119@gmail.com>
* kryo profile
Signed-off-by: raver119 <raver119@gmail.com>
* few more profiles
Signed-off-by: raver119 <raver119@gmail.com>
* few more profiles
Signed-off-by: raver119 <raver119@gmail.com>
* few more profiles
Signed-off-by: raver119 <raver119@gmail.com>
2020-01-22 16:12:30 +03:00
Alex Black
a25bb6a11c
Unit/integration test split + test speedup ( #166 )
...
* Add maven profile + base tests methods for integration tests
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Switch from system property to environment variable; seems more reliable in intellij
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Add nd4j-common-tests module, and common base test; cleanup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Ensure all ND4J tests extend BaseND4JTest
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Test spam reduction, import fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Add test logging to nd4j-aeron
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix unintended change
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Reduce sprint test log spam
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More test spam cleanup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Significantly speed up TSNE tests
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* W2V iterator test unit/integration split
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More NLP test speedups
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Avoid debug/verbose mode leaking between tests
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* test tweak
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Arbiter extends base DL4J test
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Arbiter test speedup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* nlp-uima test speedup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More test speedups
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix ND4J base test
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Few small ND4J test speed improvements
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* DL4J tests speedup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More tweaks
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Even more test speedups
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More tweaks
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Various test fixes
Signed-off-by: Alex Black <blacka101@gmail.com>
* More test fixes
Signed-off-by: Alex Black <blacka101@gmail.com>
* Add ability to specify number of threads for C++ ops in BaseDL4JTest and BaseND4JTest
Signed-off-by: Alex Black <blacka101@gmail.com>
* nd4j-aeron test profile fix for CUDA
Signed-off-by: Alex Black <blacka101@gmail.com>
2020-01-22 22:27:01 +11:00
raver119
7783012f39
cuDNN integration ( #150 )
...
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* one file
Signed-off-by: raver119 <raver119@gmail.com>
* few more includes
Signed-off-by: raver119 <raver119@gmail.com>
* m?
Signed-off-by: raver119 <raver119@gmail.com>
* const
Signed-off-by: raver119 <raver119@gmail.com>
* cudnn linkage in tests
Signed-off-by: raver119 <raver119@gmail.com>
* culibos
Signed-off-by: raver119 <raver119@gmail.com>
* static reminder
Signed-off-by: raver119 <raver119@gmail.com>
* platform engine tag
Signed-off-by: raver119 <raver119@gmail.com>
* HAVE_CUDNN moved to config.h.in
Signed-off-by: raver119 <raver119@gmail.com>
* include
Signed-off-by: raver119 <raver119@gmail.com>
* include
Signed-off-by: raver119 <raver119@gmail.com>
* skip cudnn handle creation if there's not cudnn
Signed-off-by: raver119 <raver119@gmail.com>
* meh
Signed-off-by: raver119 <raver119@gmail.com>
* target device in context
Signed-off-by: raver119 <raver119@gmail.com>
* platform engines
Signed-off-by: raver119 <raver119@gmail.com>
* platform engines
Signed-off-by: raver119 <raver119@gmail.com>
* allow multiple -h args
Signed-off-by: raver119 <raver119@gmail.com>
* allow multiple -h args
Signed-off-by: raver119 <raver119@gmail.com>
* move mkldnn out of CPU block
Signed-off-by: raver119 <raver119@gmail.com>
* link to mkldnn on cuda
Signed-off-by: raver119 <raver119@gmail.com>
* less prints
Signed-off-by: raver119 <raver119@gmail.com>
* minor tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* next step
Signed-off-by: raver119 <raver119@gmail.com>
* conv2d NCHW draft
Signed-off-by: raver119 <raver119@gmail.com>
* conv2d biasAdd
Signed-off-by: raver119 <raver119@gmail.com>
* test for MKL/CUDNN combined use
Signed-off-by: raver119 <raver119@gmail.com>
* - provide additional code for conv2d ff based on cudnn api, not tested yet
Signed-off-by: Yurii <iuriish@yahoo.com>
* - further work on conv2d helper based on using cudnn api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - fixing several cuda bugs which appeared after cudnn lib had been started to use
Signed-off-by: Yurii <iuriish@yahoo.com>
* - implementation of conv2d backprop op based on cudnn api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - implementaion of conv3d and conv3d_bp ops based on cudnn api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - bugs fixing in conv3d/conv3d_bp ops (cudnn in use)
Signed-off-by: Yurii <iuriish@yahoo.com>
* - implementation of depthwiseConv2d (ff/bp) op based on cudnn api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - implementation of batchnorm ff op based on cudnn api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - disable cudnn batchnorm temporary
Signed-off-by: Yurii <iuriish@yahoo.com>
* - add minor change in cmake
Signed-off-by: Yurii <iuriish@yahoo.com>
* engine for depthwise mkldnn
Signed-off-by: raver119 <raver119@gmail.com>
* couple of includes
Signed-off-by: raver119 <raver119@gmail.com>
* - provide permutation to cudnn batchnorm ff when format is NHWC
Signed-off-by: Yurii <iuriish@yahoo.com>
* lgamma fix
Signed-off-by: raver119 <raver119@gmail.com>
* - eliminate memory leak in two tests
Signed-off-by: Yurii <iuriish@yahoo.com>
Co-authored-by: Yurii Shyrma <iuriish@yahoo.com>
2020-01-20 21:32:46 +03:00
raver119
29e8e09db6
String changes ( #3 )
...
* initial commit
* additional data types & tensor type
Signed-off-by: raver119 <raver119@gmail.com>
* next step
Signed-off-by: raver119 <raver119@gmail.com>
* missing include
* sparse_to_dense
Signed-off-by: raver119 <raver119@gmail.com>
* few more tests files
Signed-off-by: raver119 <raver119@gmail.com>
* draft
Signed-off-by: raver119 <raver119@gmail.com>
* numeric sparse_to_dense
Signed-off-by: raver119 <raver119@gmail.com>
* comment
Signed-off-by: raver119 <raver119@gmail.com>
* string sparse_to_dense version
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA DataBuffer expand
Signed-off-by: raver119 <raver119@gmail.com>
* few tweaks for CUDA build
Signed-off-by: raver119 <raver119@gmail.com>
* shape fn for string_split
Signed-off-by: raver119 <raver119@gmail.com>
* one more comment
Signed-off-by: raver119 <raver119@gmail.com>
* string_split indices
Signed-off-by: raver119 <raver119@gmail.com>
* next step
Signed-off-by: raver119 <raver119@gmail.com>
* test passes
Signed-off-by: raver119 <raver119@gmail.com>
* few rearrangements for databuffer implementations
Signed-off-by: raver119 <raver119@gmail.com>
* DataBuffer: move inline methods to common implementations
Signed-off-by: raver119 <raver119@gmail.com>
* add native DataBuffer to Nd4j presets
Signed-off-by: raver119 <raver119@gmail.com>
* DataBuffer creation
Signed-off-by: raver119 <raver119@gmail.com>
* use DataBuffer for allocation
Signed-off-by: raver119 <raver119@gmail.com>
* cpu databuffer as deallocatable
Signed-off-by: raver119 <raver119@gmail.com>
* DataBuffer setters for bufers
Signed-off-by: raver119 <raver119@gmail.com>
* couple of wrappers
Signed-off-by: raver119 <raver119@gmail.com>
* DataBuffers being passed around
Signed-off-by: raver119 <raver119@gmail.com>
* Bunch of ByteBuffer-related signatures gone
Signed-off-by: raver119 <raver119@gmail.com>
* - few more Nd4j signatures removed
- minor fix for bfloat16
Signed-off-by: raver119 <raver119@gmail.com>
* nullptr pointer is still a pointer, but 0 as address :)
Signed-off-by: raver119 <raver119@gmail.com>
* one special test
Signed-off-by: raver119 <raver119@gmail.com>
* empty string array init
Signed-off-by: raver119 <raver119@gmail.com>
* one more test in cpp
Signed-off-by: raver119 <raver119@gmail.com>
* memcpy instead of databuffer swap
Signed-off-by: raver119 <raver119@gmail.com>
* special InteropDataBuffer for front-end languages
Signed-off-by: raver119 <raver119@gmail.com>
* few tweaks for java
Signed-off-by: raver119 <raver119@gmail.com>
* pointer/indexer actualization
Signed-off-by: raver119 <raver119@gmail.com>
* CustomOp returns list for inputArumgents and outputArguments instead of array
Signed-off-by: raver119 <raver119@gmail.com>
* redundant call
Signed-off-by: raver119 <raver119@gmail.com>
* print_variable op
Signed-off-by: raver119 <raver119@gmail.com>
* - view handling (but wrong one)
- print_variable java wrapper
Signed-off-by: raver119 <raver119@gmail.com>
* one more test
Signed-off-by: raver119 <raver119@gmail.com>
* - empty arrays handling
Signed-off-by: raver119 <raver119@gmail.com>
* - deserialization works now
Signed-off-by: raver119 <raver119@gmail.com>
* minor fix
Signed-off-by: raver119 <raver119@gmail.com>
* meh
Signed-off-by: raver119 <raver119@gmail.com>
* one more fix
Signed-off-by: raver119 <raver119@gmail.com>
* initial cuda commit
Signed-off-by: raver119 <raver119@gmail.com>
* print_variable message validation
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA views
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA special buffer size
Signed-off-by: raver119 <raver119@gmail.com>
* minor update to match master changes
Signed-off-by: raver119 <raver119@gmail.com>
* - consider arrays always actual on device for CUDA
- additional PrintVariable constructor
- CudaUtf8Buffer now allocates host buffer by default
Signed-off-by: raver119 <raver119@gmail.com>
* meh
Signed-off-by: raver119 <raver119@gmail.com>
* - print_variable now allows print from device
Signed-off-by: raver119 <raver119@gmail.com>
* InteropDataBuffer data type fix
Signed-off-by: raver119 <raver119@gmail.com>
* ...
Signed-off-by: raver119 <raver119@gmail.com>
* disable some debug messages
Signed-off-by: raver119 <raver119@gmail.com>
* master pulled in
Signed-off-by: raver119 <raver119@gmail.com>
* couple of new methods for DataBuffer interop
Signed-off-by: raver119 <raver119@gmail.com>
* java side
Signed-off-by: raver119 <raver119@gmail.com>
* offsetted constructor
Signed-off-by: raver119 <raver119@gmail.com>
* new CUDA deallocator
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA backend torn apart
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA backend torn apart 2
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA backend torn apart 3
Signed-off-by: raver119 <raver119@gmail.com>
* - few new tests
- few new methods for DataBuffer management
Signed-off-by: raver119 <raver119@gmail.com>
* few more tests + few more tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* two failing tests
Signed-off-by: raver119 <raver119@gmail.com>
* one more test
Signed-off-by: raver119 <raver119@gmail.com>
* two failing tests pass
Signed-off-by: raver119 <raver119@gmail.com>
* now we pass DataBuffer to legacy ops too
Signed-off-by: raver119 <raver119@gmail.com>
* Native DataBuffer for legacy ops, Java side
Signed-off-by: raver119 <raver119@gmail.com>
* CPU java side update
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA java side update
Signed-off-by: raver119 <raver119@gmail.com>
* no more prepare/register action on java side
Signed-off-by: raver119 <raver119@gmail.com>
* NDArray::prepare/register use now accepts vectors
Signed-off-by: raver119 <raver119@gmail.com>
* InteropDataBuffer now has few more convenience methods
Signed-off-by: raver119 <raver119@gmail.com>
* java bindings update
Signed-off-by: raver119 <raver119@gmail.com>
* tick device in NativeOps
Signed-off-by: raver119 <raver119@gmail.com>
* Corrected usage of OpaqueBuffer for tests.
* Corrected usage of OpaqueBuffer for java tests.
* NativeOpsTests fixes.
* print_variable now returns scalar
Signed-off-by: raver119 <raver119@gmail.com>
* one more test
Signed-off-by: raver119 <raver119@gmail.com>
* compat_string_split fix for CUDA
Signed-off-by: raver119 <raver119@gmail.com>
* - CUDA execScalar fix
- CUDA lazyAllocateHostPointer now checks java indexer/pointer instead of native pointer
Signed-off-by: raver119 <raver119@gmail.com>
* legacy ops DataBuffer migration prototype
Signed-off-by: raver119 <raver119@gmail.com>
* ignore device shapeinfo coming from java
Signed-off-by: raver119 <raver119@gmail.com>
* minor fix
Signed-off-by: raver119 <raver119@gmail.com>
* minor transformAny fix
Signed-off-by: raver119 <raver119@gmail.com>
* minor tweak for lazy host allocation
Signed-off-by: raver119 <raver119@gmail.com>
* - DataBuffer::memcpy method
- bitcast now uses memcpy
Signed-off-by: raver119 <raver119@gmail.com>
* - IndexReduce CUDA dimension buffer fix
Signed-off-by: raver119 <raver119@gmail.com>
* views for CPU and CUDA
Signed-off-by: raver119 <raver119@gmail.com>
* less spam
Signed-off-by: raver119 <raver119@gmail.com>
* optional memory init
Signed-off-by: raver119 <raver119@gmail.com>
* async memset
Signed-off-by: raver119 <raver119@gmail.com>
* - SummaryStats CUDA fix
- DataBuffer.sameUnderlyingData() impl
- execBroadcast fix
Signed-off-by: raver119 <raver119@gmail.com>
* - reduce3All fix
switch to CUDA 10 temporarily
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA version
Signed-off-by: raver119 <raver119@gmail.com>
* proper memory deallocator registration
Signed-off-by: raver119 <raver119@gmail.com>
* HOST_ONLY workspace allocation
Signed-off-by: raver119 <raver119@gmail.com>
* temp commit
Signed-off-by: raver119 <raver119@gmail.com>
* few conflicts resolved
Signed-off-by: raver119 <raver119@gmail.com>
* few minor fixes
Signed-off-by: raver119 <raver119@gmail.com>
* one more minor fix
Signed-off-by: raver119 <raver119@gmail.com>
* NDArray permute should operate on JVM primitives
Signed-off-by: raver119 <raver119@gmail.com>
* - create InteropDataBuffer for shapes as well
- update pointers after view creation in Java
Signed-off-by: raver119 <raver119@gmail.com>
* - addressPointer temporary moved to C++
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA: don't account offset twice
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA: DataBuffer pointer constructor updated
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA NDArray.unsafeDuplication() simplified
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA minor workspace-related fixes
Signed-off-by: raver119 <raver119@gmail.com>
* CPU DataBuffer.reallocate()
Signed-off-by: raver119 <raver119@gmail.com>
* print_affinity op
Signed-off-by: raver119 <raver119@gmail.com>
* print_affinity java side
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA more tweaks for data locality
Signed-off-by: raver119 <raver119@gmail.com>
* - compat_string_split tweak
- CudaUtf8Buffer update
Signed-off-by: raver119 <raver119@gmail.com>
* INDArray.close() mechanic restored
Signed-off-by: raver119 <raver119@gmail.com>
* one more test fixed
Signed-off-by: raver119 <raver119@gmail.com>
* - CUDA DataBuffer.reallocate() updated
- cudaMemcpy (synchronous) restored
Signed-off-by: raver119 <raver119@gmail.com>
* one last fix
Signed-off-by: raver119 <raver119@gmail.com>
* bad import removed
Signed-off-by: raver119 <raver119@gmail.com>
* another small fix
Signed-off-by: raver119 <raver119@gmail.com>
* one special test
Signed-off-by: raver119 <raver119@gmail.com>
* fix bad databuffer size
Signed-off-by: raver119 <raver119@gmail.com>
* release primaryBuffer on replace
Signed-off-by: raver119 <raver119@gmail.com>
* higher timeout
Signed-off-by: raver119 <raver119@gmail.com>
* disable timeouts
Signed-off-by: raver119 <raver119@gmail.com>
* dbCreateView now validates offset and length of a view
Signed-off-by: raver119 <raver119@gmail.com>
* additional validation for dbExpand
Signed-off-by: raver119 <raver119@gmail.com>
* restore timeout back again
Signed-off-by: raver119 <raver119@gmail.com>
* smaller distribution for rng test to prevent timeouts
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA DataBuffer::memcpy now copies to device all the time
Signed-off-by: raver119 <raver119@gmail.com>
* OpaqueDataBuffer now contains all required methods for interop
Signed-off-by: raver119 <raver119@gmail.com>
* some javadoc
Signed-off-by: raver119 <raver119@gmail.com>
* GC on failed allocations
Signed-off-by: raver119 <raver119@gmail.com>
* minoe memcpu tweak
Signed-off-by: raver119 <raver119@gmail.com>
* one more bitcast test
Signed-off-by: raver119 <raver119@gmail.com>
* - NDArray::deviceId() propagation
- special multi-threaded test for data locality checks
Signed-off-by: raver119 <raver119@gmail.com>
* DataBuffer additional syncStream
Signed-off-by: raver119 <raver119@gmail.com>
* DataBuffer additional syncStream
Signed-off-by: raver119 <raver119@gmail.com>
* one ignored test
Signed-off-by: raver119 <raver119@gmail.com>
* skip host alloc for empty arrays
Signed-off-by: raver119 <raver119@gmail.com>
* ByteBuffer support is back
Signed-off-by: raver119 <raver119@gmail.com>
* DataBuffer::memcpy minor fix
Signed-off-by: raver119 <raver119@gmail.com>
* few minor prelu/bp tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* nullify-related fixes
Signed-off-by: raver119 <raver119@gmail.com>
* PReLU fixes (#157 )
Signed-off-by: Alex Black <blacka101@gmail.com>
* Build fixed
* Fix tests
* one more ByteBuffer signature restored
Signed-off-by: raver119 <raver119@gmail.com>
* nd4j-jdbc-hsql profiles fix
Signed-off-by: raver119 <raver119@gmail.com>
* nd4j-jdbc-hsql profiles fix
Signed-off-by: raver119 <raver119@gmail.com>
* PReLU weight init fix
Signed-off-by: Alex Black <blacka101@gmail.com>
* Small PReLU fix
Signed-off-by: Alex Black <blacka101@gmail.com>
* - INDArray.migrate() reactivated
- DataBuffer::setDeviceId(...) added
- InteropDataBuffer Z syncToDevice added for views
Signed-off-by: raver119 <raver119@gmail.com>
* missed file
Signed-off-by: raver119 <raver119@gmail.com>
* Small tweak
Signed-off-by: Alex Black <blacka101@gmail.com>
* cuda 10.2
Signed-off-by: raver119 <raver119@gmail.com>
* minor fix
Signed-off-by: raver119 <raver119@gmail.com>
Co-authored-by: shugeo <sgazeos@gmail.com>
Co-authored-by: Alex Black <blacka101@gmail.com>
Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
2020-01-04 13:27:50 +03:00
raver119
451d9d57fd
shape function override ( #161 )
...
Signed-off-by: raver119 <raver119@gmail.com>
2020-01-04 09:06:44 +03:00
Alex Black
29104083cc
Various fixes ( #143 )
...
* #8568 ArrayUtil optimization
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #6171 Keras ReLU and ELU support
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Keras softmax layer import
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8549 Webjars dependency management
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix for TF import names ':0' suffix issue / NPE
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* BiasAdd: fix default data format for TF import
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Update zoo test ignores
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8509 SameDiff Listener API - provide frame + iteration
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8520 ND4J Environment
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Deconv3d
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Deconv3d fixes + gradient check
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Conv3d fixes + deconv3d DType test
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix issue with deconv3d gradinet check weight init
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8579 Fix BaseCudaDataBuffer constructor fix for UINT16
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* DataType.isNumerical() returns false for BOOL type
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8504 Reduce Spark log spam for tests
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Clean up DL4J gradient check test spam
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More Gradient check spam reduction
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* SameDiff test spam reduction
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fixes for FlatBuffers mapping
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* SameDiff log spam cleanup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Tests should extend BaseNd4jTest
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Remove debug line in c++ op
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* ND4J test spam cleanup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* DL4J test spam reduction
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More Dl4J and datavec test spam cleanup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix for bad conv3d test
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Additional test
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Embedding layers: don't inherit global default activation function
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Trigger CI
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Consolidate all BaseDL4JTest classes to single class used everywhere; make timeout configurable per class
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Test fixes and timeout increases
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Timeouts and PReLU fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Restore libnd4j build threads arg for CUDA build
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Increase timeouts on a few tests to avoid spurious failures on some CI machines
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More timeout fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More test timeout fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Tweak timeout for one more test
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Final tweaks
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* One more ignore
Signed-off-by: AlexDBlack <blacka101@gmail.com>
2020-01-04 13:45:07 +11:00
Alex Black
ce02b6fae7
Small fixes ( #140 )
...
* Allow scalar op result array auto allocation
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Don't swallow underlying exception for calculateOutputShape execution failures
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Ignore for known keras failure
Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-12-21 17:00:46 +11:00
raver119
a5f5ac72b1
reduce bool changes ( #118 )
...
* reduce bool changes
Signed-off-by: raver119 <raver119@gmail.com>
* reduce bool tweaks
Signed-off-by: raver119 <raver119@gmail.com>
2019-12-09 20:08:59 +03:00
shugeo
e09a785232
Shugeo resize fix5 ( #102 )
...
* Refactored resize images ops to use TF-like bool args as input.
* Refactored helpers for cpu implementation of resize_bilinear and resize_nearest_neighbor ops.
* Refactored cuda implementation for image.resize_bilinear and image.resize_nearest_neighbor ops helpers.
* Refactored nearest_neighbor resize op.
* Added a pair of tests for special case of resize_bilinear algorithm.
* Fixed issue with resize_bilinear op.
* Refactored cpu implementation for helpers with resize_nearest_neighbor op.
* Final fixed for resize ops to conform TF v.1.5
* Refactored cuda helpers for resize_neares_neighbor op.
* Fixed resize_bilinear to accept proper data.
* Fixed issue with non-float input for resize_bilinear op.
* Refactored cuda helper for resize_bilinear to proper process non-float inputs.
* Added tests for resize_bilinear to int inputs.
* Fixed ResizeBilinear wrapper
* Tests fixed
* Fixed float and bool constant to avoid overflow for some kind of compilers.
* Corrected float constants with float data type.
* Added f suffix for float constants.
* Corrected float constant to avoid overflow with initializing lists.
* Corrected float initializing list with float input.
* Corrected bool constant with initalizing list.
* Corrected float and bool values with initializing lists.
* Fixed wrong constant.
* Fixed issue with 1x1 input picture for resize.
* ResizeBilinear default values on import fix
Signed-off-by: raver119 <raver119@gmail.com>
2019-12-05 22:05:33 +03:00
Fariz Rahman
0d14032d26
TF Updates ( #87 )
...
* tf updates
* pom
* copyright
* graphrunner tests
* gpu test
* getSessionOptionsConfigProto
* dtype fix
* Small fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* cast graphs
* savemodel test fix
* testresource instead of local
* Logging level
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* gson dependency issue fix; fix GraphRunnerTest for no session options config case
Signed-off-by: Alex Black <blacka101@gmail.com>
* Final tweaks
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* few minor fixes
Signed-off-by: raver119 <raver119@gmail.com>
* one more fix
Signed-off-by: raver119 <raver119@gmail.com>
* Tweak configuration for GraphRunnerTest
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* nd4j align config
* tf warmup
2019-12-04 17:11:03 +11:00
Samuel Audet
5e07998e59
Add support for CUDA 10.2 ( #89 )
2019-11-29 16:31:03 +11:00
shugeo
4187190609
Shugeo release fix2 ( #70 )
...
* Corrected input checking and tests for bitcast op.
* Fixed an issue with non_max_suppression form generation and processing with score threshold given.
* Fixed bilinear resize kernel and tests.
* push for Serhii
Signed-off-by: raver119 <raver119@gmail.com>
* Added test for nearest_neighbor resize with int input.
* Added data type check for input/output match.
* Eliminate error in macros.
* Improved output message for type checking.
* Fixed input/output types for op.
* Eliminated waste logging.
* Refactored resize_bilinear helper for multithreading for cpu platform.
* Cosmetic changes only.
* Fixed error for string substitution.
* Skip test for cbow_batch with cuda.
* fix for resizeNearestNeighbor output dtype
Signed-off-by: raver119 <raver119@gmail.com>
* Refactored non_max_suppression helper.
* Refactored shape generation and input handling.
* Added additional test.
2019-11-22 22:42:44 +03:00
Samuel Audet
ff73e6da3f
ND4J: Fix OpenBLAS loading for nd4j-native ( #64 )
...
* ND4J: Fix OpenBLAS loading for nd4j-native and remove bundling of OpenMP
Signed-off-by: Samuel Audet <samuel.audet@gmail.com>
* Bundle back libgomp.so.1 for Linux
Signed-off-by: Samuel Audet <samuel.audet@gmail.com>
* Readd preload directories for ARM
Signed-off-by: Samuel Audet <samuel.audet@gmail.com>
* Add back preloads for GCC on Windows
Signed-off-by: Samuel Audet <samuel.audet@gmail.com>
* Add explicit preloadpaths for ARM and POWER to bundle correct library
Signed-off-by: Samuel Audet <samuel.audet@gmail.com>
2019-11-21 15:54:41 +03:00
raver119
064a56ccf1
Few fixes ( #66 )
...
* skip legacy transforms execution in case of empty input arrays
Signed-off-by: raver119 <raver119@gmail.com>
* - BroadcastBool ops now accept extraParams to make MatchCondition possible
- TrueBroadcastHelper now uses samediff::threads
Signed-off-by: raver119 <raver119@gmail.com>
* java side
Signed-off-by: raver119 <raver119@gmail.com>
* trigger jenkins
Signed-off-by: raver119 <raver119@gmail.com>
* update LessThanOrEqual opNum mapping
Signed-off-by: raver119 <raver119@gmail.com>
* update LessThanOrEqual opNum mapping
Signed-off-by: raver119 <raver119@gmail.com>
2019-11-21 15:43:03 +03:00
raver119
83cb0d9329
[WIP] Create and small fix ( #67 )
...
* - create op
- skip exec for empty inputs for non_max_suppression
- EmptyHandling idea
Signed-off-by: raver119 <raver119@gmail.com>
* Create op and mapping for it
Signed-off-by: raver119 <raver119@gmail.com>
2019-11-21 13:31:20 +03:00
Yurii Shyrma
66b84b38cf
Shyrma mmul ( #58 )
...
* - get rid of some copy procedures in mmulHelper ops
Signed-off-by: Yurii <iuriish@yahoo.com>
* - further work on embedding cuda api for batched gemm (cublasGemmBatchedEx) in our mmulHelper class
Signed-off-by: Yurii <iuriish@yahoo.com>
* - further work on cuda batched gamm api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - write own cuda kernel performing batched gemm
Signed-off-by: Yurii <iuriish@yahoo.com>
* missing include in MmulHelper
Signed-off-by: raver119 <raver119@gmail.com>
* - forgot to keep in code previous correct kernels for mmulNxN, since it may happen that new onw will fail for some reason in future
Signed-off-by: Yurii <iuriish@yahoo.com>
* disable old tensordot
Signed-off-by: raver119 <raver119@gmail.com>
* - rewrite cuda kernels for usualGemm and usualGemv
Signed-off-by: Yurii <iuriish@yahoo.com>
* - profiling mmul helpers
Signed-off-by: Yurii <iuriish@yahoo.com>
* - prints to check shapes were added
Signed-off-by: Yurii <iuriish@yahoo.com>
* - correct type of output array Cin mmulNxN
Signed-off-by: Yurii <iuriish@yahoo.com>
* - take into account possible nans in C array
Signed-off-by: Yurii <iuriish@yahoo.com>
* slightly change numThreads message
Signed-off-by: raver119 <raver119@gmail.com>
* - make corrections in accordance to given notes in pr review
Signed-off-by: Yurii <iuriish@yahoo.com>
2019-11-19 15:39:36 +02:00
raver119
1780dcc883
[WIP] Small fixes here and there ( #50 )
...
* one range test
Signed-off-by: raver119 <raver119@gmail.com>
* few Context convenience singatures
Signed-off-by: raver119 <raver119@gmail.com>
* one more range test
Signed-off-by: raver119 <raver119@gmail.com>
* "range" "fix"
Signed-off-by: raver119 <raver119@gmail.com>
* adjuct_contrast_v2 now allows scale factor to be provided via input_variable
Signed-off-by: raver119 <raver119@gmail.com>
* adjust_contrast now allows scale factor as variable too
Signed-off-by: raver119 <raver119@gmail.com>
* bitcast shape tests
Signed-off-by: raver119 <raver119@gmail.com>
* BitCast import dtype added
Signed-off-by: raver119 <raver119@gmail.com>
* few more BitCast signatures
Signed-off-by: raver119 <raver119@gmail.com>
2019-11-15 17:04:29 +03:00
raver119
1eb3de90d7
[WIP] Platform helpers switches ( #44 )
...
* - platform helpers can be disabled on per-op basis now via Context::allowHelpers
- java has access to it as well
Signed-off-by: raver119 <raver119@gmail.com>
* global platform-helpers trigger
Signed-off-by: raver119 <raver119@gmail.com>
* few signatures renamed
Signed-off-by: raver119 <raver119@gmail.com>
* - few new env variables to follow
- maxThreads/masterThreads differentiation
Signed-off-by: raver119 <raver119@gmail.com>
* Javadoc update
Signed-off-by: raver119 <raver119@gmail.com>
2019-11-14 14:35:02 +03:00
raver119
48df1acdfb
[WIP] ThreadPool ( #8 )
...
This PR removes OpenMP use in 95% of cases
2019-11-13 17:04:59 +03:00
Samuel Audet
73b5a508fc
Update dependencies to just released JavaCPP and JavaCV 1.5.2
...
Signed-off-by: Samuel Audet <samuel.audet@gmail.com>
2019-11-07 17:57:34 +09:00
Alex Black
d82877b18b
Various SameDiff fixes ( #21 )
...
* MKLDNN LSTM forward implementation (disabled pending #8331 )
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8318 add SameDiff.calculateGradientsAndOutputs
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Disable mkldnn backprop for now - pending fix, issue #8335
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8337 Fix CudaExecutioner unnecessary result array allocation/replacement
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small FlatBuffers serde fix, UInt8
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8135 ImagePreProcessingScaler - add segmentation support
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8319 Ensure listeners are called when they are supposed to be called
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8214 UNet (non-pretrained) last conv layer kernal size fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-11-02 11:25:53 +11:00
Alexander Stoyakin
b816845797
Fixing nd4j-cuda build ( #20 )
...
* Roll back recent fix to restore build.
* Fix compilation.
* presets updated
Signed-off-by: raver119 <raver119@gmail.com>
2019-11-01 15:59:29 +02:00
Alexander Stoyakin
45a40c8a89
DL4J/ND4J: Do pass on integer casts ( #15 )
...
* Int cast fixes.
* Revert "Int cast fixes."
This reverts commit aa36e8ca
* Int casts
* Int cast
* Int casts
* Get rid of int casts. Dropping deprecated aggregate ops.
* java scatterUpdate changes
Signed-off-by: raver119 <raver119@gmail.com>
* c++ scatterUpdate changes
Signed-off-by: raver119 <raver119@gmail.com>
* Remove aggregated ops.
* Restored test
* Tests restored.
* Minor fixes
2019-10-31 11:23:09 +02:00
Alex Black
d333d29099
SameDiff cleanup and fixes ( #12 )
...
* #8160 Remove resolvePrepertiesFromSameDiffBeforeExecution
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* SameDiff API cleanup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More SameDiff cleanup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8248 Switch SameDiff variable init from lazy to creation time for more predictable behaviour
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8252 TanhDerivative javadoc
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8225 Deconvolution2D input validation
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8265 Switch SameDiff.outputs() to user settable, instead of unreliable 'best guess'
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8224 SameDiff.zero and .one create constants, not variables
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More cleanup and fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small test fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* DL4J SameDiff fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Re-add hack for Deconvolution2DLayer until #8315 is resolved
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8270 Move CUDA device/version logging to Java; can be disabled via existing org.nd4j.log.initialization system property
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* All ND4J init logging checks system property
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small tweak
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Remove redundant device logging
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* One more fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* UX improvements
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Deconv fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Add deconv tests
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Cleanup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Remove debug code
Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-10-26 12:38:08 +11:00
Alexander Stoyakin
f31661e13b
Merge pull request #7 from KonduitAI/asto_nd4s_10172019
...
KDTree optimization
2019-10-23 12:11:25 +03:00
Robert Altena
83d958d536
Sparse matrix refactoring. ( #8238 )
...
* remove sparse method from INDArray.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* remove gemm
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* remove useage of n4j.sparseFactory
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* Nd4j.sparseFactory removed.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* sparseNDArray deleted.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* iremove more sparse calls and constants.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* remove SparseBlasWrapper.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* delete BaseSparseBlaswrapper.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* remove 3 sparse factory classes.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* delete SparseCPULevel.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* deletes JcusparseLevel, CUDASparselevel.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* delete nativeCPU sparse classes.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* removes sparse methods from NDArrayFactory.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* more deletes.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* delete (ignored) tests. BaseSparseNDArray.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* deletes ISparseNDArray.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* remove sparse methods from indArray.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* deletes sparse classes.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
2019-09-17 22:56:29 +03:00
AlexDBlack
a66e03355e
Merge remote-tracking branch 'fork/master'
2019-09-12 12:20:57 +10:00
raver119
98e2814879
Platform helpers ( #8216 )
...
* platform helpers draft
Signed-off-by: raver119 <raver119@gmail.com>
* typo
Signed-off-by: raver119 <raver119@gmail.com>
* disable platform cmake
Signed-off-by: raver119 <raver119@gmail.com>
* another draft
Signed-off-by: raver119 <raver119@gmail.com>
* mkldnn convolution refactored
Signed-off-by: raver119 <raver119@gmail.com>
* minor tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* one more safety check
Signed-off-by: raver119 <raver119@gmail.com>
* prototype works
Signed-off-by: raver119 <raver119@gmail.com>
* meh
Signed-off-by: raver119 <raver119@gmail.com>
* force static library mode for mkldnn
Signed-off-by: raver119 <raver119@gmail.com>
* - ismax fix
- experimental arg fix
- don't enforce openblas on Apple hardware
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of small fixes
Signed-off-by: raver119@gmail.com <raver119@gmail.com>
* declare concurrent
Signed-off-by: raver119@gmail.com <raver119@gmail.com>
* - MKLDNN version upgrade to 1.0.2
- avgpool2d/maxpool2d APIs update
Signed-off-by: raver119 <raver119@gmail.com>
* - avgpool2d_bp/maxpool2d_bp APIs update
Signed-off-by: raver119 <raver119@gmail.com>
* - conv2d/batchnorm APIs update
Signed-off-by: raver119 <raver119@gmail.com>
* - lrn/conv2d_bp/conv3d/conv3d_bp APIs update
Signed-off-by: raver119 <raver119@gmail.com>
* all ops converted to MKLDNN 1.x
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* namespace for platform helpers
Signed-off-by: raver119 <raver119@gmail.com>
* make sure platform helpers aren't opimized out
Signed-off-by: raver119 <raver119@gmail.com>
* build cpu_features on x86 systems
Signed-off-by: raver119 <raver119@gmail.com>
* build cpu_features on x86 systems
Signed-off-by: raver119 <raver119@gmail.com>
* more of cpu_features
Signed-off-by: raver119 <raver119@gmail.com>
* - mkldnn removed from java
- cpu_features checks in CpuNDArrayFactory
Signed-off-by: raver119 <raver119@gmail.com>
* F16C definition renamed
Signed-off-by: raver119 <raver119@gmail.com>
* some mkldnn rearrangements
Signed-off-by: raver119 <raver119@gmail.com>
* check supported instructions before doing anything
Signed-off-by: raver119 <raver119@gmail.com>
* typo
Signed-off-by: raver119 <raver119@gmail.com>
* missied impl
Signed-off-by: raver119 <raver119@gmail.com>
* BUILD_PIC option
Signed-off-by: raver119 <raver119@gmail.com>
* conv2d fix
Signed-off-by: raver119 <raver119@gmail.com>
* avgpool3d fix
Signed-off-by: raver119 <raver119@gmail.com>
* avgpool3d_bp fix
Signed-off-by: raver119 <raver119@gmail.com>
* avgpool2d_bp leak fix
Signed-off-by: raver119 <raver119@gmail.com>
* avgpool3d_bp leak fix
Signed-off-by: raver119 <raver119@gmail.com>
* maxpool bp leaks fixed
Signed-off-by: raver119 <raver119@gmail.com>
* printf removed
Signed-off-by: raver119 <raver119@gmail.com>
* batchnorm fix
Signed-off-by: raver119 <raver119@gmail.com>
* AVX warning/error polishing
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More polish
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Polish
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* remove previous MKL-DNN support layer
Signed-off-by: raver119 <raver119@gmail.com>
* avx2 tweak
Signed-off-by: raver119 <raver119@gmail.com>
* allow static for apple
Signed-off-by: raver119@gmail.com <raver119@gmail.com>
* exclude mkldnn in one more place
Signed-off-by: raver119 <raver119@gmail.com>
* exclude mkldnn in one more place
Signed-off-by: raver119 <raver119@gmail.com>
* restore OPENBLAS_PATH use
Signed-off-by: raver119 <raver119@gmail.com>
* add runtime check for avx/avx2 support
Signed-off-by: raver119 <raver119@gmail.com>
* convolution_auto
Signed-off-by: raver119 <raver119@gmail.com>
* Add logic for helper argument
* minor test fix
Signed-off-by: raver119 <raver119@gmail.com>
* few tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* few tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* skip OpTracker props for non-x86 builds
Signed-off-by: raver119 <raver119@gmail.com>
* linux arm isn't x86 :)
Signed-off-by: raver119 <raver119@gmail.com>
* avx-512
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA presets fix
Signed-off-by: raver119 <raver119@gmail.com>
* BUILD_PIC
Signed-off-by: raver119 <raver119@gmail.com>
* prefetchw for avx2
Signed-off-by: raver119 <raver119@gmail.com>
* BUILD_PIC again
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-11 21:50:28 +03:00
Robert Altena
c99f980513
INDArray javadoc ( #246 )
...
* javadoc
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* javadoc
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* javadoc
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* review fixes.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
2019-09-09 13:09:31 +10:00
AlexDBlack
a76a44e198
Merge remote-tracking branch 'fork/master'
2019-09-05 22:04:25 +10:00
Robert Altena
f25e3e71e5
remove lengthLong ( #236 )
...
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
2019-09-05 11:19:38 +10:00
AlexDBlack
b7226bdd7a
Merge
...
Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-09-05 00:54:11 +10:00
raver119
a90c7dd995
[WIP] Last set of changes ( #234 )
...
* mmul op instead of cublasSgemm
Signed-off-by: raver119 <raver119@gmail.com>
* transB
Signed-off-by: raver119 <raver119@gmail.com>
* jcpp handles
Signed-off-by: raver119 <raver119@gmail.com>
* bitwise and/or/xor
Signed-off-by: raver119 <raver119@gmail.com>
* bitwise and/or/xor mapping
Signed-off-by: raver119 <raver119@gmail.com>
* cuda/cublas version check
Signed-off-by: raver119 <raver119@gmail.com>
* add expected version
Signed-off-by: raver119 <raver119@gmail.com>
* cuda/cublas version check in java
Signed-off-by: raver119 <raver119@gmail.com>
* one more error check
Signed-off-by: raver119 <raver119@gmail.com>
* build fix
Signed-off-by: raver119 <raver119@gmail.com>
* build fix
Signed-off-by: raver119 <raver119@gmail.com>
* build fix
Signed-off-by: raver119 <raver119@gmail.com>
* one more fix
Signed-off-by: raver119 <raver119@gmail.com>
* skip CUDA version check for now
Signed-off-by: raver119 <raver119@gmail.com>
* better wording
Signed-off-by: raver119 <raver119@gmail.com>
* few more tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* few more tweaks
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-04 14:41:08 +03:00
raver119
7abc574eeb
Snapshot update ( #8194 )
...
* fix double consumption of rng on cpu
Signed-off-by: raver119 <raver119@gmail.com>
* Shyrma docs (#222 )
* - documenting and profiling matrix_set_diag cuda kernel
Signed-off-by: Yurii <yurii@skymind.io>
* - correct formula of pnorm pooling in cuda 2d/3d kernels
- remove helper matrix_diag which duplicates work of helper matrix_set_diag
Signed-off-by: Yurii <yurii@skymind.io>
* cublasHandle sharing + lock
Signed-off-by: raver119 <raver119@gmail.com>
* cublasHandle sharing + lock
Signed-off-by: raver119 <raver119@gmail.com>
* Documentation from serialization/deserialization in NLP (#221 )
* refactoring
Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
* Javadocs
Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
* Javadoc fixed
Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
* Cleanup
Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
* dedicated lock for getCudaCublasHandle
Signed-off-by: raver119 <raver119@gmail.com>
* Small fixes (#223 )
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* ELU DL4J fixes (#224 )
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* javadoc (#225 )
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* Small test compilation fix (#226 )
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8182 remove spark version suffix (#227 )
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* [WIP] Thread safety (#229 )
* sync after cublas*gemm
Signed-off-by: raver119 <raver119@gmail.com>
* mutex for CublasHelper
Signed-off-by: raver119 <raver119@gmail.com>
* don't store cublasHandle in LaunchContext, it's per-device anyway
Signed-off-by: raver119 <raver119@gmail.com>
* some printout
Signed-off-by: raver119 <raver119@gmail.com>
* check for field instead
Signed-off-by: raver119 <raver119@gmail.com>
* pew-pew
Signed-off-by: raver119 <raver119@gmail.com>
* don't release ContextBuffers until device changed
Signed-off-by: raver119 <raver119@gmail.com>
* small tweak
Signed-off-by: raver119 <raver119@gmail.com>
* some logging in sgemm
Signed-off-by: raver119 <raver119@gmail.com>
* stream sync
Signed-off-by: raver119 <raver119@gmail.com>
* some more logging
Signed-off-by: raver119 <raver119@gmail.com>
* some more error checks
Signed-off-by: raver119 <raver119@gmail.com>
* one fancy test
Signed-off-by: raver119 <raver119@gmail.com>
* one fancy test
Signed-off-by: raver119 <raver119@gmail.com>
* minor AffinityManager fix
Signed-off-by: raver119 <raver119@gmail.com>
* cudaEvent error logging improvement
Signed-off-by: raver119 <raver119@gmail.com>
* ConstantHelper thread safety
Signed-off-by: raver119 <raver119@gmail.com>
* - minor corrections in ConstantTadHelper
Signed-off-by: Yurii <yurii@skymind.io>
* ConstantShapeHelper thread safety
Signed-off-by: raver119 <raver119@gmail.com>
* ConstantTadHelper.cu updated
Signed-off-by: raver119 <raver119@gmail.com>
* logging off
Signed-off-by: raver119 <raver119@gmail.com>
* logging off
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-03 22:02:02 +03:00
raver119
dddc8a1143
[WIP] Thread safety ( #229 )
...
* sync after cublas*gemm
Signed-off-by: raver119 <raver119@gmail.com>
* mutex for CublasHelper
Signed-off-by: raver119 <raver119@gmail.com>
* don't store cublasHandle in LaunchContext, it's per-device anyway
Signed-off-by: raver119 <raver119@gmail.com>
* some printout
Signed-off-by: raver119 <raver119@gmail.com>
* check for field instead
Signed-off-by: raver119 <raver119@gmail.com>
* pew-pew
Signed-off-by: raver119 <raver119@gmail.com>
* don't release ContextBuffers until device changed
Signed-off-by: raver119 <raver119@gmail.com>
* small tweak
Signed-off-by: raver119 <raver119@gmail.com>
* some logging in sgemm
Signed-off-by: raver119 <raver119@gmail.com>
* stream sync
Signed-off-by: raver119 <raver119@gmail.com>
* some more logging
Signed-off-by: raver119 <raver119@gmail.com>
* some more error checks
Signed-off-by: raver119 <raver119@gmail.com>
* one fancy test
Signed-off-by: raver119 <raver119@gmail.com>
* one fancy test
Signed-off-by: raver119 <raver119@gmail.com>
* minor AffinityManager fix
Signed-off-by: raver119 <raver119@gmail.com>
* cudaEvent error logging improvement
Signed-off-by: raver119 <raver119@gmail.com>
* ConstantHelper thread safety
Signed-off-by: raver119 <raver119@gmail.com>
* - minor corrections in ConstantTadHelper
Signed-off-by: Yurii <yurii@skymind.io>
* ConstantShapeHelper thread safety
Signed-off-by: raver119 <raver119@gmail.com>
* ConstantTadHelper.cu updated
Signed-off-by: raver119 <raver119@gmail.com>
* logging off
Signed-off-by: raver119 <raver119@gmail.com>
* logging off
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-03 22:00:38 +03:00
raver119
d3253aff3f
dedicated lock for getCudaCublasHandle
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-02 20:01:13 +03:00