raver119
9e3c1b02b1
Perf improvements ( #242 )
...
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* meh
Signed-off-by: raver119 <raver119@gmail.com>
* better ExpandDims impl
Signed-off-by: raver119 <raver119@gmail.com>
* better Squeeze impl
Signed-off-by: raver119 <raver119@gmail.com>
* better Softmax impl
Signed-off-by: raver119 <raver119@gmail.com>
* one test disabled
Signed-off-by: raver119 <raver119@gmail.com>
* more accurate impl
Signed-off-by: raver119 <raver119@gmail.com>
* - GraphProfiler now prints full shapeInfo instead of shape
- softmax typo fix
Signed-off-by: raver119 <raver119@gmail.com>
2020-02-14 16:20:31 +03:00
Yurii Shyrma
fe47f52896
Oleh tenzor mmul ( #231 )
...
* Libnd4j: TensorMMul backprop op #8174 , raw implementation
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: TensorMMul backprop op #8174 merge master and some corrections
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: TensorMMul backprop op #8174 algorithm update, need testing, sync with master
* Libnd4j: TensorMMul backprop op #8174 fixed incorrect B axes calculation
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: TensorMMul backprop op #8174 optimize axes identification and fix bug of indeces overlapping, added first test. need testing with different shapes
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: TensorMMul backprop op #8174 some fixes and improvements need more testing
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: TensorMMul backprop op #8174 fixed order of matrix multiply
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: TensorMMul backprop op #8174 fixed issue of incorrect axes definition, add tests based on TF, need additional testing for case dLdC not equal 1
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: TensorMMul backprop op #8174 fixed scalar case add test
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: TensorMMul backprop op #8174 fixed bp algorithm, axes definition, need some mode testing with different orders combination f,c; c,f f,f and add some checks for inputs
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: TensorMMul backprop op #8174 some checks and corrections added tests, exists the problem with different input orders support A-f B-c and A-f B-f
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: TensorMMul backprop op #8174 sync master
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* - correct bug in MmulHelper::tensorDot(a, b, c, axes_a, axes_b,permutForC)
Signed-off-by: Yurii <iuriish@yahoo.com>
* Libnd4j: TensorMMul backprop op #8174 code clean up and refactoring
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* - add check for linspase ordered permutations in ShapeUtils::evalShapeForTensorDot
Signed-off-by: Yurii <iuriish@yahoo.com>
* - provide additional code in shape::reshape stuff in order to reduce amount of allocation/copy operations during reshaping procedure
Signed-off-by: Yurii <iuriish@yahoo.com>
* - further work on problem of wrong shape evaluation during permute/reshape procedures
Signed-off-by: Yurii <iuriish@yahoo.com>
* - still looking for bug reason in reshape/permute stuff
Signed-off-by: Yurii <iuriish@yahoo.com>
* - correct bug in transform cuda native ops
Signed-off-by: Yurii <iuriish@yahoo.com>
* - correct bug in NDArray::assign
Signed-off-by: Yurii <iuriish@yahoo.com>
* - remove old shape::reshape stuff
Signed-off-by: Yurii <iuriish@yahoo.com>
* - add possibility to disable copy of old buffer to new buffer during reshape operation in NDArray class
Signed-off-by: Yurii <iuriish@yahoo.com>
* - correct bug in tensorDot which had to do with wrong pointers assigments
Signed-off-by: Yurii <iuriish@yahoo.com>
Co-authored-by: Oleh <oleg.semeniv@gmail.com>
2020-02-13 20:33:54 +03:00
raver119
237c137166
few more smaller compilation units ( #226 )
...
Signed-off-by: raver119 <raver119@gmail.com>
2020-02-10 10:57:18 +03:00
raver119
8a0d5e3b97
Compilation units ( #224 )
...
* - TrueBroadcastHelper split into multiple compilation units
- legacy gemm.cpp disabled
Signed-off-by: raver119 <raver119@gmail.com>
* - IndexReduce int32/int64 split into multiple compilation units
Signed-off-by: raver119 <raver119@gmail.com>
* - Reduce3 ops split into multiple compilation units
Signed-off-by: raver119 <raver119@gmail.com>
2020-02-09 19:48:32 +03:00
Abdelrauf
bead656feb
Initial performance improvement for Bias Add and etc #8556 ( #217 )
...
* Initial performance improvement for Bias Add, loop coords helpers and increment aligned parallel threading
Signed-off-by: AbdelRauf <rauf@konduit.ai>
* One more test for Rauf
Signed-off-by: raver119 <raver119@gmail.com>
* disable couple of perf tests
Signed-off-by: raver119 <raver119@gmail.com>
Co-authored-by: raver119 <raver119@gmail.com>
2020-02-08 15:31:30 +03:00
raver119
a0da5a9e47
Events removed from Java ( #219 )
...
* replace mutex with lock_guards
Signed-off-by: raver119 <raver119@gmail.com>
* Events ditched from Java CUDA logic
Signed-off-by: raver119 <raver119@gmail.com>
2020-02-07 12:34:55 +03:00
Oleh
d52e67209e
Oleh convert ( #200 )
...
* StringUtils for utf convertor raw implementation of all possible combinations, need to be add counter of bytes per symbol for any type and add api to call convertors and store data
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* StringUtils for utf convertor more corrections to support convertors
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* StringUtils for utf convertor some corrections and bug fixes, need review to discuss how to add multi-threading
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* StringUtils for utf convertor #8613 some corrections to move to multi-threading, add one test need discussion data inputs/outputs array presentation, need discussion the way of multi-threading
* StringUtils for utf convertor #8613 tests added some corrections to optimize build
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* StringUtils for utf convertor #8613 some corrections and code clean up
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* StringUtils for utf convertor #8613 code clean up and optimize usage, need update ndarray factory before replace std usage
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* StringUtils for utf convertor #8613 some staff to integrate converters into NDArrayFactory, update tests and add some functionality
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* StringUtils for utf convertor #8613 minor corrections and bug fix before discussion
* StringUtils for utf convertor #8613 some fixes and tets
* StringUtils for utf convertor #8613 some more staff to support different unicode
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* StringUtils for utf convertor #8613 fix linking bug
* StringUtils for utf convertor #8613 corrected several tests as defaults for string ndarray changed
* StringUtils for utf convertor #8613 replace some incorrect implementation, revert some test changes, need sync before testing
* StringUtils for utf convertor #8613 fixed several thing that were badly implemented yesterday, need optimization, testing (before testing have to be add support of u32 and u16 buffer visualization)
* StringUtils for utf convertor #8613 fixed to support u16 and u32, and convertor in ndarray, fix buffer print, etc
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* StringUtils for utf convertor #8613 merge master and sync with server
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* StringUtils for utf convertor #8613 some correction for string cast, need print check only asci support
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* StringUtils for utf convertor #8613 merge master, remove copies and add cast, need test, refactoring according review and clean up
* StringUtils for utf convertor #8613 fixed cast and copy issues
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* StringUtils for utf convertor #8613 fixed cuda and update tests
* StringUtils for utf convertor #8613 integration into NdArray, fix several tests for build pass, refactoring, etc
* - avoid ambiguity of NDArray ctrs overloading in some tests
Signed-off-by: Yurii <iuriish@yahoo.com>
* StringUtils for utf convertor #8613 NDArray string constructors added, updated NDArrayFactory, refactoring unicode and tests, etc
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* StringUtils for utf convertor #8613 fixed cuda build and test, refactoring and void* added to some functions
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* StringUtils for utf convertor #8613 void* integration, removed copy operation, refactoring, added tests for NDArray string constructors, etc
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* StringUtils for utf convertor #8613 several more fixes, improvements and updates
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* StringUtils for utf convertor #8613 master merge, code clean up and optimization before review
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* StringUtils for utf convertor #8613 minor fixes string element size define
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* StringUtils for utf convertor #8613 revert last changes as mistake
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* StringUtils for utf convertor #8613 fixed NDArray constructor build problem, remove order from string factory, fixed order use for factory via project, added catch of incorrect sync in cast of arrays to data types, fixed e method for strings, etc
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* StringUtils for utf convertor #8613 added javacpp hack, added multi-threading, minor corrections in license agreement
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* StringUtils for utf convertor #8613 windows builds fix, as "sting" is not treated as utf8
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
Co-authored-by: Yurii Shyrma <iuriish@yahoo.com>
2020-01-31 16:30:49 +03:00
raver119
ba961c7601
DataTypes & FlatBuffers ( #197 )
...
* flatbuffers version upgrade
Signed-off-by: raver119 <raver119@gmail.com>
* flatbuffers version upgrade java side
Signed-off-by: raver119 <raver119@gmail.com>
* flatbuffers dependency version upgrade java side
Signed-off-by: raver119 <raver119@gmail.com>
* MKLDNN version upgrade
Signed-off-by: raver119 <raver119@gmail.com>
* DArgs first pass
Signed-off-by: raver119 <raver119@gmail.com>
* signatures first pass
Signed-off-by: raver119 <raver119@gmail.com>
* signatures second pass
Signed-off-by: raver119 <raver119@gmail.com>
* signatures third pass
Signed-off-by: raver119 <raver119@gmail.com>
* signatures third pass
Signed-off-by: raver119 <raver119@gmail.com>
* signatures fourth pass
Signed-off-by: raver119 <raver119@gmail.com>
* signatures fifth pass
Signed-off-by: raver119 <raver119@gmail.com>
* flatbuffers UI version upgrade java side
Signed-off-by: raver119 <raver119@gmail.com>
* flatbuffers ui update
Signed-off-by: raver119 <raver119@gmail.com>
* flatbuffers downgrade
Signed-off-by: raver119 <raver119@gmail.com>
* flatbuffers downgrade java side
Signed-off-by: raver119 <raver119@gmail.com>
2020-01-30 10:07:24 +03:00
raver119
7783012f39
cuDNN integration ( #150 )
...
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* one file
Signed-off-by: raver119 <raver119@gmail.com>
* few more includes
Signed-off-by: raver119 <raver119@gmail.com>
* m?
Signed-off-by: raver119 <raver119@gmail.com>
* const
Signed-off-by: raver119 <raver119@gmail.com>
* cudnn linkage in tests
Signed-off-by: raver119 <raver119@gmail.com>
* culibos
Signed-off-by: raver119 <raver119@gmail.com>
* static reminder
Signed-off-by: raver119 <raver119@gmail.com>
* platform engine tag
Signed-off-by: raver119 <raver119@gmail.com>
* HAVE_CUDNN moved to config.h.in
Signed-off-by: raver119 <raver119@gmail.com>
* include
Signed-off-by: raver119 <raver119@gmail.com>
* include
Signed-off-by: raver119 <raver119@gmail.com>
* skip cudnn handle creation if there's not cudnn
Signed-off-by: raver119 <raver119@gmail.com>
* meh
Signed-off-by: raver119 <raver119@gmail.com>
* target device in context
Signed-off-by: raver119 <raver119@gmail.com>
* platform engines
Signed-off-by: raver119 <raver119@gmail.com>
* platform engines
Signed-off-by: raver119 <raver119@gmail.com>
* allow multiple -h args
Signed-off-by: raver119 <raver119@gmail.com>
* allow multiple -h args
Signed-off-by: raver119 <raver119@gmail.com>
* move mkldnn out of CPU block
Signed-off-by: raver119 <raver119@gmail.com>
* link to mkldnn on cuda
Signed-off-by: raver119 <raver119@gmail.com>
* less prints
Signed-off-by: raver119 <raver119@gmail.com>
* minor tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* next step
Signed-off-by: raver119 <raver119@gmail.com>
* conv2d NCHW draft
Signed-off-by: raver119 <raver119@gmail.com>
* conv2d biasAdd
Signed-off-by: raver119 <raver119@gmail.com>
* test for MKL/CUDNN combined use
Signed-off-by: raver119 <raver119@gmail.com>
* - provide additional code for conv2d ff based on cudnn api, not tested yet
Signed-off-by: Yurii <iuriish@yahoo.com>
* - further work on conv2d helper based on using cudnn api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - fixing several cuda bugs which appeared after cudnn lib had been started to use
Signed-off-by: Yurii <iuriish@yahoo.com>
* - implementation of conv2d backprop op based on cudnn api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - implementaion of conv3d and conv3d_bp ops based on cudnn api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - bugs fixing in conv3d/conv3d_bp ops (cudnn in use)
Signed-off-by: Yurii <iuriish@yahoo.com>
* - implementation of depthwiseConv2d (ff/bp) op based on cudnn api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - implementation of batchnorm ff op based on cudnn api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - disable cudnn batchnorm temporary
Signed-off-by: Yurii <iuriish@yahoo.com>
* - add minor change in cmake
Signed-off-by: Yurii <iuriish@yahoo.com>
* engine for depthwise mkldnn
Signed-off-by: raver119 <raver119@gmail.com>
* couple of includes
Signed-off-by: raver119 <raver119@gmail.com>
* - provide permutation to cudnn batchnorm ff when format is NHWC
Signed-off-by: Yurii <iuriish@yahoo.com>
* lgamma fix
Signed-off-by: raver119 <raver119@gmail.com>
* - eliminate memory leak in two tests
Signed-off-by: Yurii <iuriish@yahoo.com>
Co-authored-by: Yurii Shyrma <iuriish@yahoo.com>
2020-01-20 21:32:46 +03:00
raver119
29e8e09db6
String changes ( #3 )
...
* initial commit
* additional data types & tensor type
Signed-off-by: raver119 <raver119@gmail.com>
* next step
Signed-off-by: raver119 <raver119@gmail.com>
* missing include
* sparse_to_dense
Signed-off-by: raver119 <raver119@gmail.com>
* few more tests files
Signed-off-by: raver119 <raver119@gmail.com>
* draft
Signed-off-by: raver119 <raver119@gmail.com>
* numeric sparse_to_dense
Signed-off-by: raver119 <raver119@gmail.com>
* comment
Signed-off-by: raver119 <raver119@gmail.com>
* string sparse_to_dense version
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA DataBuffer expand
Signed-off-by: raver119 <raver119@gmail.com>
* few tweaks for CUDA build
Signed-off-by: raver119 <raver119@gmail.com>
* shape fn for string_split
Signed-off-by: raver119 <raver119@gmail.com>
* one more comment
Signed-off-by: raver119 <raver119@gmail.com>
* string_split indices
Signed-off-by: raver119 <raver119@gmail.com>
* next step
Signed-off-by: raver119 <raver119@gmail.com>
* test passes
Signed-off-by: raver119 <raver119@gmail.com>
* few rearrangements for databuffer implementations
Signed-off-by: raver119 <raver119@gmail.com>
* DataBuffer: move inline methods to common implementations
Signed-off-by: raver119 <raver119@gmail.com>
* add native DataBuffer to Nd4j presets
Signed-off-by: raver119 <raver119@gmail.com>
* DataBuffer creation
Signed-off-by: raver119 <raver119@gmail.com>
* use DataBuffer for allocation
Signed-off-by: raver119 <raver119@gmail.com>
* cpu databuffer as deallocatable
Signed-off-by: raver119 <raver119@gmail.com>
* DataBuffer setters for bufers
Signed-off-by: raver119 <raver119@gmail.com>
* couple of wrappers
Signed-off-by: raver119 <raver119@gmail.com>
* DataBuffers being passed around
Signed-off-by: raver119 <raver119@gmail.com>
* Bunch of ByteBuffer-related signatures gone
Signed-off-by: raver119 <raver119@gmail.com>
* - few more Nd4j signatures removed
- minor fix for bfloat16
Signed-off-by: raver119 <raver119@gmail.com>
* nullptr pointer is still a pointer, but 0 as address :)
Signed-off-by: raver119 <raver119@gmail.com>
* one special test
Signed-off-by: raver119 <raver119@gmail.com>
* empty string array init
Signed-off-by: raver119 <raver119@gmail.com>
* one more test in cpp
Signed-off-by: raver119 <raver119@gmail.com>
* memcpy instead of databuffer swap
Signed-off-by: raver119 <raver119@gmail.com>
* special InteropDataBuffer for front-end languages
Signed-off-by: raver119 <raver119@gmail.com>
* few tweaks for java
Signed-off-by: raver119 <raver119@gmail.com>
* pointer/indexer actualization
Signed-off-by: raver119 <raver119@gmail.com>
* CustomOp returns list for inputArumgents and outputArguments instead of array
Signed-off-by: raver119 <raver119@gmail.com>
* redundant call
Signed-off-by: raver119 <raver119@gmail.com>
* print_variable op
Signed-off-by: raver119 <raver119@gmail.com>
* - view handling (but wrong one)
- print_variable java wrapper
Signed-off-by: raver119 <raver119@gmail.com>
* one more test
Signed-off-by: raver119 <raver119@gmail.com>
* - empty arrays handling
Signed-off-by: raver119 <raver119@gmail.com>
* - deserialization works now
Signed-off-by: raver119 <raver119@gmail.com>
* minor fix
Signed-off-by: raver119 <raver119@gmail.com>
* meh
Signed-off-by: raver119 <raver119@gmail.com>
* one more fix
Signed-off-by: raver119 <raver119@gmail.com>
* initial cuda commit
Signed-off-by: raver119 <raver119@gmail.com>
* print_variable message validation
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA views
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA special buffer size
Signed-off-by: raver119 <raver119@gmail.com>
* minor update to match master changes
Signed-off-by: raver119 <raver119@gmail.com>
* - consider arrays always actual on device for CUDA
- additional PrintVariable constructor
- CudaUtf8Buffer now allocates host buffer by default
Signed-off-by: raver119 <raver119@gmail.com>
* meh
Signed-off-by: raver119 <raver119@gmail.com>
* - print_variable now allows print from device
Signed-off-by: raver119 <raver119@gmail.com>
* InteropDataBuffer data type fix
Signed-off-by: raver119 <raver119@gmail.com>
* ...
Signed-off-by: raver119 <raver119@gmail.com>
* disable some debug messages
Signed-off-by: raver119 <raver119@gmail.com>
* master pulled in
Signed-off-by: raver119 <raver119@gmail.com>
* couple of new methods for DataBuffer interop
Signed-off-by: raver119 <raver119@gmail.com>
* java side
Signed-off-by: raver119 <raver119@gmail.com>
* offsetted constructor
Signed-off-by: raver119 <raver119@gmail.com>
* new CUDA deallocator
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA backend torn apart
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA backend torn apart 2
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA backend torn apart 3
Signed-off-by: raver119 <raver119@gmail.com>
* - few new tests
- few new methods for DataBuffer management
Signed-off-by: raver119 <raver119@gmail.com>
* few more tests + few more tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* two failing tests
Signed-off-by: raver119 <raver119@gmail.com>
* one more test
Signed-off-by: raver119 <raver119@gmail.com>
* two failing tests pass
Signed-off-by: raver119 <raver119@gmail.com>
* now we pass DataBuffer to legacy ops too
Signed-off-by: raver119 <raver119@gmail.com>
* Native DataBuffer for legacy ops, Java side
Signed-off-by: raver119 <raver119@gmail.com>
* CPU java side update
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA java side update
Signed-off-by: raver119 <raver119@gmail.com>
* no more prepare/register action on java side
Signed-off-by: raver119 <raver119@gmail.com>
* NDArray::prepare/register use now accepts vectors
Signed-off-by: raver119 <raver119@gmail.com>
* InteropDataBuffer now has few more convenience methods
Signed-off-by: raver119 <raver119@gmail.com>
* java bindings update
Signed-off-by: raver119 <raver119@gmail.com>
* tick device in NativeOps
Signed-off-by: raver119 <raver119@gmail.com>
* Corrected usage of OpaqueBuffer for tests.
* Corrected usage of OpaqueBuffer for java tests.
* NativeOpsTests fixes.
* print_variable now returns scalar
Signed-off-by: raver119 <raver119@gmail.com>
* one more test
Signed-off-by: raver119 <raver119@gmail.com>
* compat_string_split fix for CUDA
Signed-off-by: raver119 <raver119@gmail.com>
* - CUDA execScalar fix
- CUDA lazyAllocateHostPointer now checks java indexer/pointer instead of native pointer
Signed-off-by: raver119 <raver119@gmail.com>
* legacy ops DataBuffer migration prototype
Signed-off-by: raver119 <raver119@gmail.com>
* ignore device shapeinfo coming from java
Signed-off-by: raver119 <raver119@gmail.com>
* minor fix
Signed-off-by: raver119 <raver119@gmail.com>
* minor transformAny fix
Signed-off-by: raver119 <raver119@gmail.com>
* minor tweak for lazy host allocation
Signed-off-by: raver119 <raver119@gmail.com>
* - DataBuffer::memcpy method
- bitcast now uses memcpy
Signed-off-by: raver119 <raver119@gmail.com>
* - IndexReduce CUDA dimension buffer fix
Signed-off-by: raver119 <raver119@gmail.com>
* views for CPU and CUDA
Signed-off-by: raver119 <raver119@gmail.com>
* less spam
Signed-off-by: raver119 <raver119@gmail.com>
* optional memory init
Signed-off-by: raver119 <raver119@gmail.com>
* async memset
Signed-off-by: raver119 <raver119@gmail.com>
* - SummaryStats CUDA fix
- DataBuffer.sameUnderlyingData() impl
- execBroadcast fix
Signed-off-by: raver119 <raver119@gmail.com>
* - reduce3All fix
switch to CUDA 10 temporarily
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA version
Signed-off-by: raver119 <raver119@gmail.com>
* proper memory deallocator registration
Signed-off-by: raver119 <raver119@gmail.com>
* HOST_ONLY workspace allocation
Signed-off-by: raver119 <raver119@gmail.com>
* temp commit
Signed-off-by: raver119 <raver119@gmail.com>
* few conflicts resolved
Signed-off-by: raver119 <raver119@gmail.com>
* few minor fixes
Signed-off-by: raver119 <raver119@gmail.com>
* one more minor fix
Signed-off-by: raver119 <raver119@gmail.com>
* NDArray permute should operate on JVM primitives
Signed-off-by: raver119 <raver119@gmail.com>
* - create InteropDataBuffer for shapes as well
- update pointers after view creation in Java
Signed-off-by: raver119 <raver119@gmail.com>
* - addressPointer temporary moved to C++
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA: don't account offset twice
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA: DataBuffer pointer constructor updated
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA NDArray.unsafeDuplication() simplified
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA minor workspace-related fixes
Signed-off-by: raver119 <raver119@gmail.com>
* CPU DataBuffer.reallocate()
Signed-off-by: raver119 <raver119@gmail.com>
* print_affinity op
Signed-off-by: raver119 <raver119@gmail.com>
* print_affinity java side
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA more tweaks for data locality
Signed-off-by: raver119 <raver119@gmail.com>
* - compat_string_split tweak
- CudaUtf8Buffer update
Signed-off-by: raver119 <raver119@gmail.com>
* INDArray.close() mechanic restored
Signed-off-by: raver119 <raver119@gmail.com>
* one more test fixed
Signed-off-by: raver119 <raver119@gmail.com>
* - CUDA DataBuffer.reallocate() updated
- cudaMemcpy (synchronous) restored
Signed-off-by: raver119 <raver119@gmail.com>
* one last fix
Signed-off-by: raver119 <raver119@gmail.com>
* bad import removed
Signed-off-by: raver119 <raver119@gmail.com>
* another small fix
Signed-off-by: raver119 <raver119@gmail.com>
* one special test
Signed-off-by: raver119 <raver119@gmail.com>
* fix bad databuffer size
Signed-off-by: raver119 <raver119@gmail.com>
* release primaryBuffer on replace
Signed-off-by: raver119 <raver119@gmail.com>
* higher timeout
Signed-off-by: raver119 <raver119@gmail.com>
* disable timeouts
Signed-off-by: raver119 <raver119@gmail.com>
* dbCreateView now validates offset and length of a view
Signed-off-by: raver119 <raver119@gmail.com>
* additional validation for dbExpand
Signed-off-by: raver119 <raver119@gmail.com>
* restore timeout back again
Signed-off-by: raver119 <raver119@gmail.com>
* smaller distribution for rng test to prevent timeouts
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA DataBuffer::memcpy now copies to device all the time
Signed-off-by: raver119 <raver119@gmail.com>
* OpaqueDataBuffer now contains all required methods for interop
Signed-off-by: raver119 <raver119@gmail.com>
* some javadoc
Signed-off-by: raver119 <raver119@gmail.com>
* GC on failed allocations
Signed-off-by: raver119 <raver119@gmail.com>
* minoe memcpu tweak
Signed-off-by: raver119 <raver119@gmail.com>
* one more bitcast test
Signed-off-by: raver119 <raver119@gmail.com>
* - NDArray::deviceId() propagation
- special multi-threaded test for data locality checks
Signed-off-by: raver119 <raver119@gmail.com>
* DataBuffer additional syncStream
Signed-off-by: raver119 <raver119@gmail.com>
* DataBuffer additional syncStream
Signed-off-by: raver119 <raver119@gmail.com>
* one ignored test
Signed-off-by: raver119 <raver119@gmail.com>
* skip host alloc for empty arrays
Signed-off-by: raver119 <raver119@gmail.com>
* ByteBuffer support is back
Signed-off-by: raver119 <raver119@gmail.com>
* DataBuffer::memcpy minor fix
Signed-off-by: raver119 <raver119@gmail.com>
* few minor prelu/bp tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* nullify-related fixes
Signed-off-by: raver119 <raver119@gmail.com>
* PReLU fixes (#157 )
Signed-off-by: Alex Black <blacka101@gmail.com>
* Build fixed
* Fix tests
* one more ByteBuffer signature restored
Signed-off-by: raver119 <raver119@gmail.com>
* nd4j-jdbc-hsql profiles fix
Signed-off-by: raver119 <raver119@gmail.com>
* nd4j-jdbc-hsql profiles fix
Signed-off-by: raver119 <raver119@gmail.com>
* PReLU weight init fix
Signed-off-by: Alex Black <blacka101@gmail.com>
* Small PReLU fix
Signed-off-by: Alex Black <blacka101@gmail.com>
* - INDArray.migrate() reactivated
- DataBuffer::setDeviceId(...) added
- InteropDataBuffer Z syncToDevice added for views
Signed-off-by: raver119 <raver119@gmail.com>
* missed file
Signed-off-by: raver119 <raver119@gmail.com>
* Small tweak
Signed-off-by: Alex Black <blacka101@gmail.com>
* cuda 10.2
Signed-off-by: raver119 <raver119@gmail.com>
* minor fix
Signed-off-by: raver119 <raver119@gmail.com>
Co-authored-by: shugeo <sgazeos@gmail.com>
Co-authored-by: Alex Black <blacka101@gmail.com>
Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
2020-01-04 13:27:50 +03:00
Alexander Stoyakin
010744ef9c
Lu wrapper and tests fixes ( #144 )
...
* Tests fixed
* Lu added
* Test fixed
* Default timeout
* Tests timeouts fixed.
* TF import fix
* Timeouts added
* Timeout fixed.
* Test corrected
* rgb and yiq conversion ops added
* Converter ops added
* Header
* Yuv converters
* API added
* Empty test for matmul
* Explanation
* skip gemm/gemv on empty inputs
Signed-off-by: raver119 <raver119@gmail.com>
* Test added
* Correct test
* one more empty pass-through for mmul
Signed-off-by: raver119 <raver119@gmail.com>
* Cleanup
* Test added
* Test fixed
* Added missing mapping
* Added missing mapping
Co-authored-by: raver119 <raver119@gmail.com>
2019-12-30 15:06:12 +03:00
Yurii Shyrma
5d9b2a16e5
Shyrma temp ( #131 )
...
* - specifying template instantiation for certain types in float16 and bloat16
Signed-off-by: Yurii <iuriish@yahoo.com>
* - polishing bfloat16 and float16 member functions template specialization
Signed-off-by: Yurii <iuriish@yahoo.com>
* - rewrite and overload array +-*/ scalar and scalar +-*/ arr in NDAray class
Signed-off-by: Yurii <iuriish@yahoo.com>
* - make corrections which have to do with and rvalue lvalue conversions
Signed-off-by: Yurii <iuriish@yahoo.com>
* - provide move semantic in NDArray operators array +-/* array
Signed-off-by: Yurii <iuriish@yahoo.com>
* float16/bfloat16 tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* one more tweak
Signed-off-by: raver119 <raver119@gmail.com>
* - make float16 and bfloat16 to compile successfully on cuda
Signed-off-by: Yurii <iuriish@yahoo.com>
* - do not use resources of view-like arrays when move semantics is applied
Signed-off-by: Yurii <iuriish@yahoo.com>
* - get rid of pointers in signatures NDArray methods 1
Signed-off-by: Yurii <iuriish@yahoo.com>
* - correction of signature of NDArray::dup method
Signed-off-by: Yurii <iuriish@yahoo.com>
* - correction of signature of NDArray::reduceAlongDimension method
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::applyIndexReduce and applyTrueBroadcast methods
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::applyReduce3 and varianceAlongDimension methods
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::tensorsAlongDimension and diagonal methods
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::allTensorsAlongDimension
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::reduceAlongDimension 2
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::applyTransform 2
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::applyPairwiseTransform 2
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::applyBroadcast 2
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::applyTrueBroadcast 2
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::applyScalar and applyScalarArr
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::lambda methods
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::reduce3 methods 2
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of following NDArray methods: add/sub/mul/div row/column and fillAsTriangular
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::tileToShape methods
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::isShapeSameStrict method
Signed-off-by: Yurii <iuriish@yahoo.com>
* minor corrections in tests
Signed-off-by: Yurii <iuriish@yahoo.com>
* - replace reduce op in batchnorm mkldnn
Signed-off-by: Yurii <iuriish@yahoo.com>
* - add explicit templates instantiations for operator+(NDArray&&. const scalar)
Signed-off-by: Yurii <iuriish@yahoo.com>
* - corrections of casts in float16/bfloat16
Signed-off-by: Yurii <iuriish@yahoo.com>
* - provide move semantics in following NDArray methods: transform, applyTrueBroadcast, transpose, reshape, permute
Signed-off-by: Yurii <iuriish@yahoo.com>
* - get rid of input array A duplicate in svd cuda op
Signed-off-by: Yurii <iuriish@yahoo.com>
* - avoid available bug in svd cuda API
Signed-off-by: Yurii <iuriish@yahoo.com>
* - add temporary global memory buffer in svd cuda when calcUV = false and m != n
Signed-off-by: Yurii <iuriish@yahoo.com>
* - remove test with blfoat16 type for betainC
Signed-off-by: Yurii <iuriish@yahoo.com>
* - resolve conflicts after master has been merged in
Signed-off-by: Yurii <iuriish@yahoo.com>
* - changed type of affected input array in fused_batch_norm
Signed-off-by: Yurii <iuriish@yahoo.com>
* - add several explicit type castings
Signed-off-by: Yurii <iuriish@yahoo.com>
* - add ND4J_EXPORT to operators
Signed-off-by: Yurii <iuriish@yahoo.com>
* - add explicit template types in instantiations of template arithm operators of NDArray class
Signed-off-by: Yurii <iuriish@yahoo.com>
* - one more test fix
Signed-off-by: Yurii <iuriish@yahoo.com>
Co-authored-by: raver119 <raver119@gmail.com>
2019-12-20 22:35:39 +03:00
raver119
bd4f77c652
Compilation units ( #124 )
...
* IndexReduce and Reduce3 split into few units
Signed-off-by: raver119 <raver119@gmail.com>
* IndexReductionLoops split as well
Signed-off-by: raver119 <raver119@gmail.com>
* reduce_float split as well
Signed-off-by: raver119 <raver119@gmail.com>
2019-12-14 21:59:37 +02:00
Yurii Shyrma
425c747330
- permute threadsPerBlock and blocksPerGrid in signature of launching of cuda kernel for trueBroadcast op ( #120 )
...
Signed-off-by: Yurii <iuriish@yahoo.com>
2019-12-09 20:08:36 +03:00
raver119
ee5d25caa9
cuda broadcast exec fix
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-12-09 11:17:16 +03:00
raver119
cea68c18f1
cuda broadcast fix
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-12-09 09:27:50 +03:00
raver119
ae7933a428
cpu truebroadcast fix
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-12-09 08:01:12 +03:00
raver119
25b3cd9b80
[WIP] CUDA tests ( #95 )
...
* one more CI test
Signed-off-by: raver119 <raver119@gmail.com>
* export additional symbols
Signed-off-by: raver119 <raver119@gmail.com>
* few more tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* one more tweak for linux
Signed-off-by: raver119 <raver119@gmail.com>
* fix dtype in few tests
Signed-off-by: raver119 <raver119@gmail.com>
* missing sync and memset in couple of tests
Signed-off-by: raver119 <raver119@gmail.com>
* copy step for libnd4j cuda
Signed-off-by: raver119 <raver119@gmail.com>
* no-op on empty for adjust hue/contrast/saturation
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA_VERBOSE Off
Signed-off-by: raver119 <raver119@gmail.com>
* BroadcastBool fix + few tests
Signed-off-by: raver119 <raver119@gmail.com>
* trigger jenkins
Signed-off-by: raver119 <raver119@gmail.com>
* trigger jenkins
Signed-off-by: raver119 <raver119@gmail.com>
* - ignore couple of warnings
- remove redundant compiler options
Signed-off-by: raver119 <raver119@gmail.com>
2019-12-02 21:37:21 +03:00
Yurii Shyrma
a8dd6713aa
Shyrma scatter ( #84 )
...
* - improve performance of scatter (no lock) ops for 1D case
Signed-off-by: Yurii <iuriish@yahoo.com>
* - improve scatter lock op performance for 1D case
Signed-off-by: Yurii <iuriish@yahoo.com>
* - add kernel for verification of input indices-array elements in scatter and scatter_nd ops
Signed-off-by: Yurii <iuriish@yahoo.com>
* - provide fast indices checking on cpu side for scatter and gather osp
Signed-off-by: Yurii <iuriish@yahoo.com>
* - apply corrections requested by pr reviewer
Signed-off-by: Yurii <iuriish@yahoo.com>
2019-11-26 20:29:09 +03:00
Yurii Shyrma
7a90a31cfb
Shyrma deconv3 ( #69 )
...
* - profiling cuda kernels for vol2col and im2col
Signed-off-by: Yurii <iuriish@yahoo.com>
* - correct addBias helper
Signed-off-by: Yurii <iuriish@yahoo.com>
* - correct mkl dilation formula and switch off mkl api for dilation deconvolutions
Signed-off-by: Yurii <iuriish@yahoo.com>
2019-11-21 21:17:30 +02:00
raver119
064a56ccf1
Few fixes ( #66 )
...
* skip legacy transforms execution in case of empty input arrays
Signed-off-by: raver119 <raver119@gmail.com>
* - BroadcastBool ops now accept extraParams to make MatchCondition possible
- TrueBroadcastHelper now uses samediff::threads
Signed-off-by: raver119 <raver119@gmail.com>
* java side
Signed-off-by: raver119 <raver119@gmail.com>
* trigger jenkins
Signed-off-by: raver119 <raver119@gmail.com>
* update LessThanOrEqual opNum mapping
Signed-off-by: raver119 <raver119@gmail.com>
* update LessThanOrEqual opNum mapping
Signed-off-by: raver119 <raver119@gmail.com>
2019-11-21 15:43:03 +03:00
Yurii Shyrma
66b84b38cf
Shyrma mmul ( #58 )
...
* - get rid of some copy procedures in mmulHelper ops
Signed-off-by: Yurii <iuriish@yahoo.com>
* - further work on embedding cuda api for batched gemm (cublasGemmBatchedEx) in our mmulHelper class
Signed-off-by: Yurii <iuriish@yahoo.com>
* - further work on cuda batched gamm api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - write own cuda kernel performing batched gemm
Signed-off-by: Yurii <iuriish@yahoo.com>
* missing include in MmulHelper
Signed-off-by: raver119 <raver119@gmail.com>
* - forgot to keep in code previous correct kernels for mmulNxN, since it may happen that new onw will fail for some reason in future
Signed-off-by: Yurii <iuriish@yahoo.com>
* disable old tensordot
Signed-off-by: raver119 <raver119@gmail.com>
* - rewrite cuda kernels for usualGemm and usualGemv
Signed-off-by: Yurii <iuriish@yahoo.com>
* - profiling mmul helpers
Signed-off-by: Yurii <iuriish@yahoo.com>
* - prints to check shapes were added
Signed-off-by: Yurii <iuriish@yahoo.com>
* - correct type of output array Cin mmulNxN
Signed-off-by: Yurii <iuriish@yahoo.com>
* - take into account possible nans in C array
Signed-off-by: Yurii <iuriish@yahoo.com>
* slightly change numThreads message
Signed-off-by: raver119 <raver119@gmail.com>
* - make corrections in accordance to given notes in pr review
Signed-off-by: Yurii <iuriish@yahoo.com>
2019-11-19 15:39:36 +02:00
Alex Black
47d19908f4
Various fixes ( #43 )
...
* #8172 Enable DL4J MKLDNN batch norm backward pass
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8382 INDArray.toString() rank 1 brackets / ambiguity fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8308 Fix handful of broken links (inc. some in errors)
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Unused dependencies, round 1
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Unused dependencies, round 2
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Unused dependencies, round 3
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Uniform distribution TF import fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-11-14 19:38:20 +11:00
raver119
48df1acdfb
[WIP] ThreadPool ( #8 )
...
This PR removes OpenMP use in 95% of cases
2019-11-13 17:04:59 +03:00
raver119
f05c6ee139
INLINE_LOOPS for windows
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-11-12 15:12:31 +03:00
Alex Black
18c01f5bdc
Add SameDiff memory reuse memory manager (array cache) ( #39 )
...
* Attention op comments
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* ArrayCacheMemoryMgr - first pass
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Tweak array cache for use with SameDiff identity arrays
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* ArrayCacheMemoryMgr javadoc and properly get max memory
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* LRU cache policy + add tests
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Resize arrays internally if required for ArrayCacheMemoryMgr
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Test improvement
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small polish
Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-11-12 21:15:44 +11:00
shugeo
08853c7829
Shugeo random uniform int ( #30 )
...
* Corrected randomuniform declaration.
* Refactored uniform distribution for both cuda and cpu platforms.
* Refactored uniform distribution and tests.
* Fixed type usage with indices.
* Refactored uniform distribution implementation and tests to full conform with TF implementation.
* Refactored gamma function to use type util method.
* Copyright changes and fixes with ConstantHelper.
* Added error checking on allocate cuda device memory and operations.
2019-11-06 12:49:27 +02:00
Yurii Shyrma
871f3bb3e6
- add additional condition in svd helper to take into account rounding errors ( #31 )
...
Signed-off-by: Yurii <iuriish@yahoo.com>
2019-11-05 17:16:17 +02:00
Yurii Shyrma
0cdb5750e0
Shyrma concat ( #24 )
...
* - provide possibility to pass axis as last input array in concat op
- corrcect sumation in bias_add_bp op for NHWC case
Signed-off-by: Yurii <iuriish@yahoo.com>
* - write code for deconv2d op based on mkl dnn api
* no unsafe math
Signed-off-by: raver119 <raver119@gmail.com>
* no unsafe math
Signed-off-by: raver119 <raver119@gmail.com>
* - get rid of e<> and p<> methods in svd helper
Signed-off-by: Yurii <iuriish@yahoo.com>
* - provide mkl api support for deconvolution 3d
Signed-off-by: Yurii <iuriish@yahoo.com>
* - write deconv2d_bp based on mkl api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - write deconv3d_bp based on mkl api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - testing and fixing deconv based on mkl api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - remove dilation form conv2d/3d mkl
Signed-off-by: Yurii <iuriish@yahoo.com>
* - minor changes
Signed-off-by: Yurii <iuriish@yahoo.com>
* - further corrections of deconv ops based on mkl dnn api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - provide deconv2d_tf based on mkl dnn api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - add minor corrections required by reviewer
Signed-off-by: Yurii <iuriish@yahoo.com>
2019-11-03 12:37:19 +02:00
Yurii Shyrma
029a69a835
Shyrma bn mkl bp ( #14 )
...
* - write code for new batchnorm backprop
Signed-off-by: Yurii <iuriish@yahoo.com>
* - testing batchnorm backprop
Signed-off-by: Yurii <iuriish@yahoo.com>
* - write code for batchnorm backprop based on mkl dnn api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - testing and fixing bugs in batchnorm_bp mkl dnn
Signed-off-by: Yurii <iuriish@yahoo.com>
* - made corrections required by reviewer
Signed-off-by: Yurii <iuriish@yahoo.com>
* - change name in java wrapper for batchnorm op
Signed-off-by: Yurii <iuriish@yahoo.com>
2019-10-26 14:14:21 +03:00
Yurii
70bd925abd
- write 2 versions of new lstmLayer: one is based on own code, second uses mkl dnn api
2019-10-17 20:44:52 +03:00
raver119
44a8d19ac6
[WIP] Broadcast changes ( #8257 )
...
* - provide correct call NDArray::applyBroadcast inside of NDArray::applyTrueBroadcast
Signed-off-by: Yurii <yurii@skymind.io>
* - provide new trueBroadcast helper
Signed-off-by: Yurii <yurii@skymind.io>
* example for yurii
Signed-off-by: raver119 <raver119@gmail.com>
* - provide new trueBroadcast helper for cpu
Signed-off-by: Yurii <yurii@skymind.io>
* - start working on new trueBroadcat helper for cuda
Signed-off-by: Yurii <yurii@skymind.io>
* - further work on trueBroadcast for cuda
Signed-off-by: Yurii <yurii@skymind.io>
* - fix bugs in cuda helper trueBroadcast
Signed-off-by: Yurii <yurii@skymind.io>
2019-10-01 09:10:19 +03:00
AlexDBlack
a66e03355e
Merge remote-tracking branch 'fork/master'
2019-09-12 12:20:57 +10:00
raver119
98e2814879
Platform helpers ( #8216 )
...
* platform helpers draft
Signed-off-by: raver119 <raver119@gmail.com>
* typo
Signed-off-by: raver119 <raver119@gmail.com>
* disable platform cmake
Signed-off-by: raver119 <raver119@gmail.com>
* another draft
Signed-off-by: raver119 <raver119@gmail.com>
* mkldnn convolution refactored
Signed-off-by: raver119 <raver119@gmail.com>
* minor tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* one more safety check
Signed-off-by: raver119 <raver119@gmail.com>
* prototype works
Signed-off-by: raver119 <raver119@gmail.com>
* meh
Signed-off-by: raver119 <raver119@gmail.com>
* force static library mode for mkldnn
Signed-off-by: raver119 <raver119@gmail.com>
* - ismax fix
- experimental arg fix
- don't enforce openblas on Apple hardware
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of small fixes
Signed-off-by: raver119@gmail.com <raver119@gmail.com>
* declare concurrent
Signed-off-by: raver119@gmail.com <raver119@gmail.com>
* - MKLDNN version upgrade to 1.0.2
- avgpool2d/maxpool2d APIs update
Signed-off-by: raver119 <raver119@gmail.com>
* - avgpool2d_bp/maxpool2d_bp APIs update
Signed-off-by: raver119 <raver119@gmail.com>
* - conv2d/batchnorm APIs update
Signed-off-by: raver119 <raver119@gmail.com>
* - lrn/conv2d_bp/conv3d/conv3d_bp APIs update
Signed-off-by: raver119 <raver119@gmail.com>
* all ops converted to MKLDNN 1.x
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* namespace for platform helpers
Signed-off-by: raver119 <raver119@gmail.com>
* make sure platform helpers aren't opimized out
Signed-off-by: raver119 <raver119@gmail.com>
* build cpu_features on x86 systems
Signed-off-by: raver119 <raver119@gmail.com>
* build cpu_features on x86 systems
Signed-off-by: raver119 <raver119@gmail.com>
* more of cpu_features
Signed-off-by: raver119 <raver119@gmail.com>
* - mkldnn removed from java
- cpu_features checks in CpuNDArrayFactory
Signed-off-by: raver119 <raver119@gmail.com>
* F16C definition renamed
Signed-off-by: raver119 <raver119@gmail.com>
* some mkldnn rearrangements
Signed-off-by: raver119 <raver119@gmail.com>
* check supported instructions before doing anything
Signed-off-by: raver119 <raver119@gmail.com>
* typo
Signed-off-by: raver119 <raver119@gmail.com>
* missied impl
Signed-off-by: raver119 <raver119@gmail.com>
* BUILD_PIC option
Signed-off-by: raver119 <raver119@gmail.com>
* conv2d fix
Signed-off-by: raver119 <raver119@gmail.com>
* avgpool3d fix
Signed-off-by: raver119 <raver119@gmail.com>
* avgpool3d_bp fix
Signed-off-by: raver119 <raver119@gmail.com>
* avgpool2d_bp leak fix
Signed-off-by: raver119 <raver119@gmail.com>
* avgpool3d_bp leak fix
Signed-off-by: raver119 <raver119@gmail.com>
* maxpool bp leaks fixed
Signed-off-by: raver119 <raver119@gmail.com>
* printf removed
Signed-off-by: raver119 <raver119@gmail.com>
* batchnorm fix
Signed-off-by: raver119 <raver119@gmail.com>
* AVX warning/error polishing
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More polish
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Polish
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* remove previous MKL-DNN support layer
Signed-off-by: raver119 <raver119@gmail.com>
* avx2 tweak
Signed-off-by: raver119 <raver119@gmail.com>
* allow static for apple
Signed-off-by: raver119@gmail.com <raver119@gmail.com>
* exclude mkldnn in one more place
Signed-off-by: raver119 <raver119@gmail.com>
* exclude mkldnn in one more place
Signed-off-by: raver119 <raver119@gmail.com>
* restore OPENBLAS_PATH use
Signed-off-by: raver119 <raver119@gmail.com>
* add runtime check for avx/avx2 support
Signed-off-by: raver119 <raver119@gmail.com>
* convolution_auto
Signed-off-by: raver119 <raver119@gmail.com>
* Add logic for helper argument
* minor test fix
Signed-off-by: raver119 <raver119@gmail.com>
* few tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* few tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* skip OpTracker props for non-x86 builds
Signed-off-by: raver119 <raver119@gmail.com>
* linux arm isn't x86 :)
Signed-off-by: raver119 <raver119@gmail.com>
* avx-512
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA presets fix
Signed-off-by: raver119 <raver119@gmail.com>
* BUILD_PIC
Signed-off-by: raver119 <raver119@gmail.com>
* prefetchw for avx2
Signed-off-by: raver119 <raver119@gmail.com>
* BUILD_PIC again
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-11 21:50:28 +03:00
raver119
589401477d
[WIP] bunch of improvements ( #257 )
...
* - profiling bias_add op
- add some docementation
Signed-off-by: Yurii <yurii@skymind.io>
* - minor change
Signed-off-by: Yurii <yurii@skymind.io>
* - provide addBias cuda kernel
Signed-off-by: Yurii <yurii@skymind.io>
* - improve shape::getIndexOfffset and change its signature
Signed-off-by: Yurii <yurii@skymind.io>
* - same as previous
Signed-off-by: Yurii <yurii@skymind.io>
* - improve and change signature in some shape:: stuff which has to do with calculation of offsets for array elements
Signed-off-by: Yurii <yurii@skymind.io>
* - minor changes in flatten
Signed-off-by: Yurii <shyrma@skymind.io>
* - add function shape::getIndexOffsetOrdered
Signed-off-by: Yurii <shyrma@skymind.io>
* - correct shape::getIndexOffsetOrdered()
Signed-off-by: Yurii <shyrma@skymind.io>
* - move getIndexOffsetOrdered to flatten.h header in order to isolate this function
Signed-off-by: Yurii <shyrma@skymind.io>
2019-09-11 20:12:09 +03:00
raver119
7abc574eeb
Snapshot update ( #8194 )
...
* fix double consumption of rng on cpu
Signed-off-by: raver119 <raver119@gmail.com>
* Shyrma docs (#222 )
* - documenting and profiling matrix_set_diag cuda kernel
Signed-off-by: Yurii <yurii@skymind.io>
* - correct formula of pnorm pooling in cuda 2d/3d kernels
- remove helper matrix_diag which duplicates work of helper matrix_set_diag
Signed-off-by: Yurii <yurii@skymind.io>
* cublasHandle sharing + lock
Signed-off-by: raver119 <raver119@gmail.com>
* cublasHandle sharing + lock
Signed-off-by: raver119 <raver119@gmail.com>
* Documentation from serialization/deserialization in NLP (#221 )
* refactoring
Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
* Javadocs
Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
* Javadoc fixed
Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
* Cleanup
Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
* dedicated lock for getCudaCublasHandle
Signed-off-by: raver119 <raver119@gmail.com>
* Small fixes (#223 )
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* ELU DL4J fixes (#224 )
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* javadoc (#225 )
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* Small test compilation fix (#226 )
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8182 remove spark version suffix (#227 )
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* [WIP] Thread safety (#229 )
* sync after cublas*gemm
Signed-off-by: raver119 <raver119@gmail.com>
* mutex for CublasHelper
Signed-off-by: raver119 <raver119@gmail.com>
* don't store cublasHandle in LaunchContext, it's per-device anyway
Signed-off-by: raver119 <raver119@gmail.com>
* some printout
Signed-off-by: raver119 <raver119@gmail.com>
* check for field instead
Signed-off-by: raver119 <raver119@gmail.com>
* pew-pew
Signed-off-by: raver119 <raver119@gmail.com>
* don't release ContextBuffers until device changed
Signed-off-by: raver119 <raver119@gmail.com>
* small tweak
Signed-off-by: raver119 <raver119@gmail.com>
* some logging in sgemm
Signed-off-by: raver119 <raver119@gmail.com>
* stream sync
Signed-off-by: raver119 <raver119@gmail.com>
* some more logging
Signed-off-by: raver119 <raver119@gmail.com>
* some more error checks
Signed-off-by: raver119 <raver119@gmail.com>
* one fancy test
Signed-off-by: raver119 <raver119@gmail.com>
* one fancy test
Signed-off-by: raver119 <raver119@gmail.com>
* minor AffinityManager fix
Signed-off-by: raver119 <raver119@gmail.com>
* cudaEvent error logging improvement
Signed-off-by: raver119 <raver119@gmail.com>
* ConstantHelper thread safety
Signed-off-by: raver119 <raver119@gmail.com>
* - minor corrections in ConstantTadHelper
Signed-off-by: Yurii <yurii@skymind.io>
* ConstantShapeHelper thread safety
Signed-off-by: raver119 <raver119@gmail.com>
* ConstantTadHelper.cu updated
Signed-off-by: raver119 <raver119@gmail.com>
* logging off
Signed-off-by: raver119 <raver119@gmail.com>
* logging off
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-03 22:02:02 +03:00
raver119
dddc8a1143
[WIP] Thread safety ( #229 )
...
* sync after cublas*gemm
Signed-off-by: raver119 <raver119@gmail.com>
* mutex for CublasHelper
Signed-off-by: raver119 <raver119@gmail.com>
* don't store cublasHandle in LaunchContext, it's per-device anyway
Signed-off-by: raver119 <raver119@gmail.com>
* some printout
Signed-off-by: raver119 <raver119@gmail.com>
* check for field instead
Signed-off-by: raver119 <raver119@gmail.com>
* pew-pew
Signed-off-by: raver119 <raver119@gmail.com>
* don't release ContextBuffers until device changed
Signed-off-by: raver119 <raver119@gmail.com>
* small tweak
Signed-off-by: raver119 <raver119@gmail.com>
* some logging in sgemm
Signed-off-by: raver119 <raver119@gmail.com>
* stream sync
Signed-off-by: raver119 <raver119@gmail.com>
* some more logging
Signed-off-by: raver119 <raver119@gmail.com>
* some more error checks
Signed-off-by: raver119 <raver119@gmail.com>
* one fancy test
Signed-off-by: raver119 <raver119@gmail.com>
* one fancy test
Signed-off-by: raver119 <raver119@gmail.com>
* minor AffinityManager fix
Signed-off-by: raver119 <raver119@gmail.com>
* cudaEvent error logging improvement
Signed-off-by: raver119 <raver119@gmail.com>
* ConstantHelper thread safety
Signed-off-by: raver119 <raver119@gmail.com>
* - minor corrections in ConstantTadHelper
Signed-off-by: Yurii <yurii@skymind.io>
* ConstantShapeHelper thread safety
Signed-off-by: raver119 <raver119@gmail.com>
* ConstantTadHelper.cu updated
Signed-off-by: raver119 <raver119@gmail.com>
* logging off
Signed-off-by: raver119 <raver119@gmail.com>
* logging off
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-03 22:00:38 +03:00
Yurii Shyrma
70af8c2afc
Shyrma svd ( #191 )
...
* - add one additional test for svd
* - provide float argument in eye op to be a type of output array
Signed-off-by: Yurii <yurii@skymind.io>
* - add cuda capability check to mmulHelper
Signed-off-by: Yurii <yurii@skymind.io>
* - make use another method for divice id evaluation
Signed-off-by: Yurii <yurii@skymind.io>
* Eye data type as T argument
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-28 18:27:08 +03:00
raver119
7f0c660d8b
[WIP] HGemm ( #181 )
...
* skip string arrays for device validation
Signed-off-by: raver119 <raver119@gmail.com>
* confusion_matrix fix
Signed-off-by: raver119 <raver119@gmail.com>
* exclude cublasHGemm from archs < 530
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-27 15:05:43 +03:00
raver119
df84bc7255
[WIP] More tweaks ( #173 )
...
* CUDA empty reduction
Signed-off-by: raver119 <raver119@gmail.com>
* - listdiff synchronization fix for CUDA
- listdiff test
Signed-off-by: raver119 <raver119@gmail.com>
* - IndexReduce ops now allow INDEXING_TYPES output
- topK op accepts only INDEXING_TYPES as output
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-27 10:37:10 +03:00
raver119
25e5c23eae
[WIP] Error handling ( #169 )
...
* CUDA reverse rewrite + couple of tests
Signed-off-by: raver119 <raver119@gmail.com>
* don't throw exception on invalid pointer
Signed-off-by: raver119 <raver119@gmail.com>
* data types validation for fastpath exec mode + 2 tests
Signed-off-by: raver119 <raver119@gmail.com>
* data types validation for fastpath exec mode + 2 tests
Signed-off-by: raver119 <raver119@gmail.com>
* ismax allowed dtypes tweak
Signed-off-by: raver119 <raver119@gmail.com>
* lastErrorCode + lastErrorMessage for native exceptions handling
Signed-off-by: raver119 <raver119@gmail.com>
* exportable ErrorReference
Signed-off-by: raver119 <raver119@gmail.com>
* check error codes in java
Signed-off-by: raver119 <raver119@gmail.com>
* - consume lastErrorCode
- fast_in dtype validation fix
Signed-off-by: raver119 <raver119@gmail.com>
* - sg/cb allowed output type change
- minor logging fix for data type validation
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-26 19:57:51 +03:00
raver119
f03b0ee78f
[WIP] more fixes ( #159 )
...
* Added test for MatrixInverse with double input. Fixed matrixDeterminantKernel.
* Fixed kernels to avoid waste templating.
* Fixed logDeterminant kernel.
* Refactored type check for lup'
* - decrease blockDim value for zeta op
Signed-off-by: Yurii <yurii@skymind.io>
* Added print for compound matrix with CUDA.
* Refactored upper matrix invertion kernels.
* - provide move constructor and move assignment operator for OpArgsHoder class
Signed-off-by: Yurii <yurii@skymind.io>
* Refactored usage of launch context.
* - add test for mergemax
Signed-off-by: Yurii <yurii@skymind.io>
* get rid of AveragingArrayProxy
Signed-off-by: raver119 <raver119@gmail.com>
* Refactoring of LUP inversion.
* Added prints for invertion.
* - add OpArgsHolder copy constructor and assignment operator
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for lower inversion
* - fix bug in upsampling2d/3d_bp op
Signed-off-by: Yurii <yurii@skymind.io>
* Added expensive printfs to kernel.
* Refactored expensive kernel prints.
* Refactored expensive printfs
* - remove nullify
Signed-off-by: Yurii <yurii@skymind.io>
* Eliminated waste prints with tests.
* upsampling2d_bp test
Signed-off-by: raver119 <raver119@gmail.com>
* test updated
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-23 19:20:50 +03:00
raver119
e604ffe0d2
[WIP] repeat op ( #143 )
...
* - write new repeat helper (cpu)
Signed-off-by: Yurii <yurii@skymind.io>
* - update NDArray::cpu
Signed-off-by: Yurii <yurii@skymind.io>
* - update NDArray::repeat cuda
Signed-off-by: Yurii <yurii@skymind.io>
2019-08-21 21:10:29 +03:00
raver119
3cf72e5e30
[WIP] More fixes ( #142 )
...
* atomicAdd cc 70+
Signed-off-by: raver119 <raver119@gmail.com>
* additional 8 bytes alocation
Signed-off-by: raver119 <raver119@gmail.com>
* missed include 2019
Signed-off-by: raver119 <raver119@gmail.com>
* less spam
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-21 20:18:29 +03:00
raver119
269d508ba5
[WIP] cross-device migrations ( #134 )
...
* two more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA device afinity tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* minor tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* prepareAction/registerAction for CustomOps
Signed-off-by: raver119 <raver119@gmail.com>
* lazy allocate host bufer before relocation
Signed-off-by: raver119 <raver119@gmail.com>
* one special test for migration in cpp
Signed-off-by: raver119 <raver119@gmail.com>
* tests update for msvc
Signed-off-by: raver119 <raver119@gmail.com>
* logging
Signed-off-by: raver119 <raver119@gmail.com>
* stick to old col2im impl
Signed-off-by: raver119 <raver119@gmail.com>
* cudaStreams reorganization
Signed-off-by: raver119 <raver119@gmail.com>
* buffer size fix
Signed-off-by: raver119 <raver119@gmail.com>
* c++ data migration
Signed-off-by: raver119 <raver119@gmail.com>
* fix CropAndResize test
Signed-off-by: raver119 <raver119@gmail.com>
* - minor improvment
Signed-off-by: Yurii <yurii@skymind.io>
2019-08-20 18:52:41 +03:00
raver119
6264530dd8
[WIP] bitwise ops ( #115 )
...
* - cyclic_shift_bits + test
- shift_bits + test
Signed-off-by: raver119 <raver119@gmail.com>
* OMP_IF replacement
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-15 11:49:50 +03:00
raver119
53ca9a76e8
[WIP] multi-device support ( #80 )
...
* fix pad javadoc and @see links. (#72 )
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* [WIP] More fixes (#73 )
* special tests for ConstantTadHelper/ConstantShapeHelper
Signed-off-by: raver119 <raver119@gmail.com>
* release methods for data buffers
Signed-off-by: raver119 <raver119@gmail.com>
* delete temporary buffer Java side
Signed-off-by: raver119 <raver119@gmail.com>
* delete temporary buffer Java side
Signed-off-by: raver119 <raver119@gmail.com>
* delete temporary TadPack C++/Java side (#74 )
Signed-off-by: raver119 <raver119@gmail.com>
* Zoo model TF import test updates (#75 )
* argLine fix, update compression_gru comment
* updated comment for xception
* undid but commented argLine change
* updated xlnet comment
* copyright headers
* - new NDArray methods like()/ulike() (#77 )
- fix for depthwise_conv2d_bp + special test
Signed-off-by: raver119 <raver119@gmail.com>
* upsampling2d fix CUDA
Signed-off-by: raver119 <raver119@gmail.com>
* DL4J trace logging (#79 )
* MLN/CG trace logging for debugging
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Tiny tweak
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* strided_slice_bp shape fn leak fix
Signed-off-by: raver119 <raver119@gmail.com>
* SameDiff fixes and naming (#78 )
* remove SDVariable inplace methods
* import methods
* npe fix in OpVal
* removed SameDiff inplace ops from tests
* Naming updates, moved to centralized methods in SameDiff, should use op_#:# for everything
* quick fixes
* javadoc
* SDVariable eval with placeholders
* use regex match
* better matching
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* fix javadoc. (#76 )
* fix javadoc.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* replace most @see with @link s.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* 4 additional tests
Signed-off-by: raver119 <raver119@gmail.com>
* launch context reorganization
Signed-off-by: raver119 <raver119@gmail.com>
* LaunchContext reorganization
Signed-off-by: raver119 <raver119@gmail.com>
* per-device LaunchContext
Signed-off-by: raver119 <raver119@gmail.com>
* Various DL4J/ND4J fixes (#81 )
* #7954 Force refresh of UI when switching tabs on overview page
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8017 Concurrent modification exception (synchronize) fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8033 Don't initialize updater in middle of writing memory crash dump
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8208 Fix shape checks for ND4J int[] creator methods
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #6385 #7992 Keras import naming fixes + cleanup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8016 Upsampling3D - add NDHWC format support
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* ContextBuffers as separate entity
Signed-off-by: raver119 <raver119@gmail.com>
* Refactor NativeOps.h to export C functions
* Actually export functions from NativeOps.h
* Adapt the Java wrappers in ND4J generated with JavaCPP
* Create C wrappers for some of the C++ classes currently used by ND4J
* ContextBuffers as separate entity
Signed-off-by: raver119 <raver119@gmail.com>
* remove duplicate code in createBufferDetached. (#83 )
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* Keras model import - updater lr fix (#84 )
* Keras model import - updater lr fix
Signed-off-by: eraly <susan.eraly@gmail.com>
* Keras model import - updater lr fix, cleanup
Signed-off-by: eraly <susan.eraly@gmail.com>
* ContextBuffers as separate entity
Signed-off-by: raver119 <raver119@gmail.com>
* ContextBuffers as separate entity
Signed-off-by: raver119 <raver119@gmail.com>
* Fix functions of OpaqueVariablesSet
* thread-local buffers/affinity
Signed-off-by: raver119 <raver119@gmail.com>
* thread safety for LaunchContext
Signed-off-by: raver119 <raver119@gmail.com>
* more of thread safety
Signed-off-by: raver119 <raver119@gmail.com>
* one more multi threaded test
Signed-off-by: raver119 <raver119@gmail.com>
* SameDiff Convolution Config validation, better output methods (#82 )
* Conv Config validation & tests
Signed-off-by: Ryan Nett <rnett@skymind.io>
* stackOutputs utility method
Signed-off-by: Ryan Nett <rnett@skymind.io>
* use constructor for validation, support negative kernel sizes (infered from weights)
Signed-off-by: Ryan Nett <rnett@skymind.io>
* better output methods
Signed-off-by: Ryan Nett <rnett@skymind.io>
* move output to be with fit and evaluate
Signed-off-by: Ryan Nett <rnett@skymind.io>
* fixes
Signed-off-by: Ryan Nett <rnett@skymind.io>
* more fixes
Signed-off-by: Ryan Nett <rnett@skymind.io>
* refactor duplicate code from pad methods. (#86 )
* refactor duplicate code from pad methods.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* replace switch with if.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* Various ND4J/DL4J fixes and improvements (#87 )
* Reshape and reallocate - small fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Reshape and reallocate - small fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #6488 ElementWiseVertex broadcast support
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Constructors and broadcast supported it Transforms.max/min
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8054 ElementWiseVertex now supports broadcast inputs
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8057 Nd4j.create overload dtype fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #7551 ND4J Shape validation fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* [WIP] Numpy boolean import (#91 )
* numpy bool type
Signed-off-by: raver119 <raver119@gmail.com>
* numpy bool java side
Signed-off-by: raver119 <raver119@gmail.com>
* remove create method with unused parameter. (#89 )
* remove create method with unused parameter.
* removed more unused methods.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* removing more unused code.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* last removal of unused code.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* remove createSparse methods. (#92 )
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* Various ND4J/DL4J fixes (#90 )
* Deprecate Old*Op instances
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8063 #8054 Broadcast exceptions + cleanup inplace ops
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Remove bad test condition
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #7993 Fix shape function issue in crop_and_resize op
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* DL4J SameDiff lambda layer fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8029 Fix for pnorm backprop math
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8038 Fix Op profiler NaN/Inf triggering + add tests (#93 )
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* createUninitializedDetached refactoring. (#94 )
* wip
* update interface, add null implementations.
* Breaking one test in a weird way.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* createUninitializedDetached refactored.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* cuda build fix for issues introduced by recent refactoring
Signed-off-by: raver119 <raver119@gmail.com>
* [WIP] More of CUDA (#95 )
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* Implementation of hashcode cuda helper. Working edition.
* Fixed parallel test input arangements.
* Fixed tests for hashcode op.
* Fixed shape calculation for image:crop_and_resize op and test.
* NativeOps tests. Initial test suite.
* Added tests for indexReduce methods.
* Added test on execBroadcast with NDArray as dimensions.
* Added test on execBroadcastBool with NDArray as dimensions.
* Added tests on execPairwiseTransform and execPairwiseTransofrmBool.
* Added tests for execReduce with scalar results.
* Added reduce tests for non-empty dims array.
* Added tests for reduce3.
* Added tests for execScalar.
* Added tests for execSummaryStats.
* - provide cpu/cuda code for batch_to_space
- testing it
Signed-off-by: Yurii <yurii@skymind.io>
* - remove old test for batch_to_space (had wrong format and numbers were not checked)
Signed-off-by: Yurii <yurii@skymind.io>
* Fixed complilation errors with test.
* Added test for execTransformFloat.
* Added test for execTransformSame.
* Added test for execTransformBool.
* Added test for execTransformStrict.
* Added tests for execScalar/execScalarBool with TADs.
* Added test for flatten.
* - provide cpu/cuda code for space_to_Batch operaion
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for concat.
* comment unnecessary stuff in s_t_b
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for specialConcat.
* Added tests for memcpy/set routines.
* Fixed pullRow cuda test.
* Added pullRow test.
* Added average test.
* - correct typo in NDArray::applyPairwiseTransform(nd4j::pairwise::BoolOps op...)
Signed-off-by: Yurii <yurii@skymind.io>
* - debugging and fixing cuda tests in JavaInteropTests file
Signed-off-by: Yurii <yurii@skymind.io>
* - correct some tests
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for shuffle.
* Fixed ops declarations.
* Restored omp and added shuffle test.
* Added convertTypes test.
* Added tests for execRandom. Eliminated usage of RandomBuffer with NativeOps.
* Added sort tests.
* Added tests for execCustomOp.
* - further debuging and fixing tests terminated with crash
Signed-off-by: Yurii <yurii@skymind.io>
* Added tests for calculateOutputShapes.
* Addded Benchmarks test.
* Commented benchmark tests.
* change assertion
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for apply_sgd op. Added cpu helper for that op.
* Implement cuda helper for aplly_sgd op. Fixed tests for NativeOps.
* Added test for assign broadcastable.
* Added tests for assign_bp op.
* Added tests for axpy op.
* - assign/execScalar/execTransformAny signature change
- minor test fix
Signed-off-by: raver119 <raver119@gmail.com>
* Fixed axpy op.
* meh
Signed-off-by: raver119 <raver119@gmail.com>
* - fix tests for nativeOps::concat
Signed-off-by: Yurii <yurii@skymind.io>
* sequential transform/scalar
Signed-off-by: raver119 <raver119@gmail.com>
* allow nested parallelism
Signed-off-by: raver119 <raver119@gmail.com>
* assign_bp leak fix
Signed-off-by: raver119 <raver119@gmail.com>
* block setRNG fix
Signed-off-by: raver119 <raver119@gmail.com>
* enable parallelism by default
Signed-off-by: raver119 <raver119@gmail.com>
* enable nested parallelism by default
Signed-off-by: raver119 <raver119@gmail.com>
* Added cuda implementation for row_count helper.
* Added implementation for tnse gains op helper.
* - take into account possible situations when input arrays are empty in reduce_ cuda stuff
Signed-off-by: Yurii <yurii@skymind.io>
* Implemented tsne/edge_forces op cuda-based helper. Parallelized cpu-based helper for edge_forces.
* Added kernel for tsne/symmetrized op heleper.
* Implementation of tsne/symmetrized op cuda helper. Working edition.
* Eliminated waste printfs.
* Added test for broadcastgradientargs op.
* host-only fallback for empty reduce float
Signed-off-by: raver119 <raver119@gmail.com>
* - some tests fixes
Signed-off-by: Yurii <yurii@skymind.io>
* - correct the rest of reduce_ stuff
Signed-off-by: Yurii <yurii@skymind.io>
* - further correction of reduce_ stuff
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for Cbow op. Also added cuda implementation for cbow helpers.
* - improve code of stack operation for scalar case
Signed-off-by: Yurii <yurii@skymind.io>
* - provide cuda kernel for gatherND operation
Signed-off-by: Yurii <yurii@skymind.io>
* Implementation of cbow helpers with cuda kernels.
* minor tests tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* minor tests tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* - further correction of cuda stuff
Signed-off-by: Yurii <yurii@skymind.io>
* Implementatation of cbow op helper with cuda kernels. Working edition.
* Skip random testing for cudablas case.
* lstmBlockCell context fix
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for ELU and ELU_BP ops.
* Added tests for eq_scalar, gt_scalar, gte_scalar and lte_scalar ops.
* Added tests for neq_scalar.
* Added test for noop.
* - further work on clipbynorm_bp
Signed-off-by: Yurii <yurii@skymind.io>
* - get rid of concat op call, use instead direct concat helper call
Signed-off-by: Yurii <yurii@skymind.io>
* lstmBlockCell context fix
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for lrelu and lrelu_bp.
* Added tests for selu and selu_bp.
* Fixed lrelu derivative helpers.
* - some corrections in lstm
Signed-off-by: Yurii <yurii@skymind.io>
* operator * result shape fix
Signed-off-by: raver119 <raver119@gmail.com>
* - correct typo in lstmCell
Signed-off-by: Yurii <yurii@skymind.io>
* few tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA inverse broadcast bool fix
Signed-off-by: raver119 <raver119@gmail.com>
* disable MMAP test for CUDA
Signed-off-by: raver119 <raver119@gmail.com>
* BooleanOp syncToDevice
Signed-off-by: raver119 <raver119@gmail.com>
* meh
Signed-off-by: raver119 <raver119@gmail.com>
* additional data types for im2col/col2im
Signed-off-by: raver119 <raver119@gmail.com>
* Added test for firas_sparse op.
* one more RandomBuffer test excluded
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for flatten op.
* Added test for Floor op.
* bunch of tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* mmulDot tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* Implemented floordiv_bp op and tests.
* Fixed scalar case with cuda implementation for bds.
* - work on cuda kernel for clip_by_norm backprop op is completed
Signed-off-by: Yurii <yurii@skymind.io>
* Eliminate cbow crach.
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* Eliminated abortion with batched nlp test.
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* Fixed shared flag initializing.
* disabled bunch of cpu workspaces tests
Signed-off-by: raver119 <raver119@gmail.com>
* scalar operators fix: missing registerSpecialUse call
Signed-off-by: raver119 <raver119@gmail.com>
* Fixed logdet for cuda and tests.
* - correct clipBynorm_bp
Signed-off-by: Yurii <yurii@skymind.io>
* Fixed crop_and_resize shape datatype.
* - correct some mmul tests
Signed-off-by: Yurii <yurii@skymind.io>
* build fix
Signed-off-by: raver119 <raver119@gmail.com>
* exclude two methods for JNI
Signed-off-by: raver119 <raver119@gmail.com>
* exclude two methods for JNI
Signed-off-by: raver119 <raver119@gmail.com>
* exclude two methods for JNI (#97 )
Signed-off-by: raver119 <raver119@gmail.com>
* temporary stack fix
Signed-off-by: raver119 <raver119@gmail.com>
* round robin affinity test
Signed-off-by: raver119 <raver119@gmail.com>
* get rid of legacy CudaContext methods
Signed-off-by: raver119 <raver119@gmail.com>
* get rid of legacy ContextPool classes/methods
Signed-off-by: raver119 <raver119@gmail.com>
* one legacy test removed
Signed-off-by: raver119 <raver119@gmail.com>
* few more fields rearranged
Signed-off-by: raver119 <raver119@gmail.com>
* OpaqueLaunchContext
Signed-off-by: raver119 <raver119@gmail.com>
* OpaqueLaunchContext++
Signed-off-by: raver119 <raver119@gmail.com>
* more of OpaqueLaunchContext methods
Signed-off-by: raver119 <raver119@gmail.com>
* LaunchContext -> CudaContext
Signed-off-by: raver119 <raver119@gmail.com>
* AffinityManger changes
Signed-off-by: raver119 <raver119@gmail.com>
* AffinityManger changes
Signed-off-by: raver119 <raver119@gmail.com>
* cusolver handles
Signed-off-by: raver119 <raver119@gmail.com>
* typo
Signed-off-by: raver119 <raver119@gmail.com>
* cusolver method
Signed-off-by: raver119 <raver119@gmail.com>
* cusolver handle propagated
Signed-off-by: raver119 <raver119@gmail.com>
* blas/solver handles
Signed-off-by: raver119 <raver119@gmail.com>
* one more test
Signed-off-by: raver119 <raver119@gmail.com>
* legacy concat implementations replaced with new CustomOp
Signed-off-by: raver119 <raver119@gmail.com>
* one more test
Signed-off-by: raver119 <raver119@gmail.com>
* concat now uses way more blocks
Signed-off-by: raver119 <raver119@gmail.com>
* print
Signed-off-by: raver119 <raver119@gmail.com>
* no more triple template mmul
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of kernels have dtypes reconsidered
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of kernels have dtypes reconsidered
Signed-off-by: raver119 <raver119@gmail.com>
* bitonic sort reorganized
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of cpu stuff removed from cuda scope
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of cpu stuff removed from cuda scope
Signed-off-by: raver119 <raver119@gmail.com>
* type conversions moved to generic impl
Signed-off-by: raver119 <raver119@gmail.com>
* cpu data types pass
Signed-off-by: raver119 <raver119@gmail.com>
* non_max_suppression
Signed-off-by: raver119 <raver119@gmail.com>
* sortByValue fix
Signed-off-by: raver119 <raver119@gmail.com>
* ignore all mixed datatype tests for mmul
Signed-off-by: raver119 <raver119@gmail.com>
* special handling of OpProfiler exceptions
Signed-off-by: raver119 <raver119@gmail.com>
* - one failing concat test in cpp
- Nd4j.tile now uses op internally
Signed-off-by: raver119 <raver119@gmail.com>
* get back dtype exception for legacy arrays deserialization
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-14 16:52:34 +03:00
raver119
f49c4ea9d0
int -> long ( #108 )
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-10 09:14:18 +03:00
raver119
24e43e9856
[WIP] build time improvements ( #106 )
...
* fix pad javadoc and @see links. (#72 )
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* [WIP] More fixes (#73 )
* special tests for ConstantTadHelper/ConstantShapeHelper
Signed-off-by: raver119 <raver119@gmail.com>
* release methods for data buffers
Signed-off-by: raver119 <raver119@gmail.com>
* delete temporary buffer Java side
Signed-off-by: raver119 <raver119@gmail.com>
* delete temporary buffer Java side
Signed-off-by: raver119 <raver119@gmail.com>
* delete temporary TadPack C++/Java side (#74 )
Signed-off-by: raver119 <raver119@gmail.com>
* Zoo model TF import test updates (#75 )
* argLine fix, update compression_gru comment
* updated comment for xception
* undid but commented argLine change
* updated xlnet comment
* copyright headers
* - new NDArray methods like()/ulike() (#77 )
- fix for depthwise_conv2d_bp + special test
Signed-off-by: raver119 <raver119@gmail.com>
* upsampling2d fix CUDA
Signed-off-by: raver119 <raver119@gmail.com>
* DL4J trace logging (#79 )
* MLN/CG trace logging for debugging
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Tiny tweak
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* strided_slice_bp shape fn leak fix
Signed-off-by: raver119 <raver119@gmail.com>
* SameDiff fixes and naming (#78 )
* remove SDVariable inplace methods
* import methods
* npe fix in OpVal
* removed SameDiff inplace ops from tests
* Naming updates, moved to centralized methods in SameDiff, should use op_#:# for everything
* quick fixes
* javadoc
* SDVariable eval with placeholders
* use regex match
* better matching
* fix javadoc. (#76 )
* fix javadoc.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* replace most @see with @link s.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* 4 additional tests
Signed-off-by: raver119 <raver119@gmail.com>
* Various DL4J/ND4J fixes (#81 )
* #7954 Force refresh of UI when switching tabs on overview page
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8017 Concurrent modification exception (synchronize) fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8033 Don't initialize updater in middle of writing memory crash dump
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8208 Fix shape checks for ND4J int[] creator methods
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #6385 #7992 Keras import naming fixes + cleanup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8016 Upsampling3D - add NDHWC format support
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Refactor NativeOps.h to export C functions
* Actually export functions from NativeOps.h
* Adapt the Java wrappers in ND4J generated with JavaCPP
* Create C wrappers for some of the C++ classes currently used by ND4J
* remove duplicate code in createBufferDetached. (#83 )
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* Keras model import - updater lr fix (#84 )
* Keras model import - updater lr fix
Signed-off-by: eraly <susan.eraly@gmail.com>
* Keras model import - updater lr fix, cleanup
Signed-off-by: eraly <susan.eraly@gmail.com>
* Fix functions of OpaqueVariablesSet
* SameDiff Convolution Config validation, better output methods (#82 )
* Conv Config validation & tests
Signed-off-by: Ryan Nett <rnett@skymind.io>
* stackOutputs utility method
Signed-off-by: Ryan Nett <rnett@skymind.io>
* use constructor for validation, support negative kernel sizes (infered from weights)
Signed-off-by: Ryan Nett <rnett@skymind.io>
* better output methods
Signed-off-by: Ryan Nett <rnett@skymind.io>
* move output to be with fit and evaluate
Signed-off-by: Ryan Nett <rnett@skymind.io>
* fixes
Signed-off-by: Ryan Nett <rnett@skymind.io>
* more fixes
Signed-off-by: Ryan Nett <rnett@skymind.io>
* refactor duplicate code from pad methods. (#86 )
* refactor duplicate code from pad methods.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* replace switch with if.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* Various ND4J/DL4J fixes and improvements (#87 )
* Reshape and reallocate - small fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Reshape and reallocate - small fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #6488 ElementWiseVertex broadcast support
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Constructors and broadcast supported it Transforms.max/min
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8054 ElementWiseVertex now supports broadcast inputs
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8057 Nd4j.create overload dtype fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #7551 ND4J Shape validation fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* [WIP] Numpy boolean import (#91 )
* numpy bool type
Signed-off-by: raver119 <raver119@gmail.com>
* numpy bool java side
Signed-off-by: raver119 <raver119@gmail.com>
* remove create method with unused parameter. (#89 )
* remove create method with unused parameter.
* removed more unused methods.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* removing more unused code.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* last removal of unused code.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* remove createSparse methods. (#92 )
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* Various ND4J/DL4J fixes (#90 )
* Deprecate Old*Op instances
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8063 #8054 Broadcast exceptions + cleanup inplace ops
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Remove bad test condition
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #7993 Fix shape function issue in crop_and_resize op
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* DL4J SameDiff lambda layer fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8029 Fix for pnorm backprop math
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8038 Fix Op profiler NaN/Inf triggering + add tests (#93 )
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* createUninitializedDetached refactoring. (#94 )
* wip
* update interface, add null implementations.
* Breaking one test in a weird way.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* createUninitializedDetached refactored.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* cuda build fix for issues introduced by recent refactoring
Signed-off-by: raver119 <raver119@gmail.com>
* [WIP] More of CUDA (#95 )
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* Implementation of hashcode cuda helper. Working edition.
* Fixed parallel test input arangements.
* Fixed tests for hashcode op.
* Fixed shape calculation for image:crop_and_resize op and test.
* NativeOps tests. Initial test suite.
* Added tests for indexReduce methods.
* Added test on execBroadcast with NDArray as dimensions.
* Added test on execBroadcastBool with NDArray as dimensions.
* Added tests on execPairwiseTransform and execPairwiseTransofrmBool.
* Added tests for execReduce with scalar results.
* Added reduce tests for non-empty dims array.
* Added tests for reduce3.
* Added tests for execScalar.
* Added tests for execSummaryStats.
* - provide cpu/cuda code for batch_to_space
- testing it
Signed-off-by: Yurii <yurii@skymind.io>
* - remove old test for batch_to_space (had wrong format and numbers were not checked)
Signed-off-by: Yurii <yurii@skymind.io>
* Fixed complilation errors with test.
* Added test for execTransformFloat.
* Added test for execTransformSame.
* Added test for execTransformBool.
* Added test for execTransformStrict.
* Added tests for execScalar/execScalarBool with TADs.
* Added test for flatten.
* - provide cpu/cuda code for space_to_Batch operaion
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for concat.
* comment unnecessary stuff in s_t_b
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for specialConcat.
* Added tests for memcpy/set routines.
* Fixed pullRow cuda test.
* Added pullRow test.
* Added average test.
* - correct typo in NDArray::applyPairwiseTransform(nd4j::pairwise::BoolOps op...)
Signed-off-by: Yurii <yurii@skymind.io>
* - debugging and fixing cuda tests in JavaInteropTests file
Signed-off-by: Yurii <yurii@skymind.io>
* - correct some tests
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for shuffle.
* Fixed ops declarations.
* Restored omp and added shuffle test.
* Added convertTypes test.
* Added tests for execRandom. Eliminated usage of RandomBuffer with NativeOps.
* Added sort tests.
* Added tests for execCustomOp.
* - further debuging and fixing tests terminated with crash
Signed-off-by: Yurii <yurii@skymind.io>
* Added tests for calculateOutputShapes.
* Addded Benchmarks test.
* Commented benchmark tests.
* change assertion
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for apply_sgd op. Added cpu helper for that op.
* Implement cuda helper for aplly_sgd op. Fixed tests for NativeOps.
* Added test for assign broadcastable.
* Added tests for assign_bp op.
* Added tests for axpy op.
* - assign/execScalar/execTransformAny signature change
- minor test fix
Signed-off-by: raver119 <raver119@gmail.com>
* Fixed axpy op.
* meh
Signed-off-by: raver119 <raver119@gmail.com>
* - fix tests for nativeOps::concat
Signed-off-by: Yurii <yurii@skymind.io>
* sequential transform/scalar
Signed-off-by: raver119 <raver119@gmail.com>
* allow nested parallelism
Signed-off-by: raver119 <raver119@gmail.com>
* assign_bp leak fix
Signed-off-by: raver119 <raver119@gmail.com>
* block setRNG fix
Signed-off-by: raver119 <raver119@gmail.com>
* enable parallelism by default
Signed-off-by: raver119 <raver119@gmail.com>
* enable nested parallelism by default
Signed-off-by: raver119 <raver119@gmail.com>
* Added cuda implementation for row_count helper.
* Added implementation for tnse gains op helper.
* - take into account possible situations when input arrays are empty in reduce_ cuda stuff
Signed-off-by: Yurii <yurii@skymind.io>
* Implemented tsne/edge_forces op cuda-based helper. Parallelized cpu-based helper for edge_forces.
* Added kernel for tsne/symmetrized op heleper.
* Implementation of tsne/symmetrized op cuda helper. Working edition.
* Eliminated waste printfs.
* Added test for broadcastgradientargs op.
* host-only fallback for empty reduce float
Signed-off-by: raver119 <raver119@gmail.com>
* - some tests fixes
Signed-off-by: Yurii <yurii@skymind.io>
* - correct the rest of reduce_ stuff
Signed-off-by: Yurii <yurii@skymind.io>
* - further correction of reduce_ stuff
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for Cbow op. Also added cuda implementation for cbow helpers.
* - improve code of stack operation for scalar case
Signed-off-by: Yurii <yurii@skymind.io>
* - provide cuda kernel for gatherND operation
Signed-off-by: Yurii <yurii@skymind.io>
* Implementation of cbow helpers with cuda kernels.
* minor tests tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* minor tests tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* - further correction of cuda stuff
Signed-off-by: Yurii <yurii@skymind.io>
* Implementatation of cbow op helper with cuda kernels. Working edition.
* Skip random testing for cudablas case.
* lstmBlockCell context fix
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for ELU and ELU_BP ops.
* Added tests for eq_scalar, gt_scalar, gte_scalar and lte_scalar ops.
* Added tests for neq_scalar.
* Added test for noop.
* - further work on clipbynorm_bp
Signed-off-by: Yurii <yurii@skymind.io>
* - get rid of concat op call, use instead direct concat helper call
Signed-off-by: Yurii <yurii@skymind.io>
* lstmBlockCell context fix
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for lrelu and lrelu_bp.
* Added tests for selu and selu_bp.
* Fixed lrelu derivative helpers.
* - some corrections in lstm
Signed-off-by: Yurii <yurii@skymind.io>
* operator * result shape fix
Signed-off-by: raver119 <raver119@gmail.com>
* - correct typo in lstmCell
Signed-off-by: Yurii <yurii@skymind.io>
* few tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA inverse broadcast bool fix
Signed-off-by: raver119 <raver119@gmail.com>
* disable MMAP test for CUDA
Signed-off-by: raver119 <raver119@gmail.com>
* BooleanOp syncToDevice
Signed-off-by: raver119 <raver119@gmail.com>
* meh
Signed-off-by: raver119 <raver119@gmail.com>
* additional data types for im2col/col2im
Signed-off-by: raver119 <raver119@gmail.com>
* Added test for firas_sparse op.
* one more RandomBuffer test excluded
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for flatten op.
* Added test for Floor op.
* bunch of tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* mmulDot tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* Implemented floordiv_bp op and tests.
* Fixed scalar case with cuda implementation for bds.
* - work on cuda kernel for clip_by_norm backprop op is completed
Signed-off-by: Yurii <yurii@skymind.io>
* Eliminate cbow crach.
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* Eliminated abortion with batched nlp test.
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* Fixed shared flag initializing.
* disabled bunch of cpu workspaces tests
Signed-off-by: raver119 <raver119@gmail.com>
* scalar operators fix: missing registerSpecialUse call
Signed-off-by: raver119 <raver119@gmail.com>
* Fixed logdet for cuda and tests.
* - correct clipBynorm_bp
Signed-off-by: Yurii <yurii@skymind.io>
* Fixed crop_and_resize shape datatype.
* - correct some mmul tests
Signed-off-by: Yurii <yurii@skymind.io>
* build fix
Signed-off-by: raver119 <raver119@gmail.com>
* exclude two methods for JNI
Signed-off-by: raver119 <raver119@gmail.com>
* exclude two methods for JNI
Signed-off-by: raver119 <raver119@gmail.com>
* exclude two methods for JNI (#97 )
Signed-off-by: raver119 <raver119@gmail.com>
* temporary stack fix
Signed-off-by: raver119 <raver119@gmail.com>
* couple of legacy groups reorganized into separate compialtion units
Signed-off-by: raver119 <raver119@gmail.com>
* wrong include
Signed-off-by: raver119 <raver119@gmail.com>
* wrong include
Signed-off-by: raver119 <raver119@gmail.com>
* ReductionLoops_float split
Signed-off-by: raver119 <raver119@gmail.com>
* maximum
Signed-off-by: raver119 <raver119@gmail.com>
* some more rearrangements
Signed-off-by: raver119 <raver119@gmail.com>
* spare ifdef
Signed-off-by: raver119 <raver119@gmail.com>
* mirror pad
Signed-off-by: raver119 <raver119@gmail.com>
* - reduce_float split
- mcmodel
Signed-off-by: raver119 <raver119@gmail.com>
* bad include fix
Signed-off-by: raver119 <raver119@gmail.com>
* norelax
Signed-off-by: raver119 <raver119@gmail.com>
* norelax
Signed-off-by: raver119 <raver119@gmail.com>
* norelax
Signed-off-by: raver119 <raver119@gmail.com>
* norelax
Signed-off-by: raver119 <raver119@gmail.com>
* norelax
Signed-off-by: raver119 <raver119@gmail.com>
* norelax gone
Signed-off-by: raver119 <raver119@gmail.com>
* get back sm
Signed-off-by: raver119 <raver119@gmail.com>
* fix couple of tests for msvc
Signed-off-by: raver119 <raver119@gmail.com>
* fix couple of tests for msvc
Signed-off-by: raver119 <raver119@gmail.com>
* compress-all
Signed-off-by: raver119 <raver119@gmail.com>
* reduced arch list
Signed-off-by: raver119 <raver119@gmail.com>
* compress-all
Signed-off-by: raver119 <raver119@gmail.com>
* reduced arch list
Signed-off-by: raver119 <raver119@gmail.com>
* all compute capabilities option for tests
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-07 17:49:13 +03:00
shugeo
c78f5a8225
Shugeo cuda cuda ( #105 )
...
* Refactored extract_image_patches op helpers.
* Eliminated compliler errors with helper implementation.
* Finished implementation for extract_image_patches both cpu and cuda helpers.
* Improved cpu implementation.
* Improved cuda implementation for extract_image_patches helper.
* Added omp to ClipByGlobalNorm helpers implementation.
* Added implementation for thresholedrelu_bp op.
* Fixed cuda kernel with F order.
* Fixed tests for subarray.
* Refactored tests for Gaussian_3 and Truncated_22.
* Added tests for GaussianDistribution with native ops.
* Modified tests for Gaussian distribution.
* Fixed random tests.
* Fixed atomicMin/atomicMax for 64bit cases.
* Fixed tests for execReduce3TAD tests.
* Eliminated waste comments.
2019-08-07 15:29:17 +03:00