raver119
7f90930e7a
bring back cuda cc 30
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-11-25 09:17:35 +03:00
raver119
064a56ccf1
Few fixes ( #66 )
...
* skip legacy transforms execution in case of empty input arrays
Signed-off-by: raver119 <raver119@gmail.com>
* - BroadcastBool ops now accept extraParams to make MatchCondition possible
- TrueBroadcastHelper now uses samediff::threads
Signed-off-by: raver119 <raver119@gmail.com>
* java side
Signed-off-by: raver119 <raver119@gmail.com>
* trigger jenkins
Signed-off-by: raver119 <raver119@gmail.com>
* update LessThanOrEqual opNum mapping
Signed-off-by: raver119 <raver119@gmail.com>
* update LessThanOrEqual opNum mapping
Signed-off-by: raver119 <raver119@gmail.com>
2019-11-21 15:43:03 +03:00
Yurii Shyrma
66b84b38cf
Shyrma mmul ( #58 )
...
* - get rid of some copy procedures in mmulHelper ops
Signed-off-by: Yurii <iuriish@yahoo.com>
* - further work on embedding cuda api for batched gemm (cublasGemmBatchedEx) in our mmulHelper class
Signed-off-by: Yurii <iuriish@yahoo.com>
* - further work on cuda batched gamm api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - write own cuda kernel performing batched gemm
Signed-off-by: Yurii <iuriish@yahoo.com>
* missing include in MmulHelper
Signed-off-by: raver119 <raver119@gmail.com>
* - forgot to keep in code previous correct kernels for mmulNxN, since it may happen that new onw will fail for some reason in future
Signed-off-by: Yurii <iuriish@yahoo.com>
* disable old tensordot
Signed-off-by: raver119 <raver119@gmail.com>
* - rewrite cuda kernels for usualGemm and usualGemv
Signed-off-by: Yurii <iuriish@yahoo.com>
* - profiling mmul helpers
Signed-off-by: Yurii <iuriish@yahoo.com>
* - prints to check shapes were added
Signed-off-by: Yurii <iuriish@yahoo.com>
* - correct type of output array Cin mmulNxN
Signed-off-by: Yurii <iuriish@yahoo.com>
* - take into account possible nans in C array
Signed-off-by: Yurii <iuriish@yahoo.com>
* slightly change numThreads message
Signed-off-by: raver119 <raver119@gmail.com>
* - make corrections in accordance to given notes in pr review
Signed-off-by: Yurii <iuriish@yahoo.com>
2019-11-19 15:39:36 +02:00
raver119
1eb3de90d7
[WIP] Platform helpers switches ( #44 )
...
* - platform helpers can be disabled on per-op basis now via Context::allowHelpers
- java has access to it as well
Signed-off-by: raver119 <raver119@gmail.com>
* global platform-helpers trigger
Signed-off-by: raver119 <raver119@gmail.com>
* few signatures renamed
Signed-off-by: raver119 <raver119@gmail.com>
* - few new env variables to follow
- maxThreads/masterThreads differentiation
Signed-off-by: raver119 <raver119@gmail.com>
* Javadoc update
Signed-off-by: raver119 <raver119@gmail.com>
2019-11-14 14:35:02 +03:00
raver119
48df1acdfb
[WIP] ThreadPool ( #8 )
...
This PR removes OpenMP use in 95% of cases
2019-11-13 17:04:59 +03:00
raver119
929c1dc5c7
- new NDArrayFactory scalar constructor
...
- minor tweak in randomuniform
- one more test
Signed-off-by: raver119 <raver119@gmail.com>
2019-11-08 08:49:41 +03:00
shugeo
08853c7829
Shugeo random uniform int ( #30 )
...
* Corrected randomuniform declaration.
* Refactored uniform distribution for both cuda and cpu platforms.
* Refactored uniform distribution and tests.
* Fixed type usage with indices.
* Refactored uniform distribution implementation and tests to full conform with TF implementation.
* Refactored gamma function to use type util method.
* Copyright changes and fixes with ConstantHelper.
* Added error checking on allocate cuda device memory and operations.
2019-11-06 12:49:27 +02:00
raver119
c94013f0a1
cc 52 -> 50
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-11-03 09:54:35 +03:00
raver119
879a06c913
few typos fixed
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-11-01 09:13:15 +03:00
Alexander Stoyakin
45a40c8a89
DL4J/ND4J: Do pass on integer casts ( #15 )
...
* Int cast fixes.
* Revert "Int cast fixes."
This reverts commit aa36e8ca
* Int casts
* Int cast
* Int casts
* Get rid of int casts. Dropping deprecated aggregate ops.
* java scatterUpdate changes
Signed-off-by: raver119 <raver119@gmail.com>
* c++ scatterUpdate changes
Signed-off-by: raver119 <raver119@gmail.com>
* Remove aggregated ops.
* Restored test
* Tests restored.
* Minor fixes
2019-10-31 11:23:09 +02:00
Alex Black
d333d29099
SameDiff cleanup and fixes ( #12 )
...
* #8160 Remove resolvePrepertiesFromSameDiffBeforeExecution
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* SameDiff API cleanup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More SameDiff cleanup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8248 Switch SameDiff variable init from lazy to creation time for more predictable behaviour
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8252 TanhDerivative javadoc
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8225 Deconvolution2D input validation
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8265 Switch SameDiff.outputs() to user settable, instead of unreliable 'best guess'
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8224 SameDiff.zero and .one create constants, not variables
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More cleanup and fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small test fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* DL4J SameDiff fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Re-add hack for Deconvolution2DLayer until #8315 is resolved
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8270 Move CUDA device/version logging to Java; can be disabled via existing org.nd4j.log.initialization system property
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* All ND4J init logging checks system property
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small tweak
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Remove redundant device logging
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* One more fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* UX improvements
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Deconv fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Add deconv tests
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Cleanup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Remove debug code
Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-10-26 12:38:08 +11:00
Yurii
8f3eaebda5
- replace condition isScalar() by condition length ==1 in some NDArray methodds
...
Signed-off-by: Yurii <iuriish@yahoo.com>
2019-10-21 16:25:13 +03:00
Yurii
70bd925abd
- write 2 versions of new lstmLayer: one is based on own code, second uses mkl dnn api
2019-10-17 20:44:52 +03:00
AlexDBlack
2d750b69e5
Merge remote-tracking branch 'konduit/master'
2019-10-14 17:21:23 +11:00
shugeo
78443ffebf
Working implementation of draw_bounding_boxes op for cpu.
2019-10-07 15:04:44 +03:00
raver119
44a8d19ac6
[WIP] Broadcast changes ( #8257 )
...
* - provide correct call NDArray::applyBroadcast inside of NDArray::applyTrueBroadcast
Signed-off-by: Yurii <yurii@skymind.io>
* - provide new trueBroadcast helper
Signed-off-by: Yurii <yurii@skymind.io>
* example for yurii
Signed-off-by: raver119 <raver119@gmail.com>
* - provide new trueBroadcast helper for cpu
Signed-off-by: Yurii <yurii@skymind.io>
* - start working on new trueBroadcat helper for cuda
Signed-off-by: Yurii <yurii@skymind.io>
* - further work on trueBroadcast for cuda
Signed-off-by: Yurii <yurii@skymind.io>
* - fix bugs in cuda helper trueBroadcast
Signed-off-by: Yurii <yurii@skymind.io>
2019-10-01 09:10:19 +03:00
AlexDBlack
a66e03355e
Merge remote-tracking branch 'fork/master'
2019-09-12 12:20:57 +10:00
raver119
98e2814879
Platform helpers ( #8216 )
...
* platform helpers draft
Signed-off-by: raver119 <raver119@gmail.com>
* typo
Signed-off-by: raver119 <raver119@gmail.com>
* disable platform cmake
Signed-off-by: raver119 <raver119@gmail.com>
* another draft
Signed-off-by: raver119 <raver119@gmail.com>
* mkldnn convolution refactored
Signed-off-by: raver119 <raver119@gmail.com>
* minor tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* one more safety check
Signed-off-by: raver119 <raver119@gmail.com>
* prototype works
Signed-off-by: raver119 <raver119@gmail.com>
* meh
Signed-off-by: raver119 <raver119@gmail.com>
* force static library mode for mkldnn
Signed-off-by: raver119 <raver119@gmail.com>
* - ismax fix
- experimental arg fix
- don't enforce openblas on Apple hardware
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of small fixes
Signed-off-by: raver119@gmail.com <raver119@gmail.com>
* declare concurrent
Signed-off-by: raver119@gmail.com <raver119@gmail.com>
* - MKLDNN version upgrade to 1.0.2
- avgpool2d/maxpool2d APIs update
Signed-off-by: raver119 <raver119@gmail.com>
* - avgpool2d_bp/maxpool2d_bp APIs update
Signed-off-by: raver119 <raver119@gmail.com>
* - conv2d/batchnorm APIs update
Signed-off-by: raver119 <raver119@gmail.com>
* - lrn/conv2d_bp/conv3d/conv3d_bp APIs update
Signed-off-by: raver119 <raver119@gmail.com>
* all ops converted to MKLDNN 1.x
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* namespace for platform helpers
Signed-off-by: raver119 <raver119@gmail.com>
* make sure platform helpers aren't opimized out
Signed-off-by: raver119 <raver119@gmail.com>
* build cpu_features on x86 systems
Signed-off-by: raver119 <raver119@gmail.com>
* build cpu_features on x86 systems
Signed-off-by: raver119 <raver119@gmail.com>
* more of cpu_features
Signed-off-by: raver119 <raver119@gmail.com>
* - mkldnn removed from java
- cpu_features checks in CpuNDArrayFactory
Signed-off-by: raver119 <raver119@gmail.com>
* F16C definition renamed
Signed-off-by: raver119 <raver119@gmail.com>
* some mkldnn rearrangements
Signed-off-by: raver119 <raver119@gmail.com>
* check supported instructions before doing anything
Signed-off-by: raver119 <raver119@gmail.com>
* typo
Signed-off-by: raver119 <raver119@gmail.com>
* missied impl
Signed-off-by: raver119 <raver119@gmail.com>
* BUILD_PIC option
Signed-off-by: raver119 <raver119@gmail.com>
* conv2d fix
Signed-off-by: raver119 <raver119@gmail.com>
* avgpool3d fix
Signed-off-by: raver119 <raver119@gmail.com>
* avgpool3d_bp fix
Signed-off-by: raver119 <raver119@gmail.com>
* avgpool2d_bp leak fix
Signed-off-by: raver119 <raver119@gmail.com>
* avgpool3d_bp leak fix
Signed-off-by: raver119 <raver119@gmail.com>
* maxpool bp leaks fixed
Signed-off-by: raver119 <raver119@gmail.com>
* printf removed
Signed-off-by: raver119 <raver119@gmail.com>
* batchnorm fix
Signed-off-by: raver119 <raver119@gmail.com>
* AVX warning/error polishing
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More polish
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Polish
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* remove previous MKL-DNN support layer
Signed-off-by: raver119 <raver119@gmail.com>
* avx2 tweak
Signed-off-by: raver119 <raver119@gmail.com>
* allow static for apple
Signed-off-by: raver119@gmail.com <raver119@gmail.com>
* exclude mkldnn in one more place
Signed-off-by: raver119 <raver119@gmail.com>
* exclude mkldnn in one more place
Signed-off-by: raver119 <raver119@gmail.com>
* restore OPENBLAS_PATH use
Signed-off-by: raver119 <raver119@gmail.com>
* add runtime check for avx/avx2 support
Signed-off-by: raver119 <raver119@gmail.com>
* convolution_auto
Signed-off-by: raver119 <raver119@gmail.com>
* Add logic for helper argument
* minor test fix
Signed-off-by: raver119 <raver119@gmail.com>
* few tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* few tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* skip OpTracker props for non-x86 builds
Signed-off-by: raver119 <raver119@gmail.com>
* linux arm isn't x86 :)
Signed-off-by: raver119 <raver119@gmail.com>
* avx-512
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA presets fix
Signed-off-by: raver119 <raver119@gmail.com>
* BUILD_PIC
Signed-off-by: raver119 <raver119@gmail.com>
* prefetchw for avx2
Signed-off-by: raver119 <raver119@gmail.com>
* BUILD_PIC again
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-11 21:50:28 +03:00
raver119
589401477d
[WIP] bunch of improvements ( #257 )
...
* - profiling bias_add op
- add some docementation
Signed-off-by: Yurii <yurii@skymind.io>
* - minor change
Signed-off-by: Yurii <yurii@skymind.io>
* - provide addBias cuda kernel
Signed-off-by: Yurii <yurii@skymind.io>
* - improve shape::getIndexOfffset and change its signature
Signed-off-by: Yurii <yurii@skymind.io>
* - same as previous
Signed-off-by: Yurii <yurii@skymind.io>
* - improve and change signature in some shape:: stuff which has to do with calculation of offsets for array elements
Signed-off-by: Yurii <yurii@skymind.io>
* - minor changes in flatten
Signed-off-by: Yurii <shyrma@skymind.io>
* - add function shape::getIndexOffsetOrdered
Signed-off-by: Yurii <shyrma@skymind.io>
* - correct shape::getIndexOffsetOrdered()
Signed-off-by: Yurii <shyrma@skymind.io>
* - move getIndexOffsetOrdered to flatten.h header in order to isolate this function
Signed-off-by: Yurii <shyrma@skymind.io>
2019-09-11 20:12:09 +03:00
AlexDBlack
b7226bdd7a
Merge
...
Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-09-05 00:54:11 +10:00
raver119
a90c7dd995
[WIP] Last set of changes ( #234 )
...
* mmul op instead of cublasSgemm
Signed-off-by: raver119 <raver119@gmail.com>
* transB
Signed-off-by: raver119 <raver119@gmail.com>
* jcpp handles
Signed-off-by: raver119 <raver119@gmail.com>
* bitwise and/or/xor
Signed-off-by: raver119 <raver119@gmail.com>
* bitwise and/or/xor mapping
Signed-off-by: raver119 <raver119@gmail.com>
* cuda/cublas version check
Signed-off-by: raver119 <raver119@gmail.com>
* add expected version
Signed-off-by: raver119 <raver119@gmail.com>
* cuda/cublas version check in java
Signed-off-by: raver119 <raver119@gmail.com>
* one more error check
Signed-off-by: raver119 <raver119@gmail.com>
* build fix
Signed-off-by: raver119 <raver119@gmail.com>
* build fix
Signed-off-by: raver119 <raver119@gmail.com>
* build fix
Signed-off-by: raver119 <raver119@gmail.com>
* one more fix
Signed-off-by: raver119 <raver119@gmail.com>
* skip CUDA version check for now
Signed-off-by: raver119 <raver119@gmail.com>
* better wording
Signed-off-by: raver119 <raver119@gmail.com>
* few more tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* few more tweaks
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-04 14:41:08 +03:00
Alex Black
6cc887bee9
Rename flatbuffers DataType to DType ( #228 )
...
* Rename flatbuffers DataType enum to DType
Signed-off-by: Alex Black <blacka101@gmail.com>
* Rename flatbuffers DataType enum to DType
Signed-off-by: Alex Black <blacka101@gmail.com>
* Updates for flatbuffers datatype enum renaming
Signed-off-by: Alex Black <blacka101@gmail.com>
2019-09-04 16:36:11 +10:00
raver119
f6f9437a36
get back cc 7.0 support for cuda 9.2
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-03 09:26:35 +03:00
raver119
e42c34ca55
[WIP] minor ( #218 )
...
* - initial docs commit
- merge* cuda fix
Signed-off-by: raver119 <raver119@gmail.com>
* one more fix
Signed-off-by: raver119 <raver119@gmail.com>
* one more fix
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-02 11:25:48 +03:00
raver119
b71c993ded
[WIP] maxpool_bp cuda fix ( #212 )
...
* one test for alex
Signed-off-by: raver119 <raver119@gmail.com>
* fix
Signed-off-by: raver119 <raver119@gmail.com>
* get rid of safety offset in cpp
Signed-off-by: raver119 <raver119@gmail.com>
* bfloat16
Signed-off-by: raver119 <raver119@gmail.com>
* minor test rearrangement to fastpath launch
Signed-off-by: raver119 <raver119@gmail.com>
* - atomicAdd/Mul/Div fix for float16/bfloat16 misalignment
- one special test for maxpoolbp java
- safety offset of 8 bytes is back to libnd4j legacy
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-31 20:57:05 +03:00
raver119
1003428a18
[WIP] Int broadcastables ( #195 )
...
* Removed invalid resource and fixed tests
Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
* legacy scalar/pairwise/broadcast int ops
Signed-off-by: raver119 <raver119@gmail.com>
* NDArray int broadcastables
Signed-off-by: raver119 <raver119@gmail.com>
* few more bitwise tests
Signed-off-by: raver119 <raver119@gmail.com>
* java side update
Signed-off-by: raver119 <raver119@gmail.com>
* Argument type changed for shift ops
Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
* legacy scalar/pairwise/broadcast int ops
Signed-off-by: raver119 <raver119@gmail.com>
* NDArray int broadcastables
Signed-off-by: raver119 <raver119@gmail.com>
* few more bitwise tests
Signed-off-by: raver119 <raver119@gmail.com>
* java side update
Signed-off-by: raver119 <raver119@gmail.com>
* Argument type changed for shift ops
Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
2019-08-30 10:12:40 +03:00
raver119
b472d7d8c8
[WIP] few more fixes ( #182 )
...
* one noop test
Signed-off-by: raver119 <raver119@gmail.com>
* skip input validation for no-input ops
Signed-off-by: raver119 <raver119@gmail.com>
* - one more noop empty test
- one more validation before sync
Signed-off-by: raver119 <raver119@gmail.com>
* typo
Signed-off-by: raver119 <raver119@gmail.com>
* one more validation fix
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA empty reductions java side
Signed-off-by: raver119 <raver119@gmail.com>
* one svd test
Signed-off-by: raver119 <raver119@gmail.com>
* Corrected segment_mean helpers and added another test.
* Refactored segment_mean kernels to avoid race_condition.
2019-08-27 21:00:38 +03:00
raver119
df84bc7255
[WIP] More tweaks ( #173 )
...
* CUDA empty reduction
Signed-off-by: raver119 <raver119@gmail.com>
* - listdiff synchronization fix for CUDA
- listdiff test
Signed-off-by: raver119 <raver119@gmail.com>
* - IndexReduce ops now allow INDEXING_TYPES output
- topK op accepts only INDEXING_TYPES as output
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-27 10:37:10 +03:00
raver119
25e5c23eae
[WIP] Error handling ( #169 )
...
* CUDA reverse rewrite + couple of tests
Signed-off-by: raver119 <raver119@gmail.com>
* don't throw exception on invalid pointer
Signed-off-by: raver119 <raver119@gmail.com>
* data types validation for fastpath exec mode + 2 tests
Signed-off-by: raver119 <raver119@gmail.com>
* data types validation for fastpath exec mode + 2 tests
Signed-off-by: raver119 <raver119@gmail.com>
* ismax allowed dtypes tweak
Signed-off-by: raver119 <raver119@gmail.com>
* lastErrorCode + lastErrorMessage for native exceptions handling
Signed-off-by: raver119 <raver119@gmail.com>
* exportable ErrorReference
Signed-off-by: raver119 <raver119@gmail.com>
* check error codes in java
Signed-off-by: raver119 <raver119@gmail.com>
* - consume lastErrorCode
- fast_in dtype validation fix
Signed-off-by: raver119 <raver119@gmail.com>
* - sg/cb allowed output type change
- minor logging fix for data type validation
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-26 19:57:51 +03:00
raver119
daf5420f4c
cmake fix for windows debug build
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-26 08:13:22 +03:00
raver119
b091e972ef
- string NDArray flat serde impl + tests ( #163 )
...
- string NDArray equalsTo impl
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-24 14:16:34 +03:00
raver119
f8364997c0
[WIP] maxpool2d_bp fix ( #160 )
...
* one test for maxpool2d_bp
Signed-off-by: raver119 <raver119@gmail.com>
* - maxpool2d_bp cuda fix for NaNs
- streamSync after each custom op execution
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-24 09:20:57 +03:00
raver119
eea3062ccf
[WIP] stb/bts nd ( #144 )
...
* - start working on space_to_batch_nd
Signed-off-by: Yurii <yurii@skymind.io>
* - provide cpu helper for space_to_batch_nd op
Signed-off-by: Yurii <yurii@skymind.io>
* few typos fixed
Signed-off-by: raver119 <raver119@gmail.com>
* - add tests for space_to_batch and correct bugs
Signed-off-by: Yurii <yurii@skymind.io>
* - write cuda kernel for space_to_batch op
Signed-off-by: Yurii <yurii@skymind.io>
* - add order argument to shape::index2coords method in convolution cuda ops
Signed-off-by: Yurii <yurii@skymind.io>
* - restore some previous code
Signed-off-by: Yurii <yurii@skymind.io>
* old col2im kernel activated
Signed-off-by: raver119 <raver119@gmail.com>
* - change coords calculation in col2im kernel
Signed-off-by: Yurii <yurii@skymind.io>
* - restore old col2im kernel
Signed-off-by: Yurii <yurii@skymind.io>
* - add custom op for batch_to_space
Signed-off-by: Yurii <yurii@skymind.io>
* - provide cpu version for batch_to_space_nd op
Signed-off-by: Yurii <yurii@skymind.io>
* - provide cuda kernel for batch_to_space_nd op
Signed-off-by: Yurii <yurii@skymind.io>
2019-08-21 21:11:46 +03:00
raver119
e604ffe0d2
[WIP] repeat op ( #143 )
...
* - write new repeat helper (cpu)
Signed-off-by: Yurii <yurii@skymind.io>
* - update NDArray::cpu
Signed-off-by: Yurii <yurii@skymind.io>
* - update NDArray::repeat cuda
Signed-off-by: Yurii <yurii@skymind.io>
2019-08-21 21:10:29 +03:00
raver119
d9ab299759
[WIP] Minor fixes ( #140 )
...
* - Tile java shape fn removed
- Tile 0 validation added
- scatter_upd test
Signed-off-by: raver119 <raver119@gmail.com>
* additional tile validation
Signed-off-by: raver119 <raver119@gmail.com>
* - provide vector case in cuda scatter op
Signed-off-by: Yurii <yurii@skymind.io>
* cpu ismax view fix
Signed-off-by: raver119 <raver119@gmail.com>
* exp
Signed-off-by: raver119 <raver119@gmail.com>
* cuda ismax fix
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-21 15:05:47 +03:00
raver119
4310e87860
include path fix for java
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-21 07:32:21 +03:00
raver119
269d508ba5
[WIP] cross-device migrations ( #134 )
...
* two more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA device afinity tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* minor tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* prepareAction/registerAction for CustomOps
Signed-off-by: raver119 <raver119@gmail.com>
* lazy allocate host bufer before relocation
Signed-off-by: raver119 <raver119@gmail.com>
* one special test for migration in cpp
Signed-off-by: raver119 <raver119@gmail.com>
* tests update for msvc
Signed-off-by: raver119 <raver119@gmail.com>
* logging
Signed-off-by: raver119 <raver119@gmail.com>
* stick to old col2im impl
Signed-off-by: raver119 <raver119@gmail.com>
* cudaStreams reorganization
Signed-off-by: raver119 <raver119@gmail.com>
* buffer size fix
Signed-off-by: raver119 <raver119@gmail.com>
* c++ data migration
Signed-off-by: raver119 <raver119@gmail.com>
* fix CropAndResize test
Signed-off-by: raver119 <raver119@gmail.com>
* - minor improvment
Signed-off-by: Yurii <yurii@skymind.io>
2019-08-20 18:52:41 +03:00
raver119
23c8738d4a
syncthreads ( #136 )
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-20 18:28:43 +03:00
raver119
aceb915557
[WIP] tests fixes ( #130 )
...
* no openmp for ClipByGlobalNorm
Signed-off-by: raver119 <raver119@gmail.com>
* one more bfloat16 rng test
Signed-off-by: raver119 <raver119@gmail.com>
* assertion fix
Signed-off-by: raver119 <raver119@gmail.com>
* - legacy IsMax gone
- linear IsMax gets shapeInfo argument
Signed-off-by: raver119 <raver119@gmail.com>
* get rid of legacy IsMax tests
Signed-off-by: raver119 <raver119@gmail.com>
* IsMax is custom op now
Signed-off-by: raver119 <raver119@gmail.com>
* more blocks for ismax
Signed-off-by: raver119 <raver119@gmail.com>
* one more test
Signed-off-by: raver119 <raver119@gmail.com>
* - sqrt test
- some legacy code removed from CudaExecutioner
- Transforms.asin tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* - TransformFloat fix
Signed-off-by: raver119 <raver119@gmail.com>
* - ismax fix
- SpaceToBatchND/BatchToSpaceND wrappers
- couple of legacy tests removed
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-19 11:33:15 +03:00
shugeo
f083b96c74
Shugeo cuda tests ( #116 )
...
* Added tests for get_seed/set_seed ops.
* Added missed tests for scatter_sub/mul/div ops.
* Added tests for hardsigmoid and hardsigmoid_bp.
* Added tests for hardtanh and hardtanh_bp ops.
* Added test for histogram op.
* Added tests for identity op.
* Refactored mergemaxindex op. Added tests for log1p,mergemaxindex, mod and mod_bp ops.
* Fixed tests for FloorDiv.
* Added test for rank op.
* Added tests for rationaltanh/rationaltanh_bp ops.
* Added tests for realdiv/realdiv_bp.
* Added tests for rectifiedtanh/_bp ops.
* Added tests for shapes_of op.
* Added tests for shapes_of op.
* Added tests for size op.
* Added tests for softplus/_bp ops.
* Added tests for softsign/_bp ops.
* Added tests for toggle_bits op. Fixed processing of OP_IMPL and so on defititions.
* Added test for truncatediv op.
* Added another test for truncatediv op.
* Added another test for histogram.
* Added tests for unstack_list op.
* Refactored to_int32/uint32/float16/float32/double/int64/uint64 ops and tests.
* Refactored mergemaxindex op helper for cuda platform and tests.
* Fixed cuda kernel for histogram op helper.
* Refactor skipgram to avoid early buffers shift.
* Fixed check up with non_max_suppression op cuda helper. Added cuda kernel implementation for skipgram op helpers.
* Added implementation of skipgram op helper for cuda platform. Working revision
* Fixed mergeMaxIndex kernel and move it to separate source file.
2019-08-15 13:54:47 +03:00
raver119
6264530dd8
[WIP] bitwise ops ( #115 )
...
* - cyclic_shift_bits + test
- shift_bits + test
Signed-off-by: raver119 <raver119@gmail.com>
* OMP_IF replacement
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-15 11:49:50 +03:00
raver119
c7277729e9
few fixes for bfloat16 in java and cpp ( #114 )
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-14 21:51:42 +03:00
raver119
53ca9a76e8
[WIP] multi-device support ( #80 )
...
* fix pad javadoc and @see links. (#72 )
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* [WIP] More fixes (#73 )
* special tests for ConstantTadHelper/ConstantShapeHelper
Signed-off-by: raver119 <raver119@gmail.com>
* release methods for data buffers
Signed-off-by: raver119 <raver119@gmail.com>
* delete temporary buffer Java side
Signed-off-by: raver119 <raver119@gmail.com>
* delete temporary buffer Java side
Signed-off-by: raver119 <raver119@gmail.com>
* delete temporary TadPack C++/Java side (#74 )
Signed-off-by: raver119 <raver119@gmail.com>
* Zoo model TF import test updates (#75 )
* argLine fix, update compression_gru comment
* updated comment for xception
* undid but commented argLine change
* updated xlnet comment
* copyright headers
* - new NDArray methods like()/ulike() (#77 )
- fix for depthwise_conv2d_bp + special test
Signed-off-by: raver119 <raver119@gmail.com>
* upsampling2d fix CUDA
Signed-off-by: raver119 <raver119@gmail.com>
* DL4J trace logging (#79 )
* MLN/CG trace logging for debugging
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Tiny tweak
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* strided_slice_bp shape fn leak fix
Signed-off-by: raver119 <raver119@gmail.com>
* SameDiff fixes and naming (#78 )
* remove SDVariable inplace methods
* import methods
* npe fix in OpVal
* removed SameDiff inplace ops from tests
* Naming updates, moved to centralized methods in SameDiff, should use op_#:# for everything
* quick fixes
* javadoc
* SDVariable eval with placeholders
* use regex match
* better matching
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* fix javadoc. (#76 )
* fix javadoc.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* replace most @see with @link s.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* 4 additional tests
Signed-off-by: raver119 <raver119@gmail.com>
* launch context reorganization
Signed-off-by: raver119 <raver119@gmail.com>
* LaunchContext reorganization
Signed-off-by: raver119 <raver119@gmail.com>
* per-device LaunchContext
Signed-off-by: raver119 <raver119@gmail.com>
* Various DL4J/ND4J fixes (#81 )
* #7954 Force refresh of UI when switching tabs on overview page
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8017 Concurrent modification exception (synchronize) fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8033 Don't initialize updater in middle of writing memory crash dump
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8208 Fix shape checks for ND4J int[] creator methods
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #6385 #7992 Keras import naming fixes + cleanup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8016 Upsampling3D - add NDHWC format support
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* ContextBuffers as separate entity
Signed-off-by: raver119 <raver119@gmail.com>
* Refactor NativeOps.h to export C functions
* Actually export functions from NativeOps.h
* Adapt the Java wrappers in ND4J generated with JavaCPP
* Create C wrappers for some of the C++ classes currently used by ND4J
* ContextBuffers as separate entity
Signed-off-by: raver119 <raver119@gmail.com>
* remove duplicate code in createBufferDetached. (#83 )
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* Keras model import - updater lr fix (#84 )
* Keras model import - updater lr fix
Signed-off-by: eraly <susan.eraly@gmail.com>
* Keras model import - updater lr fix, cleanup
Signed-off-by: eraly <susan.eraly@gmail.com>
* ContextBuffers as separate entity
Signed-off-by: raver119 <raver119@gmail.com>
* ContextBuffers as separate entity
Signed-off-by: raver119 <raver119@gmail.com>
* Fix functions of OpaqueVariablesSet
* thread-local buffers/affinity
Signed-off-by: raver119 <raver119@gmail.com>
* thread safety for LaunchContext
Signed-off-by: raver119 <raver119@gmail.com>
* more of thread safety
Signed-off-by: raver119 <raver119@gmail.com>
* one more multi threaded test
Signed-off-by: raver119 <raver119@gmail.com>
* SameDiff Convolution Config validation, better output methods (#82 )
* Conv Config validation & tests
Signed-off-by: Ryan Nett <rnett@skymind.io>
* stackOutputs utility method
Signed-off-by: Ryan Nett <rnett@skymind.io>
* use constructor for validation, support negative kernel sizes (infered from weights)
Signed-off-by: Ryan Nett <rnett@skymind.io>
* better output methods
Signed-off-by: Ryan Nett <rnett@skymind.io>
* move output to be with fit and evaluate
Signed-off-by: Ryan Nett <rnett@skymind.io>
* fixes
Signed-off-by: Ryan Nett <rnett@skymind.io>
* more fixes
Signed-off-by: Ryan Nett <rnett@skymind.io>
* refactor duplicate code from pad methods. (#86 )
* refactor duplicate code from pad methods.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* replace switch with if.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* Various ND4J/DL4J fixes and improvements (#87 )
* Reshape and reallocate - small fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Reshape and reallocate - small fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #6488 ElementWiseVertex broadcast support
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Constructors and broadcast supported it Transforms.max/min
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8054 ElementWiseVertex now supports broadcast inputs
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8057 Nd4j.create overload dtype fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #7551 ND4J Shape validation fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* [WIP] Numpy boolean import (#91 )
* numpy bool type
Signed-off-by: raver119 <raver119@gmail.com>
* numpy bool java side
Signed-off-by: raver119 <raver119@gmail.com>
* remove create method with unused parameter. (#89 )
* remove create method with unused parameter.
* removed more unused methods.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* removing more unused code.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* last removal of unused code.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* remove createSparse methods. (#92 )
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* Various ND4J/DL4J fixes (#90 )
* Deprecate Old*Op instances
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8063 #8054 Broadcast exceptions + cleanup inplace ops
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Remove bad test condition
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #7993 Fix shape function issue in crop_and_resize op
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* DL4J SameDiff lambda layer fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8029 Fix for pnorm backprop math
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8038 Fix Op profiler NaN/Inf triggering + add tests (#93 )
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* createUninitializedDetached refactoring. (#94 )
* wip
* update interface, add null implementations.
* Breaking one test in a weird way.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* createUninitializedDetached refactored.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* cuda build fix for issues introduced by recent refactoring
Signed-off-by: raver119 <raver119@gmail.com>
* [WIP] More of CUDA (#95 )
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* Implementation of hashcode cuda helper. Working edition.
* Fixed parallel test input arangements.
* Fixed tests for hashcode op.
* Fixed shape calculation for image:crop_and_resize op and test.
* NativeOps tests. Initial test suite.
* Added tests for indexReduce methods.
* Added test on execBroadcast with NDArray as dimensions.
* Added test on execBroadcastBool with NDArray as dimensions.
* Added tests on execPairwiseTransform and execPairwiseTransofrmBool.
* Added tests for execReduce with scalar results.
* Added reduce tests for non-empty dims array.
* Added tests for reduce3.
* Added tests for execScalar.
* Added tests for execSummaryStats.
* - provide cpu/cuda code for batch_to_space
- testing it
Signed-off-by: Yurii <yurii@skymind.io>
* - remove old test for batch_to_space (had wrong format and numbers were not checked)
Signed-off-by: Yurii <yurii@skymind.io>
* Fixed complilation errors with test.
* Added test for execTransformFloat.
* Added test for execTransformSame.
* Added test for execTransformBool.
* Added test for execTransformStrict.
* Added tests for execScalar/execScalarBool with TADs.
* Added test for flatten.
* - provide cpu/cuda code for space_to_Batch operaion
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for concat.
* comment unnecessary stuff in s_t_b
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for specialConcat.
* Added tests for memcpy/set routines.
* Fixed pullRow cuda test.
* Added pullRow test.
* Added average test.
* - correct typo in NDArray::applyPairwiseTransform(nd4j::pairwise::BoolOps op...)
Signed-off-by: Yurii <yurii@skymind.io>
* - debugging and fixing cuda tests in JavaInteropTests file
Signed-off-by: Yurii <yurii@skymind.io>
* - correct some tests
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for shuffle.
* Fixed ops declarations.
* Restored omp and added shuffle test.
* Added convertTypes test.
* Added tests for execRandom. Eliminated usage of RandomBuffer with NativeOps.
* Added sort tests.
* Added tests for execCustomOp.
* - further debuging and fixing tests terminated with crash
Signed-off-by: Yurii <yurii@skymind.io>
* Added tests for calculateOutputShapes.
* Addded Benchmarks test.
* Commented benchmark tests.
* change assertion
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for apply_sgd op. Added cpu helper for that op.
* Implement cuda helper for aplly_sgd op. Fixed tests for NativeOps.
* Added test for assign broadcastable.
* Added tests for assign_bp op.
* Added tests for axpy op.
* - assign/execScalar/execTransformAny signature change
- minor test fix
Signed-off-by: raver119 <raver119@gmail.com>
* Fixed axpy op.
* meh
Signed-off-by: raver119 <raver119@gmail.com>
* - fix tests for nativeOps::concat
Signed-off-by: Yurii <yurii@skymind.io>
* sequential transform/scalar
Signed-off-by: raver119 <raver119@gmail.com>
* allow nested parallelism
Signed-off-by: raver119 <raver119@gmail.com>
* assign_bp leak fix
Signed-off-by: raver119 <raver119@gmail.com>
* block setRNG fix
Signed-off-by: raver119 <raver119@gmail.com>
* enable parallelism by default
Signed-off-by: raver119 <raver119@gmail.com>
* enable nested parallelism by default
Signed-off-by: raver119 <raver119@gmail.com>
* Added cuda implementation for row_count helper.
* Added implementation for tnse gains op helper.
* - take into account possible situations when input arrays are empty in reduce_ cuda stuff
Signed-off-by: Yurii <yurii@skymind.io>
* Implemented tsne/edge_forces op cuda-based helper. Parallelized cpu-based helper for edge_forces.
* Added kernel for tsne/symmetrized op heleper.
* Implementation of tsne/symmetrized op cuda helper. Working edition.
* Eliminated waste printfs.
* Added test for broadcastgradientargs op.
* host-only fallback for empty reduce float
Signed-off-by: raver119 <raver119@gmail.com>
* - some tests fixes
Signed-off-by: Yurii <yurii@skymind.io>
* - correct the rest of reduce_ stuff
Signed-off-by: Yurii <yurii@skymind.io>
* - further correction of reduce_ stuff
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for Cbow op. Also added cuda implementation for cbow helpers.
* - improve code of stack operation for scalar case
Signed-off-by: Yurii <yurii@skymind.io>
* - provide cuda kernel for gatherND operation
Signed-off-by: Yurii <yurii@skymind.io>
* Implementation of cbow helpers with cuda kernels.
* minor tests tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* minor tests tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* - further correction of cuda stuff
Signed-off-by: Yurii <yurii@skymind.io>
* Implementatation of cbow op helper with cuda kernels. Working edition.
* Skip random testing for cudablas case.
* lstmBlockCell context fix
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for ELU and ELU_BP ops.
* Added tests for eq_scalar, gt_scalar, gte_scalar and lte_scalar ops.
* Added tests for neq_scalar.
* Added test for noop.
* - further work on clipbynorm_bp
Signed-off-by: Yurii <yurii@skymind.io>
* - get rid of concat op call, use instead direct concat helper call
Signed-off-by: Yurii <yurii@skymind.io>
* lstmBlockCell context fix
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for lrelu and lrelu_bp.
* Added tests for selu and selu_bp.
* Fixed lrelu derivative helpers.
* - some corrections in lstm
Signed-off-by: Yurii <yurii@skymind.io>
* operator * result shape fix
Signed-off-by: raver119 <raver119@gmail.com>
* - correct typo in lstmCell
Signed-off-by: Yurii <yurii@skymind.io>
* few tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA inverse broadcast bool fix
Signed-off-by: raver119 <raver119@gmail.com>
* disable MMAP test for CUDA
Signed-off-by: raver119 <raver119@gmail.com>
* BooleanOp syncToDevice
Signed-off-by: raver119 <raver119@gmail.com>
* meh
Signed-off-by: raver119 <raver119@gmail.com>
* additional data types for im2col/col2im
Signed-off-by: raver119 <raver119@gmail.com>
* Added test for firas_sparse op.
* one more RandomBuffer test excluded
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for flatten op.
* Added test for Floor op.
* bunch of tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* mmulDot tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* Implemented floordiv_bp op and tests.
* Fixed scalar case with cuda implementation for bds.
* - work on cuda kernel for clip_by_norm backprop op is completed
Signed-off-by: Yurii <yurii@skymind.io>
* Eliminate cbow crach.
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* Eliminated abortion with batched nlp test.
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* Fixed shared flag initializing.
* disabled bunch of cpu workspaces tests
Signed-off-by: raver119 <raver119@gmail.com>
* scalar operators fix: missing registerSpecialUse call
Signed-off-by: raver119 <raver119@gmail.com>
* Fixed logdet for cuda and tests.
* - correct clipBynorm_bp
Signed-off-by: Yurii <yurii@skymind.io>
* Fixed crop_and_resize shape datatype.
* - correct some mmul tests
Signed-off-by: Yurii <yurii@skymind.io>
* build fix
Signed-off-by: raver119 <raver119@gmail.com>
* exclude two methods for JNI
Signed-off-by: raver119 <raver119@gmail.com>
* exclude two methods for JNI
Signed-off-by: raver119 <raver119@gmail.com>
* exclude two methods for JNI (#97 )
Signed-off-by: raver119 <raver119@gmail.com>
* temporary stack fix
Signed-off-by: raver119 <raver119@gmail.com>
* round robin affinity test
Signed-off-by: raver119 <raver119@gmail.com>
* get rid of legacy CudaContext methods
Signed-off-by: raver119 <raver119@gmail.com>
* get rid of legacy ContextPool classes/methods
Signed-off-by: raver119 <raver119@gmail.com>
* one legacy test removed
Signed-off-by: raver119 <raver119@gmail.com>
* few more fields rearranged
Signed-off-by: raver119 <raver119@gmail.com>
* OpaqueLaunchContext
Signed-off-by: raver119 <raver119@gmail.com>
* OpaqueLaunchContext++
Signed-off-by: raver119 <raver119@gmail.com>
* more of OpaqueLaunchContext methods
Signed-off-by: raver119 <raver119@gmail.com>
* LaunchContext -> CudaContext
Signed-off-by: raver119 <raver119@gmail.com>
* AffinityManger changes
Signed-off-by: raver119 <raver119@gmail.com>
* AffinityManger changes
Signed-off-by: raver119 <raver119@gmail.com>
* cusolver handles
Signed-off-by: raver119 <raver119@gmail.com>
* typo
Signed-off-by: raver119 <raver119@gmail.com>
* cusolver method
Signed-off-by: raver119 <raver119@gmail.com>
* cusolver handle propagated
Signed-off-by: raver119 <raver119@gmail.com>
* blas/solver handles
Signed-off-by: raver119 <raver119@gmail.com>
* one more test
Signed-off-by: raver119 <raver119@gmail.com>
* legacy concat implementations replaced with new CustomOp
Signed-off-by: raver119 <raver119@gmail.com>
* one more test
Signed-off-by: raver119 <raver119@gmail.com>
* concat now uses way more blocks
Signed-off-by: raver119 <raver119@gmail.com>
* print
Signed-off-by: raver119 <raver119@gmail.com>
* no more triple template mmul
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of kernels have dtypes reconsidered
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of kernels have dtypes reconsidered
Signed-off-by: raver119 <raver119@gmail.com>
* bitonic sort reorganized
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of cpu stuff removed from cuda scope
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of cpu stuff removed from cuda scope
Signed-off-by: raver119 <raver119@gmail.com>
* type conversions moved to generic impl
Signed-off-by: raver119 <raver119@gmail.com>
* cpu data types pass
Signed-off-by: raver119 <raver119@gmail.com>
* non_max_suppression
Signed-off-by: raver119 <raver119@gmail.com>
* sortByValue fix
Signed-off-by: raver119 <raver119@gmail.com>
* ignore all mixed datatype tests for mmul
Signed-off-by: raver119 <raver119@gmail.com>
* special handling of OpProfiler exceptions
Signed-off-by: raver119 <raver119@gmail.com>
* - one failing concat test in cpp
- Nd4j.tile now uses op internally
Signed-off-by: raver119 <raver119@gmail.com>
* get back dtype exception for legacy arrays deserialization
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-14 16:52:34 +03:00
raver119
f49c4ea9d0
int -> long ( #108 )
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-10 09:14:18 +03:00
raver119
7fa01288bb
[WIP] cuda concat ( #107 )
...
* - correct cuda concat
Signed-off-by: Yurii <yurii@skymind.io>
* - pooling 2d/3d : take into account possible case when input and gradI have different strides
Signed-off-by: Yurii <yurii@skymind.io>
* master pulled in
Signed-off-by: raver119 <raver119@gmail.com>
* floordiv_bp test reverted
Signed-off-by: raver119 <raver119@gmail.com>
* - add NDArray::printLinearBuffer method
Signed-off-by: Yurii <yurii@skymind.io>
2019-08-08 18:05:21 +03:00
raver119
62a025439b
java cuda compilation fix
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-07 21:36:27 +03:00
raver119
55066d9c41
bad fatbin option removed
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-07 19:34:25 +03:00
raver119
24e43e9856
[WIP] build time improvements ( #106 )
...
* fix pad javadoc and @see links. (#72 )
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* [WIP] More fixes (#73 )
* special tests for ConstantTadHelper/ConstantShapeHelper
Signed-off-by: raver119 <raver119@gmail.com>
* release methods for data buffers
Signed-off-by: raver119 <raver119@gmail.com>
* delete temporary buffer Java side
Signed-off-by: raver119 <raver119@gmail.com>
* delete temporary buffer Java side
Signed-off-by: raver119 <raver119@gmail.com>
* delete temporary TadPack C++/Java side (#74 )
Signed-off-by: raver119 <raver119@gmail.com>
* Zoo model TF import test updates (#75 )
* argLine fix, update compression_gru comment
* updated comment for xception
* undid but commented argLine change
* updated xlnet comment
* copyright headers
* - new NDArray methods like()/ulike() (#77 )
- fix for depthwise_conv2d_bp + special test
Signed-off-by: raver119 <raver119@gmail.com>
* upsampling2d fix CUDA
Signed-off-by: raver119 <raver119@gmail.com>
* DL4J trace logging (#79 )
* MLN/CG trace logging for debugging
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Tiny tweak
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* strided_slice_bp shape fn leak fix
Signed-off-by: raver119 <raver119@gmail.com>
* SameDiff fixes and naming (#78 )
* remove SDVariable inplace methods
* import methods
* npe fix in OpVal
* removed SameDiff inplace ops from tests
* Naming updates, moved to centralized methods in SameDiff, should use op_#:# for everything
* quick fixes
* javadoc
* SDVariable eval with placeholders
* use regex match
* better matching
* fix javadoc. (#76 )
* fix javadoc.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* replace most @see with @link s.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* 4 additional tests
Signed-off-by: raver119 <raver119@gmail.com>
* Various DL4J/ND4J fixes (#81 )
* #7954 Force refresh of UI when switching tabs on overview page
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8017 Concurrent modification exception (synchronize) fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8033 Don't initialize updater in middle of writing memory crash dump
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8208 Fix shape checks for ND4J int[] creator methods
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #6385 #7992 Keras import naming fixes + cleanup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8016 Upsampling3D - add NDHWC format support
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Refactor NativeOps.h to export C functions
* Actually export functions from NativeOps.h
* Adapt the Java wrappers in ND4J generated with JavaCPP
* Create C wrappers for some of the C++ classes currently used by ND4J
* remove duplicate code in createBufferDetached. (#83 )
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* Keras model import - updater lr fix (#84 )
* Keras model import - updater lr fix
Signed-off-by: eraly <susan.eraly@gmail.com>
* Keras model import - updater lr fix, cleanup
Signed-off-by: eraly <susan.eraly@gmail.com>
* Fix functions of OpaqueVariablesSet
* SameDiff Convolution Config validation, better output methods (#82 )
* Conv Config validation & tests
Signed-off-by: Ryan Nett <rnett@skymind.io>
* stackOutputs utility method
Signed-off-by: Ryan Nett <rnett@skymind.io>
* use constructor for validation, support negative kernel sizes (infered from weights)
Signed-off-by: Ryan Nett <rnett@skymind.io>
* better output methods
Signed-off-by: Ryan Nett <rnett@skymind.io>
* move output to be with fit and evaluate
Signed-off-by: Ryan Nett <rnett@skymind.io>
* fixes
Signed-off-by: Ryan Nett <rnett@skymind.io>
* more fixes
Signed-off-by: Ryan Nett <rnett@skymind.io>
* refactor duplicate code from pad methods. (#86 )
* refactor duplicate code from pad methods.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* replace switch with if.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* Various ND4J/DL4J fixes and improvements (#87 )
* Reshape and reallocate - small fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Reshape and reallocate - small fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #6488 ElementWiseVertex broadcast support
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Constructors and broadcast supported it Transforms.max/min
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8054 ElementWiseVertex now supports broadcast inputs
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8057 Nd4j.create overload dtype fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #7551 ND4J Shape validation fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* [WIP] Numpy boolean import (#91 )
* numpy bool type
Signed-off-by: raver119 <raver119@gmail.com>
* numpy bool java side
Signed-off-by: raver119 <raver119@gmail.com>
* remove create method with unused parameter. (#89 )
* remove create method with unused parameter.
* removed more unused methods.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* removing more unused code.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* last removal of unused code.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* remove createSparse methods. (#92 )
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* Various ND4J/DL4J fixes (#90 )
* Deprecate Old*Op instances
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8063 #8054 Broadcast exceptions + cleanup inplace ops
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Remove bad test condition
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #7993 Fix shape function issue in crop_and_resize op
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* DL4J SameDiff lambda layer fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8029 Fix for pnorm backprop math
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8038 Fix Op profiler NaN/Inf triggering + add tests (#93 )
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* createUninitializedDetached refactoring. (#94 )
* wip
* update interface, add null implementations.
* Breaking one test in a weird way.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* createUninitializedDetached refactored.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* cuda build fix for issues introduced by recent refactoring
Signed-off-by: raver119 <raver119@gmail.com>
* [WIP] More of CUDA (#95 )
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* Implementation of hashcode cuda helper. Working edition.
* Fixed parallel test input arangements.
* Fixed tests for hashcode op.
* Fixed shape calculation for image:crop_and_resize op and test.
* NativeOps tests. Initial test suite.
* Added tests for indexReduce methods.
* Added test on execBroadcast with NDArray as dimensions.
* Added test on execBroadcastBool with NDArray as dimensions.
* Added tests on execPairwiseTransform and execPairwiseTransofrmBool.
* Added tests for execReduce with scalar results.
* Added reduce tests for non-empty dims array.
* Added tests for reduce3.
* Added tests for execScalar.
* Added tests for execSummaryStats.
* - provide cpu/cuda code for batch_to_space
- testing it
Signed-off-by: Yurii <yurii@skymind.io>
* - remove old test for batch_to_space (had wrong format and numbers were not checked)
Signed-off-by: Yurii <yurii@skymind.io>
* Fixed complilation errors with test.
* Added test for execTransformFloat.
* Added test for execTransformSame.
* Added test for execTransformBool.
* Added test for execTransformStrict.
* Added tests for execScalar/execScalarBool with TADs.
* Added test for flatten.
* - provide cpu/cuda code for space_to_Batch operaion
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for concat.
* comment unnecessary stuff in s_t_b
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for specialConcat.
* Added tests for memcpy/set routines.
* Fixed pullRow cuda test.
* Added pullRow test.
* Added average test.
* - correct typo in NDArray::applyPairwiseTransform(nd4j::pairwise::BoolOps op...)
Signed-off-by: Yurii <yurii@skymind.io>
* - debugging and fixing cuda tests in JavaInteropTests file
Signed-off-by: Yurii <yurii@skymind.io>
* - correct some tests
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for shuffle.
* Fixed ops declarations.
* Restored omp and added shuffle test.
* Added convertTypes test.
* Added tests for execRandom. Eliminated usage of RandomBuffer with NativeOps.
* Added sort tests.
* Added tests for execCustomOp.
* - further debuging and fixing tests terminated with crash
Signed-off-by: Yurii <yurii@skymind.io>
* Added tests for calculateOutputShapes.
* Addded Benchmarks test.
* Commented benchmark tests.
* change assertion
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for apply_sgd op. Added cpu helper for that op.
* Implement cuda helper for aplly_sgd op. Fixed tests for NativeOps.
* Added test for assign broadcastable.
* Added tests for assign_bp op.
* Added tests for axpy op.
* - assign/execScalar/execTransformAny signature change
- minor test fix
Signed-off-by: raver119 <raver119@gmail.com>
* Fixed axpy op.
* meh
Signed-off-by: raver119 <raver119@gmail.com>
* - fix tests for nativeOps::concat
Signed-off-by: Yurii <yurii@skymind.io>
* sequential transform/scalar
Signed-off-by: raver119 <raver119@gmail.com>
* allow nested parallelism
Signed-off-by: raver119 <raver119@gmail.com>
* assign_bp leak fix
Signed-off-by: raver119 <raver119@gmail.com>
* block setRNG fix
Signed-off-by: raver119 <raver119@gmail.com>
* enable parallelism by default
Signed-off-by: raver119 <raver119@gmail.com>
* enable nested parallelism by default
Signed-off-by: raver119 <raver119@gmail.com>
* Added cuda implementation for row_count helper.
* Added implementation for tnse gains op helper.
* - take into account possible situations when input arrays are empty in reduce_ cuda stuff
Signed-off-by: Yurii <yurii@skymind.io>
* Implemented tsne/edge_forces op cuda-based helper. Parallelized cpu-based helper for edge_forces.
* Added kernel for tsne/symmetrized op heleper.
* Implementation of tsne/symmetrized op cuda helper. Working edition.
* Eliminated waste printfs.
* Added test for broadcastgradientargs op.
* host-only fallback for empty reduce float
Signed-off-by: raver119 <raver119@gmail.com>
* - some tests fixes
Signed-off-by: Yurii <yurii@skymind.io>
* - correct the rest of reduce_ stuff
Signed-off-by: Yurii <yurii@skymind.io>
* - further correction of reduce_ stuff
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for Cbow op. Also added cuda implementation for cbow helpers.
* - improve code of stack operation for scalar case
Signed-off-by: Yurii <yurii@skymind.io>
* - provide cuda kernel for gatherND operation
Signed-off-by: Yurii <yurii@skymind.io>
* Implementation of cbow helpers with cuda kernels.
* minor tests tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* minor tests tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* - further correction of cuda stuff
Signed-off-by: Yurii <yurii@skymind.io>
* Implementatation of cbow op helper with cuda kernels. Working edition.
* Skip random testing for cudablas case.
* lstmBlockCell context fix
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for ELU and ELU_BP ops.
* Added tests for eq_scalar, gt_scalar, gte_scalar and lte_scalar ops.
* Added tests for neq_scalar.
* Added test for noop.
* - further work on clipbynorm_bp
Signed-off-by: Yurii <yurii@skymind.io>
* - get rid of concat op call, use instead direct concat helper call
Signed-off-by: Yurii <yurii@skymind.io>
* lstmBlockCell context fix
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for lrelu and lrelu_bp.
* Added tests for selu and selu_bp.
* Fixed lrelu derivative helpers.
* - some corrections in lstm
Signed-off-by: Yurii <yurii@skymind.io>
* operator * result shape fix
Signed-off-by: raver119 <raver119@gmail.com>
* - correct typo in lstmCell
Signed-off-by: Yurii <yurii@skymind.io>
* few tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA inverse broadcast bool fix
Signed-off-by: raver119 <raver119@gmail.com>
* disable MMAP test for CUDA
Signed-off-by: raver119 <raver119@gmail.com>
* BooleanOp syncToDevice
Signed-off-by: raver119 <raver119@gmail.com>
* meh
Signed-off-by: raver119 <raver119@gmail.com>
* additional data types for im2col/col2im
Signed-off-by: raver119 <raver119@gmail.com>
* Added test for firas_sparse op.
* one more RandomBuffer test excluded
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for flatten op.
* Added test for Floor op.
* bunch of tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* mmulDot tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* Implemented floordiv_bp op and tests.
* Fixed scalar case with cuda implementation for bds.
* - work on cuda kernel for clip_by_norm backprop op is completed
Signed-off-by: Yurii <yurii@skymind.io>
* Eliminate cbow crach.
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* Eliminated abortion with batched nlp test.
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* Fixed shared flag initializing.
* disabled bunch of cpu workspaces tests
Signed-off-by: raver119 <raver119@gmail.com>
* scalar operators fix: missing registerSpecialUse call
Signed-off-by: raver119 <raver119@gmail.com>
* Fixed logdet for cuda and tests.
* - correct clipBynorm_bp
Signed-off-by: Yurii <yurii@skymind.io>
* Fixed crop_and_resize shape datatype.
* - correct some mmul tests
Signed-off-by: Yurii <yurii@skymind.io>
* build fix
Signed-off-by: raver119 <raver119@gmail.com>
* exclude two methods for JNI
Signed-off-by: raver119 <raver119@gmail.com>
* exclude two methods for JNI
Signed-off-by: raver119 <raver119@gmail.com>
* exclude two methods for JNI (#97 )
Signed-off-by: raver119 <raver119@gmail.com>
* temporary stack fix
Signed-off-by: raver119 <raver119@gmail.com>
* couple of legacy groups reorganized into separate compialtion units
Signed-off-by: raver119 <raver119@gmail.com>
* wrong include
Signed-off-by: raver119 <raver119@gmail.com>
* wrong include
Signed-off-by: raver119 <raver119@gmail.com>
* ReductionLoops_float split
Signed-off-by: raver119 <raver119@gmail.com>
* maximum
Signed-off-by: raver119 <raver119@gmail.com>
* some more rearrangements
Signed-off-by: raver119 <raver119@gmail.com>
* spare ifdef
Signed-off-by: raver119 <raver119@gmail.com>
* mirror pad
Signed-off-by: raver119 <raver119@gmail.com>
* - reduce_float split
- mcmodel
Signed-off-by: raver119 <raver119@gmail.com>
* bad include fix
Signed-off-by: raver119 <raver119@gmail.com>
* norelax
Signed-off-by: raver119 <raver119@gmail.com>
* norelax
Signed-off-by: raver119 <raver119@gmail.com>
* norelax
Signed-off-by: raver119 <raver119@gmail.com>
* norelax
Signed-off-by: raver119 <raver119@gmail.com>
* norelax
Signed-off-by: raver119 <raver119@gmail.com>
* norelax gone
Signed-off-by: raver119 <raver119@gmail.com>
* get back sm
Signed-off-by: raver119 <raver119@gmail.com>
* fix couple of tests for msvc
Signed-off-by: raver119 <raver119@gmail.com>
* fix couple of tests for msvc
Signed-off-by: raver119 <raver119@gmail.com>
* compress-all
Signed-off-by: raver119 <raver119@gmail.com>
* reduced arch list
Signed-off-by: raver119 <raver119@gmail.com>
* compress-all
Signed-off-by: raver119 <raver119@gmail.com>
* reduced arch list
Signed-off-by: raver119 <raver119@gmail.com>
* all compute capabilities option for tests
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-07 17:49:13 +03:00
shugeo
c78f5a8225
Shugeo cuda cuda ( #105 )
...
* Refactored extract_image_patches op helpers.
* Eliminated compliler errors with helper implementation.
* Finished implementation for extract_image_patches both cpu and cuda helpers.
* Improved cpu implementation.
* Improved cuda implementation for extract_image_patches helper.
* Added omp to ClipByGlobalNorm helpers implementation.
* Added implementation for thresholedrelu_bp op.
* Fixed cuda kernel with F order.
* Fixed tests for subarray.
* Refactored tests for Gaussian_3 and Truncated_22.
* Added tests for GaussianDistribution with native ops.
* Modified tests for Gaussian distribution.
* Fixed random tests.
* Fixed atomicMin/atomicMax for 64bit cases.
* Fixed tests for execReduce3TAD tests.
* Eliminated waste comments.
2019-08-07 15:29:17 +03:00
raver119
b75bac750d
exclude two methods for JNI ( #97 )
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-05 11:28:07 +10:00