raver119
cd961727bb
[WIP] perf tests ( #40 )
...
* special maxpool test
Signed-off-by: raver119 <raver119@gmail.com>
* special maxpool test
Signed-off-by: raver119 <raver119@gmail.com>
2019-11-11 17:45:59 +03:00
raver119
929c1dc5c7
- new NDArrayFactory scalar constructor
...
- minor tweak in randomuniform
- one more test
Signed-off-by: raver119 <raver119@gmail.com>
2019-11-08 08:49:41 +03:00
raver119
51f3a1371d
[WIP] Random Uniform ( #36 )
...
* args
Signed-off-by: raver119@gmail.com <raver119@gmail.com>
* T args
Signed-off-by: raver119 <raver119@gmail.com>
2019-11-07 17:09:47 +03:00
shugeo
679e42199a
Shugeo strided slice bp fix2 ( #33 )
...
* Fixed crash and restored brocken functionality for strided slice.
* Added comments for strided_slice_bp main step.
2019-11-07 13:44:02 +03:00
raver119
4276e63054
one more test
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-11-07 08:49:27 +03:00
shugeo
08853c7829
Shugeo random uniform int ( #30 )
...
* Corrected randomuniform declaration.
* Refactored uniform distribution for both cuda and cpu platforms.
* Refactored uniform distribution and tests.
* Fixed type usage with indices.
* Refactored uniform distribution implementation and tests to full conform with TF implementation.
* Refactored gamma function to use type util method.
* Copyright changes and fixes with ConstantHelper.
* Added error checking on allocate cuda device memory and operations.
2019-11-06 12:49:27 +02:00
AlexDBlack
7583ccfa15
Merge
...
Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-11-06 13:28:03 +11:00
Yurii Shyrma
871f3bb3e6
- add additional condition in svd helper to take into account rounding errors ( #31 )
...
Signed-off-by: Yurii <iuriish@yahoo.com>
2019-11-05 17:16:17 +02:00
shugeo
9124974e3b
Fixed crash with strided_slice_bp op and tests. ( #29 )
2019-11-05 12:49:15 +02:00
shugeo
7b14a9f603
Gamma and Poisson distributions ( #27 )
...
* Added implementation for random_gamma op.
* Added implementation for random_poisson op and support classes.
* Added helpers for random_poisson and random_gamma ops.
* Implementation of random_poisson. The first working edition.
* Implementation of random_poisson. Parallelized working edition.
* Implementation of random_gamma. Parallelized working edition with alpha only.
* Added cuda implementation for helper of poisson distribution.
* Corrected shape calculation with random_gamma and tests.
* Finished cpu implementation for gamma distribution.
* Finished cuda implementation for random_gamma op.
* Refactored cpu helpers for random_gamma and random_poisson ops.
* Refactored cuda helpers for gamma and poisson distribution.
* Refactored cuda helper for gamma distribution.
* Refactored cpu helper for random_poisson op.
* Refactored cpu helper for random_gamma op.
2019-11-04 15:42:28 +02:00
Alex Black
948ebef41c
Op Fixes ( #28 )
...
* #8280 biasadd_bp nchw arg fixes (java side) + test
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8285 Concat op Java side fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Concat op cpp fix - allow dynamic axis to be negative, same as static axis
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* ignores for deconv3d import tests until deconv3d_tf op is implemented
Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-11-05 00:05:04 +11:00
Yurii Shyrma
0cdb5750e0
Shyrma concat ( #24 )
...
* - provide possibility to pass axis as last input array in concat op
- corrcect sumation in bias_add_bp op for NHWC case
Signed-off-by: Yurii <iuriish@yahoo.com>
* - write code for deconv2d op based on mkl dnn api
* no unsafe math
Signed-off-by: raver119 <raver119@gmail.com>
* no unsafe math
Signed-off-by: raver119 <raver119@gmail.com>
* - get rid of e<> and p<> methods in svd helper
Signed-off-by: Yurii <iuriish@yahoo.com>
* - provide mkl api support for deconvolution 3d
Signed-off-by: Yurii <iuriish@yahoo.com>
* - write deconv2d_bp based on mkl api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - write deconv3d_bp based on mkl api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - testing and fixing deconv based on mkl api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - remove dilation form conv2d/3d mkl
Signed-off-by: Yurii <iuriish@yahoo.com>
* - minor changes
Signed-off-by: Yurii <iuriish@yahoo.com>
* - further corrections of deconv ops based on mkl dnn api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - provide deconv2d_tf based on mkl dnn api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - add minor corrections required by reviewer
Signed-off-by: Yurii <iuriish@yahoo.com>
2019-11-03 12:37:19 +02:00
raver119
c94013f0a1
cc 52 -> 50
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-11-03 09:54:35 +03:00
raver119
879a06c913
few typos fixed
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-11-01 09:13:15 +03:00
Alexander Stoyakin
45a40c8a89
DL4J/ND4J: Do pass on integer casts ( #15 )
...
* Int cast fixes.
* Revert "Int cast fixes."
This reverts commit aa36e8ca
* Int casts
* Int cast
* Int casts
* Get rid of int casts. Dropping deprecated aggregate ops.
* java scatterUpdate changes
Signed-off-by: raver119 <raver119@gmail.com>
* c++ scatterUpdate changes
Signed-off-by: raver119 <raver119@gmail.com>
* Remove aggregated ops.
* Restored test
* Tests restored.
* Minor fixes
2019-10-31 11:23:09 +02:00
shugeo
95f7ad7b94
Shugeo suppression overlaps ( #9 )
...
* Added non_max_suppression_overlaps op and tests.
* Refactored implementation of non_max_suppression_overlaps.
* Refactoring of implementation of non_max_suppression_overlaps op.
* Refactoring of implementation of non_max_suppression op.
* Fixed portion error.
* Added cuda frontends for image suppression ops.
* Eliminated crash with cuda arch on image.non_max_suppression_overlaps op.
* Improved implementation of image_suppression helper for cpu platform.
* The generic approach of non_max_suppression_overlaps op helper with cuda platform.
* Working cuda implementation of helper non_max_suppression_overlaps op.
* Eliminated waste comments.
* Improved implementations for both platforms
* Refactored cuda implementation of image.non_max_suppression_overlaps op helper.
* Improved cuda implementation of non_max_suppression op helper.
* Refactored cuda implementation of image.non_max_suppression_overlaps op helper.
* Improved cuda implementation of image.non_max_suppression_overlaps op helper.
* Added modifications into cuda implementation for image suppression overlaps op.
* Correct queue emulation with cuda implementation of non_max_suppression_overlaps op.
* Prefinal stage of cuda implementation of non_max_suppression_overlaps.
* Worked cuda implementation of non_max_suppresion_overlaps helper.
* Fixed return to proper thread.
* Improvements for cuda implementation of image.non_max_suppression_overlaps op helper.
* Fixed implementation issues with non_max_suppression_overlaps on cuda platform.
* Fixed skip for non_max_suppression_overlaps on cuda platform.
* Finalize implementation of image_suppression helper and tests.
* Cosmetic changes only.
2019-10-30 13:43:45 +02:00
Yurii Shyrma
029a69a835
Shyrma bn mkl bp ( #14 )
...
* - write code for new batchnorm backprop
Signed-off-by: Yurii <iuriish@yahoo.com>
* - testing batchnorm backprop
Signed-off-by: Yurii <iuriish@yahoo.com>
* - write code for batchnorm backprop based on mkl dnn api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - testing and fixing bugs in batchnorm_bp mkl dnn
Signed-off-by: Yurii <iuriish@yahoo.com>
* - made corrections required by reviewer
Signed-off-by: Yurii <iuriish@yahoo.com>
* - change name in java wrapper for batchnorm op
Signed-off-by: Yurii <iuriish@yahoo.com>
2019-10-26 14:14:21 +03:00
Alex Black
d333d29099
SameDiff cleanup and fixes ( #12 )
...
* #8160 Remove resolvePrepertiesFromSameDiffBeforeExecution
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* SameDiff API cleanup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More SameDiff cleanup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8248 Switch SameDiff variable init from lazy to creation time for more predictable behaviour
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8252 TanhDerivative javadoc
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8225 Deconvolution2D input validation
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8265 Switch SameDiff.outputs() to user settable, instead of unreliable 'best guess'
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8224 SameDiff.zero and .one create constants, not variables
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More cleanup and fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small test fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* DL4J SameDiff fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Re-add hack for Deconvolution2DLayer until #8315 is resolved
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8270 Move CUDA device/version logging to Java; can be disabled via existing org.nd4j.log.initialization system property
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* All ND4J init logging checks system property
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small tweak
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Remove redundant device logging
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* One more fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* UX improvements
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Deconv fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Add deconv tests
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Cleanup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Remove debug code
Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-10-26 12:38:08 +11:00
Alex Black
3f0b4a2d4c
SameDiff execution, TF and memory management overhaul ( #10 )
...
* SameDiff execution memory management improvements, round 1
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Round 2
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Round 3
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Clear node outputs closed array references; Slight change to OpValidation internals to not rely on cached op outputs
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Next steps
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Next step
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Cleanup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More polish
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Add WeakIdentityHashmap
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Session fixes for control ops and next steps
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* First steps for training session + in-line updating
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Next steps
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix losses and history during training
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* BiasAdd and other fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Don't use SDVariable.getArr() in TFGraphTestAllHelper (import tests)
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Cleanup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* First steps for new dependency tracking approach
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Start integrating dependency tracking for memory management
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Non-control op dependency tracking works/passes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Switch/merge
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Next steps
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Cleanup and next steps
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix issue dependency tracking for initial variables/constants
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Add check for aliases when determining if safe to close array
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* First pass on new TF graph import class
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Import fixes, op fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Cleanup and fixes for new TF import mapper
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More cleanup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Partial implementation of new dependency tracker
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Next steps
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* AbstractDependencyTracker for shared code
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Overhaul SameDiff graph execution (dependency tracking)
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More fixes, cleanup, next steps
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Ad no-op memory manager, cleanup, fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix switch dependency tracking
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* INDArray.toString: no exception on closed arrays, just note closed
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix enter and exit dependency tracking
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* TensorArray memory management fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Add unique ID for INDArray instances
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix memory management for NextIteration outputs in multi-iteration loops
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Remove (now unnecessary) special case handling for nested enters
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Cleanup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Handle control dependencies during execution; javadoc for memory managers
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Cleanup, polish, code comments, javadoc
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Cleanup and more javadoc
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Add memory validation for all TF import tests - ensure all arrays (except outputs) are released
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Clean up arrays waiting on unexecuted ops at the end of execution
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fixes for enter op memory managent in the context of multiple non-nested loops/frames
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix order of operation issues for dependency tracker
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Always clear op fields after execution to avoid leaks or unintended array reuse
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Re-implement dtype conversion
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix for control dependencies execution (dependency tracking)
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix TF import overrides and filtering
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix for constant enter array dependency tracking
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* DL4J Fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More DL4J fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Cleanup and polish
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More polish and javadoc
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More logging level tweaks, small DL4J fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small fix to DL4J SameDiffLayer
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix empty array deserialization, add extra deserialization checks
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* FlatBuffers control dep serialization fixes; test serialization as part of all TF import tests
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Variable control dependencies serialization fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix issue with removing inputs for ops
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* FlatBuffers NDArray deserialization fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* FlatBuffers NDArray deserialization fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Final cleanup/polish
Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-10-23 21:19:50 +11:00
Alexander Stoyakin
f31661e13b
Merge pull request #7 from KonduitAI/asto_nd4s_10172019
...
KDTree optimization
2019-10-23 12:11:25 +03:00
Yurii
8f3eaebda5
- replace condition isScalar() by condition length ==1 in some NDArray methodds
...
Signed-off-by: Yurii <iuriish@yahoo.com>
2019-10-21 16:25:13 +03:00
Yurii
99be467f76
- minor change in recurrent.h
...
Signed-off-by: Yurii <iuriish@yahoo.com>
2019-10-17 20:46:51 +03:00
Yurii
70bd925abd
- write 2 versions of new lstmLayer: one is based on own code, second uses mkl dnn api
2019-10-17 20:44:52 +03:00
Alexander Stoyakin
630bb3c9b6
Merge pull request #2 from KonduitAI/asto_ops_wrapper
...
[WIP] New ops wrapper
2019-10-16 20:21:50 +03:00
shugeo
3662657d5c
Merge pull request #1 from KonduitAI/shugeo_gamma
...
Shugeo gamma
2019-10-16 18:49:33 +03:00
shugeo
24a2b2933f
Added gamma and lgamma functions.
2019-10-16 18:22:18 +03:00
Alexander Stoyakin
96a9a1a733
Fixed output from operation.
2019-10-16 18:07:52 +03:00
shugeo
7617682a46
Added declarations for igamma and igammac ops.
2019-10-16 14:45:10 +03:00
shugeo
478a0c1f97
Added igamma and igammac broadcastable ops implementations and tests.
2019-10-16 14:02:53 +03:00
shugeo
7103aca8c5
Added broadcastable IGamma and IGammac ops.
2019-10-16 13:58:32 +03:00
shugeo
f90e6da97e
Added nd4j_gamma, nd4j_igamma and nd4j_igammac functions.
2019-10-16 13:53:31 +03:00
shugeo
df2448613e
Added gamma distribution functions.
2019-10-15 20:00:07 +03:00
AlexDBlack
2d750b69e5
Merge remote-tracking branch 'konduit/master'
2019-10-14 17:21:23 +11:00
shugeo
ace65355c5
Added doc for fake_quant_with_min_max* op helpers cuda implementations.
2019-10-10 18:35:28 +03:00
shugeo
c890de5a7b
Added doc for fake_quant_with_min_max* op helpers implementations.
2019-10-10 18:31:17 +03:00
shugeo
c3f755d975
Refactored helpers both for cuda and cpu platforms.
2019-10-10 18:02:49 +03:00
shugeo
a09cb5e2be
Added doc for fake_quant_with_min_max_per_channel op declaration.
2019-10-10 17:13:33 +03:00
shugeo
92636b0b86
Eliminated waste operator.
2019-10-10 17:08:59 +03:00
shugeo
d5b352273d
Implementation of cuda kernel for fake_quant_with_min_max_vars_per_channels op. Final revision.
2019-10-10 16:51:29 +03:00
shugeo
02d8616692
Implementation of cuda kernel for fake_quant_with_min_max_vars_per_channels op.
2019-10-10 16:40:56 +03:00
shugeo
3504b0cda9
Implemented fake_quant_with_min_max_vars_per_channel fop cuda helper. The first working revision.
2019-10-10 15:44:50 +03:00
shugeo
753565145c
Refactored fake_quant_with_min_max_vars op cuda implementation.
2019-10-10 14:00:49 +03:00
shugeo
c13e945a96
Fixed fake_quant_with_min_max_vars op and tests.
2019-10-10 13:23:11 +03:00
shugeo
3c0c59ab88
Refactored fake_quant_with_min_max_vars op.
2019-10-09 22:09:33 +03:00
shugeo
352f1eee80
Implemented fake_quant_with_min_max_per_channel helper for cpu platform. The first approach.
2019-10-09 21:39:59 +03:00
shugeo
d0cbd33b0e
Added input checks for op.
2019-10-09 15:52:13 +03:00
shugeo
3a89e51811
Added tests for fake_quant_with_min_max_vars_per_channel op.
2019-10-09 13:38:18 +03:00
shugeo
cb56b0b06a
The first approach for fake_quant_with_min_max_vars_per_channel op implementation.
2019-10-08 19:00:41 +03:00
shugeo
8fe5a1fa96
The working implementation of draw_bounding_boxes op.
2019-10-08 15:42:27 +03:00
shugeo
30a8af566c
The first working implementation of cuda kernel for draw_bounding_boxes op helper.
2019-10-08 13:45:18 +03:00
shugeo
ae09cfee32
Next approach of cuda imlementation for draw_bounding_boxes op helper.
2019-10-08 00:09:46 +03:00
shugeo
6cf3a8fa9c
Refactored cpu implementatio and added cuda aproach.
2019-10-07 17:51:07 +03:00
shugeo
78443ffebf
Working implementation of draw_bounding_boxes op for cpu.
2019-10-07 15:04:44 +03:00
shugeo
16a66a65e3
Added helper declaration for draw_bounding_boxes op.
2019-10-04 21:16:34 +03:00
shugeo
53a2ebddbe
Added test and helpers for draw_bounding_boxes op both cpu and cuda related.
2019-10-04 20:46:26 +03:00
shugeo
8f70b4441f
draw_bounding_boxes op implementation. Inital revision.
2019-10-04 18:32:21 +03:00
shugeo
908e4c4912
Added implementation for divide_no_nan op and tests.
2019-10-04 10:29:15 +03:00
raver119
cff26f13c5
Revert "Implement divide_no_nan op."
2019-10-03 20:25:52 +03:00
shugeo
6eaca179d6
Implement divide_no_nan op.
2019-10-03 18:22:17 +03:00
shugeo
130ee25682
Implemented compare_and_bitpack op.
2019-10-03 10:57:48 +03:00
shugeo
75ad3c8153
Fixed test names.
2019-10-02 19:05:26 +03:00
shugeo
f3e42173ef
Refactored buffer copying to avoid wrong usage of buffers.
2019-10-02 16:51:09 +03:00
shugeo
1c6173d218
Added implementation of bitcast op.
2019-10-02 15:04:59 +03:00
shugeo
a27e61553a
Added tests and fixed op name.
2019-10-02 15:04:28 +03:00
shugeo
863ff76878
Added declaration for bincast op.
2019-10-02 12:17:00 +03:00
shugeo
afeb524238
Refactored implementation for adjust_contrast ops.
2019-10-01 14:13:09 +03:00
shugeo
1575c704ae
Added implementation for adjust_contrast_v2 op and tests.
2019-10-01 11:44:27 +03:00
raver119
44a8d19ac6
[WIP] Broadcast changes ( #8257 )
...
* - provide correct call NDArray::applyBroadcast inside of NDArray::applyTrueBroadcast
Signed-off-by: Yurii <yurii@skymind.io>
* - provide new trueBroadcast helper
Signed-off-by: Yurii <yurii@skymind.io>
* example for yurii
Signed-off-by: raver119 <raver119@gmail.com>
* - provide new trueBroadcast helper for cpu
Signed-off-by: Yurii <yurii@skymind.io>
* - start working on new trueBroadcat helper for cuda
Signed-off-by: Yurii <yurii@skymind.io>
* - further work on trueBroadcast for cuda
Signed-off-by: Yurii <yurii@skymind.io>
* - fix bugs in cuda helper trueBroadcast
Signed-off-by: Yurii <yurii@skymind.io>
2019-10-01 09:10:19 +03:00
shugeo
e06dfb5dcc
Implementation of adjust_contrast op.
2019-09-30 18:24:12 +03:00
raver119
78bca543a8
missed include for MklDnnTests run without mkldnn
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-12 10:49:01 +03:00
AlexDBlack
a66e03355e
Merge remote-tracking branch 'fork/master'
2019-09-12 12:20:57 +10:00
raver119
07901ceb69
few more mkldnn dependencies removed
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-12 04:55:59 +03:00
raver119
98e2814879
Platform helpers ( #8216 )
...
* platform helpers draft
Signed-off-by: raver119 <raver119@gmail.com>
* typo
Signed-off-by: raver119 <raver119@gmail.com>
* disable platform cmake
Signed-off-by: raver119 <raver119@gmail.com>
* another draft
Signed-off-by: raver119 <raver119@gmail.com>
* mkldnn convolution refactored
Signed-off-by: raver119 <raver119@gmail.com>
* minor tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* one more safety check
Signed-off-by: raver119 <raver119@gmail.com>
* prototype works
Signed-off-by: raver119 <raver119@gmail.com>
* meh
Signed-off-by: raver119 <raver119@gmail.com>
* force static library mode for mkldnn
Signed-off-by: raver119 <raver119@gmail.com>
* - ismax fix
- experimental arg fix
- don't enforce openblas on Apple hardware
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of small fixes
Signed-off-by: raver119@gmail.com <raver119@gmail.com>
* declare concurrent
Signed-off-by: raver119@gmail.com <raver119@gmail.com>
* - MKLDNN version upgrade to 1.0.2
- avgpool2d/maxpool2d APIs update
Signed-off-by: raver119 <raver119@gmail.com>
* - avgpool2d_bp/maxpool2d_bp APIs update
Signed-off-by: raver119 <raver119@gmail.com>
* - conv2d/batchnorm APIs update
Signed-off-by: raver119 <raver119@gmail.com>
* - lrn/conv2d_bp/conv3d/conv3d_bp APIs update
Signed-off-by: raver119 <raver119@gmail.com>
* all ops converted to MKLDNN 1.x
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* namespace for platform helpers
Signed-off-by: raver119 <raver119@gmail.com>
* make sure platform helpers aren't opimized out
Signed-off-by: raver119 <raver119@gmail.com>
* build cpu_features on x86 systems
Signed-off-by: raver119 <raver119@gmail.com>
* build cpu_features on x86 systems
Signed-off-by: raver119 <raver119@gmail.com>
* more of cpu_features
Signed-off-by: raver119 <raver119@gmail.com>
* - mkldnn removed from java
- cpu_features checks in CpuNDArrayFactory
Signed-off-by: raver119 <raver119@gmail.com>
* F16C definition renamed
Signed-off-by: raver119 <raver119@gmail.com>
* some mkldnn rearrangements
Signed-off-by: raver119 <raver119@gmail.com>
* check supported instructions before doing anything
Signed-off-by: raver119 <raver119@gmail.com>
* typo
Signed-off-by: raver119 <raver119@gmail.com>
* missied impl
Signed-off-by: raver119 <raver119@gmail.com>
* BUILD_PIC option
Signed-off-by: raver119 <raver119@gmail.com>
* conv2d fix
Signed-off-by: raver119 <raver119@gmail.com>
* avgpool3d fix
Signed-off-by: raver119 <raver119@gmail.com>
* avgpool3d_bp fix
Signed-off-by: raver119 <raver119@gmail.com>
* avgpool2d_bp leak fix
Signed-off-by: raver119 <raver119@gmail.com>
* avgpool3d_bp leak fix
Signed-off-by: raver119 <raver119@gmail.com>
* maxpool bp leaks fixed
Signed-off-by: raver119 <raver119@gmail.com>
* printf removed
Signed-off-by: raver119 <raver119@gmail.com>
* batchnorm fix
Signed-off-by: raver119 <raver119@gmail.com>
* AVX warning/error polishing
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More polish
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Polish
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* remove previous MKL-DNN support layer
Signed-off-by: raver119 <raver119@gmail.com>
* avx2 tweak
Signed-off-by: raver119 <raver119@gmail.com>
* allow static for apple
Signed-off-by: raver119@gmail.com <raver119@gmail.com>
* exclude mkldnn in one more place
Signed-off-by: raver119 <raver119@gmail.com>
* exclude mkldnn in one more place
Signed-off-by: raver119 <raver119@gmail.com>
* restore OPENBLAS_PATH use
Signed-off-by: raver119 <raver119@gmail.com>
* add runtime check for avx/avx2 support
Signed-off-by: raver119 <raver119@gmail.com>
* convolution_auto
Signed-off-by: raver119 <raver119@gmail.com>
* Add logic for helper argument
* minor test fix
Signed-off-by: raver119 <raver119@gmail.com>
* few tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* few tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* skip OpTracker props for non-x86 builds
Signed-off-by: raver119 <raver119@gmail.com>
* linux arm isn't x86 :)
Signed-off-by: raver119 <raver119@gmail.com>
* avx-512
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA presets fix
Signed-off-by: raver119 <raver119@gmail.com>
* BUILD_PIC
Signed-off-by: raver119 <raver119@gmail.com>
* prefetchw for avx2
Signed-off-by: raver119 <raver119@gmail.com>
* BUILD_PIC again
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-11 21:50:28 +03:00
shugeo
e1a7460f8e
Shugeo cuda doc2 ( #255 )
...
* Added comments to tileKernel routine.
* Refactored kernel and added doc to it.
* Refactored setDiagonal kernel and added doc for it.
* Added doc for tnse cuda helpers.
* Added doc for diag kernels.
* Added doc for kernel.
* Refactored code with fake quantization.
* Added docs for image resize and crop kernels.
* Added docs for image suppression helpers.
* Added docs to matrix_band helpers.
* Added docs for matrix_diag_part and nth_element helpers.
* Fixed syntax error and refactored getIndexOffset usage.
2019-09-11 21:04:43 +03:00
raver119
589401477d
[WIP] bunch of improvements ( #257 )
...
* - profiling bias_add op
- add some docementation
Signed-off-by: Yurii <yurii@skymind.io>
* - minor change
Signed-off-by: Yurii <yurii@skymind.io>
* - provide addBias cuda kernel
Signed-off-by: Yurii <yurii@skymind.io>
* - improve shape::getIndexOfffset and change its signature
Signed-off-by: Yurii <yurii@skymind.io>
* - same as previous
Signed-off-by: Yurii <yurii@skymind.io>
* - improve and change signature in some shape:: stuff which has to do with calculation of offsets for array elements
Signed-off-by: Yurii <yurii@skymind.io>
* - minor changes in flatten
Signed-off-by: Yurii <shyrma@skymind.io>
* - add function shape::getIndexOffsetOrdered
Signed-off-by: Yurii <shyrma@skymind.io>
* - correct shape::getIndexOffsetOrdered()
Signed-off-by: Yurii <shyrma@skymind.io>
* - move getIndexOffsetOrdered to flatten.h header in order to isolate this function
Signed-off-by: Yurii <shyrma@skymind.io>
2019-09-11 20:12:09 +03:00
Alex Black
4f7b35ac82
Update links to eclipse repos ( #252 )
...
* Fix repo links and clean up old github templates
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More link updates
Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-09-10 19:09:46 +10:00
raver119
ffae024cda
disable threadlocal for apple
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-10 07:46:35 +03:00
shugeo
c9f8a904ad
Shugeo cuda docs1 ( #249 )
...
* Comments axis shifts.
* Fixed LUP solver usage. Added helpers doc.
* Switch off OMP for roll and lup. Fixed omp usage for ClipByGlobalNorm.
* Switch off omp for ClipByGlobalNorm to reduce omp ambigiousness.
2019-09-09 16:27:45 +03:00
raver119
1de9fb218e
- bits_hamming_distance dtype fix ( #8208 )
...
- DataTypeUtils::asString fixe + new dtypes added
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-06 08:59:05 +03:00
raver119
46f8c58502
- bits_hamming_distance dtype fix
...
- DataTypeUtils::asString fixe + new dtypes added
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-06 08:57:53 +03:00
AlexDBlack
a76a44e198
Merge remote-tracking branch 'fork/master'
2019-09-05 22:04:25 +10:00
raver119
fc222d3325
logging off
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-05 14:45:46 +03:00
Yves Quemener
d1e9b34982
libnd4j: Remove some unused declarations in unit tests ( #8202 )
2019-09-05 15:04:36 +09:00
Ryan Nett
79867f5c5a
cleanup SDRNN and rnn ops ( #238 )
...
Signed-off-by: Ryan Nett <rnett@skymind.io>
2019-09-05 12:25:03 +10:00
AlexDBlack
b7226bdd7a
Merge
...
Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-09-05 00:54:11 +10:00
shugeo
548044a1e2
Shugeo doc ( #235 )
...
* Actualized doc to tnse ops.
* Added comments for dynamic_stitch op.
* Added comments to dynamic_stitch op implementation.
* Modified comment for unstack_list op.
* Added doc for space_to_depth and depth_to_space ops.
* Added doc for space_to_batch op.
* Enlarge test type for adjustSaturation.
* Added doc for runner.
2019-09-04 14:57:59 +03:00
raver119
a90c7dd995
[WIP] Last set of changes ( #234 )
...
* mmul op instead of cublasSgemm
Signed-off-by: raver119 <raver119@gmail.com>
* transB
Signed-off-by: raver119 <raver119@gmail.com>
* jcpp handles
Signed-off-by: raver119 <raver119@gmail.com>
* bitwise and/or/xor
Signed-off-by: raver119 <raver119@gmail.com>
* bitwise and/or/xor mapping
Signed-off-by: raver119 <raver119@gmail.com>
* cuda/cublas version check
Signed-off-by: raver119 <raver119@gmail.com>
* add expected version
Signed-off-by: raver119 <raver119@gmail.com>
* cuda/cublas version check in java
Signed-off-by: raver119 <raver119@gmail.com>
* one more error check
Signed-off-by: raver119 <raver119@gmail.com>
* build fix
Signed-off-by: raver119 <raver119@gmail.com>
* build fix
Signed-off-by: raver119 <raver119@gmail.com>
* build fix
Signed-off-by: raver119 <raver119@gmail.com>
* one more fix
Signed-off-by: raver119 <raver119@gmail.com>
* skip CUDA version check for now
Signed-off-by: raver119 <raver119@gmail.com>
* better wording
Signed-off-by: raver119 <raver119@gmail.com>
* few more tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* few more tweaks
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-04 14:41:08 +03:00
Alex Black
6cc887bee9
Rename flatbuffers DataType to DType ( #228 )
...
* Rename flatbuffers DataType enum to DType
Signed-off-by: Alex Black <blacka101@gmail.com>
* Rename flatbuffers DataType enum to DType
Signed-off-by: Alex Black <blacka101@gmail.com>
* Updates for flatbuffers datatype enum renaming
Signed-off-by: Alex Black <blacka101@gmail.com>
2019-09-04 16:36:11 +10:00
raver119
64eaafb4cd
remove unwanted noexcept
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-04 08:38:22 +03:00
raver119
7abc574eeb
Snapshot update ( #8194 )
...
* fix double consumption of rng on cpu
Signed-off-by: raver119 <raver119@gmail.com>
* Shyrma docs (#222 )
* - documenting and profiling matrix_set_diag cuda kernel
Signed-off-by: Yurii <yurii@skymind.io>
* - correct formula of pnorm pooling in cuda 2d/3d kernels
- remove helper matrix_diag which duplicates work of helper matrix_set_diag
Signed-off-by: Yurii <yurii@skymind.io>
* cublasHandle sharing + lock
Signed-off-by: raver119 <raver119@gmail.com>
* cublasHandle sharing + lock
Signed-off-by: raver119 <raver119@gmail.com>
* Documentation from serialization/deserialization in NLP (#221 )
* refactoring
Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
* Javadocs
Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
* Javadoc fixed
Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
* Cleanup
Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
* dedicated lock for getCudaCublasHandle
Signed-off-by: raver119 <raver119@gmail.com>
* Small fixes (#223 )
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* ELU DL4J fixes (#224 )
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* javadoc (#225 )
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* Small test compilation fix (#226 )
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8182 remove spark version suffix (#227 )
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* [WIP] Thread safety (#229 )
* sync after cublas*gemm
Signed-off-by: raver119 <raver119@gmail.com>
* mutex for CublasHelper
Signed-off-by: raver119 <raver119@gmail.com>
* don't store cublasHandle in LaunchContext, it's per-device anyway
Signed-off-by: raver119 <raver119@gmail.com>
* some printout
Signed-off-by: raver119 <raver119@gmail.com>
* check for field instead
Signed-off-by: raver119 <raver119@gmail.com>
* pew-pew
Signed-off-by: raver119 <raver119@gmail.com>
* don't release ContextBuffers until device changed
Signed-off-by: raver119 <raver119@gmail.com>
* small tweak
Signed-off-by: raver119 <raver119@gmail.com>
* some logging in sgemm
Signed-off-by: raver119 <raver119@gmail.com>
* stream sync
Signed-off-by: raver119 <raver119@gmail.com>
* some more logging
Signed-off-by: raver119 <raver119@gmail.com>
* some more error checks
Signed-off-by: raver119 <raver119@gmail.com>
* one fancy test
Signed-off-by: raver119 <raver119@gmail.com>
* one fancy test
Signed-off-by: raver119 <raver119@gmail.com>
* minor AffinityManager fix
Signed-off-by: raver119 <raver119@gmail.com>
* cudaEvent error logging improvement
Signed-off-by: raver119 <raver119@gmail.com>
* ConstantHelper thread safety
Signed-off-by: raver119 <raver119@gmail.com>
* - minor corrections in ConstantTadHelper
Signed-off-by: Yurii <yurii@skymind.io>
* ConstantShapeHelper thread safety
Signed-off-by: raver119 <raver119@gmail.com>
* ConstantTadHelper.cu updated
Signed-off-by: raver119 <raver119@gmail.com>
* logging off
Signed-off-by: raver119 <raver119@gmail.com>
* logging off
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-03 22:02:02 +03:00
raver119
dddc8a1143
[WIP] Thread safety ( #229 )
...
* sync after cublas*gemm
Signed-off-by: raver119 <raver119@gmail.com>
* mutex for CublasHelper
Signed-off-by: raver119 <raver119@gmail.com>
* don't store cublasHandle in LaunchContext, it's per-device anyway
Signed-off-by: raver119 <raver119@gmail.com>
* some printout
Signed-off-by: raver119 <raver119@gmail.com>
* check for field instead
Signed-off-by: raver119 <raver119@gmail.com>
* pew-pew
Signed-off-by: raver119 <raver119@gmail.com>
* don't release ContextBuffers until device changed
Signed-off-by: raver119 <raver119@gmail.com>
* small tweak
Signed-off-by: raver119 <raver119@gmail.com>
* some logging in sgemm
Signed-off-by: raver119 <raver119@gmail.com>
* stream sync
Signed-off-by: raver119 <raver119@gmail.com>
* some more logging
Signed-off-by: raver119 <raver119@gmail.com>
* some more error checks
Signed-off-by: raver119 <raver119@gmail.com>
* one fancy test
Signed-off-by: raver119 <raver119@gmail.com>
* one fancy test
Signed-off-by: raver119 <raver119@gmail.com>
* minor AffinityManager fix
Signed-off-by: raver119 <raver119@gmail.com>
* cudaEvent error logging improvement
Signed-off-by: raver119 <raver119@gmail.com>
* ConstantHelper thread safety
Signed-off-by: raver119 <raver119@gmail.com>
* - minor corrections in ConstantTadHelper
Signed-off-by: Yurii <yurii@skymind.io>
* ConstantShapeHelper thread safety
Signed-off-by: raver119 <raver119@gmail.com>
* ConstantTadHelper.cu updated
Signed-off-by: raver119 <raver119@gmail.com>
* logging off
Signed-off-by: raver119 <raver119@gmail.com>
* logging off
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-03 22:00:38 +03:00
raver119
9d03bb9425
allow atomicAdd for CUDA 10 only
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-03 13:30:16 +03:00
raver119
f6f9437a36
get back cc 7.0 support for cuda 9.2
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-03 09:26:35 +03:00
Yurii Shyrma
cb4c9377b1
Shyrma docs ( #222 )
...
* - documenting and profiling matrix_set_diag cuda kernel
Signed-off-by: Yurii <yurii@skymind.io>
* - correct formula of pnorm pooling in cuda 2d/3d kernels
- remove helper matrix_diag which duplicates work of helper matrix_set_diag
Signed-off-by: Yurii <yurii@skymind.io>
2019-09-02 16:25:58 +03:00
raver119
106524663b
fix double consumption of rng on cpu
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-02 15:24:51 +03:00
AlexDBlack
7ded4416cb
Merge remote-tracking branch 'fork/master'
2019-09-02 18:52:12 +10:00
raver119
5b8ea3e830
one more tiny cuda fux
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-02 11:49:13 +03:00
raver119
e42c34ca55
[WIP] minor ( #218 )
...
* - initial docs commit
- merge* cuda fix
Signed-off-by: raver119 <raver119@gmail.com>
* one more fix
Signed-off-by: raver119 <raver119@gmail.com>
* one more fix
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-02 11:25:48 +03:00
raver119
c34826da4d
fixed args
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-01 22:06:01 +03:00
raver119
00cf28f477
get rid of builtin_popcount to please ppc
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-01 19:57:39 +03:00
raver119
3679e55c49
fix bits_hamming_distance for ppc
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-01 19:33:23 +03:00
Yurii Shyrma
a35926c6e9
- add parameter alpha to elu and lrelu_bp ( #213 )
...
* - add parameter alpha to elu and lrelu_bp
Signed-off-by: Yurii <yurii@skymind.io>
* - forgot to correct header activations.h
Signed-off-by: Yurii <yurii@skymind.io>
2019-08-31 20:57:39 +03:00
raver119
b71c993ded
[WIP] maxpool_bp cuda fix ( #212 )
...
* one test for alex
Signed-off-by: raver119 <raver119@gmail.com>
* fix
Signed-off-by: raver119 <raver119@gmail.com>
* get rid of safety offset in cpp
Signed-off-by: raver119 <raver119@gmail.com>
* bfloat16
Signed-off-by: raver119 <raver119@gmail.com>
* minor test rearrangement to fastpath launch
Signed-off-by: raver119 <raver119@gmail.com>
* - atomicAdd/Mul/Div fix for float16/bfloat16 misalignment
- one special test for maxpoolbp java
- safety offset of 8 bytes is back to libnd4j legacy
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-31 20:57:05 +03:00
Yurii Shyrma
00fd50cee2
Shyrma softmax ( #209 )
...
* - provide new cuda kernel for softmax
Signed-off-by: Yurii <yurii@skymind.io>
* - further work on cuda kernel for softmax
Signed-off-by: Yurii <yurii@skymind.io>
* - correction cuda kernel for softmax
Signed-off-by: Yurii <yurii@skymind.io>
2019-08-30 20:31:05 +03:00
raver119
bdc3eacafd
one small playground test
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-30 20:13:01 +03:00
raver119
70a9ae5068
[WIP] few tweaks ( #206 )
...
* scatter empty check
Signed-off-by: raver119 <raver119@gmail.com>
* scatter empty test
Signed-off-by: raver119 <raver119@gmail.com>
* one more test
Signed-off-by: raver119 <raver119@gmail.com>
* two tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* dup tweak
Signed-off-by: raver119 <raver119@gmail.com>
* - put empty checking of indices array immediately prior helper run
Signed-off-by: Yurii <yurii@skymind.io>
* minor tests fix
Signed-off-by: raver119 <raver119@gmail.com>
* minor tests fix
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-30 16:32:01 +03:00
raver119
1003428a18
[WIP] Int broadcastables ( #195 )
...
* Removed invalid resource and fixed tests
Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
* legacy scalar/pairwise/broadcast int ops
Signed-off-by: raver119 <raver119@gmail.com>
* NDArray int broadcastables
Signed-off-by: raver119 <raver119@gmail.com>
* few more bitwise tests
Signed-off-by: raver119 <raver119@gmail.com>
* java side update
Signed-off-by: raver119 <raver119@gmail.com>
* Argument type changed for shift ops
Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
* legacy scalar/pairwise/broadcast int ops
Signed-off-by: raver119 <raver119@gmail.com>
* NDArray int broadcastables
Signed-off-by: raver119 <raver119@gmail.com>
* few more bitwise tests
Signed-off-by: raver119 <raver119@gmail.com>
* java side update
Signed-off-by: raver119 <raver119@gmail.com>
* Argument type changed for shift ops
Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
2019-08-30 10:12:40 +03:00
Yurii Shyrma
5395d4fbe5
- rewrite broadcast_dynamic_shape and delete corresponding helpers ( #194 )
...
Signed-off-by: Yurii <yurii@skymind.io>
2019-08-29 20:38:02 +03:00
Yurii Shyrma
70af8c2afc
Shyrma svd ( #191 )
...
* - add one additional test for svd
* - provide float argument in eye op to be a type of output array
Signed-off-by: Yurii <yurii@skymind.io>
* - add cuda capability check to mmulHelper
Signed-off-by: Yurii <yurii@skymind.io>
* - make use another method for divice id evaluation
Signed-off-by: Yurii <yurii@skymind.io>
* Eye data type as T argument
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-28 18:27:08 +03:00
raver119
dec296da17
[WIP] bits_hamming_distance ( #192 )
...
* bits_hamming_distance op
Signed-off-by: raver119 <raver119@gmail.com>
* bits_hamming_distance cuda
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-28 18:20:44 +03:00
raver119
f4860574d7
[WIP] More fixes ( #190 )
...
* Refactored kernels for segment_max/min/sum ops.
* Refactored segment_prod kernels.
* Refactored segment_prod kernels.
* DynamicPartition test
Signed-off-by: raver119 <raver119@gmail.com>
* Addede linear test for dynamic_partition op.
* Refactored test with int datatype.
* some logging
Signed-off-by: raver119 <raver119@gmail.com>
* some logging
Signed-off-by: raver119 <raver119@gmail.com>
* some logging
Signed-off-by: raver119 <raver119@gmail.com>
* dynamicPartition fix
Signed-off-by: raver119 <raver119@gmail.com>
* get rid of some logging
Signed-off-by: raver119 <raver119@gmail.com>
* one more test for dynamic_stitch
Signed-off-by: raver119 <raver119@gmail.com>
* one more test for dynamic_stitch
Signed-off-by: raver119 <raver119@gmail.com>
* empty check for stitch
Signed-off-by: raver119 <raver119@gmail.com>
* minor print changes
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-28 15:38:57 +03:00
raver119
3157ec110c
[WIP] reverse_sequence ( #188 )
...
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* one more print
Signed-off-by: raver119 <raver119@gmail.com>
* minor fix
Signed-off-by: raver119 <raver119@gmail.com>
* reverse_sequence fix
Signed-off-by: raver119 <raver119@gmail.com>
* confusion_matrix test updated
Signed-off-by: raver119 <raver119@gmail.com>
* minor tweak
Signed-off-by: raver119 <raver119@gmail.com>
* minor tweak
Signed-off-by: raver119 <raver119@gmail.com>
* one more reverse_sequence test
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-28 11:14:22 +03:00
raver119
b472d7d8c8
[WIP] few more fixes ( #182 )
...
* one noop test
Signed-off-by: raver119 <raver119@gmail.com>
* skip input validation for no-input ops
Signed-off-by: raver119 <raver119@gmail.com>
* - one more noop empty test
- one more validation before sync
Signed-off-by: raver119 <raver119@gmail.com>
* typo
Signed-off-by: raver119 <raver119@gmail.com>
* one more validation fix
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA empty reductions java side
Signed-off-by: raver119 <raver119@gmail.com>
* one svd test
Signed-off-by: raver119 <raver119@gmail.com>
* Corrected segment_mean helpers and added another test.
* Refactored segment_mean kernels to avoid race_condition.
2019-08-27 21:00:38 +03:00
Yurii Shyrma
2144941313
Shyrma fix2 ( #186 )
...
* - further work on layer_norm
Signed-off-by: Yurii <yurii@skymind.io>
* - further work on layer_norm 2
Signed-off-by: Yurii <yurii@skymind.io>
* - correct helpers for svd cuda
Signed-off-by: Yurii <yurii@skymind.io>
2019-08-27 19:57:59 +03:00
shugeo
0849b3c1a4
Shugeo segment fix2 ( #185 )
...
* Added test for segment_mean.
* Added another test for segment_mean.
* Fixed segment_* ops helpers for cuda to proper use external data.
2019-08-27 18:25:39 +03:00
raver119
7f0c660d8b
[WIP] HGemm ( #181 )
...
* skip string arrays for device validation
Signed-off-by: raver119 <raver119@gmail.com>
* confusion_matrix fix
Signed-off-by: raver119 <raver119@gmail.com>
* exclude cublasHGemm from archs < 530
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-27 15:05:43 +03:00
raver119
0e523490e9
[WIP] confusion ( #180 )
...
* skip string arrays for device validation
Signed-off-by: raver119 <raver119@gmail.com>
* confusion_matrix fix
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-27 14:30:37 +03:00
raver119
a49f7c908b
[WIP] More fixes ( #178 )
...
* skip string arrays for device validation
Signed-off-by: raver119 <raver119@gmail.com>
* histogram_fixed_width now really supports indexing types
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-27 13:21:01 +03:00
raver119
efbfafe3f7
[WIP] gatherND fix ( #176 )
...
* one test for gather_nd
Signed-off-by: raver119 <raver119@gmail.com>
* get rid of old concat tests
Signed-off-by: raver119 <raver119@gmail.com>
* one printf
Signed-off-by: raver119 <raver119@gmail.com>
* one more legacy test removed
Signed-off-by: raver119 <raver119@gmail.com>
* gatherNd launch params fix
Signed-off-by: raver119 <raver119@gmail.com>
* gatherNd launch params fix
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-27 12:35:14 +03:00
raver119
df84bc7255
[WIP] More tweaks ( #173 )
...
* CUDA empty reduction
Signed-off-by: raver119 <raver119@gmail.com>
* - listdiff synchronization fix for CUDA
- listdiff test
Signed-off-by: raver119 <raver119@gmail.com>
* - IndexReduce ops now allow INDEXING_TYPES output
- topK op accepts only INDEXING_TYPES as output
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-27 10:37:10 +03:00
raver119
25e5c23eae
[WIP] Error handling ( #169 )
...
* CUDA reverse rewrite + couple of tests
Signed-off-by: raver119 <raver119@gmail.com>
* don't throw exception on invalid pointer
Signed-off-by: raver119 <raver119@gmail.com>
* data types validation for fastpath exec mode + 2 tests
Signed-off-by: raver119 <raver119@gmail.com>
* data types validation for fastpath exec mode + 2 tests
Signed-off-by: raver119 <raver119@gmail.com>
* ismax allowed dtypes tweak
Signed-off-by: raver119 <raver119@gmail.com>
* lastErrorCode + lastErrorMessage for native exceptions handling
Signed-off-by: raver119 <raver119@gmail.com>
* exportable ErrorReference
Signed-off-by: raver119 <raver119@gmail.com>
* check error codes in java
Signed-off-by: raver119 <raver119@gmail.com>
* - consume lastErrorCode
- fast_in dtype validation fix
Signed-off-by: raver119 <raver119@gmail.com>
* - sg/cb allowed output type change
- minor logging fix for data type validation
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-26 19:57:51 +03:00
raver119
bb5fc36e5e
[WIP] ops fixes ( #168 )
...
* - correct layer_norm
Signed-off-by: Yurii <yurii@skymind.io>
* - further fix of layer norm
Signed-off-by: Yurii <yurii@skymind.io>
* - correct scatter_upd op
Signed-off-by: Yurii <yurii@skymind.io>
* - correct cuda kernel for histogram_fixed_width op
Signed-off-by: Yurii <yurii@skymind.io>
* - delete comments
Signed-off-by: Yurii <yurii@skymind.io>
* enabled one ignored test
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-26 19:37:05 +03:00
Alex Black
b417ca21bf
Fix for concat op shape function (empty shapes) ( #167 )
...
Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-08-26 23:10:28 +10:00
raver119
daf5420f4c
cmake fix for windows debug build
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-26 08:13:22 +03:00
raver119
ece6a17b11
lup context fix ( #164 )
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-24 16:57:48 +03:00
raver119
841eeb56c5
get rid of context variable
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-24 16:18:38 +03:00
raver119
b091e972ef
- string NDArray flat serde impl + tests ( #163 )
...
- string NDArray equalsTo impl
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-24 14:16:34 +03:00
raver119
f8364997c0
[WIP] maxpool2d_bp fix ( #160 )
...
* one test for maxpool2d_bp
Signed-off-by: raver119 <raver119@gmail.com>
* - maxpool2d_bp cuda fix for NaNs
- streamSync after each custom op execution
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-24 09:20:57 +03:00
raver119
f03b0ee78f
[WIP] more fixes ( #159 )
...
* Added test for MatrixInverse with double input. Fixed matrixDeterminantKernel.
* Fixed kernels to avoid waste templating.
* Fixed logDeterminant kernel.
* Refactored type check for lup'
* - decrease blockDim value for zeta op
Signed-off-by: Yurii <yurii@skymind.io>
* Added print for compound matrix with CUDA.
* Refactored upper matrix invertion kernels.
* - provide move constructor and move assignment operator for OpArgsHoder class
Signed-off-by: Yurii <yurii@skymind.io>
* Refactored usage of launch context.
* - add test for mergemax
Signed-off-by: Yurii <yurii@skymind.io>
* get rid of AveragingArrayProxy
Signed-off-by: raver119 <raver119@gmail.com>
* Refactoring of LUP inversion.
* Added prints for invertion.
* - add OpArgsHolder copy constructor and assignment operator
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for lower inversion
* - fix bug in upsampling2d/3d_bp op
Signed-off-by: Yurii <yurii@skymind.io>
* Added expensive printfs to kernel.
* Refactored expensive kernel prints.
* Refactored expensive printfs
* - remove nullify
Signed-off-by: Yurii <yurii@skymind.io>
* Eliminated waste prints with tests.
* upsampling2d_bp test
Signed-off-by: raver119 <raver119@gmail.com>
* test updated
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-23 19:20:50 +03:00
raver119
99cdf6d42b
- cpu isMax fix for multidim case + test
...
- INDArray.wasClosed() fix for empty array edge case
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-23 18:44:37 +03:00
raver119
fb8de5006f
- concat empty scalar fix
...
- couple of tests for empty scalar concat
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-23 13:16:50 +03:00
raver119
729dc5e879
[WIP] size etc ( #155 )
...
* one test for size
Signed-off-by: raver119 <raver119@gmail.com>
* - few tests for size op
- size/rank/size_at ops now use p instead of assign
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-23 12:31:12 +03:00
raver119
243bf866c4
[WIP] Few fixes ( #153 )
...
* throw exception if op execution failed
Signed-off-by: raver119 <raver119@gmail.com>
* expected for test
Signed-off-by: raver119 <raver119@gmail.com>
* one more ismax test
Signed-off-by: raver119 <raver119@gmail.com>
* ismax view fix
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-23 09:00:10 +03:00
raver119
930b49e87f
[WIP] DeviceLocalNDArray updates ( #149 )
...
* ContextBuffers are released upon device change
Signed-off-by: raver119 <raver119@gmail.com>
* DeviceLocalNDArray updates + tests
Signed-off-by: raver119 <raver119@gmail.com>
* special array for delayed mode
Signed-off-by: raver119 <raver119@gmail.com>
* additional detach()
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-22 20:01:29 +03:00
Alex Black
e855e47f73
More fixes ( #148 )
...
* Small batch norm fix (cuda/no-mkldnn)
Signed-off-by: Alex Black <blacka101@gmail.com>
* Dropout fix for RnnOutputLayer
Signed-off-by: Alex Black <blacka101@gmail.com>
* Allow block size < 2 in batch_to_space_nd and space_to_batch_nd for import, in spite of what TF docs say
Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-08-22 19:55:27 +10:00
raver119
eea3062ccf
[WIP] stb/bts nd ( #144 )
...
* - start working on space_to_batch_nd
Signed-off-by: Yurii <yurii@skymind.io>
* - provide cpu helper for space_to_batch_nd op
Signed-off-by: Yurii <yurii@skymind.io>
* few typos fixed
Signed-off-by: raver119 <raver119@gmail.com>
* - add tests for space_to_batch and correct bugs
Signed-off-by: Yurii <yurii@skymind.io>
* - write cuda kernel for space_to_batch op
Signed-off-by: Yurii <yurii@skymind.io>
* - add order argument to shape::index2coords method in convolution cuda ops
Signed-off-by: Yurii <yurii@skymind.io>
* - restore some previous code
Signed-off-by: Yurii <yurii@skymind.io>
* old col2im kernel activated
Signed-off-by: raver119 <raver119@gmail.com>
* - change coords calculation in col2im kernel
Signed-off-by: Yurii <yurii@skymind.io>
* - restore old col2im kernel
Signed-off-by: Yurii <yurii@skymind.io>
* - add custom op for batch_to_space
Signed-off-by: Yurii <yurii@skymind.io>
* - provide cpu version for batch_to_space_nd op
Signed-off-by: Yurii <yurii@skymind.io>
* - provide cuda kernel for batch_to_space_nd op
Signed-off-by: Yurii <yurii@skymind.io>
2019-08-21 21:11:46 +03:00
raver119
e604ffe0d2
[WIP] repeat op ( #143 )
...
* - write new repeat helper (cpu)
Signed-off-by: Yurii <yurii@skymind.io>
* - update NDArray::cpu
Signed-off-by: Yurii <yurii@skymind.io>
* - update NDArray::repeat cuda
Signed-off-by: Yurii <yurii@skymind.io>
2019-08-21 21:10:29 +03:00
raver119
3cf72e5e30
[WIP] More fixes ( #142 )
...
* atomicAdd cc 70+
Signed-off-by: raver119 <raver119@gmail.com>
* additional 8 bytes alocation
Signed-off-by: raver119 <raver119@gmail.com>
* missed include 2019
Signed-off-by: raver119 <raver119@gmail.com>
* less spam
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-21 20:18:29 +03:00
raver119
0adce9a4fa
minor fix for msvc
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-21 16:34:07 +03:00
raver119
d9ab299759
[WIP] Minor fixes ( #140 )
...
* - Tile java shape fn removed
- Tile 0 validation added
- scatter_upd test
Signed-off-by: raver119 <raver119@gmail.com>
* additional tile validation
Signed-off-by: raver119 <raver119@gmail.com>
* - provide vector case in cuda scatter op
Signed-off-by: Yurii <yurii@skymind.io>
* cpu ismax view fix
Signed-off-by: raver119 <raver119@gmail.com>
* exp
Signed-off-by: raver119 <raver119@gmail.com>
* cuda ismax fix
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-21 15:05:47 +03:00
raver119
77805cb7fa
[WIP] cpu ismax fix ( #137 )
...
* cpu ismax fix
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of smaller scalar tests
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-21 10:12:11 +03:00
raver119
4310e87860
include path fix for java
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-21 07:32:21 +03:00
raver119
269d508ba5
[WIP] cross-device migrations ( #134 )
...
* two more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA device afinity tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* minor tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* prepareAction/registerAction for CustomOps
Signed-off-by: raver119 <raver119@gmail.com>
* lazy allocate host bufer before relocation
Signed-off-by: raver119 <raver119@gmail.com>
* one special test for migration in cpp
Signed-off-by: raver119 <raver119@gmail.com>
* tests update for msvc
Signed-off-by: raver119 <raver119@gmail.com>
* logging
Signed-off-by: raver119 <raver119@gmail.com>
* stick to old col2im impl
Signed-off-by: raver119 <raver119@gmail.com>
* cudaStreams reorganization
Signed-off-by: raver119 <raver119@gmail.com>
* buffer size fix
Signed-off-by: raver119 <raver119@gmail.com>
* c++ data migration
Signed-off-by: raver119 <raver119@gmail.com>
* fix CropAndResize test
Signed-off-by: raver119 <raver119@gmail.com>
* - minor improvment
Signed-off-by: Yurii <yurii@skymind.io>
2019-08-20 18:52:41 +03:00
raver119
23c8738d4a
syncthreads ( #136 )
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-20 18:28:43 +03:00
AlexDBlack
01cb57041a
Merge
...
Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-08-19 18:46:47 +10:00
raver119
aceb915557
[WIP] tests fixes ( #130 )
...
* no openmp for ClipByGlobalNorm
Signed-off-by: raver119 <raver119@gmail.com>
* one more bfloat16 rng test
Signed-off-by: raver119 <raver119@gmail.com>
* assertion fix
Signed-off-by: raver119 <raver119@gmail.com>
* - legacy IsMax gone
- linear IsMax gets shapeInfo argument
Signed-off-by: raver119 <raver119@gmail.com>
* get rid of legacy IsMax tests
Signed-off-by: raver119 <raver119@gmail.com>
* IsMax is custom op now
Signed-off-by: raver119 <raver119@gmail.com>
* more blocks for ismax
Signed-off-by: raver119 <raver119@gmail.com>
* one more test
Signed-off-by: raver119 <raver119@gmail.com>
* - sqrt test
- some legacy code removed from CudaExecutioner
- Transforms.asin tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* - TransformFloat fix
Signed-off-by: raver119 <raver119@gmail.com>
* - ismax fix
- SpaceToBatchND/BatchToSpaceND wrappers
- couple of legacy tests removed
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-19 11:33:15 +03:00
raver119
bb80fe4f94
Merge remote-tracking branch 'origin/master'
2019-08-17 14:52:13 +03:00
raver119
8944e5f67f
no thread_local for cpu
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-17 14:51:54 +03:00
raver119
56910ddee7
no thread_local for cpu
...
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-17 14:51:39 +03:00
raver119
e22880fd76
[WIP] Roll rewritten ( #128 )
...
* Process correct input vector.
* Added tests for roll.
* Refactored roll to conform with TF. Eliminated memory leaks with Roll op tests.
2019-08-17 14:15:08 +03:00