Commit Graph

303 Commits (b686368b82027a756572799182b14c97f8892b44)

Author SHA1 Message Date
shugeo 4187190609 Shugeo release fix2 (#70)
* Corrected input checking and tests for bitcast op.

* Fixed an issue with non_max_suppression form generation and processing with score threshold given.

* Fixed bilinear resize kernel and tests.

* push for Serhii

Signed-off-by: raver119 <raver119@gmail.com>

* Added test for nearest_neighbor resize with int input.

* Added data type check for input/output match.

* Eliminate error in macros.

* Improved output message for type checking.

* Fixed input/output types for op.

* Eliminated waste logging.

* Refactored resize_bilinear helper for multithreading for cpu platform.

* Cosmetic changes only.

* Fixed error for string substitution.

* Skip test for cbow_batch with cuda.

* fix for resizeNearestNeighbor output dtype

Signed-off-by: raver119 <raver119@gmail.com>

* Refactored non_max_suppression helper.

* Refactored shape generation and input handling.

* Added additional test.
2019-11-22 22:42:44 +03:00
Yurii Shyrma 7a90a31cfb
Shyrma deconv3 (#69)
* - profiling cuda kernels for vol2col and im2col

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct addBias helper

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct mkl dilation formula and switch off mkl api for dilation deconvolutions

Signed-off-by: Yurii <iuriish@yahoo.com>
2019-11-21 21:17:30 +02:00
raver119 064a56ccf1
Few fixes (#66)
* skip legacy transforms execution in case of empty input arrays

Signed-off-by: raver119 <raver119@gmail.com>

* - BroadcastBool ops now accept extraParams to make MatchCondition possible
- TrueBroadcastHelper now uses samediff::threads

Signed-off-by: raver119 <raver119@gmail.com>

* java side

Signed-off-by: raver119 <raver119@gmail.com>

* trigger jenkins

Signed-off-by: raver119 <raver119@gmail.com>

* update LessThanOrEqual opNum mapping

Signed-off-by: raver119 <raver119@gmail.com>

* update LessThanOrEqual opNum mapping

Signed-off-by: raver119 <raver119@gmail.com>
2019-11-21 15:43:03 +03:00
raver119 83cb0d9329
[WIP] Create and small fix (#67)
* - create op
- skip exec for empty inputs for non_max_suppression
- EmptyHandling idea

Signed-off-by: raver119 <raver119@gmail.com>

* Create op and mapping for it

Signed-off-by: raver119 <raver119@gmail.com>
2019-11-21 13:31:20 +03:00
shugeo dc0036f2c6
Shugeo image resize bicubic (#56)
* Added implementation files for image_resize and resize_bicubic ops.

* Image resize and image.resize_bicubic ops implementation. Initial revision.

* Finished with infrastructure development for image.resize_bilinear op and image_resizo op implementation.

* Refactored resize methods.

* Added processing for Mitchelcubic algorithm.

* Added check for input/output sizes.

* Added int and float types for crop_and_resize op.

* Refactored crop_and_resize output type check.

* Added helper for bicubic interpolation as TF v.1 does.

* Added TF v.1 bicubic helper for cuda platform.

* Added cached class for bicubic algorithm.

* Refactored cuda implementation for crop_and_resize helper to use proper output type.

* Added facilities for bicubic interpolation.

* Portion bicubic interpolation from TF.

* Added tests for resize_bilinear testing.

* Working implementation of bicubic interpolation and tests.

* Refactored routines with image_resize bicubic op helper.

* Refactored code with coding standards.

* Refactored cpu helpers for resize_bicubic op.

* Refactored bicubic helpers.

* Added bicubic resize facilities.

* Implementing cuda kernels for bicubic interpolation. Implementation step.

* Cuda implementation of resize_bicubic op helper.

* Refactor image.resize_bicubic op helpers.

* Refactored helpers for resize_bicubic. Added error checking with cuda implementation.

* Refactored cuda implementation of resize_bicubic op helper. The first working revision.

* Cuda arch implementation for resize_bicubic op helper. Full working single-threaded revision.

* Intermediate bicubic interpolation helper for cuda.

* Refactored cpu helper for resize_bicubic.

* Multithreaded cuda implementation for resize_bicubic.

* Fixed merge issues.

* Refactored nlp helpers.

* Replicated resize_bicubic for 3D also.

* Eliminated waste comments of unused code.

* Eliminated waste comments with unused code.

* Eliminated waste template definitions.

* Eliminated waste debug code.

* Eliminated waste comments.

* Fixed multithreading with helpers.

* Fixed test suites for float and double in float point input lists.

* Fixed usage of reshape with 3D/4D on resizes.

* Final fixes.

* Fixed resize_neighbor op problem.
2019-11-20 21:11:04 +02:00
shugeo 13e5c0a280
Shugeo release fix1 (#61)
* Added a pair of tests for failed ops.

* Fixed cpu helper for draw_bounding_boxes op.

* Refactored implementation of draw_bounding_boxes op to full conform with TF.

* Improved multithreading with draw_bounding_boxes op cuda helper.

* Eliminated log messages.

* Changed logging with draw_bounding_boxes op helper and tests.

* Resize_biliear with 3D input allowed.

* Refactored 3D input acception with resize_bilinear op.

* And another improvement.

* Refactored reshape of input/output for resize_bilinear.

* Improvements final.

* Finished with 3D replication for image.resize_bilinear/_nearest_neighbor.

* Added copyrights for TF code.

* Using new form of multithreading for cpu implementation.

* Fixed shape error.

* Added multithreaded with batches on crop_and_resize functor.

* Refactored multithreading with crop_and_resize and draw_bounding_boxes.
2019-11-20 13:37:48 +02:00
raver119 59e955cedc
- MKL-DNN version upgrade to 1.1.x (#62)
- MKL-DNN namespace changes to match DNNL rename

Signed-off-by: raver119 <raver119@gmail.com>
2019-11-20 13:23:08 +03:00
raver119 7898f3c0cc
fix for is_increasing/non_decreasing ops for empty input case (#63)
Signed-off-by: raver119 <raver119@gmail.com>
2019-11-20 11:12:15 +03:00
Yurii Shyrma 66b84b38cf
Shyrma mmul (#58)
* - get rid of some copy procedures in mmulHelper ops

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further work on embedding cuda api for batched gemm (cublasGemmBatchedEx) in our mmulHelper class

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further work on cuda batched gamm api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - write own cuda kernel performing batched gemm

Signed-off-by: Yurii <iuriish@yahoo.com>

* missing include in MmulHelper

Signed-off-by: raver119 <raver119@gmail.com>

* - forgot to keep in code previous correct kernels for mmulNxN, since it may happen that new onw will fail for some reason in future

Signed-off-by: Yurii <iuriish@yahoo.com>

* disable old tensordot

Signed-off-by: raver119 <raver119@gmail.com>

* - rewrite cuda kernels for usualGemm and usualGemv

Signed-off-by: Yurii <iuriish@yahoo.com>

* - profiling mmul helpers

Signed-off-by: Yurii <iuriish@yahoo.com>

* - prints to check shapes were added

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct type of output array Cin mmulNxN

Signed-off-by: Yurii <iuriish@yahoo.com>

* - take into account possible nans in C array

Signed-off-by: Yurii <iuriish@yahoo.com>

* slightly change numThreads message

Signed-off-by: raver119 <raver119@gmail.com>

* - make corrections in accordance to given notes in pr review

Signed-off-by: Yurii <iuriish@yahoo.com>
2019-11-19 15:39:36 +02:00
Alex Black da1944e8e1
SameDiff TF import (#49)
* Added implementation files for image_resize and resize_bicubic ops.

* Image resize and image.resize_bicubic ops implementation. Initial revision.

* Minor fix

* Some TF imports disabled.

* Finished with infrastructure development for image.resize_bilinear op and image_resizo op implementation.

* Refactored resize methods.

* Added processing for Mitchelcubic algorithm.

* adjust_contrast

* Small fix for TF import expected value loading when variable name starts with the test name

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Tests

* Tests added.

* Removed tf names absent in mapping.

* Some fixes.

* Small fixes

* Minor change

* Some failing tests.

* Disable failed test

* Ignore some tests

* Fix import class mapping

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fix float property mapping (flatbuffers)

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Override equality function for model 'dropout'

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fail tests

* Failed tests ignored temporarily.

* Minor fixes

* Small fix

* Conflict resolved

* Default implementations of tensorflowName and onnxName
2019-11-19 22:44:29 +11:00
raver119 bbd59a3537
fake quant dtype validation fix (#60)
Signed-off-by: raver119 <raver119@gmail.com>
2019-11-19 12:53:52 +03:00
raver119 db7ca956c5
[WIP] Mish (#55)
* Mish activation function and its derivative

Signed-off-by: raver119 <raver119@gmail.com>

* signature fix

Signed-off-by: raver119 <raver119@gmail.com>

* mish as activation for dl4j

Signed-off-by: raver119 <raver119@gmail.com>

* javadoc

Signed-off-by: raver119 <raver119@gmail.com>

* minor optimization

Signed-off-by: raver119 <raver119@gmail.com>
2019-11-18 13:21:26 +03:00
raver119@gmail.com 9101a0ee15 build fix for clang
Signed-off-by: raver119@gmail.com <raver119@gmail.com>
2019-11-16 22:18:50 +03:00
raver119 1780dcc883
[WIP] Small fixes here and there (#50)
* one range test

Signed-off-by: raver119 <raver119@gmail.com>

* few Context convenience singatures

Signed-off-by: raver119 <raver119@gmail.com>

* one more range test

Signed-off-by: raver119 <raver119@gmail.com>

* "range" "fix"

Signed-off-by: raver119 <raver119@gmail.com>

* adjuct_contrast_v2 now allows scale factor to be provided via input_variable

Signed-off-by: raver119 <raver119@gmail.com>

* adjust_contrast now allows scale factor as variable too

Signed-off-by: raver119 <raver119@gmail.com>

* bitcast shape tests

Signed-off-by: raver119 <raver119@gmail.com>

* BitCast import dtype added

Signed-off-by: raver119 <raver119@gmail.com>

* few more BitCast signatures

Signed-off-by: raver119 <raver119@gmail.com>
2019-11-15 17:04:29 +03:00
Yurii Shyrma 62d8e0d409 - make agreement between our and mkl api dilation/padding formulas (#47)
Signed-off-by: Yurii <iuriish@yahoo.com>
2019-11-14 20:21:22 +03:00
raver119 1eb3de90d7
[WIP] Platform helpers switches (#44)
* - platform helpers can be disabled on per-op basis now via Context::allowHelpers
- java has access to it as well

Signed-off-by: raver119 <raver119@gmail.com>

* global platform-helpers trigger

Signed-off-by: raver119 <raver119@gmail.com>

* few signatures renamed

Signed-off-by: raver119 <raver119@gmail.com>

* - few new env variables to follow
- maxThreads/masterThreads differentiation

Signed-off-by: raver119 <raver119@gmail.com>

* Javadoc update

Signed-off-by: raver119 <raver119@gmail.com>
2019-11-14 14:35:02 +03:00
Alex Black 47d19908f4
Various fixes (#43)
* #8172 Enable DL4J MKLDNN batch norm backward pass

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8382 INDArray.toString() rank 1 brackets / ambiguity fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8308 Fix handful of broken links (inc. some in errors)

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Unused dependencies, round 1

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Unused dependencies, round 2

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Unused dependencies, round 3

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Small fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Uniform distribution TF import fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-11-14 19:38:20 +11:00
raver119 48df1acdfb
[WIP] ThreadPool (#8)
This PR removes OpenMP use in 95% of cases
2019-11-13 17:04:59 +03:00
raver119 f05c6ee139 INLINE_LOOPS for windows
Signed-off-by: raver119 <raver119@gmail.com>
2019-11-12 15:12:31 +03:00
Alex Black 18c01f5bdc
Add SameDiff memory reuse memory manager (array cache) (#39)
* Attention op comments

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* ArrayCacheMemoryMgr - first pass

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Tweak array cache for use with SameDiff identity arrays

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* ArrayCacheMemoryMgr javadoc and properly get max memory

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* LRU cache policy + add tests

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fixes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Resize arrays internally if required for ArrayCacheMemoryMgr

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Test improvement

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Small polish

Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-11-12 21:15:44 +11:00
Yurii Shyrma 0eda1e733e Shyrma bnorm bp (#41)
Batchnorm backprop mkldnn
2019-11-12 11:58:48 +03:00
raver119 929c1dc5c7 - new NDArrayFactory scalar constructor
- minor tweak in randomuniform
- one more test

Signed-off-by: raver119 <raver119@gmail.com>
2019-11-08 08:49:41 +03:00
raver119 51f3a1371d
[WIP] Random Uniform (#36)
* args

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* T args

Signed-off-by: raver119 <raver119@gmail.com>
2019-11-07 17:09:47 +03:00
shugeo 679e42199a Shugeo strided slice bp fix2 (#33)
* Fixed crash and restored brocken functionality for strided slice.

* Added comments for strided_slice_bp main step.
2019-11-07 13:44:02 +03:00
shugeo 08853c7829
Shugeo random uniform int (#30)
* Corrected randomuniform declaration.

* Refactored uniform distribution for both cuda and cpu platforms.

* Refactored uniform distribution and tests.

* Fixed type usage with indices.

* Refactored uniform distribution implementation and tests to full conform with TF implementation.

* Refactored gamma function to use type util method.

* Copyright changes and fixes with ConstantHelper.

* Added error checking on allocate cuda device memory and operations.
2019-11-06 12:49:27 +02:00
Yurii Shyrma 871f3bb3e6
- add additional condition in svd helper to take into account rounding errors (#31)
Signed-off-by: Yurii <iuriish@yahoo.com>
2019-11-05 17:16:17 +02:00
shugeo 9124974e3b
Fixed crash with strided_slice_bp op and tests. (#29) 2019-11-05 12:49:15 +02:00
shugeo 7b14a9f603
Gamma and Poisson distributions (#27)
* Added implementation for random_gamma op.

* Added implementation for random_poisson op and support classes.

* Added helpers for random_poisson and random_gamma ops.

* Implementation of random_poisson. The first working edition.

* Implementation of random_poisson. Parallelized working edition.

* Implementation of random_gamma. Parallelized working edition with alpha only.

* Added cuda implementation for helper of poisson distribution.

* Corrected shape calculation with random_gamma and tests.

* Finished cpu implementation for gamma distribution.

* Finished cuda implementation for random_gamma op.

* Refactored cpu helpers for random_gamma and random_poisson ops.

* Refactored cuda helpers for gamma and poisson distribution.

* Refactored cuda helper for gamma distribution.

* Refactored cpu helper for random_poisson op.

* Refactored cpu helper for random_gamma op.
2019-11-04 15:42:28 +02:00
Alex Black 948ebef41c
Op Fixes (#28)
* #8280 biasadd_bp nchw arg fixes (java side) + test

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8285 Concat op Java side fixes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Concat op cpp fix - allow dynamic axis to be negative, same as static axis

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* ignores for deconv3d import tests until deconv3d_tf op is implemented

Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-11-05 00:05:04 +11:00
Yurii Shyrma 0cdb5750e0
Shyrma concat (#24)
* - provide possibility to pass axis as last input array in concat op
- corrcect sumation in bias_add_bp op for NHWC case

Signed-off-by: Yurii <iuriish@yahoo.com>

* - write code for deconv2d op based on mkl dnn api

* no unsafe math

Signed-off-by: raver119 <raver119@gmail.com>

* no unsafe math

Signed-off-by: raver119 <raver119@gmail.com>

* - get rid of e<> and p<> methods in svd helper

Signed-off-by: Yurii <iuriish@yahoo.com>

* - provide mkl api support for deconvolution 3d

Signed-off-by: Yurii <iuriish@yahoo.com>

* - write deconv2d_bp based on mkl api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - write deconv3d_bp based on mkl api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - testing and fixing deconv based on mkl api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - remove dilation form conv2d/3d mkl

Signed-off-by: Yurii <iuriish@yahoo.com>

* - minor changes

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further corrections of deconv ops based on mkl dnn api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - provide deconv2d_tf based on mkl dnn api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - add minor corrections required by reviewer

Signed-off-by: Yurii <iuriish@yahoo.com>
2019-11-03 12:37:19 +02:00
shugeo 95f7ad7b94
Shugeo suppression overlaps (#9)
* Added non_max_suppression_overlaps op and tests.

* Refactored implementation of non_max_suppression_overlaps.

* Refactoring of implementation of non_max_suppression_overlaps op.

* Refactoring of implementation of non_max_suppression op.

* Fixed portion error.

* Added cuda frontends for image suppression ops.

* Eliminated crash with cuda arch on image.non_max_suppression_overlaps op.

* Improved implementation of image_suppression helper for cpu platform.

* The generic approach of non_max_suppression_overlaps op helper with cuda platform.

* Working cuda implementation of helper non_max_suppression_overlaps op.

* Eliminated waste comments.

* Improved implementations for both platforms

* Refactored cuda implementation of image.non_max_suppression_overlaps op helper.

* Improved cuda implementation of non_max_suppression op helper.

* Refactored cuda implementation of image.non_max_suppression_overlaps op helper.

* Improved cuda implementation of image.non_max_suppression_overlaps op helper.

* Added modifications into cuda implementation for image suppression overlaps op.

* Correct queue emulation with cuda implementation of non_max_suppression_overlaps op.

* Prefinal stage of cuda implementation of non_max_suppression_overlaps.

* Worked cuda implementation of non_max_suppresion_overlaps helper.

* Fixed return to proper thread.

* Improvements for cuda implementation of image.non_max_suppression_overlaps op helper.

* Fixed implementation issues with non_max_suppression_overlaps on cuda platform.

* Fixed skip for non_max_suppression_overlaps on cuda platform.

* Finalize implementation of image_suppression helper and tests.

* Cosmetic changes only.
2019-10-30 13:43:45 +02:00
Yurii Shyrma 029a69a835
Shyrma bn mkl bp (#14)
* - write code for new batchnorm backprop

Signed-off-by: Yurii <iuriish@yahoo.com>

* - testing batchnorm backprop

Signed-off-by: Yurii <iuriish@yahoo.com>

* - write code for batchnorm backprop based on mkl dnn api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - testing and fixing bugs in batchnorm_bp mkl dnn

Signed-off-by: Yurii <iuriish@yahoo.com>

* - made corrections required by reviewer

Signed-off-by: Yurii <iuriish@yahoo.com>

* - change name in java wrapper for batchnorm op

Signed-off-by: Yurii <iuriish@yahoo.com>
2019-10-26 14:14:21 +03:00
Alex Black d333d29099
SameDiff cleanup and fixes (#12)
* #8160 Remove resolvePrepertiesFromSameDiffBeforeExecution

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* SameDiff API cleanup

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* More SameDiff cleanup

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Small fixes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8248 Switch SameDiff variable init from lazy to creation time for more predictable behaviour

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8252 TanhDerivative javadoc

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8225 Deconvolution2D input validation

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8265 Switch SameDiff.outputs() to user settable, instead of unreliable 'best guess'

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8224 SameDiff.zero and .one create constants, not variables

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* More cleanup and fixes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Small test fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Small fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* DL4J SameDiff fixes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Re-add hack for Deconvolution2DLayer until #8315 is resolved

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8270 Move CUDA device/version logging to Java; can be disabled via existing org.nd4j.log.initialization system property

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* All ND4J init logging checks system property

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Small tweak

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Remove redundant device logging

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* One more fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* UX improvements

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Deconv fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Add deconv tests

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Cleanup

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Remove debug code

Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-10-26 12:38:08 +11:00
Alex Black 3f0b4a2d4c
SameDiff execution, TF and memory management overhaul (#10)
* SameDiff execution memory management improvements, round 1

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Round 2

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Round 3

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Clear node outputs closed array references; Slight change to OpValidation internals to not rely on cached op outputs

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Next steps

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Next step

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Cleanup

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* More polish

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Add WeakIdentityHashmap

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Session fixes for control ops and next steps

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* More fixes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fixes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* First steps for training session + in-line updating

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Next steps

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fix losses and history during training

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* More fixes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* BiasAdd and other fixes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Don't use SDVariable.getArr() in TFGraphTestAllHelper (import tests)

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Small fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Cleanup

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* First steps for new dependency tracking approach

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Start integrating dependency tracking for memory management

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Non-control op dependency tracking works/passes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Switch/merge

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Next steps

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Cleanup and next steps

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Small fixes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fix issue dependency tracking for initial variables/constants

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Add check for aliases when determining if safe to close array

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* First pass on new TF graph import class

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Import fixes, op fixes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Cleanup and fixes for new TF import mapper

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* More cleanup

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Partial implementation of new dependency tracker

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Next steps

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* AbstractDependencyTracker for shared code

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Overhaul SameDiff graph execution (dependency tracking)

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* More fixes, cleanup, next steps

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Ad no-op memory manager, cleanup, fixes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fix switch dependency tracking

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* INDArray.toString: no exception on closed arrays, just note closed

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fix enter and exit dependency tracking

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* TensorArray memory management fixes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Add unique ID for INDArray instances

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fix memory management for NextIteration outputs in multi-iteration loops

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Remove (now unnecessary) special case handling for nested enters

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Cleanup

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Handle control dependencies during execution; javadoc for memory managers

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Cleanup, polish, code comments, javadoc

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Cleanup and more javadoc

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Add memory validation for all TF import tests - ensure all arrays (except outputs) are released

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Clean up arrays waiting on unexecuted ops at the end of execution

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fixes for enter op memory managent in the context of multiple non-nested loops/frames

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fix order of operation issues for dependency tracker

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Always clear op fields after execution to avoid leaks or unintended array reuse

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Small fixes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Re-implement dtype conversion

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fix for control dependencies execution (dependency tracking)

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fix TF import overrides and filtering

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fix for constant enter array dependency tracking

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* DL4J Fixes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* More DL4J fixes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Cleanup and polish

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* More polish and javadoc

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* More logging level tweaks, small DL4J fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Small fix to DL4J SameDiffLayer

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fix empty array deserialization, add extra deserialization checks

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* FlatBuffers control dep serialization fixes; test serialization as part of all TF import tests

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Variable control dependencies serialization fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fix issue with removing inputs for ops

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* FlatBuffers NDArray deserialization fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* FlatBuffers NDArray deserialization fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Small fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Small fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Final cleanup/polish

Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-10-23 21:19:50 +11:00
Alexander Stoyakin f31661e13b
Merge pull request #7 from KonduitAI/asto_nd4s_10172019
KDTree optimization
2019-10-23 12:11:25 +03:00
Yurii 99be467f76 - minor change in recurrent.h
Signed-off-by: Yurii <iuriish@yahoo.com>
2019-10-17 20:46:51 +03:00
Yurii 70bd925abd - write 2 versions of new lstmLayer: one is based on own code, second uses mkl dnn api 2019-10-17 20:44:52 +03:00
Alexander Stoyakin 630bb3c9b6
Merge pull request #2 from KonduitAI/asto_ops_wrapper
[WIP] New ops wrapper
2019-10-16 20:21:50 +03:00
shugeo 3662657d5c
Merge pull request #1 from KonduitAI/shugeo_gamma
Shugeo gamma
2019-10-16 18:49:33 +03:00
shugeo 24a2b2933f Added gamma and lgamma functions. 2019-10-16 18:22:18 +03:00
Alexander Stoyakin 96a9a1a733 Fixed output from operation. 2019-10-16 18:07:52 +03:00
shugeo 7617682a46 Added declarations for igamma and igammac ops. 2019-10-16 14:45:10 +03:00
shugeo 478a0c1f97 Added igamma and igammac broadcastable ops implementations and tests. 2019-10-16 14:02:53 +03:00
shugeo 7103aca8c5 Added broadcastable IGamma and IGammac ops. 2019-10-16 13:58:32 +03:00
shugeo f90e6da97e Added nd4j_gamma, nd4j_igamma and nd4j_igammac functions. 2019-10-16 13:53:31 +03:00
shugeo df2448613e Added gamma distribution functions. 2019-10-15 20:00:07 +03:00
AlexDBlack 2d750b69e5 Merge remote-tracking branch 'konduit/master' 2019-10-14 17:21:23 +11:00
shugeo ace65355c5 Added doc for fake_quant_with_min_max* op helpers cuda implementations. 2019-10-10 18:35:28 +03:00
shugeo c890de5a7b Added doc for fake_quant_with_min_max* op helpers implementations. 2019-10-10 18:31:17 +03:00
shugeo c3f755d975 Refactored helpers both for cuda and cpu platforms. 2019-10-10 18:02:49 +03:00
shugeo a09cb5e2be Added doc for fake_quant_with_min_max_per_channel op declaration. 2019-10-10 17:13:33 +03:00
shugeo 92636b0b86 Eliminated waste operator. 2019-10-10 17:08:59 +03:00
shugeo d5b352273d Implementation of cuda kernel for fake_quant_with_min_max_vars_per_channels op. Final revision. 2019-10-10 16:51:29 +03:00
shugeo 02d8616692 Implementation of cuda kernel for fake_quant_with_min_max_vars_per_channels op. 2019-10-10 16:40:56 +03:00
shugeo 3504b0cda9 Implemented fake_quant_with_min_max_vars_per_channel fop cuda helper. The first working revision. 2019-10-10 15:44:50 +03:00
shugeo 753565145c Refactored fake_quant_with_min_max_vars op cuda implementation. 2019-10-10 14:00:49 +03:00
shugeo c13e945a96 Fixed fake_quant_with_min_max_vars op and tests. 2019-10-10 13:23:11 +03:00
shugeo 3c0c59ab88 Refactored fake_quant_with_min_max_vars op. 2019-10-09 22:09:33 +03:00
shugeo 352f1eee80 Implemented fake_quant_with_min_max_per_channel helper for cpu platform. The first approach. 2019-10-09 21:39:59 +03:00
shugeo d0cbd33b0e Added input checks for op. 2019-10-09 15:52:13 +03:00
shugeo cb56b0b06a The first approach for fake_quant_with_min_max_vars_per_channel op implementation. 2019-10-08 19:00:41 +03:00
shugeo 8fe5a1fa96 The working implementation of draw_bounding_boxes op. 2019-10-08 15:42:27 +03:00
shugeo 30a8af566c The first working implementation of cuda kernel for draw_bounding_boxes op helper. 2019-10-08 13:45:18 +03:00
shugeo ae09cfee32 Next approach of cuda imlementation for draw_bounding_boxes op helper. 2019-10-08 00:09:46 +03:00
shugeo 6cf3a8fa9c Refactored cpu implementatio and added cuda aproach. 2019-10-07 17:51:07 +03:00
shugeo 78443ffebf Working implementation of draw_bounding_boxes op for cpu. 2019-10-07 15:04:44 +03:00
shugeo 16a66a65e3 Added helper declaration for draw_bounding_boxes op. 2019-10-04 21:16:34 +03:00
shugeo 53a2ebddbe Added test and helpers for draw_bounding_boxes op both cpu and cuda related. 2019-10-04 20:46:26 +03:00
shugeo 8f70b4441f draw_bounding_boxes op implementation. Inital revision. 2019-10-04 18:32:21 +03:00
shugeo 908e4c4912 Added implementation for divide_no_nan op and tests. 2019-10-04 10:29:15 +03:00
raver119 cff26f13c5
Revert "Implement divide_no_nan op." 2019-10-03 20:25:52 +03:00
shugeo 6eaca179d6 Implement divide_no_nan op. 2019-10-03 18:22:17 +03:00
shugeo 130ee25682 Implemented compare_and_bitpack op. 2019-10-03 10:57:48 +03:00
shugeo f3e42173ef Refactored buffer copying to avoid wrong usage of buffers. 2019-10-02 16:51:09 +03:00
shugeo 1c6173d218 Added implementation of bitcast op. 2019-10-02 15:04:59 +03:00
shugeo a27e61553a Added tests and fixed op name. 2019-10-02 15:04:28 +03:00
shugeo 863ff76878 Added declaration for bincast op. 2019-10-02 12:17:00 +03:00
shugeo afeb524238 Refactored implementation for adjust_contrast ops. 2019-10-01 14:13:09 +03:00
shugeo 1575c704ae Added implementation for adjust_contrast_v2 op and tests. 2019-10-01 11:44:27 +03:00
raver119 44a8d19ac6
[WIP] Broadcast changes (#8257)
* - provide correct call NDArray::applyBroadcast inside of NDArray::applyTrueBroadcast

Signed-off-by: Yurii <yurii@skymind.io>

* - provide new trueBroadcast helper

Signed-off-by: Yurii <yurii@skymind.io>

* example for yurii

Signed-off-by: raver119 <raver119@gmail.com>

* - provide new trueBroadcast helper for cpu

Signed-off-by: Yurii <yurii@skymind.io>

* - start working on new trueBroadcat helper for cuda

Signed-off-by: Yurii <yurii@skymind.io>

* - further work on trueBroadcast for cuda

Signed-off-by: Yurii <yurii@skymind.io>

* - fix bugs in cuda helper trueBroadcast

Signed-off-by: Yurii <yurii@skymind.io>
2019-10-01 09:10:19 +03:00
shugeo e06dfb5dcc Implementation of adjust_contrast op. 2019-09-30 18:24:12 +03:00
AlexDBlack a66e03355e Merge remote-tracking branch 'fork/master' 2019-09-12 12:20:57 +10:00
raver119 98e2814879
Platform helpers (#8216)
* platform helpers draft

Signed-off-by: raver119 <raver119@gmail.com>

* typo

Signed-off-by: raver119 <raver119@gmail.com>

* disable platform cmake

Signed-off-by: raver119 <raver119@gmail.com>

* another draft

Signed-off-by: raver119 <raver119@gmail.com>

* mkldnn convolution refactored

Signed-off-by: raver119 <raver119@gmail.com>

* minor tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* one more safety check

Signed-off-by: raver119 <raver119@gmail.com>

* prototype works

Signed-off-by: raver119 <raver119@gmail.com>

* meh

Signed-off-by: raver119 <raver119@gmail.com>

* force static library mode for mkldnn

Signed-off-by: raver119 <raver119@gmail.com>

* - ismax fix
- experimental arg fix
- don't enforce openblas on Apple hardware

Signed-off-by: raver119 <raver119@gmail.com>

* bunch of small fixes

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* declare concurrent

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* - MKLDNN version upgrade to 1.0.2
- avgpool2d/maxpool2d APIs update

Signed-off-by: raver119 <raver119@gmail.com>

* - avgpool2d_bp/maxpool2d_bp APIs update

Signed-off-by: raver119 <raver119@gmail.com>

* - conv2d/batchnorm APIs update

Signed-off-by: raver119 <raver119@gmail.com>

* - lrn/conv2d_bp/conv3d/conv3d_bp APIs update

Signed-off-by: raver119 <raver119@gmail.com>

* all ops converted to MKLDNN 1.x

Signed-off-by: raver119 <raver119@gmail.com>

* bunch of tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* namespace for platform helpers

Signed-off-by: raver119 <raver119@gmail.com>

* make sure platform helpers aren't opimized out

Signed-off-by: raver119 <raver119@gmail.com>

* build cpu_features on x86 systems

Signed-off-by: raver119 <raver119@gmail.com>

* build cpu_features on x86 systems

Signed-off-by: raver119 <raver119@gmail.com>

* more of cpu_features

Signed-off-by: raver119 <raver119@gmail.com>

* - mkldnn removed from java
- cpu_features checks in CpuNDArrayFactory

Signed-off-by: raver119 <raver119@gmail.com>

* F16C definition renamed

Signed-off-by: raver119 <raver119@gmail.com>

* some mkldnn rearrangements

Signed-off-by: raver119 <raver119@gmail.com>

* check supported instructions before doing anything

Signed-off-by: raver119 <raver119@gmail.com>

* typo

Signed-off-by: raver119 <raver119@gmail.com>

* missied impl

Signed-off-by: raver119 <raver119@gmail.com>

* BUILD_PIC option

Signed-off-by: raver119 <raver119@gmail.com>

* conv2d fix

Signed-off-by: raver119 <raver119@gmail.com>

* avgpool3d fix

Signed-off-by: raver119 <raver119@gmail.com>

* avgpool3d_bp fix

Signed-off-by: raver119 <raver119@gmail.com>

* avgpool2d_bp leak fix

Signed-off-by: raver119 <raver119@gmail.com>

* avgpool3d_bp leak fix

Signed-off-by: raver119 <raver119@gmail.com>

* maxpool bp leaks fixed

Signed-off-by: raver119 <raver119@gmail.com>

* printf removed

Signed-off-by: raver119 <raver119@gmail.com>

* batchnorm fix

Signed-off-by: raver119 <raver119@gmail.com>

* AVX warning/error polishing

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* More polish

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Polish

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* remove previous MKL-DNN support layer

Signed-off-by: raver119 <raver119@gmail.com>

* avx2 tweak

Signed-off-by: raver119 <raver119@gmail.com>

* allow static for apple

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* exclude mkldnn in one more place

Signed-off-by: raver119 <raver119@gmail.com>

* exclude mkldnn in one more place

Signed-off-by: raver119 <raver119@gmail.com>

* restore OPENBLAS_PATH use

Signed-off-by: raver119 <raver119@gmail.com>

* add runtime check for avx/avx2 support

Signed-off-by: raver119 <raver119@gmail.com>

* convolution_auto

Signed-off-by: raver119 <raver119@gmail.com>

* Add logic for helper argument

* minor test fix

Signed-off-by: raver119 <raver119@gmail.com>

* few tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* few tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* skip OpTracker props for non-x86 builds

Signed-off-by: raver119 <raver119@gmail.com>

* linux arm isn't x86 :)

Signed-off-by: raver119 <raver119@gmail.com>

* avx-512

Signed-off-by: raver119 <raver119@gmail.com>

* CUDA presets fix

Signed-off-by: raver119 <raver119@gmail.com>

* BUILD_PIC

Signed-off-by: raver119 <raver119@gmail.com>

* prefetchw for avx2

Signed-off-by: raver119 <raver119@gmail.com>

* BUILD_PIC again

Signed-off-by: raver119 <raver119@gmail.com>
2019-09-11 21:50:28 +03:00
shugeo e1a7460f8e Shugeo cuda doc2 (#255)
* Added comments to tileKernel routine.

* Refactored kernel and added doc to it.

* Refactored setDiagonal kernel and added doc for it.

* Added doc for tnse cuda helpers.

* Added doc for diag kernels.

* Added doc for kernel.

* Refactored code with fake quantization.

* Added docs for image resize and crop kernels.

* Added docs for image suppression helpers.

* Added docs to matrix_band helpers.

* Added docs for matrix_diag_part and nth_element helpers.

* Fixed syntax error and refactored getIndexOffset usage.
2019-09-11 21:04:43 +03:00
raver119 589401477d
[WIP] bunch of improvements (#257)
* - profiling bias_add op
- add some docementation

Signed-off-by: Yurii <yurii@skymind.io>

* - minor change

Signed-off-by: Yurii <yurii@skymind.io>

* - provide addBias cuda kernel

Signed-off-by: Yurii <yurii@skymind.io>

* - improve shape::getIndexOfffset and change its signature

Signed-off-by: Yurii <yurii@skymind.io>

* - same as previous

Signed-off-by: Yurii <yurii@skymind.io>

* - improve and change signature in some shape:: stuff which has to do with calculation of offsets for array elements

Signed-off-by: Yurii <yurii@skymind.io>

* - minor changes in flatten

Signed-off-by: Yurii <shyrma@skymind.io>

* - add function shape::getIndexOffsetOrdered

Signed-off-by: Yurii <shyrma@skymind.io>

* - correct shape::getIndexOffsetOrdered()

Signed-off-by: Yurii <shyrma@skymind.io>

* - move getIndexOffsetOrdered to flatten.h header in order to isolate this function

Signed-off-by: Yurii <shyrma@skymind.io>
2019-09-11 20:12:09 +03:00
raver119 ffae024cda disable threadlocal for apple
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-10 07:46:35 +03:00
shugeo c9f8a904ad Shugeo cuda docs1 (#249)
* Comments axis shifts.

* Fixed LUP solver usage. Added helpers doc.

* Switch off OMP for roll and lup. Fixed omp usage for ClipByGlobalNorm.

* Switch off omp for ClipByGlobalNorm to reduce omp ambigiousness.
2019-09-09 16:27:45 +03:00
raver119 1de9fb218e
- bits_hamming_distance dtype fix (#8208)
- DataTypeUtils::asString fixe + new dtypes added

Signed-off-by: raver119 <raver119@gmail.com>
2019-09-06 08:59:05 +03:00
raver119 46f8c58502 - bits_hamming_distance dtype fix
- DataTypeUtils::asString fixe + new dtypes added

Signed-off-by: raver119 <raver119@gmail.com>
2019-09-06 08:57:53 +03:00
AlexDBlack a76a44e198 Merge remote-tracking branch 'fork/master' 2019-09-05 22:04:25 +10:00
raver119 fc222d3325 logging off
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-05 14:45:46 +03:00
Ryan Nett 79867f5c5a cleanup SDRNN and rnn ops (#238)
Signed-off-by: Ryan Nett <rnett@skymind.io>
2019-09-05 12:25:03 +10:00
AlexDBlack b7226bdd7a Merge
Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-09-05 00:54:11 +10:00
shugeo 548044a1e2 Shugeo doc (#235)
* Actualized doc to tnse ops.

* Added comments for dynamic_stitch op.

* Added comments to dynamic_stitch op implementation.

* Modified comment for unstack_list op.

* Added doc for space_to_depth and depth_to_space ops.

* Added doc for space_to_batch op.

* Enlarge test type for adjustSaturation.

* Added doc for runner.
2019-09-04 14:57:59 +03:00
raver119 a90c7dd995
[WIP] Last set of changes (#234)
* mmul op instead of cublasSgemm

Signed-off-by: raver119 <raver119@gmail.com>

* transB

Signed-off-by: raver119 <raver119@gmail.com>

* jcpp handles

Signed-off-by: raver119 <raver119@gmail.com>

* bitwise and/or/xor

Signed-off-by: raver119 <raver119@gmail.com>

* bitwise and/or/xor mapping

Signed-off-by: raver119 <raver119@gmail.com>

* cuda/cublas version check

Signed-off-by: raver119 <raver119@gmail.com>

* add expected version

Signed-off-by: raver119 <raver119@gmail.com>

* cuda/cublas version check in java

Signed-off-by: raver119 <raver119@gmail.com>

* one more error check

Signed-off-by: raver119 <raver119@gmail.com>

* build fix

Signed-off-by: raver119 <raver119@gmail.com>

* build fix

Signed-off-by: raver119 <raver119@gmail.com>

* build fix

Signed-off-by: raver119 <raver119@gmail.com>

* one more fix

Signed-off-by: raver119 <raver119@gmail.com>

* skip CUDA version check for now

Signed-off-by: raver119 <raver119@gmail.com>

* better wording

Signed-off-by: raver119 <raver119@gmail.com>

* few more tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* few more tweaks

Signed-off-by: raver119 <raver119@gmail.com>
2019-09-04 14:41:08 +03:00
Alex Black 6cc887bee9
Rename flatbuffers DataType to DType (#228)
* Rename flatbuffers DataType enum to DType

Signed-off-by: Alex Black <blacka101@gmail.com>

* Rename flatbuffers DataType enum to DType

Signed-off-by: Alex Black <blacka101@gmail.com>

* Updates for flatbuffers datatype enum renaming

Signed-off-by: Alex Black <blacka101@gmail.com>
2019-09-04 16:36:11 +10:00
raver119 64eaafb4cd remove unwanted noexcept
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-04 08:38:22 +03:00
raver119 7abc574eeb
Snapshot update (#8194)
* fix double consumption of rng on cpu

Signed-off-by: raver119 <raver119@gmail.com>

* Shyrma docs (#222)

* - documenting and profiling matrix_set_diag cuda kernel

Signed-off-by: Yurii <yurii@skymind.io>

* - correct formula of pnorm pooling in cuda 2d/3d kernels
- remove helper matrix_diag which duplicates work of helper matrix_set_diag

Signed-off-by: Yurii <yurii@skymind.io>

* cublasHandle sharing + lock

Signed-off-by: raver119 <raver119@gmail.com>

* cublasHandle sharing + lock

Signed-off-by: raver119 <raver119@gmail.com>

* Documentation from serialization/deserialization in NLP (#221)

* refactoring

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* Javadocs

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* Javadoc fixed

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* Cleanup

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* dedicated lock for getCudaCublasHandle

Signed-off-by: raver119 <raver119@gmail.com>

* Small fixes (#223)

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* ELU DL4J fixes (#224)

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* javadoc (#225)

Signed-off-by: Robert Altena <Rob@Ra-ai.com>

* Small test compilation fix (#226)

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8182 remove spark version suffix (#227)

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* [WIP] Thread safety (#229)

* sync after cublas*gemm

Signed-off-by: raver119 <raver119@gmail.com>

* mutex for CublasHelper

Signed-off-by: raver119 <raver119@gmail.com>

* don't store cublasHandle in LaunchContext, it's per-device anyway

Signed-off-by: raver119 <raver119@gmail.com>

* some printout

Signed-off-by: raver119 <raver119@gmail.com>

* check for field instead

Signed-off-by: raver119 <raver119@gmail.com>

* pew-pew

Signed-off-by: raver119 <raver119@gmail.com>

* don't release ContextBuffers until device changed

Signed-off-by: raver119 <raver119@gmail.com>

* small tweak

Signed-off-by: raver119 <raver119@gmail.com>

* some logging in sgemm

Signed-off-by: raver119 <raver119@gmail.com>

* stream sync

Signed-off-by: raver119 <raver119@gmail.com>

* some more logging

Signed-off-by: raver119 <raver119@gmail.com>

* some more error checks

Signed-off-by: raver119 <raver119@gmail.com>

* one fancy test

Signed-off-by: raver119 <raver119@gmail.com>

* one fancy test

Signed-off-by: raver119 <raver119@gmail.com>

* minor AffinityManager fix

Signed-off-by: raver119 <raver119@gmail.com>

* cudaEvent error logging improvement

Signed-off-by: raver119 <raver119@gmail.com>

* ConstantHelper thread safety

Signed-off-by: raver119 <raver119@gmail.com>

* - minor corrections in ConstantTadHelper

Signed-off-by: Yurii <yurii@skymind.io>

* ConstantShapeHelper thread safety

Signed-off-by: raver119 <raver119@gmail.com>

* ConstantTadHelper.cu updated

Signed-off-by: raver119 <raver119@gmail.com>

* logging off

Signed-off-by: raver119 <raver119@gmail.com>

* logging off

Signed-off-by: raver119 <raver119@gmail.com>
2019-09-03 22:02:02 +03:00
raver119 dddc8a1143
[WIP] Thread safety (#229)
* sync after cublas*gemm

Signed-off-by: raver119 <raver119@gmail.com>

* mutex for CublasHelper

Signed-off-by: raver119 <raver119@gmail.com>

* don't store cublasHandle in LaunchContext, it's per-device anyway

Signed-off-by: raver119 <raver119@gmail.com>

* some printout

Signed-off-by: raver119 <raver119@gmail.com>

* check for field instead

Signed-off-by: raver119 <raver119@gmail.com>

* pew-pew

Signed-off-by: raver119 <raver119@gmail.com>

* don't release ContextBuffers until device changed

Signed-off-by: raver119 <raver119@gmail.com>

* small tweak

Signed-off-by: raver119 <raver119@gmail.com>

* some logging in sgemm

Signed-off-by: raver119 <raver119@gmail.com>

* stream sync

Signed-off-by: raver119 <raver119@gmail.com>

* some more logging

Signed-off-by: raver119 <raver119@gmail.com>

* some more error checks

Signed-off-by: raver119 <raver119@gmail.com>

* one fancy test

Signed-off-by: raver119 <raver119@gmail.com>

* one fancy test

Signed-off-by: raver119 <raver119@gmail.com>

* minor AffinityManager fix

Signed-off-by: raver119 <raver119@gmail.com>

* cudaEvent error logging improvement

Signed-off-by: raver119 <raver119@gmail.com>

* ConstantHelper thread safety

Signed-off-by: raver119 <raver119@gmail.com>

* - minor corrections in ConstantTadHelper

Signed-off-by: Yurii <yurii@skymind.io>

* ConstantShapeHelper thread safety

Signed-off-by: raver119 <raver119@gmail.com>

* ConstantTadHelper.cu updated

Signed-off-by: raver119 <raver119@gmail.com>

* logging off

Signed-off-by: raver119 <raver119@gmail.com>

* logging off

Signed-off-by: raver119 <raver119@gmail.com>
2019-09-03 22:00:38 +03:00
raver119 9d03bb9425 allow atomicAdd for CUDA 10 only
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-03 13:30:16 +03:00
Yurii Shyrma cb4c9377b1 Shyrma docs (#222)
* - documenting and profiling matrix_set_diag cuda kernel

Signed-off-by: Yurii <yurii@skymind.io>

* - correct formula of pnorm pooling in cuda 2d/3d kernels
- remove helper matrix_diag which duplicates work of helper matrix_set_diag

Signed-off-by: Yurii <yurii@skymind.io>
2019-09-02 16:25:58 +03:00
raver119 106524663b fix double consumption of rng on cpu
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-02 15:24:51 +03:00
AlexDBlack 7ded4416cb Merge remote-tracking branch 'fork/master' 2019-09-02 18:52:12 +10:00
raver119 5b8ea3e830 one more tiny cuda fux
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-02 11:49:13 +03:00
raver119 e42c34ca55
[WIP] minor (#218)
* - initial docs commit
- merge* cuda fix

Signed-off-by: raver119 <raver119@gmail.com>

* one more fix

Signed-off-by: raver119 <raver119@gmail.com>

* one more fix

Signed-off-by: raver119 <raver119@gmail.com>
2019-09-02 11:25:48 +03:00
raver119 c34826da4d fixed args
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-01 22:06:01 +03:00
raver119 00cf28f477 get rid of builtin_popcount to please ppc
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-01 19:57:39 +03:00
raver119 3679e55c49 fix bits_hamming_distance for ppc
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-01 19:33:23 +03:00
Yurii Shyrma a35926c6e9 - add parameter alpha to elu and lrelu_bp (#213)
* - add parameter alpha to elu and lrelu_bp

Signed-off-by: Yurii <yurii@skymind.io>

* - forgot to correct header activations.h

Signed-off-by: Yurii <yurii@skymind.io>
2019-08-31 20:57:39 +03:00
raver119 b71c993ded
[WIP] maxpool_bp cuda fix (#212)
* one test for alex

Signed-off-by: raver119 <raver119@gmail.com>

* fix

Signed-off-by: raver119 <raver119@gmail.com>

* get rid of safety offset in cpp

Signed-off-by: raver119 <raver119@gmail.com>

* bfloat16

Signed-off-by: raver119 <raver119@gmail.com>

* minor test rearrangement to fastpath launch

Signed-off-by: raver119 <raver119@gmail.com>

* - atomicAdd/Mul/Div fix for float16/bfloat16 misalignment
- one special test for maxpoolbp java
- safety offset of 8 bytes is back to libnd4j legacy

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-31 20:57:05 +03:00
Yurii Shyrma 00fd50cee2
Shyrma softmax (#209)
* - provide new cuda kernel for softmax

Signed-off-by: Yurii <yurii@skymind.io>

* - further work on cuda kernel for softmax

Signed-off-by: Yurii <yurii@skymind.io>

* - correction cuda kernel for softmax

Signed-off-by: Yurii <yurii@skymind.io>
2019-08-30 20:31:05 +03:00
raver119 bdc3eacafd one small playground test
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-30 20:13:01 +03:00
raver119 70a9ae5068
[WIP] few tweaks (#206)
* scatter empty check

Signed-off-by: raver119 <raver119@gmail.com>

* scatter empty test

Signed-off-by: raver119 <raver119@gmail.com>

* one more test

Signed-off-by: raver119 <raver119@gmail.com>

* two tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* dup tweak

Signed-off-by: raver119 <raver119@gmail.com>

* - put empty checking of indices array immediately prior  helper run

Signed-off-by: Yurii <yurii@skymind.io>

* minor tests fix

Signed-off-by: raver119 <raver119@gmail.com>

* minor tests fix

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-30 16:32:01 +03:00
raver119 1003428a18
[WIP] Int broadcastables (#195)
* Removed invalid resource and fixed tests

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* legacy scalar/pairwise/broadcast int ops

Signed-off-by: raver119 <raver119@gmail.com>

* NDArray int broadcastables

Signed-off-by: raver119 <raver119@gmail.com>

* few more bitwise tests

Signed-off-by: raver119 <raver119@gmail.com>

* java side update

Signed-off-by: raver119 <raver119@gmail.com>

* Argument type changed for shift ops

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* legacy scalar/pairwise/broadcast int ops

Signed-off-by: raver119 <raver119@gmail.com>

* NDArray int broadcastables

Signed-off-by: raver119 <raver119@gmail.com>

* few more bitwise tests

Signed-off-by: raver119 <raver119@gmail.com>

* java side update

Signed-off-by: raver119 <raver119@gmail.com>

* Argument type changed for shift ops

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
2019-08-30 10:12:40 +03:00
Yurii Shyrma 5395d4fbe5 - rewrite broadcast_dynamic_shape and delete corresponding helpers (#194)
Signed-off-by: Yurii <yurii@skymind.io>
2019-08-29 20:38:02 +03:00
Yurii Shyrma 70af8c2afc Shyrma svd (#191)
* - add one additional test for svd

* - provide float argument in eye op to be a type of output array

Signed-off-by: Yurii <yurii@skymind.io>

* - add cuda capability check to mmulHelper

Signed-off-by: Yurii <yurii@skymind.io>

* - make use another method for divice id evaluation

Signed-off-by: Yurii <yurii@skymind.io>

* Eye data type as T argument

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-28 18:27:08 +03:00
raver119 dec296da17
[WIP] bits_hamming_distance (#192)
* bits_hamming_distance op

Signed-off-by: raver119 <raver119@gmail.com>

* bits_hamming_distance cuda

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-28 18:20:44 +03:00
raver119 f4860574d7
[WIP] More fixes (#190)
* Refactored kernels for segment_max/min/sum ops.

* Refactored segment_prod kernels.

* Refactored segment_prod kernels.

* DynamicPartition test

Signed-off-by: raver119 <raver119@gmail.com>

* Addede linear test for dynamic_partition op.

* Refactored test with int datatype.

* some logging

Signed-off-by: raver119 <raver119@gmail.com>

* some logging

Signed-off-by: raver119 <raver119@gmail.com>

* some logging

Signed-off-by: raver119 <raver119@gmail.com>

* dynamicPartition fix

Signed-off-by: raver119 <raver119@gmail.com>

* get rid of some logging

Signed-off-by: raver119 <raver119@gmail.com>

* one more test for dynamic_stitch

Signed-off-by: raver119 <raver119@gmail.com>

* one more test for dynamic_stitch

Signed-off-by: raver119 <raver119@gmail.com>

* empty check for stitch

Signed-off-by: raver119 <raver119@gmail.com>

* minor print changes

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-28 15:38:57 +03:00
raver119 3157ec110c
[WIP] reverse_sequence (#188)
* initial commit

Signed-off-by: raver119 <raver119@gmail.com>

* one more print

Signed-off-by: raver119 <raver119@gmail.com>

* minor fix

Signed-off-by: raver119 <raver119@gmail.com>

* reverse_sequence fix

Signed-off-by: raver119 <raver119@gmail.com>

* confusion_matrix test updated

Signed-off-by: raver119 <raver119@gmail.com>

* minor tweak

Signed-off-by: raver119 <raver119@gmail.com>

* minor tweak

Signed-off-by: raver119 <raver119@gmail.com>

* one more reverse_sequence test

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-28 11:14:22 +03:00
raver119 b472d7d8c8
[WIP] few more fixes (#182)
* one noop test

Signed-off-by: raver119 <raver119@gmail.com>

* skip input validation for no-input ops

Signed-off-by: raver119 <raver119@gmail.com>

* - one more noop empty test
- one more validation before sync

Signed-off-by: raver119 <raver119@gmail.com>

* typo

Signed-off-by: raver119 <raver119@gmail.com>

* one more validation fix

Signed-off-by: raver119 <raver119@gmail.com>

* CUDA empty reductions java side

Signed-off-by: raver119 <raver119@gmail.com>

* one svd test

Signed-off-by: raver119 <raver119@gmail.com>

* Corrected segment_mean helpers and added another test.

* Refactored segment_mean kernels to avoid race_condition.
2019-08-27 21:00:38 +03:00
Yurii Shyrma 2144941313 Shyrma fix2 (#186)
* - further work on layer_norm

Signed-off-by: Yurii <yurii@skymind.io>

* - further work on layer_norm 2

Signed-off-by: Yurii <yurii@skymind.io>

* - correct helpers for svd cuda

Signed-off-by: Yurii <yurii@skymind.io>
2019-08-27 19:57:59 +03:00
shugeo 0849b3c1a4 Shugeo segment fix2 (#185)
* Added test for segment_mean.

* Added another test for segment_mean.

* Fixed segment_* ops helpers for cuda to proper use external data.
2019-08-27 18:25:39 +03:00
raver119 7f0c660d8b
[WIP] HGemm (#181)
* skip string arrays for device validation

Signed-off-by: raver119 <raver119@gmail.com>

* confusion_matrix fix

Signed-off-by: raver119 <raver119@gmail.com>

* exclude cublasHGemm from archs < 530

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-27 15:05:43 +03:00
raver119 0e523490e9
[WIP] confusion (#180)
* skip string arrays for device validation

Signed-off-by: raver119 <raver119@gmail.com>

* confusion_matrix fix

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-27 14:30:37 +03:00
raver119 a49f7c908b
[WIP] More fixes (#178)
* skip string arrays for device validation

Signed-off-by: raver119 <raver119@gmail.com>

* histogram_fixed_width now really supports indexing types

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-27 13:21:01 +03:00
raver119 efbfafe3f7
[WIP] gatherND fix (#176)
* one test for gather_nd

Signed-off-by: raver119 <raver119@gmail.com>

* get rid of old concat tests

Signed-off-by: raver119 <raver119@gmail.com>

* one printf

Signed-off-by: raver119 <raver119@gmail.com>

* one more legacy test removed

Signed-off-by: raver119 <raver119@gmail.com>

* gatherNd launch params fix

Signed-off-by: raver119 <raver119@gmail.com>

* gatherNd launch params fix

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-27 12:35:14 +03:00
raver119 df84bc7255
[WIP] More tweaks (#173)
* CUDA empty reduction

Signed-off-by: raver119 <raver119@gmail.com>

* - listdiff synchronization fix for CUDA
- listdiff test

Signed-off-by: raver119 <raver119@gmail.com>

* - IndexReduce ops now allow INDEXING_TYPES output
- topK op accepts only INDEXING_TYPES as output

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-27 10:37:10 +03:00
raver119 25e5c23eae
[WIP] Error handling (#169)
* CUDA reverse rewrite + couple of tests

Signed-off-by: raver119 <raver119@gmail.com>

* don't throw exception on invalid pointer

Signed-off-by: raver119 <raver119@gmail.com>

* data types validation for fastpath exec mode + 2 tests

Signed-off-by: raver119 <raver119@gmail.com>

* data types validation for fastpath exec mode + 2 tests

Signed-off-by: raver119 <raver119@gmail.com>

* ismax allowed dtypes tweak

Signed-off-by: raver119 <raver119@gmail.com>

* lastErrorCode + lastErrorMessage for native exceptions handling

Signed-off-by: raver119 <raver119@gmail.com>

* exportable ErrorReference

Signed-off-by: raver119 <raver119@gmail.com>

* check error codes in java

Signed-off-by: raver119 <raver119@gmail.com>

* - consume lastErrorCode
- fast_in dtype validation fix

Signed-off-by: raver119 <raver119@gmail.com>

* - sg/cb allowed output type change
- minor logging fix for data type validation

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-26 19:57:51 +03:00
raver119 bb5fc36e5e
[WIP] ops fixes (#168)
* - correct layer_norm

Signed-off-by: Yurii <yurii@skymind.io>

* - further fix of layer norm

Signed-off-by: Yurii <yurii@skymind.io>

* - correct scatter_upd op

Signed-off-by: Yurii <yurii@skymind.io>

* - correct cuda kernel for histogram_fixed_width op

Signed-off-by: Yurii <yurii@skymind.io>

* - delete comments

Signed-off-by: Yurii <yurii@skymind.io>

* enabled one ignored test

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-26 19:37:05 +03:00
Alex Black b417ca21bf
Fix for concat op shape function (empty shapes) (#167)
Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-08-26 23:10:28 +10:00
raver119 ece6a17b11
lup context fix (#164)
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-24 16:57:48 +03:00
raver119 841eeb56c5 get rid of context variable
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-24 16:18:38 +03:00
raver119 b091e972ef
- string NDArray flat serde impl + tests (#163)
- string NDArray equalsTo impl

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-24 14:16:34 +03:00
raver119 f8364997c0
[WIP] maxpool2d_bp fix (#160)
* one test for maxpool2d_bp

Signed-off-by: raver119 <raver119@gmail.com>

* - maxpool2d_bp cuda fix for NaNs
- streamSync after each custom op execution

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-24 09:20:57 +03:00
raver119 f03b0ee78f
[WIP] more fixes (#159)
* Added test for MatrixInverse with double input. Fixed matrixDeterminantKernel.

* Fixed kernels to avoid waste templating.

* Fixed logDeterminant kernel.

* Refactored type check for lup'

* - decrease blockDim value for zeta op

Signed-off-by: Yurii <yurii@skymind.io>

* Added print for compound matrix with CUDA.

* Refactored upper matrix invertion kernels.

* - provide move constructor and move assignment operator for OpArgsHoder class

Signed-off-by: Yurii <yurii@skymind.io>

* Refactored usage of launch context.

* - add test for mergemax

Signed-off-by: Yurii <yurii@skymind.io>

* get rid of AveragingArrayProxy

Signed-off-by: raver119 <raver119@gmail.com>

* Refactoring of LUP inversion.

* Added prints for invertion.

* - add OpArgsHolder copy constructor and assignment operator

Signed-off-by: Yurii <yurii@skymind.io>

* Added test for lower inversion

* - fix bug in upsampling2d/3d_bp op

Signed-off-by: Yurii <yurii@skymind.io>

* Added expensive printfs to kernel.

* Refactored expensive kernel prints.

* Refactored expensive printfs

* - remove nullify

Signed-off-by: Yurii <yurii@skymind.io>

* Eliminated waste prints with tests.

* upsampling2d_bp test

Signed-off-by: raver119 <raver119@gmail.com>

* test updated

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-23 19:20:50 +03:00
raver119 99cdf6d42b - cpu isMax fix for multidim case + test
- INDArray.wasClosed() fix for empty array edge case

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-23 18:44:37 +03:00
raver119 fb8de5006f - concat empty scalar fix
- couple of tests for empty scalar concat

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-23 13:16:50 +03:00
raver119 729dc5e879
[WIP] size etc (#155)
* one test for size

Signed-off-by: raver119 <raver119@gmail.com>

* - few tests for size op
- size/rank/size_at ops now use p instead of assign

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-23 12:31:12 +03:00
raver119 243bf866c4
[WIP] Few fixes (#153)
* throw exception if op execution failed

Signed-off-by: raver119 <raver119@gmail.com>

* expected for test

Signed-off-by: raver119 <raver119@gmail.com>

* one more ismax test

Signed-off-by: raver119 <raver119@gmail.com>

* ismax view fix

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-23 09:00:10 +03:00
raver119 930b49e87f
[WIP] DeviceLocalNDArray updates (#149)
* ContextBuffers are released upon device change

Signed-off-by: raver119 <raver119@gmail.com>

* DeviceLocalNDArray updates + tests

Signed-off-by: raver119 <raver119@gmail.com>

* special array for delayed mode

Signed-off-by: raver119 <raver119@gmail.com>

* additional detach()

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-22 20:01:29 +03:00
Alex Black e855e47f73
More fixes (#148)
* Small batch norm fix (cuda/no-mkldnn)

Signed-off-by: Alex Black <blacka101@gmail.com>

* Dropout fix for RnnOutputLayer

Signed-off-by: Alex Black <blacka101@gmail.com>

* Allow block size < 2 in batch_to_space_nd and space_to_batch_nd for import, in spite of what TF docs say

Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-08-22 19:55:27 +10:00
raver119 eea3062ccf
[WIP] stb/bts nd (#144)
* - start working on space_to_batch_nd

Signed-off-by: Yurii <yurii@skymind.io>

* - provide cpu helper for space_to_batch_nd op

Signed-off-by: Yurii <yurii@skymind.io>

* few typos fixed

Signed-off-by: raver119 <raver119@gmail.com>

* - add tests for space_to_batch and correct bugs

Signed-off-by: Yurii <yurii@skymind.io>

* - write cuda kernel for space_to_batch op

Signed-off-by: Yurii <yurii@skymind.io>

* - add order argument to shape::index2coords method in convolution cuda ops

Signed-off-by: Yurii <yurii@skymind.io>

* - restore some previous code

Signed-off-by: Yurii <yurii@skymind.io>

* old col2im kernel activated

Signed-off-by: raver119 <raver119@gmail.com>

* - change coords calculation in col2im kernel

Signed-off-by: Yurii <yurii@skymind.io>

* - restore old col2im kernel

Signed-off-by: Yurii <yurii@skymind.io>

* - add custom op for batch_to_space

Signed-off-by: Yurii <yurii@skymind.io>

* - provide cpu version for batch_to_space_nd op

Signed-off-by: Yurii <yurii@skymind.io>

* - provide cuda kernel for batch_to_space_nd op

Signed-off-by: Yurii <yurii@skymind.io>
2019-08-21 21:11:46 +03:00
raver119 e604ffe0d2
[WIP] repeat op (#143)
* - write new repeat helper (cpu)

Signed-off-by: Yurii <yurii@skymind.io>

* - update NDArray::cpu

Signed-off-by: Yurii <yurii@skymind.io>

* - update NDArray::repeat cuda

Signed-off-by: Yurii <yurii@skymind.io>
2019-08-21 21:10:29 +03:00
raver119 3cf72e5e30
[WIP] More fixes (#142)
* atomicAdd cc 70+

Signed-off-by: raver119 <raver119@gmail.com>

* additional 8 bytes alocation

Signed-off-by: raver119 <raver119@gmail.com>

* missed include 2019

Signed-off-by: raver119 <raver119@gmail.com>

* less spam

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-21 20:18:29 +03:00
raver119 d9ab299759
[WIP] Minor fixes (#140)
* - Tile java shape fn removed
- Tile 0 validation added
- scatter_upd test

Signed-off-by: raver119 <raver119@gmail.com>

* additional tile validation

Signed-off-by: raver119 <raver119@gmail.com>

* - provide vector case in cuda scatter op

Signed-off-by: Yurii <yurii@skymind.io>

* cpu ismax view fix

Signed-off-by: raver119 <raver119@gmail.com>

* exp

Signed-off-by: raver119 <raver119@gmail.com>

* cuda ismax fix

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-21 15:05:47 +03:00
raver119 77805cb7fa
[WIP] cpu ismax fix (#137)
* cpu ismax fix

Signed-off-by: raver119 <raver119@gmail.com>

* bunch of smaller scalar tests

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-21 10:12:11 +03:00
raver119 269d508ba5
[WIP] cross-device migrations (#134)
* two more tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* CUDA device afinity tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* minor tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* prepareAction/registerAction for CustomOps

Signed-off-by: raver119 <raver119@gmail.com>

* lazy allocate host bufer before relocation

Signed-off-by: raver119 <raver119@gmail.com>

* one special test for migration in cpp

Signed-off-by: raver119 <raver119@gmail.com>

* tests update for msvc

Signed-off-by: raver119 <raver119@gmail.com>

* logging

Signed-off-by: raver119 <raver119@gmail.com>

* stick to old col2im impl

Signed-off-by: raver119 <raver119@gmail.com>

* cudaStreams reorganization

Signed-off-by: raver119 <raver119@gmail.com>

* buffer size fix

Signed-off-by: raver119 <raver119@gmail.com>

* c++ data migration

Signed-off-by: raver119 <raver119@gmail.com>

* fix CropAndResize test

Signed-off-by: raver119 <raver119@gmail.com>

* - minor improvment

Signed-off-by: Yurii <yurii@skymind.io>
2019-08-20 18:52:41 +03:00
raver119 23c8738d4a
syncthreads (#136)
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-20 18:28:43 +03:00
AlexDBlack 01cb57041a Merge
Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-08-19 18:46:47 +10:00
raver119 aceb915557
[WIP] tests fixes (#130)
* no openmp for ClipByGlobalNorm

Signed-off-by: raver119 <raver119@gmail.com>

* one more bfloat16 rng test

Signed-off-by: raver119 <raver119@gmail.com>

* assertion fix

Signed-off-by: raver119 <raver119@gmail.com>

* - legacy IsMax gone
- linear IsMax gets shapeInfo argument

Signed-off-by: raver119 <raver119@gmail.com>

* get rid of legacy IsMax tests

Signed-off-by: raver119 <raver119@gmail.com>

* IsMax is custom op now

Signed-off-by: raver119 <raver119@gmail.com>

* more blocks for ismax

Signed-off-by: raver119 <raver119@gmail.com>

* one more test

Signed-off-by: raver119 <raver119@gmail.com>

*  - sqrt test
 - some legacy code removed from CudaExecutioner
 - Transforms.asin tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* - TransformFloat fix

Signed-off-by: raver119 <raver119@gmail.com>

* - ismax fix
- SpaceToBatchND/BatchToSpaceND wrappers
- couple of legacy tests removed

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-19 11:33:15 +03:00