Commit Graph

228 Commits (22141759345fbb464e19be943c54f2000a3978b0)

Author SHA1 Message Date
Abdelrauf 69d91e272a
- new implementations for Index Reductions (#421)
* - new implementations for Index Reductions
- small fix in the legacy reduction
- disabled index reduction bench tests inside Playground

Signed-off-by: Abdelrauf <rauf@konduit.ai>

* Allow LIBND4J_TYPES

Signed-off-by: Abdelrauf <rauf@konduit.ai>

* index reduction stuff split into bunch of units

* meh

* IMax switched to new impl

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* minor fix + test

* minor fix

* index range fix

Signed-off-by: Abdelrauf <rauf@konduit.ai>

* noop on empty outputs

* minor fix

* minor fix

Signed-off-by: Abdelrauf <rauf@konduit.ai>

* ArgMax replaces IMax

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* argmax/argmin/argamax/argamin shape functions updated

* ArgAmax/ArgAmin/ArgMin replaces IAMax/IAMin/IMin

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* argmax/argmin/argamax/argamin CUDA

* IMax replaced in dl4j

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* Codegen output

* imports fixed

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* fix compilation issue

Signed-off-by: Abdelrauf <rauf@konduit.ai>

* Auto-generate compilation units

Signed-off-by: Abdelrauf <rauf@konduit.ai>

* Should fix NDArray refactored function calls in indexReductions.cu

Signed-off-by: Abdelrauf <rauf@konduit.ai>

Co-authored-by: raver119@gmail.com <raver119@gmail.com>
Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
2020-05-14 13:41:55 +03:00
raver119 c396fcb960
More pre-release fixes (#456)
* - numPrefixBlocks fix for threshold_encoding
- temparrays pointers fixed

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* auto configuration of memory workspace for gradients sharing

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* limit sparse encoding message size

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* one more workspace test

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* one more CUDA-specific test

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* one more CUDA-specific workspace test

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* one more CUDA-specific workspace test

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* one more CUDA-specific workspace test

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* add separate host/device reset for circular workspace mode

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* new PW builder method for encoder memory amount

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* "inplace" execution for threshold encoding

Signed-off-by: raver119@gmail.com <raver119@gmail.com>
2020-05-13 08:12:07 +03:00
raver119 7d6f628b88
stop using static library for tests (#450)
Signed-off-by: raver119 <raver119@gmail.com>
2020-05-12 07:48:29 +03:00
Yurii Shyrma 76f3553679
Shyrma merge max ind (#443)
* - provide correct possible output types in mergeMaxIndex op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - cleaning up the unneeded backprop arg in reverse_bp op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - improve clipByNorm both ff and bp

Signed-off-by: Yurii <iuriish@yahoo.com>

* - implementation and testing clipByAvgNorm_bp op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - pass biases in any way in dnnl lstm op, they are zeros when user doesn't provide them to us

Signed-off-by: Yurii <iuriish@yahoo.com>

* - start working on mkldnn concat op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further work on mkldnn concat

Signed-off-by: Yurii <iuriish@yahoo.com>

* missing declaration fix

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* - polishing mkl ops

Signed-off-by: Yurii <iuriish@yahoo.com>

* - testing and fixing bugs in mkl concat op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - fix linkage error for windows cuda build

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further conflicts resolving with master

Signed-off-by: Yurii <iuriish@yahoo.com>

* - fix format tags in mkldnn matmul op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - provide additional type cast in clip.cu

Signed-off-by: Yurii <iuriish@yahoo.com>

* - finally bug in mkldnn tanh_bp was caught

Co-authored-by: raver119@gmail.com <raver119@gmail.com>
2020-05-12 07:47:09 +03:00
raver119 f0adb6f788
[WIP] minor fixes (#447)
* couple of tests disabled

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* few syncs removed, some logging added

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* some logging added

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* some logging added

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* fix min num_threads

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* fixed wrong release function for scalarPointer

Signed-off-by: raver119@gmail.com <raver119@gmail.com>
2020-05-11 15:41:43 +03:00
raver119 320924278d
Legacy API changes (#441)
* initial commit

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* another initial commit

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* another initial commit

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* one more initial commit

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* next step

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* next step

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* next step

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* next step

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* Refactored buffer() and shapeInfo() methods usage with NDArray class.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adopt Graph class methods to use const shapes.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adopt choose op to use constant shapes.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adopt where op shape method to use constant shapes.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adopt lstsq op to use constant empty shapes.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adopt matrix_diag_part op shape routine to use constant shapes.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adopt determinant ops to use constant shapes.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adopt mean_pairwssqerr_loss ops to use constant shapes.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adopt ops shape methods.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adopt shape methods for loss ops.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adopt log_loss op shape method.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adopt shape methods for ops.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adopt dilation2d ops shape methods.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adopted deconv2d ops shape methods.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adopted dynamicRNN op shape method.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adopted shape methods for ops.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adopted shape methods for lstm layer ops.

Signed-off-by: shugeo <sgazeos@gmail.com>

* few updates

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* first cuda tweak

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* Adopt constant shapes for sconv2d ops.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adopt constant shapes for gru ops.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adopt constant shapes with shape methods for segment ops and so on.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adopted constant shapes with unsorted_segment_* ops.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adopted constant shapes with gamma op shape method.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adopted shape methods of reduce_stddev ops.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adopted shape methods for reduce_* ops.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adopt shape method for squeeze op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adopt strided_slice shape method.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored concat op shape method to adopt constant shapes.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adopted shape method for mirror_pad op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adopted split op shape method.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adopted tile ops shape methods.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added const cast for mkldnn routines handles.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored logSoftMaxForVector_ routine to conform with proper data and shape pointer casts.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Cosmetic changes to proper usage of constant pointers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored a couple shape comparators for strides and addBias helpers to proper use data pointers with inplace option.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored depthToSpace helpers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored histogram helpers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored im2col helpers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored gather and gatherND helpers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed buffer usage on percentile helper.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed gather shape with helpers and range buffer usage.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed buffer usage with space to depth helpers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed buffer usage and constant shapes.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed buffer usage with LUP decomposition>

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored onehot_ helper.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored pad and prefix to use constant shapes.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactoed softmax helpers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed space to batch helpers to use buffers properly.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed stack and split helpers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed buffer usage with sparse to dense helpers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed buffer usage with mindistance_ helpers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed buffer usage with tile helper.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed constant shape usage.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed constant shape usage with legacy pairwise bool ops.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored a couple of methods to adopt constant shape usage.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed broadcasting with constant shape."

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed const usage with inplace reverse and constant shapes with legacy reduction.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored legacy ops with const shapes.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored sort to adopt constant shapes.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Corrected sort for constant shape usage.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed constant shape usage with special methods.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored Context to conform with constant shape usage.

Signed-off-by: shugeo <sgazeos@gmail.com>

* CUDA broadcasting headers

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* pairwise/indexreduce/random headers

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* Refactored native ops to adopt constant shapes.

Signed-off-by: shugeo <sgazeos@gmail.com>

* legacy reduce3/scalar headers

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* Corrected pullRow signature and tests.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Corrected routines to proper use of constant shapes.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored tests to use constant shapes properly.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored legacy ops tests to use constant shapes properly.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored buffer usage with NDArray tests.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed native ops tests.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed special concat routine.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed buffer usage with test.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed buffer usage with a test.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored TAD.h and tests.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored calcStrides* routines to use constant shapes.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed miscelaneous errors with constant shapes.

Signed-off-by: shugeo <sgazeos@gmail.com>

* NativeOps const changes

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* Corrected definitions for declared functions.

Signed-off-by: shugeo <sgazeos@gmail.com>

* NativeOps const changes

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* few more const changes

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* Fixed const shapes with shape routines.

Signed-off-by: shugeo <sgazeos@gmail.com>

* few more const changes

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* Fixed shape method for broadcastable case.

Signed-off-by: shugeo <sgazeos@gmail.com>

* few more const changes

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* xw_plus_b BP shape fn restored

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* Fixed signatures with broadcasting.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Repaired backprops shape methods for a set of operations.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored broadcast bool for cuda.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored methods for 3 args with const qualifier.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed a couple of kernel signatures for broadcasting.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed kernels signatures for const buffers and shapes.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored pairwise methods to persistent buffers and shapes usage.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adopt const to buffers and shapes with kernels.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adopt const to buffers and shapes with scalar kernels.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored indexreduce kernels signatures to use const buffers and shapes.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored pairwise kernels to adopt cons shapes and buffers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored pairwise bool kernels to adopt cons shapes and buffers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored random special ops to conform with const shapes and buffers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored native ops to conform with const shapes and buffers under cuda platform.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Cosmetical changes only.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed const shapes and buffers error.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Corrected start pos routine.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored methods to conform with const shapes and buffers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored helpers to use proper methods instead.

Signed-off-by: shugeo <sgazeos@gmail.com>

* bunch of changes

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* next bunch of changes

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* next bunch of changes

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* Fixed execScalar declaration.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed execScalar declaration.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Corrected const shape cases with sort and so on.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed const shapes for sort.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored kernel declarations to adopt const shapes.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed kernels declarations to adopt const shapes.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Corrected kernel declarations to adopt const shapes and buffers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed kernels declarations to adopt const shapes.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed segment helpers kernels declarations and so on to adopt const shapes.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed const shape usage with segment and solve helpers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed kernel declaration with adjustWeight helper.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed cuda implementations for constant shape helpers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adopted const shape usage with kernels.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adopted top_k kernels to use const shapes and buffers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Corrected kernels declarations to adopt const shapes with helpers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored NDArray definitions to adopt const shapes and buffers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed const shapes with image suppression helpers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Slight improvement with buffers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored buffer usage.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored buffer usage with tests.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed const shape usage with definitions.

Signed-off-by: shugeo <sgazeos@gmail.com>

* minor updates on cpu side

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* Refactored const shape usage with ConstantDescritor and native ops with cuda platform.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored tear and tile kernels to adopt with const shapes.

Signed-off-by: shugeo <sgazeos@gmail.com>

* softmax_loop fix

Signed-off-by: raver119 <raver119@gmail.com>

* update missing signature

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* softmax again

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* few more missing consts

Signed-off-by: raver119 <raver119@gmail.com>

* new methods updated

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

Co-authored-by: shugeo <sgazeos@gmail.com>
2020-05-09 08:06:14 +03:00
raver119 0613485654
compression ops (#436)
* Added declarations for decode/encode_bitmap ops.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added implementation for bitmap encoding/decoding ops.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added helpers for encode/decode bitmap ops.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored encodingBitmap helper.

Signed-off-by: shugeo <sgazeos@gmail.com>

* threshold encode/decode skeleton

* helper skeleton

* minor import fix

* encoder shape fn & op impl

* thresholdEncode cpu impl

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* thresholdDecode cpu impl

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* Only cosmetical changes.

Signed-off-by: shugeo <sgazeos@gmail.com>

* placeholder

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* Added cuda implementation for bitmap decode helper.

Signed-off-by: shugeo <sgazeos@gmail.com>

* cuda thresholdEstimate

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* cuda thresholdDecode

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* next step

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* - nano cmakelist update (get rid of Clion section)
- fixed forgotten throw in AtomicTests

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* thesholdEncode cuda impl

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* Added tests for bitmap encoding/decoding ops.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed tests for encode/decode bitmaps.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored decode/encode helpers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed crashes with bitmap decode/encode helpers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* bitmap encode/decode CPU

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* bitmap encode/decode CUDA

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* C API removed for threshold/bitmap encode

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* EncodeBitmap/DecodeBitmap Java side

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* EncodeThreshold/DecodeThreshold Java side

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* EncodeThreshold/DecodeThreshold Java side

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* few more tests for threshold encoding

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* minor test tweak

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* two special tests

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* encodeBitmap CPU fix

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* parallel_long/parallel_double proper spans fix

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* encodeThreshold CUDA fix

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* nano fix

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* grid tweaks

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* RTX adaptation for thresholdEncode

Signed-off-by: raver119 <raver119@gmail.com>

* don't allow threshold encoding for length < 2

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* get rid of NDArrayCompressor in EncodingHandler

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* one more minor update of EncodingHandler

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* one more minor tweak of EncodingHandler

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* - matmul allows integer data types use
- EncodingHandler boundary default value
- few tests for integer matmul

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* minor fix of CUDA bitmap encode

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* boundary changed to integer everywhere

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* boundary changed to integer everywhere

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* re-enable CUDA deallocator

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* threshold encoder fix for systems without omp

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* - encode_threshold now requires non-negative boundary
- minor tweak in EncodingHandler

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* restore parallelism in decode_bitmap

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* fall back to omp for encode_bitmap cpu

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* single time casts

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* - additional test for encode_threshold
- sync buffers to device before calling for shape function

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

Co-authored-by: shugeo <sgazeos@gmail.com>
2020-05-08 20:59:39 +03:00
raver119 c9d1454743
MKLDNN tweaks (#415)
* one simple test

Signed-off-by: raver119 <raver119@gmail.com>

* fix

Signed-off-by: raver119 <raver119@gmail.com>

* hmmmm...

Signed-off-by: raver119 <raver119@gmail.com>

* mkl matmul skip tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* minor fix for MemoryTracker

* long shapes in matmul

* - 2 new tests for mkldnn tanh
- mkldnn isn't used for scalar tanh
2020-04-27 17:37:53 +03:00
shugeo fc3f5d4ffb
Shugeo exponential distribution infinities fix (#403)
* Fixed bound problem with Exponential distribution implementation.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added test for Exponential distribution to avoid infinities.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added a test for exponential distribution with 1M elements.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Cosmetical changes only and tests.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Modified test and implementation for exponential_distribution op.

Signed-off-by: shugeo <sgazeos@gmail.com>

Co-authored-by: raver119 <raver119@gmail.com>
2020-04-22 12:12:00 +03:00
shugeo a5db0e33be
Shugeo segment fix4 (#385)
* Added test and fixed error message for unsorted_segment_sqrt_n op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed error message for unsorted_segment_* ops when 1 segment is given.

Signed-off-by: shugeo <sgazeos@gmail.com>
2020-04-20 09:04:35 +03:00
Oleh 3d15706ffa
Lin_space operation improve (#373)
* libnd4j update linspace op

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j #8513 update lin_space op, tests added

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* - minor linspace tweaks (num_elements now iArg)
- java linspace updates
- couple of additional tests for linspace

Signed-off-by: raver119 <raver119@gmail.com>

* roll back timeout change

Signed-off-by: raver119 <raver119@gmail.com>

Co-authored-by: raver119 <raver119@gmail.com>
2020-04-16 14:53:56 +03:00
Yurii Shyrma 4247718f61
Shyrma gru bp (#377)
* - update gru ff op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - implementation and testing gru_bp op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - neglect dependencies between dLdh/dLdhLast/dLdcLast in lstmLayer backprop

Signed-off-by: Yurii <iuriish@yahoo.com>
2020-04-16 08:09:04 +03:00
Yurii Shyrma 23e4aa99ad
Shyrma lstm layer bp (#370)
* - start working on bp for lstm

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further working on bp for lstmLayer

Signed-off-by: Yurii <iuriish@yahoo.com>

* - minor change

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further work on bp for lstmLayer 2

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further work on bp for lstmLayer 3

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further work on bp for lstmLayer 4

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further work on bp for lstmLayer 5

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further work on bp for lstmLayer 6

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further work on bp for lstmLayer 7

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further work on bp for lstmLayer 8

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further work on bp for lstmLayer 9

Signed-off-by: Yurii <iuriish@yahoo.com>

* - provide lstmLayerCell and lstmLayerCellBp as separate CUSTOM_OPs

Signed-off-by: Yurii <iuriish@yahoo.com>

* - testing and fixing lstmLayerCellBp helper

Signed-off-by: Yurii <iuriish@yahoo.com>

* - implement lstmLayerCellBp as separate op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - implement lstmLayerBp as separate op (not tested)

Signed-off-by: Yurii <iuriish@yahoo.com>

* - fixing calculations of dLdWp and dLdb in lstmLayerCellBp

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further work on bp for lstmLayer 10

Signed-off-by: Yurii <iuriish@yahoo.com>

* - fixing typo in lstmLayerTimeLoop

Signed-off-by: Yurii <iuriish@yahoo.com>

* - forgot to perform clipping of c array and calculate corresponding derivative in lstmLayerCellBp

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further work on bp for lstmLayer 10

Signed-off-by: Yurii <iuriish@yahoo.com>

* - testing and fixing bugs in lstmLayer_bp op 1

Signed-off-by: Yurii <iuriish@yahoo.com>

* - testing and fixing bugs in lstmLayer_bp op 2

Signed-off-by: Yurii <iuriish@yahoo.com>

* - turn off heavy tests for cuda for lstmLayer_bp op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - forgot to nullify gradients at eliminated time steps (when sequnce length array is present )

Signed-off-by: Yurii <iuriish@yahoo.com>
2020-04-13 13:21:51 +03:00
raver119 3e2dbc65dd
MatMul for gemm/gemv calls (#365)
* libnd4j added optional alpha and beta support to matmul

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j typos fixes

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j add optional alpha and beta to matmul_bp

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j one more typo fix

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j added optional alpha and beta to mkl implementation

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* MatMul alpha/beta on java side

Signed-off-by: raver119 <raver119@gmail.com>

* alpha/beta fix in libnd4j

Signed-off-by: raver119 <raver119@gmail.com>

* alpha/beta fix in matmul_bp

Signed-off-by: raver119 <raver119@gmail.com>

* restored view validation

Signed-off-by: raver119 <raver119@gmail.com>

* gemv/gemm now use MatMul op

Signed-off-by: raver119 <raver119@gmail.com>

* few tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* additional INDArray.mmul signature

Signed-off-by: raver119 <raver119@gmail.com>

* make C order default for INDArray.mmul, unless both A/B have F order

Signed-off-by: raver119 <raver119@gmail.com>

* Nd4j.gemm validation fix

Signed-off-by: raver119 <raver119@gmail.com>

* disable mkldnn matmul for xxf with beta != 0 case

Signed-off-by: raver119 <raver119@gmail.com>

* SimpleRnn workspace fix + timeouts

Signed-off-by: Alex Black <blacka101@gmail.com>

* two more tests + minor fix in matmul platform check

Signed-off-by: raver119 <raver119@gmail.com>

* Flaky test fixes

Signed-off-by: Alex Black <blacka101@gmail.com>

* propagate testresources profile

Signed-off-by: raver119 <raver119@gmail.com>

* Resources fix + flaky test fix

Signed-off-by: Alex Black <blacka101@gmail.com>

Co-authored-by: Oleg <oleg.semeniv@gmail.com>
Co-authored-by: Alex Black <blacka101@gmail.com>
2020-04-10 17:57:02 +03:00
raver119 04b2b4f9b6
Few fixes (#361)
* INDArray.close() fix for CPU

Signed-off-by: raver119 <raver119@gmail.com>

* - BroadcastableBoolOp introduced
- ConfusionMatrix now supports explicit DataType argument

Signed-off-by: raver119 <raver119@gmail.com>

* confusion_matrix: dtype is still optional

Signed-off-by: raver119 <raver119@gmail.com>

* disable bert tests in debug builds

Signed-off-by: raver119 <raver119@gmail.com>

* Affinity fix

Signed-off-by: raver119 <raver119@gmail.com>

* minor workspace tweak to allow close() on scoped out borrowed workspace

Signed-off-by: raver119 <raver119@gmail.com>
2020-04-06 21:01:59 +03:00
Yurii Shyrma 48102c61d0
- correct reshape op for empty shapes (#354)
* - correct reshape op for empty shape in case of -1 at the end

Signed-off-by: Yurii <iuriish@yahoo.com>

* Fix test + new reshape op constructor

Signed-off-by: Alex Black <blacka101@gmail.com>

Co-authored-by: Alex Black <blacka101@gmail.com>
2020-04-01 15:13:34 +11:00
Oleh 1d004b542a
xw_plus_b mkldnn implementation (#247)
* libnd4j first step of mkldnn for xw_plus_b and test of aurora crash in imageHelper

* libnd4j sync folders with master

* libnd4j merge master, raw implementation of xw_plus_b on mkldnn, clean up, need testing and adding checks for corresponded input shapes

* libnd4j corrections and checks added to xw_plus_b mkl

* libnd4j corrected dataType description based on mkl operation description, need more investigation

* libnd4j fixe xw_blus_b mkl implementation, need testing

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j two unit tests added

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j fixed check input dimensions bug

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libndj4 one more test added to cover different order handling

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j added optional int arg support to define weights format, if arg == 1, mkldnn (do not need transpose in mkldnn implementation), else mmul weights format, corrected check points, added unit test

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j merge master

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j some improvements to avoid NDArray transpose in xw_plus_b operation

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j fixed issues connected with weights rank, also added support of one case based on tf (for mkldnn, cpu, cuda), test case added

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j added proper handling of empty inputs (all implementations)

* libnd4j fixed compilation error

* libnd4j several more corrections after conflict solve and fixed typos

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j removed unsupported data types

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j merge master and fixed issues

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j added propagation implementation for xw_plus_b, fixed issue connected with mkl weights data format, avoided data copy in transpose mode, test cases added, manually tested with gradCheck

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j one minor fix of double operation declaration

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j code clean up

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j minor tests fixes

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j fixed build problem, integrate helpers changes

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

Co-authored-by: raver119 <raver119@gmail.com>
2020-03-31 13:03:10 +03:00
Yurii Shyrma 29e61579c1
Shyrma reshape empty (#338)
* - start working on reshape op which operates with empty shapes

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct reshaping for empty arrays

Signed-off-by: Yurii <iuriish@yahoo.com>

* - remove unnecessary check in Loopkind

Signed-off-by: Yurii <iuriish@yahoo.com>
2020-03-31 07:41:16 +03:00
Oleh e8cbf5255a
Backpropagation implementation of mergemax, mergeadd and mergeavg ops (#343)
* libnd4j: first step of merge_max implementation

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j fixed typos

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j some corrections for mergeMaxBp

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j some minor corrections

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j test added for mergemax_bp

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j fixed several problems tests added, check with gradCheck

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j remove duplicated tests

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j split implementation of transforms ops into separate file implementation

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j code clean up, added mergeavg_bp and mergeadd_bp, need testing

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j merge master, fixed typos and added tests

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j some minor fixes

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j added helper for mergeAddBp operation, this permits to skip nullify

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j file renaming changes and cuda some corrections, need some additional corrections

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j some additional corrections for merge ops

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j more corrections per request for cuda more proper usage

Signed-off-by: Oleg <oleg.semeniv@gmail.com>
2020-03-25 08:40:30 +03:00
Oleh 69c92ca5ae
Learning updaters for gradient (#335)
* libnd4j raw implementation of sgd upader

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j some corrections and simple test added

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j some corrections after discussion

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j integrate applyScalar

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j raw implementation of rmsPropUpdater on cpu

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j fix operations declaration

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j rmsPropUpdater added, test cases for sgd, etc

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j fixed several typos

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j some fixes and improvements for rmsPropUpdater based on Java tests

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j fixed cuda implementation, update tests and corrected behavior according java tests

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j adaGrad updater added

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j one minor fix for ada grad

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j several more fixes for ada_grad

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j nesterovs updater added

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j fixed nesterovs updater behavior, several typos and rename file

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j one minor typo

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j ada max updater added

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j fixed several typos in adaMax updater

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j fixed several typos in adaMaxUpdater

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j several fixes for adaMax, added Adam Updater

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j adaDeltaUpdater added, minor fixes for adamUpdater

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j several fixes for adaDeltaUpdater

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j nadamUpdater added

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j one more correction for nadam updater

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j several fixes for nadam updater and added amsGradUpdater

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j several typos fixed in amsGradUpdater

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j some corrections and added f order support rmsProp updater

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j added support of f order for all updaters and modify tests for testing in place

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j fixed issues for updates when not in place mode used, added tests for f order

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j added input shape checks

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j some corrections for different cases handling

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j some code clean up and optimize per request

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j updaters refactoring after review

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* SgdUpdater wrapper

Signed-off-by: raver119 <raver119@gmail.com>

* first test

Signed-off-by: raver119 <raver119@gmail.com>

* RmsPropUpdater added

Signed-off-by: raver119 <raver119@gmail.com>

* NadamUpdater + NesterovsUpdater

Signed-off-by: raver119 <raver119@gmail.com>

* AmsGradUpdater

Signed-off-by: raver119 <raver119@gmail.com>

* AdamUpdater added

Signed-off-by: raver119 <raver119@gmail.com>

* AdaGradUpdater + AdaDeltaUpdater + AdaMaxUpdater

Signed-off-by: raver119 <raver119@gmail.com>

* AdaGradUpdater test added

Signed-off-by: raver119 <raver119@gmail.com>

* libnd4j remove input parameters parsing through NDArray, split implementation of helpers to separate files, added some rename, etc

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j next step to split operations implementation into separate files

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j merge master and minor corrections

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j revert some changes of split implementation

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j forgot to add header file

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* public default constructors

Signed-off-by: raver119 <raver119@gmail.com>

* ImportClassMapping updated

Signed-off-by: raver119 <raver119@gmail.com>

Co-authored-by: raver119 <raver119@gmail.com>
2020-03-23 07:28:31 +03:00
Yurii Shyrma e700b59f80
Shyrma weights format (#329)
* - start to introduce additional weights formats into conv2d ops

Signed-off-by: Yurii <iuriish@yahoo.com>

* - provide weights format variety in backprop conv2d and deconv2d ops, testing and fixing bugs

Signed-off-by: Yurii <iuriish@yahoo.com>

* - forgot to recover kernels sizes in deconv2d_bp test

Signed-off-by: Yurii <iuriish@yahoo.com>

* - built in weights format in depthwise conv 2d op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - provide new weights formats in mkl dnn conv ops

Signed-off-by: Yurii <iuriish@yahoo.com>

* - provide new weights formats in cuda conv helpers

Signed-off-by: Yurii <iuriish@yahoo.com>

* - working with new weights format in cudnn conv api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - take into account order of arrays in cudnn tensor descriptions

Signed-off-by: Yurii <iuriish@yahoo.com>

* - provide new weights formats in cpu conv3d (ff/bp)

Signed-off-by: Yurii <iuriish@yahoo.com>

* - provide new weights formats in cpu deconv3d (ff/bp)

Signed-off-by: Yurii <iuriish@yahoo.com>

* - provide new weights formats in conv3d ops (ff/bp) based on mkl api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - provide new weights formats in conv3d ops (ff/bp) based on cudnn api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - resolve conflicts 2

Signed-off-by: Yurii <iuriish@yahoo.com>

Co-authored-by: raver119 <raver119@gmail.com>
2020-03-20 12:11:27 +03:00
shugeo 5dae4069cf
Shugeo random expo fix2 (#295)
* Refactored exponential distribution implementation.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored exponential distribution and tests.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored test to new result sets.

Signed-off-by: shugeo <sgazeos@gmail.com>
2020-03-20 11:33:20 +03:00
raver119 7a2ac800dd
Nullify (#304)
* initial commit

Signed-off-by: raver119 <raver119@gmail.com>

* bunch of tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* hamming distance nullification

Signed-off-by: raver119 <raver119@gmail.com>

* Add output array value assignment for testing/debugging

Signed-off-by: Alex Black <blacka101@gmail.com>

* don't assign empty arrays

Signed-off-by: raver119 <raver119@gmail.com>

* conv2d/conv3d/depthwise2d nullified

Signed-off-by: raver119 <raver119@gmail.com>

* conv2d/conv3d/depthwise2d nullified

Signed-off-by: raver119 <raver119@gmail.com>

* conv2d/conv3d/depthwise2d nullified

Signed-off-by: raver119 <raver119@gmail.com>

* few more fixes

Signed-off-by: raver119 <raver119@gmail.com>

* im2col

Signed-off-by: raver119 <raver119@gmail.com>

* pooling?

Signed-off-by: raver119 <raver119@gmail.com>

* more nullified

Signed-off-by: raver119 <raver119@gmail.com>

* ismax nullified

Signed-off-by: raver119 <raver119@gmail.com>

* rollback ismax nullification

Signed-off-by: raver119 <raver119@gmail.com>

* synchronized cublas handle use on per-device basis

Signed-off-by: raver119 <raver119@gmail.com>

* hiding method from jcpp

Signed-off-by: raver119 <raver119@gmail.com>

* get rid of test assigns in DeclarableOp

Signed-off-by: raver119 <raver119@gmail.com>

* get rid of assigns

Signed-off-by: raver119 <raver119@gmail.com>

* proper deviceId is back

Signed-off-by: raver119 <raver119@gmail.com>

* include fixed

Signed-off-by: raver119 <raver119@gmail.com>

Co-authored-by: Alex Black <blacka101@gmail.com>
2020-03-20 08:49:28 +03:00
raver119 77244f5496
avg/max pooling3d bp fixed (#323)
Signed-off-by: raver119 <raver119@gmail.com>
2020-03-16 18:17:42 +03:00
raver119 4cf2afad2b
benchmarks fixes (#321)
* bunch of small fixes

Signed-off-by: raver119 <raver119@gmail.com>

* validation for legacy random op

Signed-off-by: raver119 <raver119@gmail.com>

* get rid of test

Signed-off-by: raver119 <raver119@gmail.com>
2020-03-16 10:31:06 +03:00
Oleh e7a995e959
Tanh backpropagation mkldnn implementation (#308)
* libnd4j first step of tanh_bp operation implementation on mkldnn

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j optimize several places and added test case for tanh_bp

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j minor corrections and renaming, added one more test case

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j missed mkldnn data format definition

Signed-off-by: Oleg <oleg.semeniv@gmail.com>
2020-03-13 19:01:00 +03:00
Yurii Shyrma e42b4e96c3
correct output empty shapes deducing in split op (#311)
* - correct output empty shapes deducing in split op

Signed-off-by: Yurii <iuriish@yahoo.com>

* java test fixed

Signed-off-by: raver119 <raver119@gmail.com>

* - split broadcast::exec function on individual functions corresponding to switch arg

Signed-off-by: Yurii <iuriish@yahoo.com>

* - split broadcast::exec _int and _bool function on individual functions corresponding to switch arg

Signed-off-by: Yurii <iuriish@yahoo.com>

Co-authored-by: raver119 <raver119@gmail.com>
2020-03-12 18:25:54 +03:00
Oleh 41bde8f885
Softmax BP mkldnn implementation (#301)
* libnd4j mkldnn softmax_bp operation implementation and integration, 2 tests added, need some refactoring and code clean up and more testing with different input shapes

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j softmax_bp update, code refactoring, etc

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j merge master, fixed typos, minor tweaks, code clean up

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j  integrate mkldnnUtils helpers in other mkldnn operations

Signed-off-by: Oleg <oleg.semeniv@gmail.com>
2020-03-12 18:25:29 +03:00
Yurii Shyrma 58550b7c98
[WIP] Shyrma coords (#305)
* - provide faster index2coords function for cpu

Signed-off-by: Yurii <iuriish@yahoo.com>

* - new faster index2coords function is introduced into cpu code

Signed-off-by: Yurii <iuriish@yahoo.com>

* - replace long long coordinates with int coordinates

Signed-off-by: Yurii <iuriish@yahoo.com>

* - add missed reload of coords2index function

Signed-off-by: Yurii <iuriish@yahoo.com>

* - reststart  jenkins

Signed-off-by: Yurii <iuriish@yahoo.com>

* - rollback changes in convolutions.cu and addBias.cu

Signed-off-by: Yurii <iuriish@yahoo.com>
2020-03-11 16:21:59 +03:00
Yurii Shyrma 6aaca58506
Shyrma broadcast (#302)
* - profiling TrueBroadcastHelper

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further improving of TrueBroadcastHelper

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further profiling of broadcast op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - implementation of broadcastShapeHelper which inserts unities in shapes of arrays to be broadcasted

Signed-off-by: Yurii <iuriish@yahoo.com>

* - provide additional method in ConstantShapeHelper class for deducing broadcast shapes with unities

Signed-off-by: Yurii <iuriish@yahoo.com>

* - provide new NativeOps helpers for usual and true broadcast methods

Signed-off-by: Yurii <iuriish@yahoo.com>

* enable bert profiler

Signed-off-by: raver119 <raver119@gmail.com>

* - delete unnessesary tests

Signed-off-by: Yurii <iuriish@yahoo.com>

Co-authored-by: raver119 <raver119@gmail.com>
2020-03-10 16:29:09 +03:00
Oleh c3223dbc7a
Improve ResultSet usage in libnd4j (#281)
* libnd4j profiling DeclarableOp and Tests by replacing return ResultSet pointer by instance

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j profiling semantic change in tests cases

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j some corrections to make new ResultSet semantic works, fixed one test

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j more tests fixes

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* - correct copy and move assignment operators of ResultSet class

Signed-off-by: Yurii <iuriish@yahoo.com>

Co-authored-by: Yurii <iuriish@yahoo.com>
Co-authored-by: raver119 <raver119@gmail.com>
2020-03-10 07:42:50 +03:00
raver119 57210b936c
Revert "OpenMP Threads execution (#297)" (#299)
This reverts commit dd2043ef48.
2020-03-09 08:22:49 +03:00
raver119 dd2043ef48
OpenMP Threads execution (#297)
* omp threads backported

Signed-off-by: raver119 <raver119@gmail.com>

* omp scalar reduce

Signed-off-by: raver119 <raver119@gmail.com>

* timing

Signed-off-by: raver119 <raver119@gmail.com>

* timing

Signed-off-by: raver119 <raver119@gmail.com>

* minor tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* minor tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* namespace change

Signed-off-by: raver119 <raver119@gmail.com>

* num_threads

Signed-off-by: raver119 <raver119@gmail.com>

* one minor fix

Signed-off-by: raver119 <raver119@gmail.com>
2020-03-09 08:21:44 +03:00
Oleh ead5162c97
Tanh mkldnn implementation (#296)
* libnd4j first step of softmax mkldnn implementation

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j raw implementation of mkldnn softmax

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j merge master and added softmax to MklDnnTests

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j some corrections for softmax mkldnn

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j merge branch, fixed problem with negative axis, fixed dnnl::memory::format_tag selection, test cases added

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j minor corrections to avoid risk connected with negative axis usage

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j fixed windows builds, added switcher to use mkldnn sofmax version only for 3D, 4D, 5D, 6D arrays

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j fixed dataType selection per request

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j fix for mac and windows builds

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j builds fix

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j first spet of elementwize tanh implementation on mkldnn

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j fixed typo in error message for softmax MKLDNN, test case added, implementation of tanh on MKLDNN, need supported DataType testing

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j several fixes for tanh and temporary performance test added

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j fixed mkldnn platform loader for tanh

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j MklDnn tanh removed unsupported data types, removed performance test case, added more appropriate equivalence test case, code clean up

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j fixed problem with empty input case for MklDnn tanh and softmax

Signed-off-by: Oleg <oleg.semeniv@gmail.com>
2020-03-06 17:11:22 +03:00
Oleh 4d81af9fe9
Softmax operation implementation for mkldnn (#286)
* libnd4j first step of softmax mkldnn implementation

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j raw implementation of mkldnn softmax

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j merge master and added softmax to MklDnnTests

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j some corrections for softmax mkldnn

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j merge branch, fixed problem with negative axis, fixed dnnl::memory::format_tag selection, test cases added

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j minor corrections to avoid risk connected with negative axis usage

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j fixed windows builds, added switcher to use mkldnn sofmax version only for 3D, 4D, 5D, 6D arrays

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j fixed dataType selection per request

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j fix for mac and windows builds

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j builds fix

Signed-off-by: Oleg <oleg.semeniv@gmail.com>
2020-03-04 19:36:42 +03:00
Yurii Shyrma 78934c17ad
profiling of stack and unstack ops (#261)
* - profiling of stack and unstack ops

Signed-off-by: Yurii <iuriish@yahoo.com>

* - fix bug in cpu concat op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correction of cuda stack and unstack

Signed-off-by: Yurii <iuriish@yahoo.com>

* - change shape.h method which operates with unity dimensions strides

Signed-off-by: Yurii <iuriish@yahoo.com>

* - rearrange stack tests

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct evaluation of smallest stride for moving through contiguous axis

Signed-off-by: Yurii <iuriish@yahoo.com>

* - forgot to update signature of function strideOverContigAxis in cuda concat and split ops

Signed-off-by: Yurii <iuriish@yahoo.com>

* - remove ShapeUtils::shapeAsString method applied before input arrays validations

Signed-off-by: Yurii <iuriish@yahoo.com>

* -  further removing of ShapeUtils::shapeAsString

Signed-off-by: Yurii <iuriish@yahoo.com>

* - take sub-array shapeIndo/offset calculation out of NDArray class
- add possibility of contiguous memory copy in execTransformAny op if opNum == assign

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct test_empty_scatter_2 in EmptyTests.cpp

Signed-off-by: Yurii <iuriish@yahoo.com>

* - profiling of slice op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - get rid of contiguous memcpy for some cases in concat and split ops

Signed-off-by: Yurii <iuriish@yahoo.com>

* - forgot to declare oid nd4j::SpecialMethods<T>::splitCpuGeneric

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct typo in calculation of threads in cuda split op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - forgot to correct another set of threads variables in split cuda ops

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further conflicts resolving

Signed-off-by: Yurii <iuriish@yahoo.com>

Co-authored-by: raver119 <raver119@gmail.com>
2020-03-03 07:32:37 +03:00
raver119 c54cdaab75
full bert graph (#282)
Signed-off-by: raver119 <raver119@gmail.com>
2020-03-02 18:14:32 +03:00
raver119 63fa3c2ef3
libnd4j polishing (#273)
* initial set of include changes

Signed-off-by: raver119 <raver119@gmail.com>

* one more tweak

Signed-off-by: raver119 <raver119@gmail.com>

* few more rearrangements

Signed-off-by: raver119 <raver119@gmail.com>

* few more rearrangements

Signed-off-by: raver119 <raver119@gmail.com>

* few more rearrangements

Signed-off-by: raver119 <raver119@gmail.com>

* cuda includes rearrangements

Signed-off-by: raver119 <raver119@gmail.com>

* java update

Signed-off-by: raver119 <raver119@gmail.com>

* = namespace changed to sd
- few CMake variables renamed with SD_ prefix

Signed-off-by: raver119 <raver119@gmail.com>

* java update

Signed-off-by: raver119 <raver119@gmail.com>

* LoopKind minor fix

Signed-off-by: raver119 <raver119@gmail.com>

* few more changes

Signed-off-by: raver119 <raver119@gmail.com>

* few more changes

Signed-off-by: raver119 <raver119@gmail.com>

* few more changes

Signed-off-by: raver119 <raver119@gmail.com>

* sanitizer is optional now

Signed-off-by: raver119 <raver119@gmail.com>

* dev tests updated

Signed-off-by: raver119 <raver119@gmail.com>

* few more changes

Signed-off-by: raver119 <raver119@gmail.com>

* last update

Signed-off-by: raver119 <raver119@gmail.com>

* java update

Signed-off-by: raver119 <raver119@gmail.com>
2020-03-02 12:49:41 +03:00
shugeo 330a69d4e2
Shugeo solve ls (#203)
* lstsq op. Initial commit.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Least squares linear problem solve op (lstsq). Cpu draft implementation.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed shape routine and tests.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added test for lstsq op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Rectification for lstsq op implementation.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Corrected test to avoid numerical inconsistensy.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added prints for check computing.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Corrected tests to use evalueate facility instead.

Signed-off-by: shugeo <sgazeos@gmail.com>

* CPU implementation of MatrixSolveLs op and tests.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added cuda implementation for helpers with lstsq op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored tests for lstsq op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added processing for empty inputs.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Merged tests.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored lstsq op for fast case.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed test.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored lstsq op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed some issues with solve.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed lstsq op to avoid erros.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added kernel for giagonal factor

Signed-off-by: shugeo <sgazeos@gmail.com>

* lstsq wrapper and triangular_solve fixed

* Added proper processing empty inputs and test.

Signed-off-by: shugeo <sgazeos@gmail.com>

* SequenceMask test

* Build fixed

* Added proper processing of empty inputs with solve op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Mapping added

* Added check of input shapes with solve op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added a couple of tests for lstsq op and minor changes with cuda helper for one.'

Signed-off-by: shugeo <sgazeos@gmail.com>

* Tests on

* Refactored test for lstsq op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed test

* Added another approach for lstsq op aka solve_ls.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Finished cpu part for solve_ls op helpers.

* Added helper for low triangular matrix inversion.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored alternate solve_ls cpu implementation.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Removed alternate approach for solve_ls op. Added multithreading with matrix inversion.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Assert fixed

* Refactored multithreading for inverse matricies.

Signed-off-by: shugeo <sgazeos@gmail.com>

Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
2020-02-28 11:37:26 +03:00
raver119 241ed05c64
VariableSpace uses unordered maps as well (#270)
Signed-off-by: raver119 <raver119@gmail.com>
2020-02-24 21:58:23 +03:00
shugeo 1bb3ae4b03
Shugeo unordered map (#256)
* Refactored usage of std::map to std::unordered_map instead.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Eliminated crash with wrong ShapeDescriptor hash.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Eliminated crash with TadDescriptor hash.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored Stash hash.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored hashes.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored TadDescriptor hash and top_k mapping.

* Refactored hashes for ShapeDescriptor and TadDescriptor classes.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored hash for ConstantDescriptor and ShapeDescriptor classes.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed map using with cuda platform.

Signed-off-by: shugeo <sgazeos@gmail.com>

* - few rearrangements for hash functions
- shared openblas allowed

Signed-off-by: raver119 <raver119@gmail.com>

* exports

Signed-off-by: raver119 <raver119@gmail.com>

* exports

Signed-off-by: raver119 <raver119@gmail.com>

* Stash reverted to std::map

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* Added additional test.

Signed-off-by: shugeo <sgazeos@gmail.com>

* different maps for different compilers

Signed-off-by: raver119 <raver119@gmail.com>

* missing include

Signed-off-by: raver119 <raver119@gmail.com>

* fix the leak

Signed-off-by: raver119 <raver119@gmail.com>

Co-authored-by: raver119 <raver119@gmail.com>
2020-02-24 07:51:01 +03:00
Oleh 0748c7e7c2
Oleh broadcast4d (#257)
* libnd4j raw implementation of native broadcast for special cases

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j fixed bugs for special case of 4D loop broadcast, add some tests, need more testing and discussion

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j added 3D and 5D cases support and tests, need testing with different orders

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j correctd case selection for broadcast 3,4,5D loops, fixed several places for more stable behavior, clean up

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j minor corrections to avoid some risks in strides selection, added tests and rename some variables

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j optimize usage the stride selection for all loops in separate ShapeUtils method copyCertainStridesFromShapeInfo, merge master

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j remove per request several tests for 3D, 4D and 5D broadcast loops

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j removed some loac changes that had not been sync with serve playground, turn on new loops usage
2020-02-21 07:46:05 +03:00
Yurii Shyrma f7a9190407
profiling of concat op (both cuda and cpu) (#151)
* - profiling of concat op (both cuda and cpu)

Signed-off-by: Yurii <iuriish@yahoo.com>

* better comparison for large concat

Signed-off-by: raver119 <raver119@gmail.com>

* - further improving of concat op

Signed-off-by: Yurii <iuriish@yahoo.com>

* some loggin

Signed-off-by: raver119 <raver119@gmail.com>

* - add possibility to verify presence of trailing unities in shape and set strides/ews correspondingly
- restrict second simple case in concat op to c order only

Signed-off-by: Yurii <iuriish@yahoo.com>

* - move concat op to specials_single.cpp file

Signed-off-by: Yurii <iuriish@yahoo.com>

* - get rid of second concat op declaration in transforms.cpp file

Signed-off-by: Yurii <iuriish@yahoo.com>

Co-authored-by: raver119 <raver119@gmail.com>
2020-02-20 21:19:01 +03:00
raver119 da39a63c9b one more bert-like test
Signed-off-by: raver119 <raver119@gmail.com>
2020-02-18 11:20:38 +03:00
Yurii Shyrma 22c7aa9acf
Shyrma mkl matmul (#250)
* - provide matmul code based on mkl api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct typo in mkl matmul op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - take into account empty arrays in mkl matmul op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - fix bug in mkl matmul and group all matmul tests in one file

Signed-off-by: Yurii <iuriish@yahoo.com>
2020-02-18 08:58:01 +03:00
raver119 2698fbf541
Broadcast perf improvements (#248)
* broadcast as scalar edge case

Signed-off-by: raver119 <raver119@gmail.com>

* missing return

Signed-off-by: raver119 <raver119@gmail.com>

* few fixes

Signed-off-by: raver119 <raver119@gmail.com>

* one more fix

Signed-off-by: raver119 <raver119@gmail.com>

* no need for lambdas

Signed-off-by: raver119 <raver119@gmail.com>
2020-02-17 16:25:09 +03:00
raver119 f9d51b7278
More compilation units (#246)
* weird edge case

Signed-off-by: raver119 <raver119@gmail.com>

* weird edge case

Signed-off-by: raver119 <raver119@gmail.com>

* get rid of it

Signed-off-by: raver119 <raver119@gmail.com>

* crop and resize reorganized

Signed-off-by: raver119 <raver119@gmail.com>

* restore test

Signed-off-by: raver119 <raver119@gmail.com>

* remove unwanted unit refs in cmale

Signed-off-by: raver119 <raver119@gmail.com>
2020-02-17 10:23:05 +03:00
Yurii Shyrma 011c272fde
Shyrma transpose (#244)
* - provide contiguous strides for ouput in transpose op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - provide contiguous strides for output in permute op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - take into account empty shapes properly in transpose/permute op

Signed-off-by: Yurii <iuriish@yahoo.com>
2020-02-17 08:04:28 +03:00
raver119 9e3c1b02b1
Perf improvements (#242)
* initial commit

Signed-off-by: raver119 <raver119@gmail.com>

* meh

Signed-off-by: raver119 <raver119@gmail.com>

* better ExpandDims impl

Signed-off-by: raver119 <raver119@gmail.com>

* better Squeeze impl

Signed-off-by: raver119 <raver119@gmail.com>

* better Softmax impl

Signed-off-by: raver119 <raver119@gmail.com>

* one test disabled

Signed-off-by: raver119 <raver119@gmail.com>

* more accurate impl

Signed-off-by: raver119 <raver119@gmail.com>

* - GraphProfiler now prints full shapeInfo instead of shape
- softmax typo fix

Signed-off-by: raver119 <raver119@gmail.com>
2020-02-14 16:20:31 +03:00
Oleh 6e6289b6b9
Oleh bert multiply true broad cast (#239)
* libnd4j trueBroadcast rank 3 row implementation of special case

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j rule clarify for second special case for all tests pass

* libnd4j parallel_tad loop switch on in special case

* libnd4j more general case for special case 2, need additional testing

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j more general case for trueBroadcast special cases added

* libnd4j minor corrections and clean up

* libnd4j one more minor fix

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j fixed check point to support all Y common vector representations in first special case for trueBroadcast

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

Co-authored-by: raver119 <raver119@gmail.com>
2020-02-14 12:04:38 +03:00