Commit Graph

103 Commits (dc66a52bc7c6fcda7d37cd9fa1b9f85abaa673db)

Author SHA1 Message Date
Ryan Nett 79867f5c5a cleanup SDRNN and rnn ops (#238)
Signed-off-by: Ryan Nett <rnett@skymind.io>
2019-09-05 12:25:03 +10:00
shugeo 548044a1e2 Shugeo doc (#235)
* Actualized doc to tnse ops.

* Added comments for dynamic_stitch op.

* Added comments to dynamic_stitch op implementation.

* Modified comment for unstack_list op.

* Added doc for space_to_depth and depth_to_space ops.

* Added doc for space_to_batch op.

* Enlarge test type for adjustSaturation.

* Added doc for runner.
2019-09-04 14:57:59 +03:00
raver119 a90c7dd995
[WIP] Last set of changes (#234)
* mmul op instead of cublasSgemm

Signed-off-by: raver119 <raver119@gmail.com>

* transB

Signed-off-by: raver119 <raver119@gmail.com>

* jcpp handles

Signed-off-by: raver119 <raver119@gmail.com>

* bitwise and/or/xor

Signed-off-by: raver119 <raver119@gmail.com>

* bitwise and/or/xor mapping

Signed-off-by: raver119 <raver119@gmail.com>

* cuda/cublas version check

Signed-off-by: raver119 <raver119@gmail.com>

* add expected version

Signed-off-by: raver119 <raver119@gmail.com>

* cuda/cublas version check in java

Signed-off-by: raver119 <raver119@gmail.com>

* one more error check

Signed-off-by: raver119 <raver119@gmail.com>

* build fix

Signed-off-by: raver119 <raver119@gmail.com>

* build fix

Signed-off-by: raver119 <raver119@gmail.com>

* build fix

Signed-off-by: raver119 <raver119@gmail.com>

* one more fix

Signed-off-by: raver119 <raver119@gmail.com>

* skip CUDA version check for now

Signed-off-by: raver119 <raver119@gmail.com>

* better wording

Signed-off-by: raver119 <raver119@gmail.com>

* few more tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* few more tweaks

Signed-off-by: raver119 <raver119@gmail.com>
2019-09-04 14:41:08 +03:00
Yurii Shyrma cb4c9377b1 Shyrma docs (#222)
* - documenting and profiling matrix_set_diag cuda kernel

Signed-off-by: Yurii <yurii@skymind.io>

* - correct formula of pnorm pooling in cuda 2d/3d kernels
- remove helper matrix_diag which duplicates work of helper matrix_set_diag

Signed-off-by: Yurii <yurii@skymind.io>
2019-09-02 16:25:58 +03:00
Yurii Shyrma a35926c6e9 - add parameter alpha to elu and lrelu_bp (#213)
* - add parameter alpha to elu and lrelu_bp

Signed-off-by: Yurii <yurii@skymind.io>

* - forgot to correct header activations.h

Signed-off-by: Yurii <yurii@skymind.io>
2019-08-31 20:57:39 +03:00
raver119 70a9ae5068
[WIP] few tweaks (#206)
* scatter empty check

Signed-off-by: raver119 <raver119@gmail.com>

* scatter empty test

Signed-off-by: raver119 <raver119@gmail.com>

* one more test

Signed-off-by: raver119 <raver119@gmail.com>

* two tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* dup tweak

Signed-off-by: raver119 <raver119@gmail.com>

* - put empty checking of indices array immediately prior  helper run

Signed-off-by: Yurii <yurii@skymind.io>

* minor tests fix

Signed-off-by: raver119 <raver119@gmail.com>

* minor tests fix

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-30 16:32:01 +03:00
raver119 1003428a18
[WIP] Int broadcastables (#195)
* Removed invalid resource and fixed tests

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* legacy scalar/pairwise/broadcast int ops

Signed-off-by: raver119 <raver119@gmail.com>

* NDArray int broadcastables

Signed-off-by: raver119 <raver119@gmail.com>

* few more bitwise tests

Signed-off-by: raver119 <raver119@gmail.com>

* java side update

Signed-off-by: raver119 <raver119@gmail.com>

* Argument type changed for shift ops

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* legacy scalar/pairwise/broadcast int ops

Signed-off-by: raver119 <raver119@gmail.com>

* NDArray int broadcastables

Signed-off-by: raver119 <raver119@gmail.com>

* few more bitwise tests

Signed-off-by: raver119 <raver119@gmail.com>

* java side update

Signed-off-by: raver119 <raver119@gmail.com>

* Argument type changed for shift ops

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
2019-08-30 10:12:40 +03:00
Yurii Shyrma 5395d4fbe5 - rewrite broadcast_dynamic_shape and delete corresponding helpers (#194)
Signed-off-by: Yurii <yurii@skymind.io>
2019-08-29 20:38:02 +03:00
Yurii Shyrma 70af8c2afc Shyrma svd (#191)
* - add one additional test for svd

* - provide float argument in eye op to be a type of output array

Signed-off-by: Yurii <yurii@skymind.io>

* - add cuda capability check to mmulHelper

Signed-off-by: Yurii <yurii@skymind.io>

* - make use another method for divice id evaluation

Signed-off-by: Yurii <yurii@skymind.io>

* Eye data type as T argument

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-28 18:27:08 +03:00
raver119 dec296da17
[WIP] bits_hamming_distance (#192)
* bits_hamming_distance op

Signed-off-by: raver119 <raver119@gmail.com>

* bits_hamming_distance cuda

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-28 18:20:44 +03:00
Yurii Shyrma 2144941313 Shyrma fix2 (#186)
* - further work on layer_norm

Signed-off-by: Yurii <yurii@skymind.io>

* - further work on layer_norm 2

Signed-off-by: Yurii <yurii@skymind.io>

* - correct helpers for svd cuda

Signed-off-by: Yurii <yurii@skymind.io>
2019-08-27 19:57:59 +03:00
raver119 a49f7c908b
[WIP] More fixes (#178)
* skip string arrays for device validation

Signed-off-by: raver119 <raver119@gmail.com>

* histogram_fixed_width now really supports indexing types

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-27 13:21:01 +03:00
raver119 df84bc7255
[WIP] More tweaks (#173)
* CUDA empty reduction

Signed-off-by: raver119 <raver119@gmail.com>

* - listdiff synchronization fix for CUDA
- listdiff test

Signed-off-by: raver119 <raver119@gmail.com>

* - IndexReduce ops now allow INDEXING_TYPES output
- topK op accepts only INDEXING_TYPES as output

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-27 10:37:10 +03:00
raver119 25e5c23eae
[WIP] Error handling (#169)
* CUDA reverse rewrite + couple of tests

Signed-off-by: raver119 <raver119@gmail.com>

* don't throw exception on invalid pointer

Signed-off-by: raver119 <raver119@gmail.com>

* data types validation for fastpath exec mode + 2 tests

Signed-off-by: raver119 <raver119@gmail.com>

* data types validation for fastpath exec mode + 2 tests

Signed-off-by: raver119 <raver119@gmail.com>

* ismax allowed dtypes tweak

Signed-off-by: raver119 <raver119@gmail.com>

* lastErrorCode + lastErrorMessage for native exceptions handling

Signed-off-by: raver119 <raver119@gmail.com>

* exportable ErrorReference

Signed-off-by: raver119 <raver119@gmail.com>

* check error codes in java

Signed-off-by: raver119 <raver119@gmail.com>

* - consume lastErrorCode
- fast_in dtype validation fix

Signed-off-by: raver119 <raver119@gmail.com>

* - sg/cb allowed output type change
- minor logging fix for data type validation

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-26 19:57:51 +03:00
raver119 bb5fc36e5e
[WIP] ops fixes (#168)
* - correct layer_norm

Signed-off-by: Yurii <yurii@skymind.io>

* - further fix of layer norm

Signed-off-by: Yurii <yurii@skymind.io>

* - correct scatter_upd op

Signed-off-by: Yurii <yurii@skymind.io>

* - correct cuda kernel for histogram_fixed_width op

Signed-off-by: Yurii <yurii@skymind.io>

* - delete comments

Signed-off-by: Yurii <yurii@skymind.io>

* enabled one ignored test

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-26 19:37:05 +03:00
Alex Black b417ca21bf
Fix for concat op shape function (empty shapes) (#167)
Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-08-26 23:10:28 +10:00
raver119 f03b0ee78f
[WIP] more fixes (#159)
* Added test for MatrixInverse with double input. Fixed matrixDeterminantKernel.

* Fixed kernels to avoid waste templating.

* Fixed logDeterminant kernel.

* Refactored type check for lup'

* - decrease blockDim value for zeta op

Signed-off-by: Yurii <yurii@skymind.io>

* Added print for compound matrix with CUDA.

* Refactored upper matrix invertion kernels.

* - provide move constructor and move assignment operator for OpArgsHoder class

Signed-off-by: Yurii <yurii@skymind.io>

* Refactored usage of launch context.

* - add test for mergemax

Signed-off-by: Yurii <yurii@skymind.io>

* get rid of AveragingArrayProxy

Signed-off-by: raver119 <raver119@gmail.com>

* Refactoring of LUP inversion.

* Added prints for invertion.

* - add OpArgsHolder copy constructor and assignment operator

Signed-off-by: Yurii <yurii@skymind.io>

* Added test for lower inversion

* - fix bug in upsampling2d/3d_bp op

Signed-off-by: Yurii <yurii@skymind.io>

* Added expensive printfs to kernel.

* Refactored expensive kernel prints.

* Refactored expensive printfs

* - remove nullify

Signed-off-by: Yurii <yurii@skymind.io>

* Eliminated waste prints with tests.

* upsampling2d_bp test

Signed-off-by: raver119 <raver119@gmail.com>

* test updated

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-23 19:20:50 +03:00
raver119 fb8de5006f - concat empty scalar fix
- couple of tests for empty scalar concat

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-23 13:16:50 +03:00
raver119 729dc5e879
[WIP] size etc (#155)
* one test for size

Signed-off-by: raver119 <raver119@gmail.com>

* - few tests for size op
- size/rank/size_at ops now use p instead of assign

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-23 12:31:12 +03:00
Alex Black e855e47f73
More fixes (#148)
* Small batch norm fix (cuda/no-mkldnn)

Signed-off-by: Alex Black <blacka101@gmail.com>

* Dropout fix for RnnOutputLayer

Signed-off-by: Alex Black <blacka101@gmail.com>

* Allow block size < 2 in batch_to_space_nd and space_to_batch_nd for import, in spite of what TF docs say

Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-08-22 19:55:27 +10:00
raver119 eea3062ccf
[WIP] stb/bts nd (#144)
* - start working on space_to_batch_nd

Signed-off-by: Yurii <yurii@skymind.io>

* - provide cpu helper for space_to_batch_nd op

Signed-off-by: Yurii <yurii@skymind.io>

* few typos fixed

Signed-off-by: raver119 <raver119@gmail.com>

* - add tests for space_to_batch and correct bugs

Signed-off-by: Yurii <yurii@skymind.io>

* - write cuda kernel for space_to_batch op

Signed-off-by: Yurii <yurii@skymind.io>

* - add order argument to shape::index2coords method in convolution cuda ops

Signed-off-by: Yurii <yurii@skymind.io>

* - restore some previous code

Signed-off-by: Yurii <yurii@skymind.io>

* old col2im kernel activated

Signed-off-by: raver119 <raver119@gmail.com>

* - change coords calculation in col2im kernel

Signed-off-by: Yurii <yurii@skymind.io>

* - restore old col2im kernel

Signed-off-by: Yurii <yurii@skymind.io>

* - add custom op for batch_to_space

Signed-off-by: Yurii <yurii@skymind.io>

* - provide cpu version for batch_to_space_nd op

Signed-off-by: Yurii <yurii@skymind.io>

* - provide cuda kernel for batch_to_space_nd op

Signed-off-by: Yurii <yurii@skymind.io>
2019-08-21 21:11:46 +03:00
raver119 e604ffe0d2
[WIP] repeat op (#143)
* - write new repeat helper (cpu)

Signed-off-by: Yurii <yurii@skymind.io>

* - update NDArray::cpu

Signed-off-by: Yurii <yurii@skymind.io>

* - update NDArray::repeat cuda

Signed-off-by: Yurii <yurii@skymind.io>
2019-08-21 21:10:29 +03:00
raver119 d9ab299759
[WIP] Minor fixes (#140)
* - Tile java shape fn removed
- Tile 0 validation added
- scatter_upd test

Signed-off-by: raver119 <raver119@gmail.com>

* additional tile validation

Signed-off-by: raver119 <raver119@gmail.com>

* - provide vector case in cuda scatter op

Signed-off-by: Yurii <yurii@skymind.io>

* cpu ismax view fix

Signed-off-by: raver119 <raver119@gmail.com>

* exp

Signed-off-by: raver119 <raver119@gmail.com>

* cuda ismax fix

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-21 15:05:47 +03:00
raver119 aceb915557
[WIP] tests fixes (#130)
* no openmp for ClipByGlobalNorm

Signed-off-by: raver119 <raver119@gmail.com>

* one more bfloat16 rng test

Signed-off-by: raver119 <raver119@gmail.com>

* assertion fix

Signed-off-by: raver119 <raver119@gmail.com>

* - legacy IsMax gone
- linear IsMax gets shapeInfo argument

Signed-off-by: raver119 <raver119@gmail.com>

* get rid of legacy IsMax tests

Signed-off-by: raver119 <raver119@gmail.com>

* IsMax is custom op now

Signed-off-by: raver119 <raver119@gmail.com>

* more blocks for ismax

Signed-off-by: raver119 <raver119@gmail.com>

* one more test

Signed-off-by: raver119 <raver119@gmail.com>

*  - sqrt test
 - some legacy code removed from CudaExecutioner
 - Transforms.asin tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* - TransformFloat fix

Signed-off-by: raver119 <raver119@gmail.com>

* - ismax fix
- SpaceToBatchND/BatchToSpaceND wrappers
- couple of legacy tests removed

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-19 11:33:15 +03:00
raver119 e22880fd76
[WIP] Roll rewritten (#128)
* Process correct input vector.

* Added tests for roll.

* Refactored roll to conform with TF. Eliminated memory leaks with Roll op tests.
2019-08-17 14:15:08 +03:00
raver119 5c908886b0
OP/CONFIGURABLE_OP shapefn fix (#125)
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-16 08:53:30 +03:00
raver119 987bb80c46
[WIP] right shift ops (#118)
* right shift ops

Signed-off-by: raver119 <raver119@gmail.com>

* typo

Signed-off-by: raver119 <raver119@gmail.com>

* rotr test

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-15 20:35:15 +03:00
shugeo 3c7cd2397c Eliminated memory leaks and dropped waste prints with tests. (#117) 2019-08-15 15:28:19 +03:00
shugeo f083b96c74
Shugeo cuda tests (#116)
* Added tests for get_seed/set_seed ops.

* Added missed tests for scatter_sub/mul/div ops.

* Added tests for hardsigmoid and hardsigmoid_bp.

* Added tests for hardtanh and hardtanh_bp ops.

* Added test for histogram op.

* Added tests for identity op.

* Refactored mergemaxindex op. Added tests for log1p,mergemaxindex, mod and mod_bp ops.

* Fixed tests for FloorDiv.

* Added test for rank op.

* Added tests for rationaltanh/rationaltanh_bp ops.

* Added tests for realdiv/realdiv_bp.

* Added tests for rectifiedtanh/_bp ops.

* Added tests for shapes_of op.

* Added tests for shapes_of op.

* Added tests for size op.

* Added tests for softplus/_bp ops.

* Added tests for softsign/_bp ops.

* Added tests for toggle_bits op. Fixed processing of OP_IMPL and so on defititions.

* Added test for truncatediv op.

* Added another test for truncatediv op.

* Added another test for histogram.

* Added tests for unstack_list op.

* Refactored to_int32/uint32/float16/float32/double/int64/uint64 ops and tests.

* Refactored mergemaxindex op helper for cuda platform and tests.

* Fixed cuda kernel for histogram op helper.

* Refactor skipgram to avoid early buffers shift.

* Fixed check up with non_max_suppression op cuda helper. Added cuda kernel implementation for skipgram op helpers.

* Added implementation of skipgram op helper for cuda platform. Working revision

* Fixed mergeMaxIndex kernel and move it to separate source file.
2019-08-15 13:54:47 +03:00
raver119 6264530dd8
[WIP] bitwise ops (#115)
* - cyclic_shift_bits + test
- shift_bits + test

Signed-off-by: raver119 <raver119@gmail.com>

* OMP_IF replacement

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-15 11:49:50 +03:00
raver119 e0f8d86eac bunch of shape functions fixed
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-14 18:28:25 +03:00
raver119 53ca9a76e8
[WIP] multi-device support (#80)
* fix pad javadoc and @see links. (#72)

Signed-off-by: Robert Altena <Rob@Ra-ai.com>

* [WIP] More fixes (#73)

* special tests for ConstantTadHelper/ConstantShapeHelper

Signed-off-by: raver119 <raver119@gmail.com>

* release methods for data buffers

Signed-off-by: raver119 <raver119@gmail.com>

* delete temporary buffer Java side

Signed-off-by: raver119 <raver119@gmail.com>

* delete temporary buffer Java side

Signed-off-by: raver119 <raver119@gmail.com>

* delete temporary TadPack C++/Java side (#74)

Signed-off-by: raver119 <raver119@gmail.com>

* Zoo model TF import test updates (#75)

* argLine fix, update compression_gru comment

* updated comment for xception

* undid but commented argLine change

* updated xlnet comment

* copyright headers

* - new NDArray methods like()/ulike() (#77)

- fix for depthwise_conv2d_bp + special test

Signed-off-by: raver119 <raver119@gmail.com>

* upsampling2d fix CUDA

Signed-off-by: raver119 <raver119@gmail.com>

* DL4J trace logging (#79)

* MLN/CG trace logging for debugging

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Tiny tweak

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* strided_slice_bp shape fn leak fix

Signed-off-by: raver119 <raver119@gmail.com>

* SameDiff fixes and naming (#78)

* remove SDVariable inplace methods

* import methods

* npe fix in OpVal

* removed SameDiff inplace ops from tests

* Naming updates, moved to centralized methods in SameDiff, should use op_#:# for everything

* quick fixes

* javadoc

* SDVariable eval with placeholders

* use regex match

* better matching

* initial commit

Signed-off-by: raver119 <raver119@gmail.com>

* initial commit

Signed-off-by: raver119 <raver119@gmail.com>

* fix javadoc. (#76)

* fix javadoc.

Signed-off-by: Robert Altena <Rob@Ra-ai.com>

* replace most @see with @link s.

Signed-off-by: Robert Altena <Rob@Ra-ai.com>

* 4 additional tests

Signed-off-by: raver119 <raver119@gmail.com>

* launch context reorganization

Signed-off-by: raver119 <raver119@gmail.com>

* LaunchContext reorganization

Signed-off-by: raver119 <raver119@gmail.com>

* per-device LaunchContext

Signed-off-by: raver119 <raver119@gmail.com>

* Various DL4J/ND4J fixes (#81)

* #7954 Force refresh of UI when switching tabs on overview page

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8017 Concurrent modification exception (synchronize) fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8033 Don't initialize updater in middle of writing memory crash dump

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8208 Fix shape checks for ND4J int[] creator methods

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #6385 #7992 Keras import naming fixes + cleanup

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8016 Upsampling3D - add NDHWC format support

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* ContextBuffers as separate entity

Signed-off-by: raver119 <raver119@gmail.com>

* Refactor NativeOps.h to export C functions

* Actually export functions from NativeOps.h

* Adapt the Java wrappers in ND4J generated with JavaCPP

* Create C wrappers for some of the C++ classes currently used by ND4J

* ContextBuffers as separate entity

Signed-off-by: raver119 <raver119@gmail.com>

* remove duplicate code in createBufferDetached. (#83)

Signed-off-by: Robert Altena <Rob@Ra-ai.com>

* Keras model import - updater lr fix (#84)

* Keras model import - updater lr fix

Signed-off-by: eraly <susan.eraly@gmail.com>

* Keras model import - updater lr fix, cleanup

Signed-off-by: eraly <susan.eraly@gmail.com>

* ContextBuffers as separate entity

Signed-off-by: raver119 <raver119@gmail.com>

* ContextBuffers as separate entity

Signed-off-by: raver119 <raver119@gmail.com>

* Fix functions of OpaqueVariablesSet

* thread-local buffers/affinity

Signed-off-by: raver119 <raver119@gmail.com>

* thread safety for LaunchContext

Signed-off-by: raver119 <raver119@gmail.com>

* more of thread safety

Signed-off-by: raver119 <raver119@gmail.com>

* one more multi threaded test

Signed-off-by: raver119 <raver119@gmail.com>

* SameDiff Convolution Config validation, better output methods (#82)

* Conv Config validation & tests

Signed-off-by: Ryan Nett <rnett@skymind.io>

* stackOutputs utility method

Signed-off-by: Ryan Nett <rnett@skymind.io>

* use constructor for validation, support negative kernel sizes (infered from weights)

Signed-off-by: Ryan Nett <rnett@skymind.io>

* better output methods

Signed-off-by: Ryan Nett <rnett@skymind.io>

* move output to be with fit and evaluate

Signed-off-by: Ryan Nett <rnett@skymind.io>

* fixes

Signed-off-by: Ryan Nett <rnett@skymind.io>

* more fixes

Signed-off-by: Ryan Nett <rnett@skymind.io>

* refactor duplicate code from pad methods. (#86)

* refactor duplicate code from pad methods.

Signed-off-by: Robert Altena <Rob@Ra-ai.com>

* replace switch with if.

Signed-off-by: Robert Altena <Rob@Ra-ai.com>

* Various ND4J/DL4J fixes and improvements (#87)

* Reshape and reallocate - small fixes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Reshape and reallocate - small fixes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #6488 ElementWiseVertex broadcast support

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Constructors and broadcast supported it Transforms.max/min

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8054 ElementWiseVertex now supports broadcast inputs

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8057 Nd4j.create overload dtype fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #7551 ND4J Shape validation fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* [WIP] Numpy boolean import (#91)

* numpy bool type

Signed-off-by: raver119 <raver119@gmail.com>

* numpy bool java side

Signed-off-by: raver119 <raver119@gmail.com>

* remove create method with unused parameter. (#89)

* remove create method with unused parameter.

* removed more unused methods.

Signed-off-by: Robert Altena <Rob@Ra-ai.com>

* removing more unused code.

Signed-off-by: Robert Altena <Rob@Ra-ai.com>

* last removal of unused code.

Signed-off-by: Robert Altena <Rob@Ra-ai.com>

* remove createSparse methods. (#92)

Signed-off-by: Robert Altena <Rob@Ra-ai.com>

* Various ND4J/DL4J fixes (#90)

* Deprecate Old*Op instances

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8063 #8054 Broadcast exceptions + cleanup inplace ops

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Small fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Remove bad test condition

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #7993 Fix shape function issue in crop_and_resize op

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* DL4J SameDiff lambda layer fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8029 Fix for pnorm backprop math

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8038 Fix Op profiler NaN/Inf triggering + add tests (#93)

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* createUninitializedDetached refactoring. (#94)

* wip

* update interface, add null implementations.

* Breaking one test in a weird way.

Signed-off-by: Robert Altena <Rob@Ra-ai.com>

* createUninitializedDetached refactored.

Signed-off-by: Robert Altena <Rob@Ra-ai.com>

* cuda build fix for issues introduced by recent refactoring

Signed-off-by: raver119 <raver119@gmail.com>

* [WIP] More of CUDA (#95)

* initial commit

Signed-off-by: raver119 <raver119@gmail.com>

* Implementation of hashcode cuda helper. Working edition.

* Fixed parallel test input arangements.

* Fixed tests for hashcode op.

* Fixed shape calculation for image:crop_and_resize op and test.

* NativeOps tests. Initial test suite.

* Added tests for indexReduce methods.

* Added test on execBroadcast with NDArray as dimensions.

* Added test on execBroadcastBool with NDArray as dimensions.

* Added tests on execPairwiseTransform and execPairwiseTransofrmBool.

* Added tests for execReduce with scalar results.

* Added reduce tests for non-empty dims array.

* Added tests for reduce3.

* Added tests for execScalar.

* Added tests for execSummaryStats.

* - provide cpu/cuda code for batch_to_space
- testing it

Signed-off-by: Yurii <yurii@skymind.io>

* - remove old test for batch_to_space (had wrong format and numbers were not checked)

Signed-off-by: Yurii <yurii@skymind.io>

* Fixed complilation errors with test.

* Added test for execTransformFloat.

* Added test for execTransformSame.

* Added test for execTransformBool.

* Added test for execTransformStrict.

* Added tests for execScalar/execScalarBool with TADs.

* Added test for flatten.

* - provide cpu/cuda code for space_to_Batch operaion

Signed-off-by: Yurii <yurii@skymind.io>

* Added test for concat.

* comment unnecessary stuff in s_t_b

Signed-off-by: Yurii <yurii@skymind.io>

* Added test for specialConcat.

* Added tests for memcpy/set routines.

* Fixed pullRow cuda test.

* Added pullRow test.

* Added average test.

* - correct typo in NDArray::applyPairwiseTransform(nd4j::pairwise::BoolOps op...)

Signed-off-by: Yurii <yurii@skymind.io>

* - debugging and fixing cuda tests in JavaInteropTests file

Signed-off-by: Yurii <yurii@skymind.io>

* - correct some tests

Signed-off-by: Yurii <yurii@skymind.io>

* Added test for shuffle.

* Fixed ops declarations.

* Restored omp and added shuffle test.

* Added convertTypes test.

* Added tests for execRandom. Eliminated usage of RandomBuffer with NativeOps.

* Added sort tests.

* Added tests for execCustomOp.

* - further debuging and fixing tests terminated with crash

Signed-off-by: Yurii <yurii@skymind.io>

* Added tests for calculateOutputShapes.

* Addded Benchmarks test.

* Commented benchmark tests.

* change assertion

Signed-off-by: raver119 <raver119@gmail.com>

* Added tests for apply_sgd op. Added cpu helper for that op.

* Implement cuda helper for aplly_sgd op. Fixed tests for NativeOps.

* Added test for assign broadcastable.

* Added tests for assign_bp op.

* Added tests for axpy op.

* - assign/execScalar/execTransformAny signature change
- minor test fix

Signed-off-by: raver119 <raver119@gmail.com>

* Fixed axpy op.

* meh

Signed-off-by: raver119 <raver119@gmail.com>

* - fix tests for nativeOps::concat

Signed-off-by: Yurii <yurii@skymind.io>

* sequential transform/scalar

Signed-off-by: raver119 <raver119@gmail.com>

* allow nested parallelism

Signed-off-by: raver119 <raver119@gmail.com>

* assign_bp leak fix

Signed-off-by: raver119 <raver119@gmail.com>

* block setRNG fix

Signed-off-by: raver119 <raver119@gmail.com>

* enable parallelism by default

Signed-off-by: raver119 <raver119@gmail.com>

* enable nested parallelism by default

Signed-off-by: raver119 <raver119@gmail.com>

* Added cuda implementation for row_count helper.

* Added implementation for tnse gains op helper.

* - take into account possible situations when input arrays are empty in reduce_ cuda stuff

Signed-off-by: Yurii <yurii@skymind.io>

* Implemented tsne/edge_forces op cuda-based helper. Parallelized cpu-based helper for edge_forces.

* Added kernel for tsne/symmetrized op heleper.

* Implementation of tsne/symmetrized op cuda helper. Working edition.

* Eliminated waste printfs.

* Added test for broadcastgradientargs op.

* host-only fallback for empty reduce float

Signed-off-by: raver119 <raver119@gmail.com>

* - some tests fixes

Signed-off-by: Yurii <yurii@skymind.io>

* - correct the rest of reduce_ stuff

Signed-off-by: Yurii <yurii@skymind.io>

* - further correction of reduce_ stuff

Signed-off-by: Yurii <yurii@skymind.io>

* Added test for Cbow op. Also added cuda implementation for cbow helpers.

* - improve code of stack operation for scalar case

Signed-off-by: Yurii <yurii@skymind.io>

* - provide cuda kernel for gatherND operation

Signed-off-by: Yurii <yurii@skymind.io>

* Implementation of cbow helpers with cuda kernels.

* minor tests tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* minor tests tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* - further correction of cuda stuff

Signed-off-by: Yurii <yurii@skymind.io>

* Implementatation of cbow op helper with cuda kernels. Working edition.

* Skip random testing for cudablas case.

* lstmBlockCell context fix

Signed-off-by: raver119 <raver119@gmail.com>

* Added tests for ELU and ELU_BP ops.

* Added tests for eq_scalar, gt_scalar, gte_scalar and lte_scalar ops.

* Added tests for neq_scalar.

* Added test for noop.

* - further work on clipbynorm_bp

Signed-off-by: Yurii <yurii@skymind.io>

* - get rid of concat op call, use instead direct concat helper call

Signed-off-by: Yurii <yurii@skymind.io>

* lstmBlockCell context fix

Signed-off-by: raver119 <raver119@gmail.com>

* Added tests for lrelu and lrelu_bp.

* Added tests for selu and selu_bp.

* Fixed lrelu derivative helpers.

* - some corrections in lstm

Signed-off-by: Yurii <yurii@skymind.io>

* operator * result shape fix

Signed-off-by: raver119 <raver119@gmail.com>

* - correct typo in lstmCell

Signed-off-by: Yurii <yurii@skymind.io>

* few tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* CUDA inverse broadcast bool fix

Signed-off-by: raver119 <raver119@gmail.com>

* disable MMAP test for CUDA

Signed-off-by: raver119 <raver119@gmail.com>

* BooleanOp syncToDevice

Signed-off-by: raver119 <raver119@gmail.com>

* meh

Signed-off-by: raver119 <raver119@gmail.com>

* additional data types for im2col/col2im

Signed-off-by: raver119 <raver119@gmail.com>

* Added test for firas_sparse op.

* one more RandomBuffer test excluded

Signed-off-by: raver119 <raver119@gmail.com>

* Added tests for flatten op.

* Added test for Floor op.

* bunch of tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* mmulDot tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* more tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* Implemented floordiv_bp op and tests.

* Fixed scalar case with cuda implementation for bds.

* - work on cuda kernel for clip_by_norm backprop op is completed

Signed-off-by: Yurii <yurii@skymind.io>

* Eliminate cbow crach.

* more tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* more tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* Eliminated abortion with batched nlp test.

* more tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* Fixed shared flag initializing.

* disabled bunch of cpu workspaces tests

Signed-off-by: raver119 <raver119@gmail.com>

* scalar operators fix: missing registerSpecialUse call

Signed-off-by: raver119 <raver119@gmail.com>

* Fixed logdet for cuda and tests.

* - correct clipBynorm_bp

Signed-off-by: Yurii <yurii@skymind.io>

* Fixed crop_and_resize shape datatype.

* - correct some mmul tests

Signed-off-by: Yurii <yurii@skymind.io>

* build fix

Signed-off-by: raver119 <raver119@gmail.com>

* exclude two methods for JNI

Signed-off-by: raver119 <raver119@gmail.com>

* exclude two methods for JNI

Signed-off-by: raver119 <raver119@gmail.com>

* exclude two methods for JNI (#97)

Signed-off-by: raver119 <raver119@gmail.com>

* temporary stack fix

Signed-off-by: raver119 <raver119@gmail.com>

* round robin affinity test

Signed-off-by: raver119 <raver119@gmail.com>

* get rid of legacy CudaContext methods

Signed-off-by: raver119 <raver119@gmail.com>

* get rid of legacy ContextPool classes/methods

Signed-off-by: raver119 <raver119@gmail.com>

* one legacy test removed

Signed-off-by: raver119 <raver119@gmail.com>

* few more fields rearranged

Signed-off-by: raver119 <raver119@gmail.com>

* OpaqueLaunchContext

Signed-off-by: raver119 <raver119@gmail.com>

* OpaqueLaunchContext++

Signed-off-by: raver119 <raver119@gmail.com>

* more of OpaqueLaunchContext methods

Signed-off-by: raver119 <raver119@gmail.com>

* LaunchContext -> CudaContext

Signed-off-by: raver119 <raver119@gmail.com>

* AffinityManger changes

Signed-off-by: raver119 <raver119@gmail.com>

* AffinityManger changes

Signed-off-by: raver119 <raver119@gmail.com>

* cusolver handles

Signed-off-by: raver119 <raver119@gmail.com>

* typo

Signed-off-by: raver119 <raver119@gmail.com>

* cusolver method

Signed-off-by: raver119 <raver119@gmail.com>

* cusolver handle propagated

Signed-off-by: raver119 <raver119@gmail.com>

* blas/solver handles

Signed-off-by: raver119 <raver119@gmail.com>

* one more test

Signed-off-by: raver119 <raver119@gmail.com>

* legacy concat implementations replaced with new CustomOp

Signed-off-by: raver119 <raver119@gmail.com>

* one more test

Signed-off-by: raver119 <raver119@gmail.com>

* concat now uses way more blocks

Signed-off-by: raver119 <raver119@gmail.com>

* print

Signed-off-by: raver119 <raver119@gmail.com>

* no more triple template mmul

Signed-off-by: raver119 <raver119@gmail.com>

* bunch of kernels have dtypes reconsidered

Signed-off-by: raver119 <raver119@gmail.com>

* bunch of kernels have dtypes reconsidered

Signed-off-by: raver119 <raver119@gmail.com>

* bitonic sort reorganized

Signed-off-by: raver119 <raver119@gmail.com>

* bunch of cpu stuff removed from cuda scope

Signed-off-by: raver119 <raver119@gmail.com>

* bunch of cpu stuff removed from cuda scope

Signed-off-by: raver119 <raver119@gmail.com>

* type conversions moved to generic impl

Signed-off-by: raver119 <raver119@gmail.com>

* cpu data types pass

Signed-off-by: raver119 <raver119@gmail.com>

* non_max_suppression

Signed-off-by: raver119 <raver119@gmail.com>

* sortByValue fix

Signed-off-by: raver119 <raver119@gmail.com>

* ignore all mixed datatype tests for mmul

Signed-off-by: raver119 <raver119@gmail.com>

* special handling of OpProfiler exceptions

Signed-off-by: raver119 <raver119@gmail.com>

* - one failing concat test in cpp
- Nd4j.tile now uses op internally

Signed-off-by: raver119 <raver119@gmail.com>

* get back dtype exception for legacy arrays deserialization

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-14 16:52:34 +03:00
raver119 f49c4ea9d0
int -> long (#108)
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-10 09:14:18 +03:00
Alex Black b8846113bd
Small number of fixes + cleanup + some missing op methods + constructors (#100)
* Remove unused op class - DefaultOpConverter

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Add SDImage class; INDArray constructor additions

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Floordiv

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Small polish to image methods

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Small DataVec test fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-08-05 22:31:46 +10:00
raver119 3c4e959e21 [WIP] More of CUDA (#95)
* initial commit

Signed-off-by: raver119 <raver119@gmail.com>

* Implementation of hashcode cuda helper. Working edition.

* Fixed parallel test input arangements.

* Fixed tests for hashcode op.

* Fixed shape calculation for image:crop_and_resize op and test.

* NativeOps tests. Initial test suite.

* Added tests for indexReduce methods.

* Added test on execBroadcast with NDArray as dimensions.

* Added test on execBroadcastBool with NDArray as dimensions.

* Added tests on execPairwiseTransform and execPairwiseTransofrmBool.

* Added tests for execReduce with scalar results.

* Added reduce tests for non-empty dims array.

* Added tests for reduce3.

* Added tests for execScalar.

* Added tests for execSummaryStats.

* - provide cpu/cuda code for batch_to_space
- testing it

Signed-off-by: Yurii <yurii@skymind.io>

* - remove old test for batch_to_space (had wrong format and numbers were not checked)

Signed-off-by: Yurii <yurii@skymind.io>

* Fixed complilation errors with test.

* Added test for execTransformFloat.

* Added test for execTransformSame.

* Added test for execTransformBool.

* Added test for execTransformStrict.

* Added tests for execScalar/execScalarBool with TADs.

* Added test for flatten.

* - provide cpu/cuda code for space_to_Batch operaion

Signed-off-by: Yurii <yurii@skymind.io>

* Added test for concat.

* comment unnecessary stuff in s_t_b

Signed-off-by: Yurii <yurii@skymind.io>

* Added test for specialConcat.

* Added tests for memcpy/set routines.

* Fixed pullRow cuda test.

* Added pullRow test.

* Added average test.

* - correct typo in NDArray::applyPairwiseTransform(nd4j::pairwise::BoolOps op...)

Signed-off-by: Yurii <yurii@skymind.io>

* - debugging and fixing cuda tests in JavaInteropTests file

Signed-off-by: Yurii <yurii@skymind.io>

* - correct some tests

Signed-off-by: Yurii <yurii@skymind.io>

* Added test for shuffle.

* Fixed ops declarations.

* Restored omp and added shuffle test.

* Added convertTypes test.

* Added tests for execRandom. Eliminated usage of RandomBuffer with NativeOps.

* Added sort tests.

* Added tests for execCustomOp.

* - further debuging and fixing tests terminated with crash

Signed-off-by: Yurii <yurii@skymind.io>

* Added tests for calculateOutputShapes.

* Addded Benchmarks test.

* Commented benchmark tests.

* change assertion

Signed-off-by: raver119 <raver119@gmail.com>

* Added tests for apply_sgd op. Added cpu helper for that op.

* Implement cuda helper for aplly_sgd op. Fixed tests for NativeOps.

* Added test for assign broadcastable.

* Added tests for assign_bp op.

* Added tests for axpy op.

* - assign/execScalar/execTransformAny signature change
- minor test fix

Signed-off-by: raver119 <raver119@gmail.com>

* Fixed axpy op.

* meh

Signed-off-by: raver119 <raver119@gmail.com>

* - fix tests for nativeOps::concat

Signed-off-by: Yurii <yurii@skymind.io>

* sequential transform/scalar

Signed-off-by: raver119 <raver119@gmail.com>

* allow nested parallelism

Signed-off-by: raver119 <raver119@gmail.com>

* assign_bp leak fix

Signed-off-by: raver119 <raver119@gmail.com>

* block setRNG fix

Signed-off-by: raver119 <raver119@gmail.com>

* enable parallelism by default

Signed-off-by: raver119 <raver119@gmail.com>

* enable nested parallelism by default

Signed-off-by: raver119 <raver119@gmail.com>

* Added cuda implementation for row_count helper.

* Added implementation for tnse gains op helper.

* - take into account possible situations when input arrays are empty in reduce_ cuda stuff

Signed-off-by: Yurii <yurii@skymind.io>

* Implemented tsne/edge_forces op cuda-based helper. Parallelized cpu-based helper for edge_forces.

* Added kernel for tsne/symmetrized op heleper.

* Implementation of tsne/symmetrized op cuda helper. Working edition.

* Eliminated waste printfs.

* Added test for broadcastgradientargs op.

* host-only fallback for empty reduce float

Signed-off-by: raver119 <raver119@gmail.com>

* - some tests fixes

Signed-off-by: Yurii <yurii@skymind.io>

* - correct the rest of reduce_ stuff

Signed-off-by: Yurii <yurii@skymind.io>

* - further correction of reduce_ stuff

Signed-off-by: Yurii <yurii@skymind.io>

* Added test for Cbow op. Also added cuda implementation for cbow helpers.

* - improve code of stack operation for scalar case

Signed-off-by: Yurii <yurii@skymind.io>

* - provide cuda kernel for gatherND operation

Signed-off-by: Yurii <yurii@skymind.io>

* Implementation of cbow helpers with cuda kernels.

* minor tests tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* minor tests tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* - further correction of cuda stuff

Signed-off-by: Yurii <yurii@skymind.io>

* Implementatation of cbow op helper with cuda kernels. Working edition.

* Skip random testing for cudablas case.

* lstmBlockCell context fix

Signed-off-by: raver119 <raver119@gmail.com>

* Added tests for ELU and ELU_BP ops.

* Added tests for eq_scalar, gt_scalar, gte_scalar and lte_scalar ops.

* Added tests for neq_scalar.

* Added test for noop.

* - further work on clipbynorm_bp

Signed-off-by: Yurii <yurii@skymind.io>

* - get rid of concat op call, use instead direct concat helper call

Signed-off-by: Yurii <yurii@skymind.io>

* lstmBlockCell context fix

Signed-off-by: raver119 <raver119@gmail.com>

* Added tests for lrelu and lrelu_bp.

* Added tests for selu and selu_bp.

* Fixed lrelu derivative helpers.

* - some corrections in lstm

Signed-off-by: Yurii <yurii@skymind.io>

* operator * result shape fix

Signed-off-by: raver119 <raver119@gmail.com>

* - correct typo in lstmCell

Signed-off-by: Yurii <yurii@skymind.io>

* few tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* CUDA inverse broadcast bool fix

Signed-off-by: raver119 <raver119@gmail.com>

* disable MMAP test for CUDA

Signed-off-by: raver119 <raver119@gmail.com>

* BooleanOp syncToDevice

Signed-off-by: raver119 <raver119@gmail.com>

* meh

Signed-off-by: raver119 <raver119@gmail.com>

* additional data types for im2col/col2im

Signed-off-by: raver119 <raver119@gmail.com>

* Added test for firas_sparse op.

* one more RandomBuffer test excluded

Signed-off-by: raver119 <raver119@gmail.com>

* Added tests for flatten op.

* Added test for Floor op.

* bunch of tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* mmulDot tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* more tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* Implemented floordiv_bp op and tests.

* Fixed scalar case with cuda implementation for bds.

* - work on cuda kernel for clip_by_norm backprop op is completed

Signed-off-by: Yurii <yurii@skymind.io>

* Eliminate cbow crach.

* more tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* more tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* Eliminated abortion with batched nlp test.

* more tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* Fixed shared flag initializing.

* disabled bunch of cpu workspaces tests

Signed-off-by: raver119 <raver119@gmail.com>

* scalar operators fix: missing registerSpecialUse call

Signed-off-by: raver119 <raver119@gmail.com>

* Fixed logdet for cuda and tests.

* - correct clipBynorm_bp

Signed-off-by: Yurii <yurii@skymind.io>

* Fixed crop_and_resize shape datatype.

* - correct some mmul tests

Signed-off-by: Yurii <yurii@skymind.io>
2019-08-05 11:27:05 +10:00
Alex Black e18e2dc014 Various ND4J/DL4J fixes (#90)
* Deprecate Old*Op instances

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8063 #8054 Broadcast exceptions + cleanup inplace ops

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Small fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Remove bad test condition

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #7993 Fix shape function issue in crop_and_resize op

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* DL4J SameDiff lambda layer fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8029 Fix for pnorm backprop math

Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-08-05 11:25:48 +10:00
Ryan Nett d4e7997134 SameDiff Convolution Config validation, better output methods (#82)
* Conv Config validation & tests

Signed-off-by: Ryan Nett <rnett@skymind.io>

* stackOutputs utility method

Signed-off-by: Ryan Nett <rnett@skymind.io>

* use constructor for validation, support negative kernel sizes (infered from weights)

Signed-off-by: Ryan Nett <rnett@skymind.io>

* better output methods

Signed-off-by: Ryan Nett <rnett@skymind.io>

* move output to be with fit and evaluate

Signed-off-by: Ryan Nett <rnett@skymind.io>

* fixes

Signed-off-by: Ryan Nett <rnett@skymind.io>

* more fixes

Signed-off-by: Ryan Nett <rnett@skymind.io>
2019-08-05 11:24:20 +10:00
Samuel Audet dcc72e23b2 Refactor NativeOps.h to export C functions 2019-08-05 11:22:59 +10:00
raver119 7c5c84bea8 4 additional tests
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-05 11:21:23 +10:00
raver119 ce0743da17 strided_slice_bp shape fn leak fix
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-05 11:13:42 +10:00
raver119 763a225c6a [WIP] More of CUDA operations (#69)
* initial commit

Signed-off-by: raver119 <raver119@gmail.com>

* - gruCell_bp further

Signed-off-by: Yurii <yurii@skymind.io>

* - further work on gruCell_bp

Signed-off-by: Yurii <yurii@skymind.io>

* Inverse matrix cublas implementation. Partial working revision.

* Separation of segment ops helpers. Max separation.

* Separated segment_min ops.

* Separation of segment_mean/sum/prod/sqrtN ops heleprs.

* Fixed diagonal processing with LUP decomposition.

* Modified inversion approach using current state of LU decomposition.

* Implementation of matrix_inverse op with cuda kernels. Working revision.

* Implemented sequence_mask cuda helper. Eliminated waste printf with matrix_inverse implementation. Added proper tests.

* - further work on gruCell_bp (ff/cuda)

Signed-off-by: Yurii <yurii@skymind.io>

* comment one test for gruCell_bp

Signed-off-by: Yurii <yurii@skymind.io>

* - provide cuda static_rnn

Signed-off-by: Yurii <yurii@skymind.io>

* Refactored random_shuffle op to use new random generator.

* Refactored random_shuffle op helper.

* Fixed debug tests with random ops tests.

* Implement random_shuffle op cuda kernel helper and tests.

* - provide cuda scatter_update

Signed-off-by: Yurii <yurii@skymind.io>

* Implementation of random_shuffle for linear case with cuda kernels and tests.

* Implemented random_shuffle with cuda kernels. Final revision.

* - finally gruCell_bp is completed

Signed-off-by: Yurii <yurii@skymind.io>

* Dropout op cuda helper implementation.

* Implemented dropout_bp cuda helper.

* Implemented alpha_dropout_bp with cuda kernel helpers.

* Refactored helper.

* Implementation of suppresion helper with cuda kernels.

* - provide cpu code fot hsvToRgb, rgbToHsv, adjustHue

Signed-off-by: Yurii <yurii@skymind.io>

* Using sort by value method.

* Implementation of image.non_max_suppression op cuda-based helper.

* - correcting and testing adjust_hue, adjust_saturation cpu/cuda code

Signed-off-by: Yurii <yurii@skymind.io>

* Added cuda device prefixes to declarations.

* Implementation of hashcode op with cuda helper. Initital revision.

* rnn cu impl removed

Signed-off-by: raver119 <raver119@gmail.com>
2019-07-20 23:20:41 +10:00
Alex Black d94bc7257c Various fixes (#65)
* #7977 deprecate legacy MultiLayerNetwork/ComputationGraph.params(boolean) method

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fix bad test

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Small fixes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fix Histogram mapping

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fix incorrect name handling in DifferentialFunction

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* More fixes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Histogram fixes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Proper histogram fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* ToString/NDArrayStrings fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* JSON UTF8 serialization fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-07-20 23:16:41 +10:00
raver119 91a8fb0d90 [WIP] More of CUDA (#63)
* less spam

Signed-off-by: raver119 <raver119@gmail.com>

* flatten kernel

Signed-off-by: raver119 <raver119@gmail.com>

* adjust_hue/adjust_saturation tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* adjust_hue cuda single

Signed-off-by: raver119 <raver119@gmail.com>

* adjust_hue cuda batch

Signed-off-by: raver119 <raver119@gmail.com>

* adjust_saturation cuda

Signed-off-by: raver119 <raver119@gmail.com>
2019-07-20 23:15:14 +10:00
raver119 fd6c0df024 [WIP] More CUDA fixes/updates (#62)
* CUDA reallocation update

Signed-off-by: raver119 <raver119@gmail.com>

* Legacy SoftMax/LogSoftMax/SoftMaxDerivative removed from cpp

Signed-off-by: raver119 <raver119@gmail.com>

* SoftMaxDerivative op removed

Signed-off-by: raver119 <raver119@gmail.com>

* few tests updates

Signed-off-by: raver119 <raver119@gmail.com>

* RNG fixes

Signed-off-by: raver119 <raver119@gmail.com>

* few more tests updates

Signed-off-by: raver119 <raver119@gmail.com>

* legacy Histogram/Pooling2D removed

Signed-off-by: raver119 <raver119@gmail.com>

* legacy Histogram removed

Signed-off-by: raver119 <raver119@gmail.com>

* histogram moved

Signed-off-by: raver119 <raver119@gmail.com>

* histogram moved cuda

Signed-off-by: raver119 <raver119@gmail.com>

* Histogram custom op

Signed-off-by: raver119 <raver119@gmail.com>
2019-07-20 23:07:42 +10:00
raver119 9cf28ea6c9 [WIP] CUDA tweaks (#60)
* special cpu concat

Signed-off-by: raver119 <raver119@gmail.com>

* special concat fix

Signed-off-by: raver119 <raver119@gmail.com>

* OpProfiler tweak for absent host pointers

Signed-off-by: raver119 <raver119@gmail.com>

* minor test tweak to see orders

Signed-off-by: raver119 <raver119@gmail.com>

* CUDA broadcasting diff orders fix

Signed-off-by: raver119 <raver119@gmail.com>

* faster iterations

Signed-off-by: raver119 <raver119@gmail.com>

* OldSoftMax/OldLogSoftMax gone

Signed-off-by: raver119 <raver119@gmail.com>

* RandomLauncher tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* additional check int randomtests

Signed-off-by: raver119 <raver119@gmail.com>

* skip prepare/register action for empty arrays

Signed-off-by: raver119 <raver119@gmail.com>

* npz float16 fix

Signed-off-by: raver119 <raver119@gmail.com>

* empty reduction cuda fixes

Signed-off-by: raver119 <raver119@gmail.com>

* ShapeBufferTests tweaks

Signed-off-by: raver119 <raver119@gmail.com>
2019-07-20 23:06:48 +10:00
raver119 c969b724bb [WIP] more CUDA stuff (#57)
* initial commit

Signed-off-by: raver119 <raver119@gmail.com>

* Added gradcheck test for dynamic_partition_bp op.

* - implementation of dilation op (cpu and cuda)

Signed-off-by: Yurii <yurii@skymind.io>

* Fixed broadcast_dynamic_shape 1D case and tests.

* Fixed usage of default integer arguments.

* Fixed dynamic_partition_bp op and tests.

* Eliminated test with grad check for dynamic_partition_bp op.

* start working on cuda svd - porting available corresponding api from cuSOLVER library

Signed-off-by: Yurii <yurii@skymind.io>

* provide prelu_bp

Signed-off-by: Yurii <yurii@skymind.io>

* - provide gruCell_bp (old version ??)

Signed-off-by: Yurii <yurii@skymind.io>

* - polishing cumsum_bp and cumprod_bp tests

Signed-off-by: Yurii <yurii@skymind.io>

* provide sparseSoftmaxCrossEntropyWithLogits and sparseSoftmaxCrossEntropyWithLogits_grad

Signed-off-by: Yurii <yurii@skymind.io>

* Fixed atomicMul with float input/output

* implementation of cuda kernel for triu_bp operation

Signed-off-by: Yurii <yurii@skymind.io>

* Refactored lup helper to add parrallel computing.

* cusolver libraries

Signed-off-by: raver119 <raver119@gmail.com>

* uncomment cuSolver APIs in svd.cu

Signed-off-by: Yurii <yurii@skymind.io>

* cusolver var

Signed-off-by: raver119 <raver119@gmail.com>

* - further work on cuSolver svd

Signed-off-by: Yurii <yurii@skymind.io>

* Implement usage of cuda solver to LUP decomposition.

* - correct naames in lup functions

Signed-off-by: Yurii <yurii@skymind.io>

* correct svdQR cuda

Signed-off-by: Yurii <yurii@skymind.io>

* - provide transpositions of input matrices in case of c order in svdCudaQR

Signed-off-by: Yurii <yurii@skymind.io>

* Fixed implementation issues with LUP usign cuda solver.

* Implementation of matrix_determinant helper with cuda kernels. Working revision.

* Implemented log_matrix_determinant helper with cuda kernels.

* - implementation of batched cuda svd

Signed-off-by: Yurii <yurii@skymind.io>

* Refactored cholesky helper and implementation of cuda solver cholesky batch.

* - implementation of cuda kernel for tile bp

Signed-off-by: Yurii <yurii@skymind.io>

* Implementation of cholesky and logdet with cuda kernels.

* - implementation of cuda kernel for sru_bidirectional

Signed-off-by: Yurii <yurii@skymind.io>

* Fixed cholesky helper.

* Cholesky op helper implementation. Working double-based cublas implementation.

* bad import excluded

Signed-off-by: raver119 <raver119@gmail.com>

* Finished with cuda implementation of cholesky helper and tests.

* - implementation of cuda kernel for sru_bidirectional_backprop operation

Signed-off-by: Yurii <yurii@skymind.io>

* Implementation of matrix_inverse op helper with cuda kernels. The first revision.

* - start working on gruCell_bp

Signed-off-by: Yurii <yurii@skymind.io>

* Implementation of matrix_inverse helper.

* - further work on new gruCell_bp

Signed-off-by: Yurii <yurii@skymind.io>

* cuBLAS related fixes

Signed-off-by: raver119 <raver119@gmail.com>

* calculateOutputShapes() now passes device buffers as well

Signed-off-by: raver119 <raver119@gmail.com>

* special concat/average/accumulate init host pointers now

Signed-off-by: raver119 <raver119@gmail.com>

* few more tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* additional CudaDataBufferFactory signatures certain for data types

Signed-off-by: raver119 <raver119@gmail.com>

* cuSolver host buffer

Signed-off-by: raver119 <raver119@gmail.com>

* buffer to buffer memcpy host ptr allocation

Signed-off-by: raver119 <raver119@gmail.com>
2019-07-20 23:05:21 +10:00
Alex Black cb6654bebb Add libnd4j benchmarks (#3)
This PR adds 2 libnd4j benchmarking suits
2019-07-20 22:54:44 +10:00
raver119 5708fc087a [WIP] INDArray hashCode() impl (#50)
* initial commit

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* one more initial commit

Signed-off-by: raver119 <raver119@gmail.com>

* parallel hashCode prototype

Signed-off-by: raver119 <raver119@gmail.com>

* longBytes for hashCode

Signed-off-by: raver119 <raver119@gmail.com>

* INDArray hashCode java side

Signed-off-by: raver119 <raver119@gmail.com>

* few tests fixed for MSVC

Signed-off-by: raver119 <raver119@gmail.com>

* Small gradcheck validation util fix - hash names not SDVariables

Signed-off-by: Alex Black <blacka101@gmail.com>

* Small fix + ignore for logged issue

Signed-off-by: Alex Black <blacka101@gmail.com>

* - scrollable iterator fix
- sptree hashset replaced with collection

Signed-off-by: raver119 <raver119@gmail.com>

* hashcode exception removed

* int hashCode for java
2019-07-20 22:22:11 +10:00
Alex Black cc65c01118 Op validation improvements + encoding fix (#49)
* Op validation updates

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Small logging fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #7986 Fix minor character encoding issues

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Small ignore fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-07-20 22:21:46 +10:00
raver119 dde50ee570 Small fixes (#43)
* minor cmake changes to make macos happy

* space_to_batch/batch_to_space validation fix

* - choose op tweaks
- tests updated to match appleid tweaks

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* - get rid of bad import

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* - choose now uses shape function
- choose test updated
2019-07-20 22:19:20 +10:00
Alex Black 1170827c18 Merge master to upstream (#7945)
* Shugeo strided slice zeros (#14)

* Modified strided_slice op to properly work with empty-like shapes.

* Fixed test for reduce_mean with empty-like input.

* [WIP] Last merge (#15)

* correct logsoftmax looss (#2)

* Small SameDiff listener fix (#4)

* Various fixes (#6)

* #7839 Fix for asXMatrix and tests

* #7866 EmbeddingSequenceLayer dtype fix + test

* #7856 SameDiff save/load stream methods

* #7859 RegressionEvaluation rank 4 fix + tests + axis configuration

* EvaluationBinary 3d/4d

* More evaluation 3d/4d tests

* #7847 Evaluation empty checks

* Small test ifx

* #7848 Fix median edge case

* Improve DL4J samediff layer tests

* [WIP] FastText wrapper implemented (#8)

* FastText implemented

* Some fixes

* Fix shapes for wordsNearest

* Validation of input vectors

* Fixes

* Fixed test

* Thread tagged

* Some tweaks

* setContextClassLoader for DeallocatorServiceThread

* Numpy format tests (#1)

* Various fixes (#11)

* #7852 SameDiff gather fix

* #7892 SameDiff placeholder to constant conversion

* #7890 validate input rank for MLN/CG init methods

* Fix broken permute shape calculation

* Permute and gather fixes

* Tests

* #7850 LogSumExp fix + test

* Handful of test fixes

* Empty arrays with non-scalar shapes (#10)

* minor rearrangements for lambdas

* empty tensors with non-scalar shapes

* numpy empty tensors with non-scalar shapes

* few more empty tweaks

* Small fixes

* conv3d signature update

* micro fix in batchnorm mkldnn

* Import fixes

* Fix

* MKL-DNN update

* Small fill fix

* fill with empty input + test

* Fixes

* Small error improvement

* Fix

* one special test

* couple of fixes for lstm

* Rewrite TFGraphMapper.getNDArrayFromTensor to be maintainable and less error prone

* Fixes

* FP16

* Unsigned

* BFloat16

* Fill op - empty tweaks

* - couple of fixes for empty arrays construction
- stack updated

* strided slice fix

* one transform test

* provide method for reducing shapeInfo in case of input array is empty

* Fixed reduceAlongDimensions to use empty input properly.

* couple of broadcast tests

* couple of tests broadcast tests + tweak to make them pass

* add check of non-empty to methods producing sub-arrays

* Fixed reshapeC with zeros in shape.

* complete empty check in reduce_... legacy ops

* Concat and cumsum/prod

* Tweak to empty shape inference on import

* add empty check to the rest of reduce legacy ops

* one more test

* correct typo in evalReduceShapeInfoEmpty

* Added tests for reduce_* ops to tests with zero shapes.

* few more tests for empty reductions

* Fixed strided_slice op with empty case and tests.

* one more empty reduction test

* Fixed strided_slice test.

* add empty check to NDArray::reshapei

* infOrMax

* empty min/max with infinity tests

* made unstack working correctly with empty arrays

* few IndexReduce tests + tweaks for empty shapes

* add test for empty concat

* few tests fixed

* Validation fix for reductions on empty shapes

* Reverse fix

* Reduction shape calc fixes

* SameDiff.generateOutputVariable: don't use shape function to determine number of outputs

* Range fix

* - NDArray constructor updated for scalars/empty arrays
- few tests fixed

* More fixes

* Empty creator fixes

* concat fix

* concat fix

* TF import tests: allow 'both all NaN' and 'both all inf' to pass

* Slice, zero fraction, and reshape fixes

* transpose, gather

* Zero fraction

* scalar cast fix

* Empty reduction axis support

* few more tests fixed

* Fixed input checks conforming with TF for concat op and tests.

* few tests fixed

* matmul scalar shape fix

* Fixed checkout for data type and scalarity with concat to allow non-empty scalars with vector concats.

* broadcast bool fix

* few more tests

* few more tests

* correct evalReduceShapeInfoEmpty

* argmax/argmin + tests

* one more empty edge case + one more test

* argmax/argmin/realdiv_bp tweaks

* empty reshape test + fix

* Helper fixes

* Small fixes

* Gather test fix

* Gather test fix

* Small fixes

* reduce scalar zero values

* scalar mean workaround

* Remove debug code

* along dim mean workaround

* one more test

* - equalsTo() tweak for empty arrays
- one more test

* broadcast tweaks

* [WIP] Fixing outstanding issues for NLP (#9)

* Avoid using not-inited objects

* Test fixed.

* Redundant method avoided for models like FastText

* KMeans++ implementation

* KMeans++ implementation

* Disable parallel execution

* KMeans++

* Tests

* Dev branch merge (#16)

* SameDiff: convertDataType and gradient check util improvements (#12)

* GradCheck util improvements

* StopGradient constructor + test

* SameDiff: Add datatype conversion

* Javadoc and add DataType.isNumerical()

* Small fix

* Fix SameDiff TF import test cases intermediate naming (workaround for bad default)

* TFGraphTestAllHelper: check intermediates in execution order

* Add missing debug listener

* [WIP] lstmBlock fix + other changes (#13)

- fixes lstmBlock issue
- changes NDArray method reshape(), permute(), transpose() by making them return instance instead of pointer
- CheckNumerics op
- fixes for ReduceBool IsInfOrNan & IsFinite

* Small test fix

* CheckNumerics op wrapper

* Fix some issues on master (#17)

* Fix DataVec test issue

* Fix issue with dl4j SameDiff output layer

* Dtype fix for lambda layers

* #7912 BertIterator dtype fix (use float32 not global default)

* [WIP] Next set of CUDA stuff (#7)

New CUDA implementations and improvements

* bad file

* Dev branch master merge (#23)

* SameDiff: convertDataType and gradient check util improvements (#12)

* GradCheck util improvements

* StopGradient constructor + test

* SameDiff: Add datatype conversion

* Javadoc and add DataType.isNumerical()

* Small fix

* Fix SameDiff TF import test cases intermediate naming (workaround for bad default)

* TFGraphTestAllHelper: check intermediates in execution order

* Add missing debug listener

* [WIP] lstmBlock fix + other changes (#13)

- fixes lstmBlock issue
- changes NDArray method reshape(), permute(), transpose() by making them return instance instead of pointer
- CheckNumerics op
- fixes for ReduceBool IsInfOrNan & IsFinite

* Small test fix

* CheckNumerics op wrapper

* Compatibility of deserialization (#18)

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* SameDiff: add activation gradient checking support for debugging (#19)

* SameDiff gradient checker: first pass on activation gradient checks

* Fixes + tests for activation gradient checking

* Javadoc

* [WIP] Some nd4j data type corrections (#20)

* Adjust data type

* Set correct Data type.

* Size of proper data type.

* fix averaged cpu load (#22)

* SameDiff ops, TF import and fixes (#24)

* CheckNumerics tests + fixes + misc fixes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fake quant

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fixes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* FakeQuantWithMinMaxArgs

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* CheckNumerics fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fix libnd4j ALL_INTS and ALL_FLOATS declaration (uint and bfloat types)

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Small fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Javadoc

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Exception tweak

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fix for out of scope stack allocated var use

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Ignores

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Ignore for known failing test (already logged issue)

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Merge upstream to fork (#25)

* Add thousand-separator commas to TotalParams (#7915)

* Add thousand-separator commas to TotalParams

The number of parameters can be quite large, and it would help the reading of the summary printout to have the TotalParams column & values at the bottom have thousand-separator-commas in them.

* Add thousand-separator commas to MultiLayerNetwork

Corresponding change to MultiLayerNetwork

Signed-off-by: Jxtps Jxtps <jxtps435@gmail.com>

* Update contributing and issue/PR templates (#7934)

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fix link to AdaDelta paper (#7942)

Fix link to AdaDelta paper hosted on matthewzeiler.com

Signed-off-by: Jxtps

* Fixes, and ignores for known/logged failing issues (#7943)

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* SameDiff + DL4J/SameDiff: Multiple fixes (#28)

* #7919 HDF5 attribute buffer length fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #7909 Arbiter constructor exception ux improvements

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #7925 RNN output layer length checks

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #7939 Add listener for validating inputs are not incorrectly modified

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #7939 Integrate NonInplaceValidationListener into tests

* #7844 DL4J SameDiff fixes for variable minibatch size

* DL4J SameDiff fixes - ensure gradient for input placeholder is available

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Tweaks to ExternalErrorsFunction - use placeholders, make more robust

* Another fix

* More fixes

* More SameDiff/DL4J fixes

* Scope out scalar array creation in BaseScalarOp

* Remove debug code

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* [WIP] Final dev branch merge (#29)

* SameDiff: convertDataType and gradient check util improvements (#12)

* GradCheck util improvements

* StopGradient constructor + test

* SameDiff: Add datatype conversion

* Javadoc and add DataType.isNumerical()

* Small fix

* Fix SameDiff TF import test cases intermediate naming (workaround for bad default)

* TFGraphTestAllHelper: check intermediates in execution order

* Add missing debug listener

* [WIP] lstmBlock fix + other changes (#13)

- fixes lstmBlock issue
- changes NDArray method reshape(), permute(), transpose() by making them return instance instead of pointer
- CheckNumerics op
- fixes for ReduceBool IsInfOrNan & IsFinite

* Small test fix

* CheckNumerics op wrapper

* Compatibility of deserialization (#18)

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* SameDiff: add activation gradient checking support for debugging (#19)

* SameDiff gradient checker: first pass on activation gradient checks

* Fixes + tests for activation gradient checking

* Javadoc

* [WIP] Some nd4j data type corrections (#20)

* Adjust data type

* Set correct Data type.

* Size of proper data type.

* fix averaged cpu load (#22)

* [WIP] Multiple dataset iterators (#27)

* Splitting dataset into arbitrary number

* Fixes

* Multiple split of iterator

* Test

* Test

* Some fixes

* signature change

* one more tweak

Signed-off-by: raver119 <raver119@gmail.com>

* one more test for sequential use of DataSetIteratorSplitter

Signed-off-by: raver119 <raver119@gmail.com>

* Fixes

* Fixes

* one more test for Alexander

Signed-off-by: raver119 <raver119@gmail.com>

* Some fixes

* Some fixes

* one more test for Alexander

Signed-off-by: raver119 <raver119@gmail.com>

* minor test fix

Signed-off-by: raver119 <raver119@gmail.com>

* Some fixes

* Some fixes

* couple of assertions tweaked

Signed-off-by: raver119 <raver119@gmail.com>

* MDS splitter test :/

Signed-off-by: raver119 <raver119@gmail.com>

* Minor refactoring

* Multi dataset

* Some fixes

* More tests

* Small number of test fixes/improvements (failures on CI) (#31)

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* [WIP] More CUDA stuff (#26)

* initial commit

Signed-off-by: raver119 <raver119@gmail.com>

* LRN BP CUDA

Signed-off-by: raver119 <raver119@gmail.com>

* less memory

Signed-off-by: raver119 <raver119@gmail.com>

* Fixed bug with crop_and_resize op helper.

* get rid of unnecessary index-calculation dunction

Signed-off-by: Yurii <yurii@skymind.io>

* Fixed sort with nth_element cuda-based helper.

* Refactored nth_element.

* Refactored nth_element op and tests.

* Modified usage of dim array with sortTad routine.

* Refactored main routine of helper for non_max_image_suppression op.

* non_max_image_suppression op helper with cuda kernel implementation. Initial revision.

* fix vol2col cuda kernel

* meh

Signed-off-by: raver119 <raver119@gmail.com>

* topK concept

Signed-off-by: raver119 <raver119@gmail.com>

* unsorted topK with scanWitdh of 1

Signed-off-by: raver119 <raver119@gmail.com>

* correct vol2col tests

* sorted/unsorted topK

Signed-off-by: raver119 <raver119@gmail.com>

* implementation and fixing col2im/col2vol

* Corrected usage flags with input/output with reverse op.

* dup is const now

Signed-off-by: raver119 <raver119@gmail.com>

* percentile op

Signed-off-by: raver119 <raver119@gmail.com>

* group tests for mapool2d

Signed-off-by: Yurii <yurii@skymind.io>

* special test for george

Signed-off-by: raver119 <raver119@gmail.com>

* less threads for sortTad

Signed-off-by: raver119 <raver119@gmail.com>

* provide conv2d for cuda

Signed-off-by: Yurii <yurii@skymind.io>

* remove auther in sort tad kernel code

Signed-off-by: Yurii <yurii@skymind.io>

* provide depthwise_conv2d for cuda

Signed-off-by: Yurii <yurii@skymind.io>

* - max_pooling_with_argmax
- null check for special use

Signed-off-by: raver119 <raver119@gmail.com>

* dts cuda

Signed-off-by: raver119 <raver119@gmail.com>

* provide sconv2d for cuda

Signed-off-by: Yurii <yurii@skymind.io>

* std cuda

Signed-off-by: raver119 <raver119@gmail.com>

* Refactored non_max_suppression op to conform TF implementation.

* Improved suppression helper.

* provide pooling3d for cuda

Signed-off-by: Yurii <yurii@skymind.io>

* minor lstm rearrangements

Signed-off-by: raver119 <raver119@gmail.com>

* more of minor lstm rearrangements

Signed-off-by: raver119 <raver119@gmail.com>

* (bi)dynamic_rnn

Signed-off-by: raver119 <raver119@gmail.com>

* templates init order

Signed-off-by: raver119 <raver119@gmail.com>

* Refactored non_max_suppression op.

* Added cuda kernel for non_max_suppression.

* CPU sort by key/value

Signed-off-by: raver119 <raver119@gmail.com>

* CPU sort TAD by key/value

Signed-off-by: raver119 <raver119@gmail.com>

* CPU sort TAD by key/value tests

Signed-off-by: raver119 <raver119@gmail.com>

* Eliminate compiler error with cuda implementation.

* - repaired gradCheck in cuda
- provide conv2d_bp for cuda

Signed-off-by: Yurii <yurii@skymind.io>

* missed signature

Signed-off-by: raver119 <raver119@gmail.com>

* provide depthwise_conv2d_bp for cuda

Signed-off-by: Yurii <yurii@skymind.io>

* Implementation of lup helper with cuda kernel. Initial commit.

* further work on backprops for convolutions

Signed-off-by: Yurii <yurii@skymind.io>

* CUDA linear sort by key/val

Signed-off-by: raver119 <raver119@gmail.com>

* CUDA tad sort by key/val

Signed-off-by: raver119 <raver119@gmail.com>

* start providing of backprop for pooling2d/3d

Signed-off-by: Yurii <yurii@skymind.io>

* Added atomicAdd for bool datatype.

* dynamic partition concept

Signed-off-by: raver119 <raver119@gmail.com>

* dynamic partition concept

Signed-off-by: raver119 <raver119@gmail.com>

* dynamic partition scalar CUDA

Signed-off-by: raver119 <raver119@gmail.com>

* important comment

Signed-off-by: raver119 <raver119@gmail.com>

* fix pooling2d/3d backprop helpers

Signed-off-by: Yurii <yurii@skymind.io>

* Added non-linear test with dynamic_partition.

* Improved test for dynamic_partition.

* dynamic_partition TAD concept

Signed-off-by: raver119 <raver119@gmail.com>

* - dynamic_partition TAD CUDA impl
- dynamic_partition TAD CPU fix

Signed-off-by: raver119 <raver119@gmail.com>

* - rewrite cpu code for usampling2d/3d
- write cuda code for usampling2d/3d

Signed-off-by: Yurii <yurii@skymind.io>

* dynamic_stitch CUDA vector case

Signed-off-by: raver119 <raver119@gmail.com>

* dynamic_stitch CUDA TAD case concept

Signed-off-by: raver119 <raver119@gmail.com>

* dynamic_stitch CUDA TAD case impl

Signed-off-by: raver119 <raver119@gmail.com>

* Added tests for dynamic_stitch 3D-4D cases.

* minor tests tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* Fixed type check for dynamic stitch.

* min/max bp

Signed-off-by: raver119 <raver119@gmail.com>

* rewrite code for upsampling2d/3d cpu

Signed-off-by: Yurii <yurii@skymind.io>

* reduce min/max/norm_max bp

Signed-off-by: raver119 <raver119@gmail.com>

* lup implementation. Additional enhancements.

* provide code for upsamling2d/3d backprop

Signed-off-by: Yurii <yurii@skymind.io>

* weightedCrossEntropyWithLogits

Signed-off-by: raver119 <raver119@gmail.com>

* Fixed template math atomicMul for 64bit ints.

* Refactored dynamic_partition_bp op.

* inverseBroadcast fix

Signed-off-by: raver119 <raver119@gmail.com>

* DynamicPartitionBP test datatype fixed.

* - nd4j_atomicMul Windows fix
- cpu/NDArrayLambda.hpp excluded from CUDA

Signed-off-by: raver119 <raver119@gmail.com>
2019-06-27 18:37:04 +03:00
Alex Black 68ea5f3688
Dev branch merge: dev_20190606 (#7904)
* correct logsoftmax looss (#2)

* Small SameDiff listener fix (#4)

* Various fixes (#6)

* #7839 Fix for asXMatrix and tests

* #7866 EmbeddingSequenceLayer dtype fix + test

* #7856 SameDiff save/load stream methods

* #7859 RegressionEvaluation rank 4 fix + tests + axis configuration

* EvaluationBinary 3d/4d

* More evaluation 3d/4d tests

* #7847 Evaluation empty checks

* Small test ifx

* #7848 Fix median edge case

* Improve DL4J samediff layer tests

* [WIP] FastText wrapper implemented (#8)

* FastText implemented

* Some fixes

* Fix shapes for wordsNearest

* Validation of input vectors

* Fixes

* Fixed test

* Thread tagged

* Some tweaks

* setContextClassLoader for DeallocatorServiceThread

* Numpy format tests (#1)

* Various fixes (#11)

* #7852 SameDiff gather fix

* #7892 SameDiff placeholder to constant conversion

* #7890 validate input rank for MLN/CG init methods

* Fix broken permute shape calculation

* Permute and gather fixes

* Tests

* #7850 LogSumExp fix + test

* Handful of test fixes

* Empty arrays with non-scalar shapes (#10)

* minor rearrangements for lambdas

* empty tensors with non-scalar shapes

* numpy empty tensors with non-scalar shapes

* few more empty tweaks

* Small fixes

* conv3d signature update

* micro fix in batchnorm mkldnn

* Import fixes

* Fix

* MKL-DNN update

* Small fill fix

* fill with empty input + test

* Fixes

* Small error improvement

* Fix

* one special test

* couple of fixes for lstm

* Rewrite TFGraphMapper.getNDArrayFromTensor to be maintainable and less error prone

* Fixes

* FP16

* Unsigned

* BFloat16

* Fill op - empty tweaks

* - couple of fixes for empty arrays construction
- stack updated

* strided slice fix

* one transform test

* provide method for reducing shapeInfo in case of input array is empty

* Fixed reduceAlongDimensions to use empty input properly.

* couple of broadcast tests

* couple of tests broadcast tests + tweak to make them pass

* add check of non-empty to methods producing sub-arrays

* Fixed reshapeC with zeros in shape.

* complete empty check in reduce_... legacy ops

* Concat and cumsum/prod

* Tweak to empty shape inference on import

* add empty check to the rest of reduce legacy ops

* one more test

* correct typo in evalReduceShapeInfoEmpty

* Added tests for reduce_* ops to tests with zero shapes.

* few more tests for empty reductions

* Fixed strided_slice op with empty case and tests.

* one more empty reduction test

* Fixed strided_slice test.

* add empty check to NDArray::reshapei

* infOrMax

* empty min/max with infinity tests

* made unstack working correctly with empty arrays

* few IndexReduce tests + tweaks for empty shapes

* add test for empty concat

* few tests fixed

* Validation fix for reductions on empty shapes

* Reverse fix

* Reduction shape calc fixes

* SameDiff.generateOutputVariable: don't use shape function to determine number of outputs

* Range fix

* - NDArray constructor updated for scalars/empty arrays
- few tests fixed

* More fixes

* Empty creator fixes

* concat fix

* concat fix

* TF import tests: allow 'both all NaN' and 'both all inf' to pass

* Slice, zero fraction, and reshape fixes

* transpose, gather

* Zero fraction

* scalar cast fix

* Empty reduction axis support

* few more tests fixed

* Fixed input checks conforming with TF for concat op and tests.

* few tests fixed

* matmul scalar shape fix

* Fixed checkout for data type and scalarity with concat to allow non-empty scalars with vector concats.

* broadcast bool fix

* few more tests

* few more tests

* correct evalReduceShapeInfoEmpty

* argmax/argmin + tests

* one more empty edge case + one more test

* argmax/argmin/realdiv_bp tweaks

* empty reshape test + fix

* Helper fixes

* Small fixes

* Gather test fix

* Gather test fix

* Small fixes

* reduce scalar zero values

* scalar mean workaround

* Remove debug code

* along dim mean workaround

* one more test

* - equalsTo() tweak for empty arrays
- one more test

* broadcast tweaks
2019-06-15 21:34:34 +10:00
skymindops b5f0ec072f Eclipse Migration Initial Commit 2019-06-06 15:21:15 +03:00