* libnd4j cast loop types
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j more type castination added to loops
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j sync casting types of iterated variable in loops
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j more loops reviewed for vectorization problem fix
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j fixed several typos
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j several more files reviewed to fix auto-vectorization problem in loops
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j merge master and reviewed more files to fix auto-vectorization problem in loops
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j several type casting added in broadcasting that were missed, fixed mac builds
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j double check all files and fix several more places in loops
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j fixed builds
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j revert changes for lup.cpp
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j moved split operation implementation to helpers before special case adding
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j minor fixes for general split operation move, merge master
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libndj4 split cpu implementation
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* - provide cuda helper for split op
Signed-off-by: Yurii <iuriish@yahoo.com>
* - minor correction
Signed-off-by: Yurii <iuriish@yahoo.com>
* - minor correction 2
Signed-off-by: Yurii <iuriish@yahoo.com>
Co-authored-by: Yurii Shyrma <iuriish@yahoo.com>
Co-authored-by: raver119 <raver119@gmail.com>
* libnd4j raw implementation of native broadcast for special cases
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j fixed bugs for special case of 4D loop broadcast, add some tests, need more testing and discussion
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j added 3D and 5D cases support and tests, need testing with different orders
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j correctd case selection for broadcast 3,4,5D loops, fixed several places for more stable behavior, clean up
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j minor corrections to avoid some risks in strides selection, added tests and rename some variables
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j optimize usage the stride selection for all loops in separate ShapeUtils method copyCertainStridesFromShapeInfo, merge master
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j remove per request several tests for 3D, 4D and 5D broadcast loops
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j removed some loac changes that had not been sync with serve playground, turn on new loops usage
* - profiling of concat op (both cuda and cpu)
Signed-off-by: Yurii <iuriish@yahoo.com>
* better comparison for large concat
Signed-off-by: raver119 <raver119@gmail.com>
* - further improving of concat op
Signed-off-by: Yurii <iuriish@yahoo.com>
* some loggin
Signed-off-by: raver119 <raver119@gmail.com>
* - add possibility to verify presence of trailing unities in shape and set strides/ews correspondingly
- restrict second simple case in concat op to c order only
Signed-off-by: Yurii <iuriish@yahoo.com>
* - move concat op to specials_single.cpp file
Signed-off-by: Yurii <iuriish@yahoo.com>
* - get rid of second concat op declaration in transforms.cpp file
Signed-off-by: Yurii <iuriish@yahoo.com>
Co-authored-by: raver119 <raver119@gmail.com>
* static increments in loops
Signed-off-by: raver119 <raver119@gmail.com>
* specials and concat split into separate units
Signed-off-by: raver119 <raver119@gmail.com>
* - profiling gather op for aurora
Signed-off-by: Yurii <iuriish@yahoo.com>
* - include contiguous memcpy in gather op
Signed-off-by: Yurii <iuriish@yahoo.com>
* - provide matmul code based on mkl api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - correct typo in mkl matmul op
Signed-off-by: Yurii <iuriish@yahoo.com>
* - take into account empty arrays in mkl matmul op
Signed-off-by: Yurii <iuriish@yahoo.com>
* - fix bug in mkl matmul and group all matmul tests in one file
Signed-off-by: Yurii <iuriish@yahoo.com>
* Test spam reduction
Signed-off-by: Alex Black <blacka101@gmail.com>
* Arbiter bad import fixes
Signed-off-by: Alex Black <blacka101@gmail.com>
* Small spark test tweak
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Arbiter test log spam reduction
Signed-off-by: Alex Black <blacka101@gmail.com>
* More test spam reduction
Signed-off-by: Alex Black <blacka101@gmail.com>
* broadcast as scalar edge case
Signed-off-by: raver119 <raver119@gmail.com>
* missing return
Signed-off-by: raver119 <raver119@gmail.com>
* few fixes
Signed-off-by: raver119 <raver119@gmail.com>
* one more fix
Signed-off-by: raver119 <raver119@gmail.com>
* no need for lambdas
Signed-off-by: raver119 <raver119@gmail.com>
* - provide contiguous strides for ouput in transpose op
Signed-off-by: Yurii <iuriish@yahoo.com>
* - provide contiguous strides for output in permute op
Signed-off-by: Yurii <iuriish@yahoo.com>
* - take into account empty shapes properly in transpose/permute op
Signed-off-by: Yurii <iuriish@yahoo.com>
* libnd4j trueBroadcast rank 3 row implementation of special case
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j rule clarify for second special case for all tests pass
* libnd4j parallel_tad loop switch on in special case
* libnd4j more general case for special case 2, need additional testing
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j more general case for trueBroadcast special cases added
* libnd4j minor corrections and clean up
* libnd4j one more minor fix
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j fixed check point to support all Y common vector representations in first special case for trueBroadcast
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
Co-authored-by: raver119 <raver119@gmail.com>
* Copied and pasted RegressionTest100b4.java to RegressionTest100b6.java with renamed b4->b6
* assertEquals > assertTrue for half dtype
Signed-off-by: atuzhykov <andrewtuzhukov@gmail.com>
* Libnd4j: TensorMMul backprop op #8174, raw implementation
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: TensorMMul backprop op #8174 merge master and some corrections
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: TensorMMul backprop op #8174 algorithm update, need testing, sync with master
* Libnd4j: TensorMMul backprop op #8174 fixed incorrect B axes calculation
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: TensorMMul backprop op #8174 optimize axes identification and fix bug of indeces overlapping, added first test. need testing with different shapes
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: TensorMMul backprop op #8174 some fixes and improvements need more testing
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: TensorMMul backprop op #8174 fixed order of matrix multiply
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: TensorMMul backprop op #8174 fixed issue of incorrect axes definition, add tests based on TF, need additional testing for case dLdC not equal 1
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: TensorMMul backprop op #8174 fixed scalar case add test
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: TensorMMul backprop op #8174 fixed bp algorithm, axes definition, need some mode testing with different orders combination f,c; c,f f,f and add some checks for inputs
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: TensorMMul backprop op #8174 some checks and corrections added tests, exists the problem with different input orders support A-f B-c and A-f B-f
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: TensorMMul backprop op #8174 sync master
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* - correct bug in MmulHelper::tensorDot(a, b, c, axes_a, axes_b,permutForC)
Signed-off-by: Yurii <iuriish@yahoo.com>
* Libnd4j: TensorMMul backprop op #8174 code clean up and refactoring
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* - add check for linspase ordered permutations in ShapeUtils::evalShapeForTensorDot
Signed-off-by: Yurii <iuriish@yahoo.com>
* - provide additional code in shape::reshape stuff in order to reduce amount of allocation/copy operations during reshaping procedure
Signed-off-by: Yurii <iuriish@yahoo.com>
* - further work on problem of wrong shape evaluation during permute/reshape procedures
Signed-off-by: Yurii <iuriish@yahoo.com>
* - still looking for bug reason in reshape/permute stuff
Signed-off-by: Yurii <iuriish@yahoo.com>
* - correct bug in transform cuda native ops
Signed-off-by: Yurii <iuriish@yahoo.com>
* - correct bug in NDArray::assign
Signed-off-by: Yurii <iuriish@yahoo.com>
* - remove old shape::reshape stuff
Signed-off-by: Yurii <iuriish@yahoo.com>
* - add possibility to disable copy of old buffer to new buffer during reshape operation in NDArray class
Signed-off-by: Yurii <iuriish@yahoo.com>
* - correct bug in tensorDot which had to do with wrong pointers assigments
Signed-off-by: Yurii <iuriish@yahoo.com>
Co-authored-by: Oleh <oleg.semeniv@gmail.com>
* Gradients tests added
* Fix for Standard deviation serialization + test
Signed-off-by: Alex Black <blacka101@gmail.com>
* More fixes
Signed-off-by: Alex Black <blacka101@gmail.com>
* Test fixed
* Spark config driver host config for CI
Signed-off-by: Alex Black <blacka101@gmail.com>
* Op validation timeout increase
Signed-off-by: Alex Black <blacka101@gmail.com>
* Gradient check - fix for low probability test failure due to randomly all 0s mask
Signed-off-by: AlexDBlack <blacka101@gmail.com>
Co-authored-by: Alex Black <blacka101@gmail.com>
* Making TypeName enum public
* Ignoring None type object for PythonExceptions
* better handling of None + test
Co-authored-by: Fariz Rahman <farizrahman4u@gmail.com>
* Fixed a couple of issues with resize_area op.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Added additional test for alternate params for resize_area testing.
Signed-off-by: shugeo <sgazeos@gmail.com>
* libnd4j trueBroadcast special case
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j fix trueBroadcast special case
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j special case of TrueBroadcastHelper
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j trueBroadCast special case and test
* libnd4j minor changes sync with master
* libnd4j changes to TrueBroadcastHelper.hpp per require
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Change the regular expression for the Bert tokenizer.
The previous regular expression causes StackOverflowErrors
if given a document with a large amount of whitespace. I
believe that the one I've provided is an equivalent.
* Add test for new BertWordPieceTokenizer RegEx.
This test should cause a StackOverflowError with the previous version.
* Fix assert off by one.
* special workaround methods for DataBuffer.write
Signed-off-by: raver119 <raver119@gmail.com>
* one test removed
Signed-off-by: raver119 <raver119@gmail.com>
* more of unsynced
Signed-off-by: raver119 <raver119@gmail.com>
* missing asLong for BaseCudaDataBuffer
Signed-off-by: raver119 <raver119@gmail.com>
* Test speedups / integration test run only for CUDA - NLP
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* nlp-uima CUDA slow tests
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Spark CUDA timeout fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* - provide nhwc format in mkl conv ops
Signed-off-by: Yurii <iuriish@yahoo.com>
* - corrections in mkl conv3d
Signed-off-by: Yurii <iuriish@yahoo.com>
* - corrections in mkl batchnorm
Signed-off-by: Yurii <iuriish@yahoo.com>
* - corrections in mkl maxpooling2d
Signed-off-by: Yurii <iuriish@yahoo.com>
* - add format format_tag::any to outputs in mkl conv ops
Signed-off-by: Yurii <iuriish@yahoo.com>
* - complete corrections in mkl conv ops
Signed-off-by: Yurii <iuriish@yahoo.com>
* - add test for comparison of execution speeds of mkl conv2d op with different weights format
Signed-off-by: Yurii <iuriish@yahoo.com>
* - take into account order f in mkl conv ops
Signed-off-by: Yurii <iuriish@yahoo.com>
* Fixed sequence_mask op and tests.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Cuda fix for sequence_mask op.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Fixed sequence_mask op for both platforms and tests.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Fixed solve and triangular_solve for more than 2D for adjoint cases.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Added adjoint solve test again.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Added a set of tests for triangual_solve and generic solve ops.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Added a pair tests for triangular_solve
Signed-off-by: shugeo <sgazeos@gmail.com>
* Added tests for triangular_solve op.
Signed-off-by: shugeo <sgazeos@gmail.com>