* libnd4j cast loop types
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j more type castination added to loops
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j sync casting types of iterated variable in loops
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j more loops reviewed for vectorization problem fix
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j fixed several typos
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j several more files reviewed to fix auto-vectorization problem in loops
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j merge master and reviewed more files to fix auto-vectorization problem in loops
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j several type casting added in broadcasting that were missed, fixed mac builds
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j double check all files and fix several more places in loops
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j fixed builds
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j revert changes for lup.cpp
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j: Multinomial op #8570 first raw step of multinomial random data generator implementation
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j: Multinomial op #8570 next step of multinomial random categories generator implementation on both cpu and cuda, need corrections and code clean up before review and testing
* libnd4j: Multinomial op #8570 code clean up and fixed issues data selecting, moved from coords to tads
* libnd4j: Multinomial op #8570 fixed cuda build add reference for math materials that was used for implementation
* libnd4j: Multinomial op #8570 fixed several bugs, added several tests and improved cuda version. current implementation works, need testing of reproduction with the same seed
* libnd4j: Multinomial op #8570 fixes and optimization after discussion in both cuda and cpu
* libnd4j: Multinomial op #8570 add corrections after review, removed tads, replace 2D parallel loop by 3D
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j: Multinomial op fixed declaration and add tests need discussion
* libnd4j: Multinomial op fix in test
* libnd4j: Multinomial op corrected behavior to get reproducible results, fixed issue in uniform value getting, tests added, need cuda review and cuda testing
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j: Multinomial op fixed indexing on uniform calculation
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j: Multinomial op some corrections in max min declaration
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j: Multinomial op fixed index calculation, added rewind, corrected input declaration, added stats tests, both cuda and cpu. cuda need testing
* libnd4j: Multinomial op fixed bugs on cuda nad cpu. need review
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j: Multinomial op corrected tests to handle different orders
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j: Multinomial op some improvements after code review
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j: Multinomial op more corrections after review
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j: Multinomial op fixed seed usage, update tests, fixed cuda based on comments, fixed bug of rewind, removed one behavior, minor corrections.
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j: Multinomial op minor corrections
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j: Multinomial op rise the bound of fluctuation for random cases
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j: Multinomial op modified operation inputs and update implementation and tests on both cpu and cuda
* libnd4j: Multinomial op corrected data types according ops.proto
Co-authored-by: raver119 <raver119@gmail.com>
* - specifying template instantiation for certain types in float16 and bloat16
Signed-off-by: Yurii <iuriish@yahoo.com>
* - polishing bfloat16 and float16 member functions template specialization
Signed-off-by: Yurii <iuriish@yahoo.com>
* - rewrite and overload array +-*/ scalar and scalar +-*/ arr in NDAray class
Signed-off-by: Yurii <iuriish@yahoo.com>
* - make corrections which have to do with and rvalue lvalue conversions
Signed-off-by: Yurii <iuriish@yahoo.com>
* - provide move semantic in NDArray operators array +-/* array
Signed-off-by: Yurii <iuriish@yahoo.com>
* float16/bfloat16 tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* one more tweak
Signed-off-by: raver119 <raver119@gmail.com>
* - make float16 and bfloat16 to compile successfully on cuda
Signed-off-by: Yurii <iuriish@yahoo.com>
* - do not use resources of view-like arrays when move semantics is applied
Signed-off-by: Yurii <iuriish@yahoo.com>
* - get rid of pointers in signatures NDArray methods 1
Signed-off-by: Yurii <iuriish@yahoo.com>
* - correction of signature of NDArray::dup method
Signed-off-by: Yurii <iuriish@yahoo.com>
* - correction of signature of NDArray::reduceAlongDimension method
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::applyIndexReduce and applyTrueBroadcast methods
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::applyReduce3 and varianceAlongDimension methods
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::tensorsAlongDimension and diagonal methods
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::allTensorsAlongDimension
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::reduceAlongDimension 2
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::applyTransform 2
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::applyPairwiseTransform 2
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::applyBroadcast 2
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::applyTrueBroadcast 2
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::applyScalar and applyScalarArr
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::lambda methods
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::reduce3 methods 2
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of following NDArray methods: add/sub/mul/div row/column and fillAsTriangular
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::tileToShape methods
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::isShapeSameStrict method
Signed-off-by: Yurii <iuriish@yahoo.com>
* minor corrections in tests
Signed-off-by: Yurii <iuriish@yahoo.com>
* - replace reduce op in batchnorm mkldnn
Signed-off-by: Yurii <iuriish@yahoo.com>
* - add explicit templates instantiations for operator+(NDArray&&. const scalar)
Signed-off-by: Yurii <iuriish@yahoo.com>
* - corrections of casts in float16/bfloat16
Signed-off-by: Yurii <iuriish@yahoo.com>
* - provide move semantics in following NDArray methods: transform, applyTrueBroadcast, transpose, reshape, permute
Signed-off-by: Yurii <iuriish@yahoo.com>
* - get rid of input array A duplicate in svd cuda op
Signed-off-by: Yurii <iuriish@yahoo.com>
* - avoid available bug in svd cuda API
Signed-off-by: Yurii <iuriish@yahoo.com>
* - add temporary global memory buffer in svd cuda when calcUV = false and m != n
Signed-off-by: Yurii <iuriish@yahoo.com>
* - remove test with blfoat16 type for betainC
Signed-off-by: Yurii <iuriish@yahoo.com>
* - resolve conflicts after master has been merged in
Signed-off-by: Yurii <iuriish@yahoo.com>
* - changed type of affected input array in fused_batch_norm
Signed-off-by: Yurii <iuriish@yahoo.com>
* - add several explicit type castings
Signed-off-by: Yurii <iuriish@yahoo.com>
* - add ND4J_EXPORT to operators
Signed-off-by: Yurii <iuriish@yahoo.com>
* - add explicit template types in instantiations of template arithm operators of NDArray class
Signed-off-by: Yurii <iuriish@yahoo.com>
* - one more test fix
Signed-off-by: Yurii <iuriish@yahoo.com>
Co-authored-by: raver119 <raver119@gmail.com>
* Corrected randomuniform declaration.
* Refactored uniform distribution for both cuda and cpu platforms.
* Refactored uniform distribution and tests.
* Fixed type usage with indices.
* Refactored uniform distribution implementation and tests to full conform with TF implementation.
* Refactored gamma function to use type util method.
* Copyright changes and fixes with ConstantHelper.
* Added error checking on allocate cuda device memory and operations.
* Added implementation for random_gamma op.
* Added implementation for random_poisson op and support classes.
* Added helpers for random_poisson and random_gamma ops.
* Implementation of random_poisson. The first working edition.
* Implementation of random_poisson. Parallelized working edition.
* Implementation of random_gamma. Parallelized working edition with alpha only.
* Added cuda implementation for helper of poisson distribution.
* Corrected shape calculation with random_gamma and tests.
* Finished cpu implementation for gamma distribution.
* Finished cuda implementation for random_gamma op.
* Refactored cpu helpers for random_gamma and random_poisson ops.
* Refactored cuda helpers for gamma and poisson distribution.
* Refactored cuda helper for gamma distribution.
* Refactored cpu helper for random_poisson op.
* Refactored cpu helper for random_gamma op.