Commit Graph

16 Commits (cf67c7165aa839a433b74c977498a51deb404da9)

Author SHA1 Message Date
Oleh d52e67209e
Oleh convert (#200)
* StringUtils for utf convertor raw implementation of all possible combinations, need to be add counter of bytes per symbol for any type and add api to call convertors and store data

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor more corrections to support convertors

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor some corrections and bug fixes, need review to discuss how to add multi-threading

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 some corrections to move to multi-threading, add one test need discussion data inputs/outputs array presentation, need discussion the way of multi-threading

* StringUtils for utf convertor #8613 tests added some corrections to optimize build

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 some corrections and code clean up

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 code clean up and optimize usage, need update ndarray factory before replace std usage

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 some staff to integrate converters into NDArrayFactory, update tests and add some functionality

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 minor corrections and bug fix before discussion

* StringUtils for utf convertor #8613 some fixes and tets

* StringUtils for utf convertor #8613 some more staff to support different unicode

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 fix linking bug

* StringUtils for utf convertor #8613 corrected several tests as defaults for string ndarray changed

* StringUtils for utf convertor #8613 replace some incorrect implementation, revert some test changes, need sync before testing

* StringUtils for utf convertor #8613 fixed several thing that were badly implemented yesterday, need optimization, testing (before testing have to be add support of u32 and u16 buffer visualization)

* StringUtils for utf convertor #8613 fixed to support u16 and u32, and convertor in ndarray, fix buffer print, etc

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 merge master and sync with server

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 some correction for string cast, need print check only asci support

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 merge master, remove copies and add cast, need test, refactoring according review and clean up

* StringUtils for utf convertor #8613 fixed cast and copy issues

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 fixed cuda and update tests

* StringUtils for utf convertor #8613 integration into NdArray, fix several tests for build pass, refactoring, etc

* - avoid ambiguity of NDArray ctrs overloading in some tests

Signed-off-by: Yurii <iuriish@yahoo.com>

* StringUtils for utf convertor #8613 NDArray string constructors added, updated NDArrayFactory, refactoring unicode and tests, etc

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 fixed cuda build and test, refactoring and void* added to some functions

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613  void* integration, removed copy operation, refactoring, added tests for NDArray string constructors, etc

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 several more fixes, improvements and updates

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 master merge, code clean up and optimization before review

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 minor fixes string element size define

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 revert last changes as mistake

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 fixed NDArray constructor build problem, remove order from string factory, fixed order use for factory via project, added catch of incorrect sync in cast of arrays to data types, fixed e method for strings, etc

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 added javacpp hack, added multi-threading, minor corrections in license agreement

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 windows builds fix, as "sting" is not treated as utf8

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

Co-authored-by: Yurii Shyrma <iuriish@yahoo.com>
2020-01-31 16:30:49 +03:00
raver119 ba961c7601
DataTypes & FlatBuffers (#197)
* flatbuffers version upgrade

Signed-off-by: raver119 <raver119@gmail.com>

* flatbuffers version upgrade java side

Signed-off-by: raver119 <raver119@gmail.com>

* flatbuffers dependency version upgrade java side

Signed-off-by: raver119 <raver119@gmail.com>

* MKLDNN version upgrade

Signed-off-by: raver119 <raver119@gmail.com>

* DArgs first pass

Signed-off-by: raver119 <raver119@gmail.com>

* signatures first pass

Signed-off-by: raver119 <raver119@gmail.com>

* signatures second pass

Signed-off-by: raver119 <raver119@gmail.com>

* signatures third pass

Signed-off-by: raver119 <raver119@gmail.com>

* signatures third pass

Signed-off-by: raver119 <raver119@gmail.com>

* signatures fourth pass

Signed-off-by: raver119 <raver119@gmail.com>

* signatures fifth pass

Signed-off-by: raver119 <raver119@gmail.com>

* flatbuffers UI version upgrade java side

Signed-off-by: raver119 <raver119@gmail.com>

* flatbuffers ui update

Signed-off-by: raver119 <raver119@gmail.com>

* flatbuffers downgrade

Signed-off-by: raver119 <raver119@gmail.com>

* flatbuffers downgrade java side

Signed-off-by: raver119 <raver119@gmail.com>
2020-01-30 10:07:24 +03:00
Oleh 8fc0e63ce7 Oleh powderev (#171)
* Libnd4j: Add broadcastable elementwise power derivative #7461 first step of Pow_bp operation implementation

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: Add broadcastable elementwise power derivative #7461 some corrections of calculation steps

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: Add broadcastable elementwise power derivative #7461 some bug fixes, the PowDerevative op made broadcastable, add the raw tests for op, need refactoring to use broadcast ops

* Libnd4j: Add broadcastable elementwise power derivative #7461 fixed several bugs add broadcast support and tests, need to fix scalar+array and array+scalar

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: Add broadcastable elementwise power derivative #7461 fixed bugs for scalar inputs, fixed multinomial tests, added tests

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: Add broadcastable elementwise power derivative #7461 fised bugs for different shapes support, tests updated

* Libnd4j: Add broadcastable elementwise power derivative #7461 applied all possible variants via tiled arrays, add support of broadcast for Pow and PowDerivative ops, covered by tests, before review have to be replaced tiled implementation by applyTrueBroadcast

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: Add broadcastable elementwise power derivative #7461 replaced tile by broadcast implementation, fixed issue with negative x input, corrected tests, need additional testing

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: Add broadcastable elementwise power derivative #7461 added and corrected test cases, corrected implementation need review

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: Add broadcastable elementwise power derivative #7461 code clean up

* Libnd4j: Add broadcastable elementwise power derivative #7461 code clean up, removed some tests, add tests with scalar

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: Add broadcastable elementwise power derivative #7461 code improvement and clean up, split tests

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: Add broadcastable elementwise power derivative #7461 some code clean up

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: Add broadcastable elementwise power derivative replace __isnanf by internal realization

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* pow_bp wrapper

* Fixed PowBp wrapper

* Tests added

* Test fixed

* Fix return type

* Disable powBp usage

* Pow backprop changed

Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
2020-01-20 12:59:12 +03:00
Oleh 2404be5fe0 Oleh multinomial (#163)
* libnd4j: Multinomial op #8570 first raw step of multinomial random data generator implementation

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j: Multinomial op #8570 next step of multinomial random categories generator implementation on both cpu and cuda, need corrections and code clean up before review and testing

* libnd4j: Multinomial op #8570 code clean up and fixed issues data selecting, moved from coords to tads

* libnd4j: Multinomial op #8570 fixed cuda build add reference for math materials that was used for implementation

* libnd4j: Multinomial op #8570 fixed several bugs, added several tests and improved cuda version. current implementation works, need testing of reproduction with the same seed

* libnd4j: Multinomial op #8570 fixes and optimization after discussion in both cuda and cpu

* libnd4j: Multinomial op #8570 add corrections after review, removed tads, replace 2D parallel loop by 3D

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j: Multinomial op fixed declaration and add tests need discussion

* libnd4j: Multinomial op fix in test

* libnd4j: Multinomial op corrected behavior to get reproducible results, fixed issue in uniform value getting, tests added, need cuda review and cuda testing

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j: Multinomial op fixed indexing on uniform calculation

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j: Multinomial op some corrections in max min declaration

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j: Multinomial op fixed index calculation, added rewind, corrected input declaration, added stats tests, both cuda and cpu. cuda need testing

* libnd4j: Multinomial op fixed bugs on cuda nad cpu. need review

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j: Multinomial op corrected tests to handle different orders

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j: Multinomial op some improvements after code review

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j: Multinomial op more corrections after review

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j: Multinomial op fixed seed usage, update tests, fixed cuda based on comments, fixed bug of rewind, removed one behavior, minor corrections.

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j: Multinomial op minor corrections

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j: Multinomial op rise the bound of fluctuation for random cases

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j: Multinomial op modified operation inputs and update implementation and tests on both cpu and cuda

* libnd4j: Multinomial op corrected data types according ops.proto

Co-authored-by: raver119 <raver119@gmail.com>
2020-01-06 22:35:05 +03:00
Yurii Shyrma 5d9b2a16e5 Shyrma temp (#131)
* - specifying template instantiation for certain types in float16 and bloat16

Signed-off-by: Yurii <iuriish@yahoo.com>

* - polishing bfloat16 and float16 member functions template specialization

Signed-off-by: Yurii <iuriish@yahoo.com>

* - rewrite and overload array +-*/ scalar and scalar +-*/ arr in NDAray class

Signed-off-by: Yurii <iuriish@yahoo.com>

* - make corrections which have to do with and rvalue lvalue conversions

Signed-off-by: Yurii <iuriish@yahoo.com>

* - provide move semantic in NDArray operators array +-/* array

Signed-off-by: Yurii <iuriish@yahoo.com>

* float16/bfloat16 tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* one more tweak

Signed-off-by: raver119 <raver119@gmail.com>

* - make float16 and bfloat16 to compile successfully on cuda

Signed-off-by: Yurii <iuriish@yahoo.com>

* - do not use resources of view-like arrays when move semantics is applied

Signed-off-by: Yurii <iuriish@yahoo.com>

* - get rid of pointers in signatures NDArray methods 1

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correction of signature of NDArray::dup method

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correction of signature of NDArray::reduceAlongDimension method

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::applyIndexReduce and applyTrueBroadcast methods

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::applyReduce3 and varianceAlongDimension methods

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::tensorsAlongDimension and diagonal methods

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::allTensorsAlongDimension

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::reduceAlongDimension 2

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::applyTransform 2

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::applyPairwiseTransform 2

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::applyBroadcast 2

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::applyTrueBroadcast 2

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::applyScalar and applyScalarArr

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::lambda methods

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::reduce3 methods 2

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of following NDArray methods: add/sub/mul/div row/column and fillAsTriangular

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::tileToShape methods

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::isShapeSameStrict method

Signed-off-by: Yurii <iuriish@yahoo.com>

* minor corrections in tests

Signed-off-by: Yurii <iuriish@yahoo.com>

* - replace reduce op in batchnorm mkldnn

Signed-off-by: Yurii <iuriish@yahoo.com>

* - add explicit templates instantiations for operator+(NDArray&&. const scalar)

Signed-off-by: Yurii <iuriish@yahoo.com>

* - corrections of casts in float16/bfloat16

Signed-off-by: Yurii <iuriish@yahoo.com>

* - provide move semantics in following NDArray methods: transform, applyTrueBroadcast, transpose, reshape, permute

Signed-off-by: Yurii <iuriish@yahoo.com>

* - get rid of input array A duplicate in svd cuda op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - avoid available bug in svd cuda API

Signed-off-by: Yurii <iuriish@yahoo.com>

* - add temporary global memory buffer in svd cuda when calcUV = false and  m != n

Signed-off-by: Yurii <iuriish@yahoo.com>

* - remove test with blfoat16 type for betainC

Signed-off-by: Yurii <iuriish@yahoo.com>

* - resolve conflicts after master has been merged in

Signed-off-by: Yurii <iuriish@yahoo.com>

* - changed type of affected input array in fused_batch_norm

Signed-off-by: Yurii <iuriish@yahoo.com>

* - add several explicit type castings

Signed-off-by: Yurii <iuriish@yahoo.com>

* - add ND4J_EXPORT to operators

Signed-off-by: Yurii <iuriish@yahoo.com>

* - add explicit template types in instantiations of template arithm operators of NDArray class

Signed-off-by: Yurii <iuriish@yahoo.com>

* - one more test fix

Signed-off-by: Yurii <iuriish@yahoo.com>

Co-authored-by: raver119 <raver119@gmail.com>
2019-12-20 22:35:39 +03:00
raver119 6de00bf75f
[WIP] Weekly update of repo (#8390)
* [WIP] Fix compilation after nd4j changes (#37)

* Fix compilation.

* Some tests fixed

* Disable tests temporarily.

* Restored test

* Tests restored.

* Test restored.

* [WIP] perf tests (#40)

* special maxpool test

Signed-off-by: raver119 <raver119@gmail.com>

* special maxpool test

Signed-off-by: raver119 <raver119@gmail.com>

* Shyrma bnorm bp (#41)

Batchnorm backprop mkldnn

* Add SameDiff memory reuse memory manager (array cache) (#39)

* Attention op comments

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* ArrayCacheMemoryMgr - first pass

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Tweak array cache for use with SameDiff identity arrays

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* ArrayCacheMemoryMgr javadoc and properly get max memory

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* LRU cache policy + add tests

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fixes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Resize arrays internally if required for ArrayCacheMemoryMgr

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Test improvement

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Small polish

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* SameDiff op runtime benchmarking listener (#42)

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* INLINE_LOOPS for windows

Signed-off-by: raver119 <raver119@gmail.com>

* [WIP] ThreadPool (#8)

This PR removes OpenMP use in 95% of cases
2019-11-13 17:15:18 +03:00
raver119 929c1dc5c7 - new NDArrayFactory scalar constructor
- minor tweak in randomuniform
- one more test

Signed-off-by: raver119 <raver119@gmail.com>
2019-11-08 08:49:41 +03:00
shugeo 08853c7829
Shugeo random uniform int (#30)
* Corrected randomuniform declaration.

* Refactored uniform distribution for both cuda and cpu platforms.

* Refactored uniform distribution and tests.

* Fixed type usage with indices.

* Refactored uniform distribution implementation and tests to full conform with TF implementation.

* Refactored gamma function to use type util method.

* Copyright changes and fixes with ConstantHelper.

* Added error checking on allocate cuda device memory and operations.
2019-11-06 12:49:27 +02:00
shugeo 7b14a9f603
Gamma and Poisson distributions (#27)
* Added implementation for random_gamma op.

* Added implementation for random_poisson op and support classes.

* Added helpers for random_poisson and random_gamma ops.

* Implementation of random_poisson. The first working edition.

* Implementation of random_poisson. Parallelized working edition.

* Implementation of random_gamma. Parallelized working edition with alpha only.

* Added cuda implementation for helper of poisson distribution.

* Corrected shape calculation with random_gamma and tests.

* Finished cpu implementation for gamma distribution.

* Finished cuda implementation for random_gamma op.

* Refactored cpu helpers for random_gamma and random_poisson ops.

* Refactored cuda helpers for gamma and poisson distribution.

* Refactored cuda helper for gamma distribution.

* Refactored cpu helper for random_poisson op.

* Refactored cpu helper for random_gamma op.
2019-11-04 15:42:28 +02:00
shugeo 3c7cd2397c Eliminated memory leaks and dropped waste prints with tests. (#117) 2019-08-15 15:28:19 +03:00
shugeo c78f5a8225
Shugeo cuda cuda (#105)
* Refactored extract_image_patches op helpers.

* Eliminated compliler errors with helper implementation.

* Finished implementation for extract_image_patches both cpu and cuda helpers.

* Improved cpu implementation.

* Improved cuda implementation for extract_image_patches helper.

* Added omp to ClipByGlobalNorm helpers implementation.

* Added implementation for thresholedrelu_bp op.

* Fixed cuda kernel with F order.

* Fixed tests for subarray.

* Refactored tests for Gaussian_3 and Truncated_22.

* Added tests for GaussianDistribution with native ops.

* Modified tests for Gaussian distribution.

* Fixed random tests.

* Fixed atomicMin/atomicMax for 64bit cases.

* Fixed tests for execReduce3TAD tests.

* Eliminated waste comments.
2019-08-07 15:29:17 +03:00
raver119 3c4e959e21 [WIP] More of CUDA (#95)
* initial commit

Signed-off-by: raver119 <raver119@gmail.com>

* Implementation of hashcode cuda helper. Working edition.

* Fixed parallel test input arangements.

* Fixed tests for hashcode op.

* Fixed shape calculation for image:crop_and_resize op and test.

* NativeOps tests. Initial test suite.

* Added tests for indexReduce methods.

* Added test on execBroadcast with NDArray as dimensions.

* Added test on execBroadcastBool with NDArray as dimensions.

* Added tests on execPairwiseTransform and execPairwiseTransofrmBool.

* Added tests for execReduce with scalar results.

* Added reduce tests for non-empty dims array.

* Added tests for reduce3.

* Added tests for execScalar.

* Added tests for execSummaryStats.

* - provide cpu/cuda code for batch_to_space
- testing it

Signed-off-by: Yurii <yurii@skymind.io>

* - remove old test for batch_to_space (had wrong format and numbers were not checked)

Signed-off-by: Yurii <yurii@skymind.io>

* Fixed complilation errors with test.

* Added test for execTransformFloat.

* Added test for execTransformSame.

* Added test for execTransformBool.

* Added test for execTransformStrict.

* Added tests for execScalar/execScalarBool with TADs.

* Added test for flatten.

* - provide cpu/cuda code for space_to_Batch operaion

Signed-off-by: Yurii <yurii@skymind.io>

* Added test for concat.

* comment unnecessary stuff in s_t_b

Signed-off-by: Yurii <yurii@skymind.io>

* Added test for specialConcat.

* Added tests for memcpy/set routines.

* Fixed pullRow cuda test.

* Added pullRow test.

* Added average test.

* - correct typo in NDArray::applyPairwiseTransform(nd4j::pairwise::BoolOps op...)

Signed-off-by: Yurii <yurii@skymind.io>

* - debugging and fixing cuda tests in JavaInteropTests file

Signed-off-by: Yurii <yurii@skymind.io>

* - correct some tests

Signed-off-by: Yurii <yurii@skymind.io>

* Added test for shuffle.

* Fixed ops declarations.

* Restored omp and added shuffle test.

* Added convertTypes test.

* Added tests for execRandom. Eliminated usage of RandomBuffer with NativeOps.

* Added sort tests.

* Added tests for execCustomOp.

* - further debuging and fixing tests terminated with crash

Signed-off-by: Yurii <yurii@skymind.io>

* Added tests for calculateOutputShapes.

* Addded Benchmarks test.

* Commented benchmark tests.

* change assertion

Signed-off-by: raver119 <raver119@gmail.com>

* Added tests for apply_sgd op. Added cpu helper for that op.

* Implement cuda helper for aplly_sgd op. Fixed tests for NativeOps.

* Added test for assign broadcastable.

* Added tests for assign_bp op.

* Added tests for axpy op.

* - assign/execScalar/execTransformAny signature change
- minor test fix

Signed-off-by: raver119 <raver119@gmail.com>

* Fixed axpy op.

* meh

Signed-off-by: raver119 <raver119@gmail.com>

* - fix tests for nativeOps::concat

Signed-off-by: Yurii <yurii@skymind.io>

* sequential transform/scalar

Signed-off-by: raver119 <raver119@gmail.com>

* allow nested parallelism

Signed-off-by: raver119 <raver119@gmail.com>

* assign_bp leak fix

Signed-off-by: raver119 <raver119@gmail.com>

* block setRNG fix

Signed-off-by: raver119 <raver119@gmail.com>

* enable parallelism by default

Signed-off-by: raver119 <raver119@gmail.com>

* enable nested parallelism by default

Signed-off-by: raver119 <raver119@gmail.com>

* Added cuda implementation for row_count helper.

* Added implementation for tnse gains op helper.

* - take into account possible situations when input arrays are empty in reduce_ cuda stuff

Signed-off-by: Yurii <yurii@skymind.io>

* Implemented tsne/edge_forces op cuda-based helper. Parallelized cpu-based helper for edge_forces.

* Added kernel for tsne/symmetrized op heleper.

* Implementation of tsne/symmetrized op cuda helper. Working edition.

* Eliminated waste printfs.

* Added test for broadcastgradientargs op.

* host-only fallback for empty reduce float

Signed-off-by: raver119 <raver119@gmail.com>

* - some tests fixes

Signed-off-by: Yurii <yurii@skymind.io>

* - correct the rest of reduce_ stuff

Signed-off-by: Yurii <yurii@skymind.io>

* - further correction of reduce_ stuff

Signed-off-by: Yurii <yurii@skymind.io>

* Added test for Cbow op. Also added cuda implementation for cbow helpers.

* - improve code of stack operation for scalar case

Signed-off-by: Yurii <yurii@skymind.io>

* - provide cuda kernel for gatherND operation

Signed-off-by: Yurii <yurii@skymind.io>

* Implementation of cbow helpers with cuda kernels.

* minor tests tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* minor tests tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* - further correction of cuda stuff

Signed-off-by: Yurii <yurii@skymind.io>

* Implementatation of cbow op helper with cuda kernels. Working edition.

* Skip random testing for cudablas case.

* lstmBlockCell context fix

Signed-off-by: raver119 <raver119@gmail.com>

* Added tests for ELU and ELU_BP ops.

* Added tests for eq_scalar, gt_scalar, gte_scalar and lte_scalar ops.

* Added tests for neq_scalar.

* Added test for noop.

* - further work on clipbynorm_bp

Signed-off-by: Yurii <yurii@skymind.io>

* - get rid of concat op call, use instead direct concat helper call

Signed-off-by: Yurii <yurii@skymind.io>

* lstmBlockCell context fix

Signed-off-by: raver119 <raver119@gmail.com>

* Added tests for lrelu and lrelu_bp.

* Added tests for selu and selu_bp.

* Fixed lrelu derivative helpers.

* - some corrections in lstm

Signed-off-by: Yurii <yurii@skymind.io>

* operator * result shape fix

Signed-off-by: raver119 <raver119@gmail.com>

* - correct typo in lstmCell

Signed-off-by: Yurii <yurii@skymind.io>

* few tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* CUDA inverse broadcast bool fix

Signed-off-by: raver119 <raver119@gmail.com>

* disable MMAP test for CUDA

Signed-off-by: raver119 <raver119@gmail.com>

* BooleanOp syncToDevice

Signed-off-by: raver119 <raver119@gmail.com>

* meh

Signed-off-by: raver119 <raver119@gmail.com>

* additional data types for im2col/col2im

Signed-off-by: raver119 <raver119@gmail.com>

* Added test for firas_sparse op.

* one more RandomBuffer test excluded

Signed-off-by: raver119 <raver119@gmail.com>

* Added tests for flatten op.

* Added test for Floor op.

* bunch of tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* mmulDot tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* more tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* Implemented floordiv_bp op and tests.

* Fixed scalar case with cuda implementation for bds.

* - work on cuda kernel for clip_by_norm backprop op is completed

Signed-off-by: Yurii <yurii@skymind.io>

* Eliminate cbow crach.

* more tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* more tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* Eliminated abortion with batched nlp test.

* more tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* Fixed shared flag initializing.

* disabled bunch of cpu workspaces tests

Signed-off-by: raver119 <raver119@gmail.com>

* scalar operators fix: missing registerSpecialUse call

Signed-off-by: raver119 <raver119@gmail.com>

* Fixed logdet for cuda and tests.

* - correct clipBynorm_bp

Signed-off-by: Yurii <yurii@skymind.io>

* Fixed crop_and_resize shape datatype.

* - correct some mmul tests

Signed-off-by: Yurii <yurii@skymind.io>
2019-08-05 11:27:05 +10:00
Samuel Audet dcc72e23b2 Refactor NativeOps.h to export C functions 2019-08-05 11:22:59 +10:00
raver119 763a225c6a [WIP] More of CUDA operations (#69)
* initial commit

Signed-off-by: raver119 <raver119@gmail.com>

* - gruCell_bp further

Signed-off-by: Yurii <yurii@skymind.io>

* - further work on gruCell_bp

Signed-off-by: Yurii <yurii@skymind.io>

* Inverse matrix cublas implementation. Partial working revision.

* Separation of segment ops helpers. Max separation.

* Separated segment_min ops.

* Separation of segment_mean/sum/prod/sqrtN ops heleprs.

* Fixed diagonal processing with LUP decomposition.

* Modified inversion approach using current state of LU decomposition.

* Implementation of matrix_inverse op with cuda kernels. Working revision.

* Implemented sequence_mask cuda helper. Eliminated waste printf with matrix_inverse implementation. Added proper tests.

* - further work on gruCell_bp (ff/cuda)

Signed-off-by: Yurii <yurii@skymind.io>

* comment one test for gruCell_bp

Signed-off-by: Yurii <yurii@skymind.io>

* - provide cuda static_rnn

Signed-off-by: Yurii <yurii@skymind.io>

* Refactored random_shuffle op to use new random generator.

* Refactored random_shuffle op helper.

* Fixed debug tests with random ops tests.

* Implement random_shuffle op cuda kernel helper and tests.

* - provide cuda scatter_update

Signed-off-by: Yurii <yurii@skymind.io>

* Implementation of random_shuffle for linear case with cuda kernels and tests.

* Implemented random_shuffle with cuda kernels. Final revision.

* - finally gruCell_bp is completed

Signed-off-by: Yurii <yurii@skymind.io>

* Dropout op cuda helper implementation.

* Implemented dropout_bp cuda helper.

* Implemented alpha_dropout_bp with cuda kernel helpers.

* Refactored helper.

* Implementation of suppresion helper with cuda kernels.

* - provide cpu code fot hsvToRgb, rgbToHsv, adjustHue

Signed-off-by: Yurii <yurii@skymind.io>

* Using sort by value method.

* Implementation of image.non_max_suppression op cuda-based helper.

* - correcting and testing adjust_hue, adjust_saturation cpu/cuda code

Signed-off-by: Yurii <yurii@skymind.io>

* Added cuda device prefixes to declarations.

* Implementation of hashcode op with cuda helper. Initital revision.

* rnn cu impl removed

Signed-off-by: raver119 <raver119@gmail.com>
2019-07-20 23:20:41 +10:00
raver119 cf2311859a cpp tests fixes
Signed-off-by: raver119 <raver119@gmail.com>
2019-07-20 23:08:07 +10:00
skymindops b5f0ec072f Eclipse Migration Initial Commit 2019-06-06 15:21:15 +03:00