Commit Graph

115 Commits (358c650b62d340d8bc4a0966b4586b04ac1b2356)

Author SHA1 Message Date
Oleh f0706b21aa
Split operation improvement (#262)
* libnd4j moved split operation implementation to helpers before special case adding

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j minor fixes for general split operation move, merge master

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libndj4 split cpu implementation

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* - provide cuda helper for split op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - minor correction

Signed-off-by: Yurii <iuriish@yahoo.com>

* - minor correction 2

Signed-off-by: Yurii <iuriish@yahoo.com>

Co-authored-by: Yurii Shyrma <iuriish@yahoo.com>
Co-authored-by: raver119 <raver119@gmail.com>
2020-02-24 08:22:41 +03:00
Yurii Shyrma f7a9190407
profiling of concat op (both cuda and cpu) (#151)
* - profiling of concat op (both cuda and cpu)

Signed-off-by: Yurii <iuriish@yahoo.com>

* better comparison for large concat

Signed-off-by: raver119 <raver119@gmail.com>

* - further improving of concat op

Signed-off-by: Yurii <iuriish@yahoo.com>

* some loggin

Signed-off-by: raver119 <raver119@gmail.com>

* - add possibility to verify presence of trailing unities in shape and set strides/ews correspondingly
- restrict second simple case in concat op to c order only

Signed-off-by: Yurii <iuriish@yahoo.com>

* - move concat op to specials_single.cpp file

Signed-off-by: Yurii <iuriish@yahoo.com>

* - get rid of second concat op declaration in transforms.cpp file

Signed-off-by: Yurii <iuriish@yahoo.com>

Co-authored-by: raver119 <raver119@gmail.com>
2020-02-20 21:19:01 +03:00
raver119 f9d51b7278
More compilation units (#246)
* weird edge case

Signed-off-by: raver119 <raver119@gmail.com>

* weird edge case

Signed-off-by: raver119 <raver119@gmail.com>

* get rid of it

Signed-off-by: raver119 <raver119@gmail.com>

* crop and resize reorganized

Signed-off-by: raver119 <raver119@gmail.com>

* restore test

Signed-off-by: raver119 <raver119@gmail.com>

* remove unwanted unit refs in cmale

Signed-off-by: raver119 <raver119@gmail.com>
2020-02-17 10:23:05 +03:00
Yurii Shyrma fe47f52896
Oleh tenzor mmul (#231)
* Libnd4j: TensorMMul backprop op #8174, raw implementation

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: TensorMMul backprop op #8174 merge master and some corrections

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: TensorMMul backprop op #8174 algorithm update, need testing, sync with  master

* Libnd4j: TensorMMul backprop op #8174 fixed incorrect B axes calculation

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: TensorMMul backprop op #8174 optimize axes identification and fix bug of indeces overlapping, added first test. need testing with different shapes

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: TensorMMul backprop op #8174 some fixes and improvements need more testing

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: TensorMMul backprop op #8174 fixed order of matrix multiply

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: TensorMMul backprop op #8174 fixed issue of incorrect axes definition, add tests based on TF, need additional testing for case dLdC not equal 1

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: TensorMMul backprop op #8174 fixed scalar case add test

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: TensorMMul backprop op #8174 fixed bp algorithm, axes definition, need some mode testing with different orders combination f,c; c,f f,f and add some checks for inputs

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: TensorMMul backprop op #8174 some checks and corrections added tests, exists the problem with different input orders support A-f B-c and A-f B-f

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: TensorMMul backprop op #8174 sync master

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* - correct bug in MmulHelper::tensorDot(a, b, c, axes_a, axes_b,permutForC)

Signed-off-by: Yurii <iuriish@yahoo.com>

* Libnd4j: TensorMMul backprop op #8174 code clean up and refactoring

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* - add check for linspase ordered permutations in ShapeUtils::evalShapeForTensorDot

Signed-off-by: Yurii <iuriish@yahoo.com>

* - provide additional code in shape::reshape stuff in order to reduce amount of allocation/copy operations during reshaping procedure

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further work on problem of wrong shape evaluation during permute/reshape procedures

Signed-off-by: Yurii <iuriish@yahoo.com>

* - still looking for bug reason in reshape/permute stuff

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct bug in transform cuda native ops

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct bug in NDArray::assign

Signed-off-by: Yurii <iuriish@yahoo.com>

* - remove old shape::reshape stuff

Signed-off-by: Yurii <iuriish@yahoo.com>

* - add possibility to disable copy of old buffer to new buffer during reshape operation in NDArray class

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct bug in tensorDot which had to do with wrong pointers assigments

Signed-off-by: Yurii <iuriish@yahoo.com>

Co-authored-by: Oleh <oleg.semeniv@gmail.com>
2020-02-13 20:33:54 +03:00
shugeo f0c684020f
Shugeo resize area fix4 (#229)
* Fixed a couple of issues with resize_area op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added additional test for alternate params for resize_area testing.

Signed-off-by: shugeo <sgazeos@gmail.com>
2020-02-12 19:02:42 +03:00
shugeo 5ae40f6e38
Shugeo sequence mask fix2 (#216)
* Fixed sequence_mask op and tests.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Cuda fix for sequence_mask op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed sequence_mask op for both platforms and tests.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed solve and triangular_solve for more than 2D for adjoint cases.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added adjoint solve test again.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added a set of tests for triangual_solve and generic solve ops.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added a pair tests for triangular_solve

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added tests for triangular_solve op.

Signed-off-by: shugeo <sgazeos@gmail.com>
2020-02-06 21:06:50 +03:00
shugeo 41ff907bc6
Shugeo solve linear (#191)
* linear equations systems solve op. Initial commit.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed compiling issues.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Linear equations systems solve. The next stage commit.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added test for linear equations systems solve operation.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added additional test and fixed lower matrix retrievance.

* Implementation for solve of the systems of linear equations."

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored permutation generation.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added restore for permutations batched with cuda helper for solve op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Finished cuda implementation for solve op helpers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored cpu helpers for solve op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fix gtest output on Windows

* Fixed issue with permutation matrix for cuda implementation.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed issue with permutation matrix for cpu implementation.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Eliminated waste comments.

Signed-off-by: shugeo <sgazeos@gmail.com>

* LinearSolve added

* Mapping added

* Javadoc added

* Refactored implementation of triangular_solve helpers and tests for solve matrix equations generally.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added a test for solve op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Solve test added

* Fix for TF import

Co-authored-by: Serhii Shepel <9946053+sshepel@users.noreply.github.com>
Co-authored-by: raver119 <raver119@gmail.com>
Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
2020-02-04 08:59:11 +03:00
shugeo 99a54829c2 Shugeo resize area fix2 (#181)
* Added test for issue with resize_area op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added a pair of tests for resize_are op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored resize_area kernel to avoid shared memory overflow.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Eliminated prints with tests.

Signed-off-by: shugeo <sgazeos@gmail.com>

* ignore bad test

Signed-off-by: raver119 <raver119@gmail.com>

* Fixed test with resize_area.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed test for float constants.

Signed-off-by: shugeo <sgazeos@gmail.com>

Co-authored-by: raver119 <raver119@gmail.com>
2020-01-24 20:55:25 +03:00
shugeo 2717b25931 Shugeo qr (#153)
* Added qr op implementation. Initial version.

* Fixed doc for qr op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Implementation of QR decomposition. CPU platform version.

* Added a pair of tests for qr op testing.

Signed-off-by: shugeo <sgazeos@gmail.com>

* QR implementation.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Corrected norm using.

* Properly calculated intermediate results with QR decomposition.

* Another step to implement QR algorithm by householder.

* Cpu implementatio for QR decomposition. The first working edition.

* Corrected test to QR decomposition.

* Added tad multithreading with QR implementation.

* Finished cpu implementation for QR decomposition helpers.

* Refactored tests and improved multithreading.

* Refactored QR cpu implementation and update cuda implementation helpers.

* Cuda QR helper implementation. The first working edition.

* Eliminated waste prints.

* Restore multithreading with cuda implementation.

* Ops names corrected

* Refactored qr op helpers to optimize.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Eliminated waste manual ticking.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored memory allocation to avoid waste memory usage.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored matrixMinor method both for cuda and cpu platforms.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored method of vmul to use raw buffers instead type conversion.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored temporary array of matricies.

Signed-off-by: shugeo <sgazeos@gmail.com>

Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
Co-authored-by: raver119 <raver119@gmail.com>
2020-01-22 13:59:36 +03:00
shugeo 815a2908af Shugeo solve triangular (#173)
* Added implementation of the triangular_solve op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed compilation issues.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added verification of input data and helpers facilities for triangular_solve op.'

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added cpu implementation for triangular_solve helpers.

* Added tests and implementation for upper triangular equations.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added a pair of cases to tests.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added multithreading with cpu helpers for triangular_solve op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added cuda implementation of triangular_solve op helpers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Finished cuda implementation of triangular_solve helpers and tests.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed copyright marks.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Corrected grammar errors with doc and error messages.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored matricies processing with triangular_solve cuda helper implementation.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added triangular_solve wrapper

* Fixed mapping

* Added processing for adjoint with cpu helpers of triangular_solve op implementation.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added implementation for adjoint routine with cuda platform.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added multithreading with adjoint routine for cpu platform.

Signed-off-by: shugeo <sgazeos@gmail.com>

Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
2020-01-22 10:48:03 +03:00
shugeo e50b285c2c Shugeo resize area (#162)
* Added implementation for resize_area op. Initial commit.

* Added implementation of resize_area op. Initial revision.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Corrected resizeArea functor call.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Implementation of resize_area. Cpu platform helpers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Implementation for resize_area helpers. The first part revision.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added a set of tests for resize_area op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Cuda implementation for resize_area. Initial approach.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adding multithreading for resize_area algorithm.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Cuda implementation of resize_area helpers. Shared memory approach.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored resizeAreaKernel with cuda implementation.

* Eliminated compilation errors.

* ResizeArea helpers for cuda platform. The first working revision.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added test for batched resize_area op testing.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Implementation of resize_are for cuda platform and tests.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed multithreading with resize_area op helper.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Corrected copyright marks with sources.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Corrected copyright mark for resize_area op implementation.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Corrected copyright mark for parity ops header.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Corrected typo in strings and so on with image resize ops.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored resize_area helpers and multithreading.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added ResizeArea wrapper

* Added test with align_corners and fixed shape processing with only int args given for output size.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added test

* TF mapping for ResizeArea

* Fixed implementation issues with resize_area op for both platforms.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored image resizer struct to use flexible types for ints and floats.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Improved multithreading with resizeAreaKernel launch.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Use asynchronical memory copying with cuda platform image resize allocations.

Signed-off-by: shugeo <sgazeos@gmail.com>

Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
2020-01-22 10:46:33 +03:00
raver119 7783012f39
cuDNN integration (#150)
* initial commit

Signed-off-by: raver119 <raver119@gmail.com>

* one file

Signed-off-by: raver119 <raver119@gmail.com>

* few more includes

Signed-off-by: raver119 <raver119@gmail.com>

* m?

Signed-off-by: raver119 <raver119@gmail.com>

* const

Signed-off-by: raver119 <raver119@gmail.com>

* cudnn linkage in tests

Signed-off-by: raver119 <raver119@gmail.com>

* culibos

Signed-off-by: raver119 <raver119@gmail.com>

* static reminder

Signed-off-by: raver119 <raver119@gmail.com>

* platform engine tag

Signed-off-by: raver119 <raver119@gmail.com>

* HAVE_CUDNN moved to config.h.in

Signed-off-by: raver119 <raver119@gmail.com>

* include

Signed-off-by: raver119 <raver119@gmail.com>

* include

Signed-off-by: raver119 <raver119@gmail.com>

* skip cudnn handle creation if there's not cudnn

Signed-off-by: raver119 <raver119@gmail.com>

* meh

Signed-off-by: raver119 <raver119@gmail.com>

* target device in context

Signed-off-by: raver119 <raver119@gmail.com>

* platform engines

Signed-off-by: raver119 <raver119@gmail.com>

* platform engines

Signed-off-by: raver119 <raver119@gmail.com>

* allow multiple -h args

Signed-off-by: raver119 <raver119@gmail.com>

* allow multiple -h args

Signed-off-by: raver119 <raver119@gmail.com>

* move mkldnn out of CPU block

Signed-off-by: raver119 <raver119@gmail.com>

* link to mkldnn on cuda

Signed-off-by: raver119 <raver119@gmail.com>

* less prints

Signed-off-by: raver119 <raver119@gmail.com>

* minor tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* next step

Signed-off-by: raver119 <raver119@gmail.com>

* conv2d NCHW draft

Signed-off-by: raver119 <raver119@gmail.com>

* conv2d biasAdd

Signed-off-by: raver119 <raver119@gmail.com>

* test for MKL/CUDNN combined use

Signed-off-by: raver119 <raver119@gmail.com>

* - provide additional code for conv2d ff based on cudnn api, not tested yet

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further work on conv2d helper based on using cudnn api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - fixing several cuda bugs which appeared after cudnn lib had been started to use

Signed-off-by: Yurii <iuriish@yahoo.com>

* - implementation of conv2d backprop op based on cudnn api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - implementaion of conv3d and conv3d_bp ops based on cudnn api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - bugs fixing in conv3d/conv3d_bp ops (cudnn in use)

Signed-off-by: Yurii <iuriish@yahoo.com>

* - implementation of depthwiseConv2d (ff/bp) op based on cudnn api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - implementation of batchnorm ff op based on cudnn api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - disable cudnn batchnorm temporary

Signed-off-by: Yurii <iuriish@yahoo.com>

* - add minor change in cmake

Signed-off-by: Yurii <iuriish@yahoo.com>

* engine for depthwise mkldnn

Signed-off-by: raver119 <raver119@gmail.com>

* couple of includes

Signed-off-by: raver119 <raver119@gmail.com>

* - provide permutation to cudnn batchnorm ff when format is NHWC

Signed-off-by: Yurii <iuriish@yahoo.com>

* lgamma fix

Signed-off-by: raver119 <raver119@gmail.com>

* - eliminate memory leak in two tests

Signed-off-by: Yurii <iuriish@yahoo.com>

Co-authored-by: Yurii Shyrma <iuriish@yahoo.com>
2020-01-20 21:32:46 +03:00
shugeo 6943a5f57a Shugeo lgamma (#170)
* lgamma op. Initial version.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored lgamma op and test.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Lgamma wrapper

* Added TF mapping

Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
2020-01-20 12:29:36 +03:00
Yurii Shyrma bbf88b53dd - fix wrong calculation of elements offsets in batchnorm op when input arrays have unusual (#169)
Signed-off-by: Yurii <iuriish@yahoo.com>
2020-01-11 00:14:20 +03:00
Oleh 2404be5fe0 Oleh multinomial (#163)
* libnd4j: Multinomial op #8570 first raw step of multinomial random data generator implementation

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j: Multinomial op #8570 next step of multinomial random categories generator implementation on both cpu and cuda, need corrections and code clean up before review and testing

* libnd4j: Multinomial op #8570 code clean up and fixed issues data selecting, moved from coords to tads

* libnd4j: Multinomial op #8570 fixed cuda build add reference for math materials that was used for implementation

* libnd4j: Multinomial op #8570 fixed several bugs, added several tests and improved cuda version. current implementation works, need testing of reproduction with the same seed

* libnd4j: Multinomial op #8570 fixes and optimization after discussion in both cuda and cpu

* libnd4j: Multinomial op #8570 add corrections after review, removed tads, replace 2D parallel loop by 3D

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j: Multinomial op fixed declaration and add tests need discussion

* libnd4j: Multinomial op fix in test

* libnd4j: Multinomial op corrected behavior to get reproducible results, fixed issue in uniform value getting, tests added, need cuda review and cuda testing

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j: Multinomial op fixed indexing on uniform calculation

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j: Multinomial op some corrections in max min declaration

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j: Multinomial op fixed index calculation, added rewind, corrected input declaration, added stats tests, both cuda and cpu. cuda need testing

* libnd4j: Multinomial op fixed bugs on cuda nad cpu. need review

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j: Multinomial op corrected tests to handle different orders

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j: Multinomial op some improvements after code review

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j: Multinomial op more corrections after review

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j: Multinomial op fixed seed usage, update tests, fixed cuda based on comments, fixed bug of rewind, removed one behavior, minor corrections.

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j: Multinomial op minor corrections

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j: Multinomial op rise the bound of fluctuation for random cases

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j: Multinomial op modified operation inputs and update implementation and tests on both cpu and cuda

* libnd4j: Multinomial op corrected data types according ops.proto

Co-authored-by: raver119 <raver119@gmail.com>
2020-01-06 22:35:05 +03:00
raver119 29e8e09db6
String changes (#3)
* initial commit

* additional data types & tensor type

Signed-off-by: raver119 <raver119@gmail.com>

* next step

Signed-off-by: raver119 <raver119@gmail.com>

* missing include

* sparse_to_dense

Signed-off-by: raver119 <raver119@gmail.com>

* few more tests files

Signed-off-by: raver119 <raver119@gmail.com>

* draft

Signed-off-by: raver119 <raver119@gmail.com>

* numeric sparse_to_dense

Signed-off-by: raver119 <raver119@gmail.com>

* comment

Signed-off-by: raver119 <raver119@gmail.com>

* string sparse_to_dense version

Signed-off-by: raver119 <raver119@gmail.com>

* CUDA DataBuffer expand

Signed-off-by: raver119 <raver119@gmail.com>

* few tweaks for CUDA build

Signed-off-by: raver119 <raver119@gmail.com>

* shape fn for string_split

Signed-off-by: raver119 <raver119@gmail.com>

* one more comment

Signed-off-by: raver119 <raver119@gmail.com>

* string_split indices

Signed-off-by: raver119 <raver119@gmail.com>

* next step

Signed-off-by: raver119 <raver119@gmail.com>

* test passes

Signed-off-by: raver119 <raver119@gmail.com>

* few rearrangements for databuffer implementations

Signed-off-by: raver119 <raver119@gmail.com>

* DataBuffer: move inline methods to common implementations

Signed-off-by: raver119 <raver119@gmail.com>

* add native DataBuffer to Nd4j presets

Signed-off-by: raver119 <raver119@gmail.com>

* DataBuffer creation

Signed-off-by: raver119 <raver119@gmail.com>

* use DataBuffer for allocation

Signed-off-by: raver119 <raver119@gmail.com>

* cpu databuffer as deallocatable

Signed-off-by: raver119 <raver119@gmail.com>

* DataBuffer setters for bufers

Signed-off-by: raver119 <raver119@gmail.com>

* couple of wrappers

Signed-off-by: raver119 <raver119@gmail.com>

* DataBuffers being passed around

Signed-off-by: raver119 <raver119@gmail.com>

* Bunch of ByteBuffer-related signatures gone

Signed-off-by: raver119 <raver119@gmail.com>

* - few more Nd4j signatures removed
- minor fix for bfloat16

Signed-off-by: raver119 <raver119@gmail.com>

* nullptr pointer is still a pointer, but 0 as address :)

Signed-off-by: raver119 <raver119@gmail.com>

* one special test

Signed-off-by: raver119 <raver119@gmail.com>

* empty string array init

Signed-off-by: raver119 <raver119@gmail.com>

* one more test in cpp

Signed-off-by: raver119 <raver119@gmail.com>

* memcpy instead of databuffer swap

Signed-off-by: raver119 <raver119@gmail.com>

* special InteropDataBuffer for front-end languages

Signed-off-by: raver119 <raver119@gmail.com>

* few tweaks for java

Signed-off-by: raver119 <raver119@gmail.com>

* pointer/indexer actualization

Signed-off-by: raver119 <raver119@gmail.com>

* CustomOp returns list for inputArumgents and outputArguments instead of array

Signed-off-by: raver119 <raver119@gmail.com>

* redundant call

Signed-off-by: raver119 <raver119@gmail.com>

* print_variable op

Signed-off-by: raver119 <raver119@gmail.com>

* - view handling (but wrong one)
- print_variable java wrapper

Signed-off-by: raver119 <raver119@gmail.com>

* one more test

Signed-off-by: raver119 <raver119@gmail.com>

* - empty arrays handling

Signed-off-by: raver119 <raver119@gmail.com>

* - deserialization works now

Signed-off-by: raver119 <raver119@gmail.com>

* minor fix

Signed-off-by: raver119 <raver119@gmail.com>

* meh

Signed-off-by: raver119 <raver119@gmail.com>

* one more fix

Signed-off-by: raver119 <raver119@gmail.com>

* initial cuda commit

Signed-off-by: raver119 <raver119@gmail.com>

* print_variable message validation

Signed-off-by: raver119 <raver119@gmail.com>

* CUDA views

Signed-off-by: raver119 <raver119@gmail.com>

* CUDA special buffer size

Signed-off-by: raver119 <raver119@gmail.com>

* minor update to match master changes

Signed-off-by: raver119 <raver119@gmail.com>

* - consider arrays always actual on device for CUDA
- additional PrintVariable constructor
- CudaUtf8Buffer now allocates host buffer by default

Signed-off-by: raver119 <raver119@gmail.com>

* meh

Signed-off-by: raver119 <raver119@gmail.com>

* - print_variable now allows print from device

Signed-off-by: raver119 <raver119@gmail.com>

* InteropDataBuffer data type fix

Signed-off-by: raver119 <raver119@gmail.com>

* ...

Signed-off-by: raver119 <raver119@gmail.com>

* disable some debug messages

Signed-off-by: raver119 <raver119@gmail.com>

* master pulled in

Signed-off-by: raver119 <raver119@gmail.com>

* couple of new methods for DataBuffer interop

Signed-off-by: raver119 <raver119@gmail.com>

* java side

Signed-off-by: raver119 <raver119@gmail.com>

* offsetted constructor

Signed-off-by: raver119 <raver119@gmail.com>

* new CUDA deallocator

Signed-off-by: raver119 <raver119@gmail.com>

* CUDA backend torn apart

Signed-off-by: raver119 <raver119@gmail.com>

* CUDA backend torn apart 2

Signed-off-by: raver119 <raver119@gmail.com>

* CUDA backend torn apart 3

Signed-off-by: raver119 <raver119@gmail.com>

* - few new tests
- few new methods for DataBuffer management

Signed-off-by: raver119 <raver119@gmail.com>

* few more tests + few more tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* two failing tests

Signed-off-by: raver119 <raver119@gmail.com>

* one more test

Signed-off-by: raver119 <raver119@gmail.com>

* two failing tests pass

Signed-off-by: raver119 <raver119@gmail.com>

* now we pass DataBuffer to legacy ops too

Signed-off-by: raver119 <raver119@gmail.com>

* Native DataBuffer for legacy ops, Java side

Signed-off-by: raver119 <raver119@gmail.com>

* CPU java side update

Signed-off-by: raver119 <raver119@gmail.com>

* CUDA java side update

Signed-off-by: raver119 <raver119@gmail.com>

* no more prepare/register action on java side

Signed-off-by: raver119 <raver119@gmail.com>

* NDArray::prepare/register use now accepts vectors

Signed-off-by: raver119 <raver119@gmail.com>

* InteropDataBuffer now has few more convenience methods

Signed-off-by: raver119 <raver119@gmail.com>

* java bindings update

Signed-off-by: raver119 <raver119@gmail.com>

* tick device in NativeOps

Signed-off-by: raver119 <raver119@gmail.com>

* Corrected usage of OpaqueBuffer for tests.

* Corrected usage of OpaqueBuffer for java tests.

* NativeOpsTests fixes.

* print_variable now returns scalar

Signed-off-by: raver119 <raver119@gmail.com>

* one more test

Signed-off-by: raver119 <raver119@gmail.com>

* compat_string_split fix for CUDA

Signed-off-by: raver119 <raver119@gmail.com>

* - CUDA execScalar fix
- CUDA lazyAllocateHostPointer now checks java indexer/pointer instead of native pointer

Signed-off-by: raver119 <raver119@gmail.com>

* legacy ops DataBuffer migration prototype

Signed-off-by: raver119 <raver119@gmail.com>

* ignore device shapeinfo coming from java

Signed-off-by: raver119 <raver119@gmail.com>

* minor fix

Signed-off-by: raver119 <raver119@gmail.com>

* minor transformAny fix

Signed-off-by: raver119 <raver119@gmail.com>

* minor tweak for lazy host allocation

Signed-off-by: raver119 <raver119@gmail.com>

* - DataBuffer::memcpy method
- bitcast now uses memcpy

Signed-off-by: raver119 <raver119@gmail.com>

* - IndexReduce CUDA dimension buffer fix

Signed-off-by: raver119 <raver119@gmail.com>

* views for CPU and CUDA

Signed-off-by: raver119 <raver119@gmail.com>

* less spam

Signed-off-by: raver119 <raver119@gmail.com>

* optional memory init

Signed-off-by: raver119 <raver119@gmail.com>

* async memset

Signed-off-by: raver119 <raver119@gmail.com>

* - SummaryStats CUDA fix
- DataBuffer.sameUnderlyingData() impl
- execBroadcast fix

Signed-off-by: raver119 <raver119@gmail.com>

* - reduce3All fix
switch to CUDA 10 temporarily

Signed-off-by: raver119 <raver119@gmail.com>

* CUDA version

Signed-off-by: raver119 <raver119@gmail.com>

* proper memory deallocator registration

Signed-off-by: raver119 <raver119@gmail.com>

* HOST_ONLY workspace allocation

Signed-off-by: raver119 <raver119@gmail.com>

* temp commit

Signed-off-by: raver119 <raver119@gmail.com>

* few conflicts resolved

Signed-off-by: raver119 <raver119@gmail.com>

* few minor fixes

Signed-off-by: raver119 <raver119@gmail.com>

* one more minor fix

Signed-off-by: raver119 <raver119@gmail.com>

* NDArray permute should operate on JVM primitives

Signed-off-by: raver119 <raver119@gmail.com>

* - create InteropDataBuffer for shapes as well
- update pointers after view creation in Java

Signed-off-by: raver119 <raver119@gmail.com>

* - addressPointer temporary moved to C++

Signed-off-by: raver119 <raver119@gmail.com>

* CUDA: don't account offset twice

Signed-off-by: raver119 <raver119@gmail.com>

* CUDA: DataBuffer pointer constructor updated

Signed-off-by: raver119 <raver119@gmail.com>

* CUDA NDArray.unsafeDuplication() simplified

Signed-off-by: raver119 <raver119@gmail.com>

* CUDA minor workspace-related fixes

Signed-off-by: raver119 <raver119@gmail.com>

* CPU DataBuffer.reallocate()

Signed-off-by: raver119 <raver119@gmail.com>

* print_affinity op

Signed-off-by: raver119 <raver119@gmail.com>

* print_affinity java side

Signed-off-by: raver119 <raver119@gmail.com>

* CUDA more tweaks for data locality

Signed-off-by: raver119 <raver119@gmail.com>

* - compat_string_split tweak
- CudaUtf8Buffer update

Signed-off-by: raver119 <raver119@gmail.com>

* INDArray.close() mechanic restored

Signed-off-by: raver119 <raver119@gmail.com>

* one more test fixed

Signed-off-by: raver119 <raver119@gmail.com>

* - CUDA DataBuffer.reallocate() updated
- cudaMemcpy (synchronous) restored

Signed-off-by: raver119 <raver119@gmail.com>

* one last fix

Signed-off-by: raver119 <raver119@gmail.com>

* bad import removed

Signed-off-by: raver119 <raver119@gmail.com>

* another small fix

Signed-off-by: raver119 <raver119@gmail.com>

* one special test

Signed-off-by: raver119 <raver119@gmail.com>

* fix bad databuffer size

Signed-off-by: raver119 <raver119@gmail.com>

* release primaryBuffer on replace

Signed-off-by: raver119 <raver119@gmail.com>

* higher timeout

Signed-off-by: raver119 <raver119@gmail.com>

* disable timeouts

Signed-off-by: raver119 <raver119@gmail.com>

* dbCreateView now validates offset and length of a view

Signed-off-by: raver119 <raver119@gmail.com>

* additional validation for dbExpand

Signed-off-by: raver119 <raver119@gmail.com>

* restore timeout back again

Signed-off-by: raver119 <raver119@gmail.com>

* smaller distribution for rng test to prevent timeouts

Signed-off-by: raver119 <raver119@gmail.com>

* CUDA DataBuffer::memcpy now copies to device all the time

Signed-off-by: raver119 <raver119@gmail.com>

* OpaqueDataBuffer now contains all required methods for interop

Signed-off-by: raver119 <raver119@gmail.com>

* some javadoc

Signed-off-by: raver119 <raver119@gmail.com>

* GC on failed allocations

Signed-off-by: raver119 <raver119@gmail.com>

* minoe memcpu tweak

Signed-off-by: raver119 <raver119@gmail.com>

* one more bitcast test

Signed-off-by: raver119 <raver119@gmail.com>

* - NDArray::deviceId() propagation
- special multi-threaded test for data locality checks

Signed-off-by: raver119 <raver119@gmail.com>

* DataBuffer additional syncStream

Signed-off-by: raver119 <raver119@gmail.com>

* DataBuffer additional syncStream

Signed-off-by: raver119 <raver119@gmail.com>

* one ignored test

Signed-off-by: raver119 <raver119@gmail.com>

* skip host alloc for empty arrays

Signed-off-by: raver119 <raver119@gmail.com>

* ByteBuffer support is back

Signed-off-by: raver119 <raver119@gmail.com>

* DataBuffer::memcpy minor fix

Signed-off-by: raver119 <raver119@gmail.com>

* few minor prelu/bp tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* nullify-related fixes

Signed-off-by: raver119 <raver119@gmail.com>

* PReLU fixes (#157)

Signed-off-by: Alex Black <blacka101@gmail.com>

* Build fixed

* Fix tests

* one more ByteBuffer signature restored

Signed-off-by: raver119 <raver119@gmail.com>

* nd4j-jdbc-hsql profiles fix

Signed-off-by: raver119 <raver119@gmail.com>

* nd4j-jdbc-hsql profiles fix

Signed-off-by: raver119 <raver119@gmail.com>

* PReLU weight init fix

Signed-off-by: Alex Black <blacka101@gmail.com>

* Small PReLU fix

Signed-off-by: Alex Black <blacka101@gmail.com>

* - INDArray.migrate() reactivated
- DataBuffer::setDeviceId(...) added
- InteropDataBuffer Z syncToDevice added for views

Signed-off-by: raver119 <raver119@gmail.com>

* missed file

Signed-off-by: raver119 <raver119@gmail.com>

* Small tweak

Signed-off-by: Alex Black <blacka101@gmail.com>

* cuda 10.2

Signed-off-by: raver119 <raver119@gmail.com>

* minor fix

Signed-off-by: raver119 <raver119@gmail.com>

Co-authored-by: shugeo <sgazeos@gmail.com>
Co-authored-by: Alex Black <blacka101@gmail.com>
Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
2020-01-04 13:27:50 +03:00
shugeo fbf7c9d38b Fixed lu for cuda platform and tests. (#158)
Signed-off-by: shugeo <sgazeos@gmail.com>
2020-01-02 23:25:41 +03:00
Oleh 75123b0a4c [WIP] Oleh rgb yuv (#147)
* libnd4j: RgbToYuv and YuvToRgb, both implementations for both cpu and cuda. Need adding tests and review

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j: RgbToYuv and YuvToRgb, replace coords method on Tad in both cpu and cuda, add tests, fixed bugs

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j: RgbToYuv and YuvToRgb minor corrections

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j: RgbToYuv and YuvToRgb corrections to use operations in-place
2019-12-24 18:30:54 +03:00
raver119 d1e5e79c10
[WIP] CUDA concat tweak (#148)
* one special test

Signed-off-by: raver119 <raver119@gmail.com>

* one special test

Signed-off-by: raver119 <raver119@gmail.com>

* local memory for concat

Signed-off-by: raver119 <raver119@gmail.com>

* fixed grid size for concat

Signed-off-by: raver119 <raver119@gmail.com>

* fixed grid size for concat

Signed-off-by: raver119 <raver119@gmail.com>

* test commented out

Signed-off-by: raver119 <raver119@gmail.com>
2019-12-24 17:01:03 +03:00
Abdelrauf 39d43ca170 RgbToYiq and YiqToRgb operations (#142)
* RgbToYiq and YiqToRgb

Signed-off-by: Abdelrauf <rauf@konduit.ai>

* CUDA impl for RgbToYiq and YiqToRgb

Signed-off-by: raver119 <raver119@gmail.com>

* remove print

Signed-off-by: raver119 <raver119@gmail.com>

* allow inplace for hsv,rgb,yiq ops

Signed-off-by: Abdelrauf <rauf@konduit.ai>

Co-authored-by: raver119 <raver119@gmail.com>
2019-12-24 15:20:35 +03:00
Yurii Shyrma 5d9b2a16e5 Shyrma temp (#131)
* - specifying template instantiation for certain types in float16 and bloat16

Signed-off-by: Yurii <iuriish@yahoo.com>

* - polishing bfloat16 and float16 member functions template specialization

Signed-off-by: Yurii <iuriish@yahoo.com>

* - rewrite and overload array +-*/ scalar and scalar +-*/ arr in NDAray class

Signed-off-by: Yurii <iuriish@yahoo.com>

* - make corrections which have to do with and rvalue lvalue conversions

Signed-off-by: Yurii <iuriish@yahoo.com>

* - provide move semantic in NDArray operators array +-/* array

Signed-off-by: Yurii <iuriish@yahoo.com>

* float16/bfloat16 tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* one more tweak

Signed-off-by: raver119 <raver119@gmail.com>

* - make float16 and bfloat16 to compile successfully on cuda

Signed-off-by: Yurii <iuriish@yahoo.com>

* - do not use resources of view-like arrays when move semantics is applied

Signed-off-by: Yurii <iuriish@yahoo.com>

* - get rid of pointers in signatures NDArray methods 1

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correction of signature of NDArray::dup method

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correction of signature of NDArray::reduceAlongDimension method

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::applyIndexReduce and applyTrueBroadcast methods

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::applyReduce3 and varianceAlongDimension methods

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::tensorsAlongDimension and diagonal methods

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::allTensorsAlongDimension

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::reduceAlongDimension 2

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::applyTransform 2

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::applyPairwiseTransform 2

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::applyBroadcast 2

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::applyTrueBroadcast 2

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::applyScalar and applyScalarArr

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::lambda methods

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::reduce3 methods 2

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of following NDArray methods: add/sub/mul/div row/column and fillAsTriangular

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::tileToShape methods

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::isShapeSameStrict method

Signed-off-by: Yurii <iuriish@yahoo.com>

* minor corrections in tests

Signed-off-by: Yurii <iuriish@yahoo.com>

* - replace reduce op in batchnorm mkldnn

Signed-off-by: Yurii <iuriish@yahoo.com>

* - add explicit templates instantiations for operator+(NDArray&&. const scalar)

Signed-off-by: Yurii <iuriish@yahoo.com>

* - corrections of casts in float16/bfloat16

Signed-off-by: Yurii <iuriish@yahoo.com>

* - provide move semantics in following NDArray methods: transform, applyTrueBroadcast, transpose, reshape, permute

Signed-off-by: Yurii <iuriish@yahoo.com>

* - get rid of input array A duplicate in svd cuda op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - avoid available bug in svd cuda API

Signed-off-by: Yurii <iuriish@yahoo.com>

* - add temporary global memory buffer in svd cuda when calcUV = false and  m != n

Signed-off-by: Yurii <iuriish@yahoo.com>

* - remove test with blfoat16 type for betainC

Signed-off-by: Yurii <iuriish@yahoo.com>

* - resolve conflicts after master has been merged in

Signed-off-by: Yurii <iuriish@yahoo.com>

* - changed type of affected input array in fused_batch_norm

Signed-off-by: Yurii <iuriish@yahoo.com>

* - add several explicit type castings

Signed-off-by: Yurii <iuriish@yahoo.com>

* - add ND4J_EXPORT to operators

Signed-off-by: Yurii <iuriish@yahoo.com>

* - add explicit template types in instantiations of template arithm operators of NDArray class

Signed-off-by: Yurii <iuriish@yahoo.com>

* - one more test fix

Signed-off-by: Yurii <iuriish@yahoo.com>

Co-authored-by: raver119 <raver119@gmail.com>
2019-12-20 22:35:39 +03:00
Oleh 211c0df76f Oleh rgb to gray scale (#138)
* libnd4j: RgbToGrayscale op #8536 - raw implementation in user branch, need checks for integration and adding other orders

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j: RgbToGrayscale op #8536 next step of merging images

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j: RgbToGrayscale op #8536, Revert merge of hsv_to_rgb and rgb_to_hsv as cause conflicts in naming need refactoring before merge, implementation of rbg_to_grs added

* libnd4j: RgbToGrayscale op #8536 imlementation and conflict resolve

* libnd4j: RgbToGrayscale op #8536 merged operations with images into image, renamed methods and files

* libnd4j: RgbToGrayscale op #8536 added test for rgbToGrayScale, need clarification and fixed tests case run

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j: RgbToGrayscale op #8536 bug fixing and need review

* libnd4j: RgbToGrayscale op #8536 some additional corrections after review

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* - minor corrections in rgbToGrs test1

Signed-off-by: Yurii <iuriish@yahoo.com>

* libnd4j: RgbToGrayscale op #8536, corrected tests and rbf_to_grs, fixed problems, refactoring, need review

* libnd4j: RgbToGrayscale op #8536 fix for 'f' order in rgbToGrs

* libnd4j: RgbToGrayscale op #8536 fixed several bugs with dimC, test case refactoring and improve

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* - add cuda kernel for rgbToGrs op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - fix linkage errors

Signed-off-by: Yurii <iuriish@yahoo.com>

Co-authored-by: Yurii Shyrma <iuriish@yahoo.com>
2019-12-20 20:59:29 +03:00
shugeo 67d8199165 [WIP] Shugeo lup (#126)
* Added infrastructure for implementation op lu for both cuda and cpu platforms.

* Added implementation of helpers with lu op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored LU decomposition to use vector of permutations instead.

* Refactored helpers for lu op.

* Fixed crash with determinant op.

* Refactored cpu LU op heleper.

* Added implementation for lu op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed issue with argmax on column.

* Added multithreaded behaviour for lu op helper.

* Fixed multithreaded cpu implementation helpers for lu op.

* Added cuda implementation for lu op helper.

* Finished lu helper implementation for cuda platform.

* Eliminated waste prints and comments.

* Fixed race condition and multithreading issues.

* Fixed memory leak with shape construction.

* Corrected test for lu op to avoid near zero elements on the main diagonal."

Signed-off-by: shugeo <sgazeos@gmail.com>

* Improved test for adjust_constast op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed issues with cuda implementation of resize_bicubic helpers.

Signed-off-by: shugeo <sgazeos@gmail.com>
2019-12-20 17:56:28 +03:00
Abdelrauf 3c9a2a5cd9 Fix for hsv and rgb ranges (#136)
Signed-off-by: Abdelrauf <rauf@konduit.ai>
2019-12-20 08:48:30 +03:00
Abdelrauf e0a9cb6c08 [WIP] HSV,RGB color model conversions (#125)
* CUDA implementation for hsv_to_rgb and rgb_to_hsv

Signed-off-by: raver119 <raver119@gmail.com>

* hsv_to_rgb and rgb_to_hsv operations
Test coverage: c order 1d, 2d, 3d array

Signed-off-by: Abdelrauf <rauf@konduit.ai>

* Index check

Signed-off-by: Abdelrauf <rauf@konduit.ai>

* Suppress Msvc floating point errors

Signed-off-by: Abdelrauf <rauf@konduit.ai>

* Added Index Check for adjust_saturation and adjust_hue

Signed-off-by: Abdelrauf <rauf@konduit.ai>

* minor fix

Signed-off-by: raver119 <raver119@gmail.com>

* Fixes missed Msvc floating narrowing errors

Signed-off-by: Abdelrauf <rauf@konduit.ai>
2019-12-17 09:42:09 +03:00
raver119 b32dd1bf92
[WIP] resize_bicubic types (#116)
* resize_bicubic: allow more dtypes

Signed-off-by: raver119 <raver119@gmail.com>

* resize_bicubic: allow less dtypes

Signed-off-by: raver119 <raver119@gmail.com>

* Refactored resize_bicubic op to full conform with TF1.5 and tests.

* Corrected test to proper data type output.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Corrected double input test to float constant outputs.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Finished with correction of tests for bicubic interpolated resizes expected.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed adjust_contrast ops to allow non-RGB inputs.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored adjust_contrast_v2 to conform with TF one.

Signed-off-by: shugeo <sgazeos@gmail.com>

* AdjustContrast tests activated

* two typos fixed

Signed-off-by: raver119 <raver119@gmail.com>
2019-12-06 18:58:37 +03:00
shugeo e09a785232 Shugeo resize fix5 (#102)
* Refactored resize images ops to use TF-like bool args as input.

* Refactored helpers for cpu implementation of resize_bilinear and resize_nearest_neighbor ops.

* Refactored cuda implementation for image.resize_bilinear and image.resize_nearest_neighbor ops helpers.

* Refactored nearest_neighbor resize op.

* Added a pair of tests for special case of resize_bilinear algorithm.

* Fixed issue with resize_bilinear op.

* Refactored cpu implementation for helpers with resize_nearest_neighbor op.

* Final fixed for resize ops to conform TF v.1.5

* Refactored cuda helpers for resize_neares_neighbor op.

* Fixed resize_bilinear to accept proper data.

* Fixed issue with non-float input for resize_bilinear op.

* Refactored cuda helper for resize_bilinear to proper process non-float inputs.

* Added tests for resize_bilinear to int inputs.

* Fixed ResizeBilinear wrapper

* Tests fixed

* Fixed float and bool constant to avoid overflow for some kind of compilers.

* Corrected float constants with float data type.

* Added f suffix for float constants.

* Corrected float constant to avoid overflow with initializing lists.

* Corrected float initializing list with float input.

* Corrected bool constant with initalizing list.

* Corrected float and bool values with initializing lists.

* Fixed wrong constant.

* Fixed issue with 1x1 input picture for resize.

* ResizeBilinear default values on import fix

Signed-off-by: raver119 <raver119@gmail.com>
2019-12-05 22:05:33 +03:00
raver119 355c6b6096
[WIP] reverse improvements (#115)
* initial commit

Signed-off-by: raver119 <raver119@gmail.com>

* reverse draft

Signed-off-by: raver119 <raver119@gmail.com>

* reverse kernel

Signed-off-by: raver119 <raver119@gmail.com>

* reverse kernel

Signed-off-by: raver119 <raver119@gmail.com>
2019-12-05 20:03:10 +03:00
Alex Black 578a5abb68 DNNL/MKLDNN dilated causal conv1d + betainc (#103)
* - add padding calculation in same mode in causal conv1d op for right mkl paddings

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct causal condition in mkldnnUtils.cpp

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct some code which caused additional round errors is betainc op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - put float in place of template parameter in nan assign in betainc op

Signed-off-by: Yurii <iuriish@yahoo.com>
2019-12-04 14:50:17 +03:00
Yurii Shyrma 1f5e15b541 Shyrma adjust (#98)
* - add possibility of passing scalar-array as input parameter for scale factor in adjust hue/contrast/saturation ops
- correct typo in function which calculates regularized incomplete beta integral

Signed-off-by: Yurii <iuriish@yahoo.com>

* - fix bug in betainc cuda kernel

Signed-off-by: Yurii <iuriish@yahoo.com>

* - start working on implementation of digamma function

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further work on digamma function (cpu)

Signed-off-by: Yurii <iuriish@yahoo.com>

* - testing and fixing bugs in digamma op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - make correction n cuda kernel for polyGamma

Signed-off-by: Yurii <iuriish@yahoo.com>

* - remove unnecessary stuff from betaInc cuda kernel

Signed-off-by: Yurii <iuriish@yahoo.com>

* - resolve conflicts in DeclarableOpsTests3.cpp after master branch has been merged

Signed-off-by: Yurii <iuriish@yahoo.com>

* - restore id number of Not opertion in legacy_ops.h

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct padding calculation in mkl dnn conv1d causal

Signed-off-by: Yurii <iuriish@yahoo.com>

* restore empty check in adjust_contrast_v2

Signed-off-by: raver119 <raver119@gmail.com>
2019-12-03 09:40:45 +03:00
shugeo 1e9ff114aa Shugeo atomic tests (#97)
* Added atomic tests for atomicAdd, atomicSub and atomicDiv.

* Fixed atomicAdd for 16bit ints.

* Fixed atomicMul for 16 floats.

* Eliminated waste prints.

* Fixed problems with double type on matrix inverse helepers.

* Eliminated commented wrong code.

* Refactored atomicMul for 16bit types.

* few more minor tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* Fixed fake_quant_with_min_max_vars_per_channel args processing.
2019-12-02 21:40:54 +03:00
Yurii Shyrma d19eeaec52 Shyrma casual conv1d (#90)
* - add causal mode of padding to convolutions

Signed-off-by: Yurii <iuriish@yahoo.com>

* - add additional tests for causal conv1d

Signed-off-by: Yurii <iuriish@yahoo.com>

* - add causal mode for cuda conv kernels

Signed-off-by: Yurii <iuriish@yahoo.com>

* Java side of Conv1D changes

Signed-off-by: raver119 <raver119@gmail.com>

* Add Conv1DDerivative op

Signed-off-by: Alex Black <blacka101@gmail.com>

* Causal Conv1D gradient checks

Signed-off-by: Alex Black <blacka101@gmail.com>

* Tweaks

Signed-off-by: Alex Black <blacka101@gmail.com>

* - add causal padding mode to conv2d_bp

Signed-off-by: Yurii <iuriish@yahoo.com>

* More thorough causal conv1d tests

Signed-off-by: Alex Black <blacka101@gmail.com>
2019-11-29 14:14:30 +03:00
shugeo 009007120b Shugeo_release_fixes3 (#81)
* Implementation for non_max_suppression_v3 was added. Initial version

* Added check for overcome threshold.

* Added definition for V3 method.

* java remapping for NonMaxSuppressionV3

Signed-off-by: raver119 <raver119@gmail.com>

* Fixed proporly processing of an empty output and test.

* Refactored op to less threshold data to float.

* Implemented cuda-based helper for non_max_suppression_v3 op.

* Fixed fake_quant_with_min_max_vars op.

* Fixed tests with float numbers.

* - assert now stops execution
- sortByKey/sortByValue now have input validation

Signed-off-by: raver119 <raver119@gmail.com>

* missing var

Signed-off-by: raver119 <raver119@gmail.com>

* Fixed proper processing for zero max_size inputs.

* Refactored kernel callers.

* Fixed return statement for logdet op helper.

* Refactored unsorted segment SqrtN op.

* get back 8 tail bytes on CUDA

Signed-off-by: raver119 <raver119@gmail.com>

* Refactored segment prod ops and helpers for cuda and tests.

* Additional test.

* CudaWorkspace tests updated for 8 tail bytes

Signed-off-by: raver119 <raver119@gmail.com>

* special atomic test

Signed-off-by: raver119 <raver119@gmail.com>

* atomicMul/atomicDiv fix for 16bit values

Signed-off-by: raver119 <raver119@gmail.com>

* Eliminated waste prints.
2019-11-28 21:08:51 +03:00
Yurii Shyrma a8dd6713aa Shyrma scatter (#84)
* - improve performance of scatter (no lock) ops for 1D case

Signed-off-by: Yurii <iuriish@yahoo.com>

* - improve scatter lock op performance for 1D case

Signed-off-by: Yurii <iuriish@yahoo.com>

* - add kernel for verification of input indices-array elements in scatter and scatter_nd ops

Signed-off-by: Yurii <iuriish@yahoo.com>

* - provide fast indices checking on cpu side for scatter and gather osp

Signed-off-by: Yurii <iuriish@yahoo.com>

* - apply corrections requested by pr reviewer

Signed-off-by: Yurii <iuriish@yahoo.com>
2019-11-26 20:29:09 +03:00
shugeo 4187190609 Shugeo release fix2 (#70)
* Corrected input checking and tests for bitcast op.

* Fixed an issue with non_max_suppression form generation and processing with score threshold given.

* Fixed bilinear resize kernel and tests.

* push for Serhii

Signed-off-by: raver119 <raver119@gmail.com>

* Added test for nearest_neighbor resize with int input.

* Added data type check for input/output match.

* Eliminate error in macros.

* Improved output message for type checking.

* Fixed input/output types for op.

* Eliminated waste logging.

* Refactored resize_bilinear helper for multithreading for cpu platform.

* Cosmetic changes only.

* Fixed error for string substitution.

* Skip test for cbow_batch with cuda.

* fix for resizeNearestNeighbor output dtype

Signed-off-by: raver119 <raver119@gmail.com>

* Refactored non_max_suppression helper.

* Refactored shape generation and input handling.

* Added additional test.
2019-11-22 22:42:44 +03:00
Yurii Shyrma 7a90a31cfb
Shyrma deconv3 (#69)
* - profiling cuda kernels for vol2col and im2col

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct addBias helper

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct mkl dilation formula and switch off mkl api for dilation deconvolutions

Signed-off-by: Yurii <iuriish@yahoo.com>
2019-11-21 21:17:30 +02:00
shugeo dc0036f2c6
Shugeo image resize bicubic (#56)
* Added implementation files for image_resize and resize_bicubic ops.

* Image resize and image.resize_bicubic ops implementation. Initial revision.

* Finished with infrastructure development for image.resize_bilinear op and image_resizo op implementation.

* Refactored resize methods.

* Added processing for Mitchelcubic algorithm.

* Added check for input/output sizes.

* Added int and float types for crop_and_resize op.

* Refactored crop_and_resize output type check.

* Added helper for bicubic interpolation as TF v.1 does.

* Added TF v.1 bicubic helper for cuda platform.

* Added cached class for bicubic algorithm.

* Refactored cuda implementation for crop_and_resize helper to use proper output type.

* Added facilities for bicubic interpolation.

* Portion bicubic interpolation from TF.

* Added tests for resize_bilinear testing.

* Working implementation of bicubic interpolation and tests.

* Refactored routines with image_resize bicubic op helper.

* Refactored code with coding standards.

* Refactored cpu helpers for resize_bicubic op.

* Refactored bicubic helpers.

* Added bicubic resize facilities.

* Implementing cuda kernels for bicubic interpolation. Implementation step.

* Cuda implementation of resize_bicubic op helper.

* Refactor image.resize_bicubic op helpers.

* Refactored helpers for resize_bicubic. Added error checking with cuda implementation.

* Refactored cuda implementation of resize_bicubic op helper. The first working revision.

* Cuda arch implementation for resize_bicubic op helper. Full working single-threaded revision.

* Intermediate bicubic interpolation helper for cuda.

* Refactored cpu helper for resize_bicubic.

* Multithreaded cuda implementation for resize_bicubic.

* Fixed merge issues.

* Refactored nlp helpers.

* Replicated resize_bicubic for 3D also.

* Eliminated waste comments of unused code.

* Eliminated waste comments with unused code.

* Eliminated waste template definitions.

* Eliminated waste debug code.

* Eliminated waste comments.

* Fixed multithreading with helpers.

* Fixed test suites for float and double in float point input lists.

* Fixed usage of reshape with 3D/4D on resizes.

* Final fixes.

* Fixed resize_neighbor op problem.
2019-11-20 21:11:04 +02:00
shugeo 13e5c0a280
Shugeo release fix1 (#61)
* Added a pair of tests for failed ops.

* Fixed cpu helper for draw_bounding_boxes op.

* Refactored implementation of draw_bounding_boxes op to full conform with TF.

* Improved multithreading with draw_bounding_boxes op cuda helper.

* Eliminated log messages.

* Changed logging with draw_bounding_boxes op helper and tests.

* Resize_biliear with 3D input allowed.

* Refactored 3D input acception with resize_bilinear op.

* And another improvement.

* Refactored reshape of input/output for resize_bilinear.

* Improvements final.

* Finished with 3D replication for image.resize_bilinear/_nearest_neighbor.

* Added copyrights for TF code.

* Using new form of multithreading for cpu implementation.

* Fixed shape error.

* Added multithreaded with batches on crop_and_resize functor.

* Refactored multithreading with crop_and_resize and draw_bounding_boxes.
2019-11-20 13:37:48 +02:00
Yurii Shyrma 66b84b38cf
Shyrma mmul (#58)
* - get rid of some copy procedures in mmulHelper ops

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further work on embedding cuda api for batched gemm (cublasGemmBatchedEx) in our mmulHelper class

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further work on cuda batched gamm api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - write own cuda kernel performing batched gemm

Signed-off-by: Yurii <iuriish@yahoo.com>

* missing include in MmulHelper

Signed-off-by: raver119 <raver119@gmail.com>

* - forgot to keep in code previous correct kernels for mmulNxN, since it may happen that new onw will fail for some reason in future

Signed-off-by: Yurii <iuriish@yahoo.com>

* disable old tensordot

Signed-off-by: raver119 <raver119@gmail.com>

* - rewrite cuda kernels for usualGemm and usualGemv

Signed-off-by: Yurii <iuriish@yahoo.com>

* - profiling mmul helpers

Signed-off-by: Yurii <iuriish@yahoo.com>

* - prints to check shapes were added

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct type of output array Cin mmulNxN

Signed-off-by: Yurii <iuriish@yahoo.com>

* - take into account possible nans in C array

Signed-off-by: Yurii <iuriish@yahoo.com>

* slightly change numThreads message

Signed-off-by: raver119 <raver119@gmail.com>

* - make corrections in accordance to given notes in pr review

Signed-off-by: Yurii <iuriish@yahoo.com>
2019-11-19 15:39:36 +02:00
Alex Black da1944e8e1
SameDiff TF import (#49)
* Added implementation files for image_resize and resize_bicubic ops.

* Image resize and image.resize_bicubic ops implementation. Initial revision.

* Minor fix

* Some TF imports disabled.

* Finished with infrastructure development for image.resize_bilinear op and image_resizo op implementation.

* Refactored resize methods.

* Added processing for Mitchelcubic algorithm.

* adjust_contrast

* Small fix for TF import expected value loading when variable name starts with the test name

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Tests

* Tests added.

* Removed tf names absent in mapping.

* Some fixes.

* Small fixes

* Minor change

* Some failing tests.

* Disable failed test

* Ignore some tests

* Fix import class mapping

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fix float property mapping (flatbuffers)

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Override equality function for model 'dropout'

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fail tests

* Failed tests ignored temporarily.

* Minor fixes

* Small fix

* Conflict resolved

* Default implementations of tensorflowName and onnxName
2019-11-19 22:44:29 +11:00
Alex Black 47d19908f4
Various fixes (#43)
* #8172 Enable DL4J MKLDNN batch norm backward pass

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8382 INDArray.toString() rank 1 brackets / ambiguity fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8308 Fix handful of broken links (inc. some in errors)

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Unused dependencies, round 1

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Unused dependencies, round 2

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Unused dependencies, round 3

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Small fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Uniform distribution TF import fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-11-14 19:38:20 +11:00
raver119 48df1acdfb
[WIP] ThreadPool (#8)
This PR removes OpenMP use in 95% of cases
2019-11-13 17:04:59 +03:00
shugeo 08853c7829
Shugeo random uniform int (#30)
* Corrected randomuniform declaration.

* Refactored uniform distribution for both cuda and cpu platforms.

* Refactored uniform distribution and tests.

* Fixed type usage with indices.

* Refactored uniform distribution implementation and tests to full conform with TF implementation.

* Refactored gamma function to use type util method.

* Copyright changes and fixes with ConstantHelper.

* Added error checking on allocate cuda device memory and operations.
2019-11-06 12:49:27 +02:00
shugeo 7b14a9f603
Gamma and Poisson distributions (#27)
* Added implementation for random_gamma op.

* Added implementation for random_poisson op and support classes.

* Added helpers for random_poisson and random_gamma ops.

* Implementation of random_poisson. The first working edition.

* Implementation of random_poisson. Parallelized working edition.

* Implementation of random_gamma. Parallelized working edition with alpha only.

* Added cuda implementation for helper of poisson distribution.

* Corrected shape calculation with random_gamma and tests.

* Finished cpu implementation for gamma distribution.

* Finished cuda implementation for random_gamma op.

* Refactored cpu helpers for random_gamma and random_poisson ops.

* Refactored cuda helpers for gamma and poisson distribution.

* Refactored cuda helper for gamma distribution.

* Refactored cpu helper for random_poisson op.

* Refactored cpu helper for random_gamma op.
2019-11-04 15:42:28 +02:00
shugeo 95f7ad7b94
Shugeo suppression overlaps (#9)
* Added non_max_suppression_overlaps op and tests.

* Refactored implementation of non_max_suppression_overlaps.

* Refactoring of implementation of non_max_suppression_overlaps op.

* Refactoring of implementation of non_max_suppression op.

* Fixed portion error.

* Added cuda frontends for image suppression ops.

* Eliminated crash with cuda arch on image.non_max_suppression_overlaps op.

* Improved implementation of image_suppression helper for cpu platform.

* The generic approach of non_max_suppression_overlaps op helper with cuda platform.

* Working cuda implementation of helper non_max_suppression_overlaps op.

* Eliminated waste comments.

* Improved implementations for both platforms

* Refactored cuda implementation of image.non_max_suppression_overlaps op helper.

* Improved cuda implementation of non_max_suppression op helper.

* Refactored cuda implementation of image.non_max_suppression_overlaps op helper.

* Improved cuda implementation of image.non_max_suppression_overlaps op helper.

* Added modifications into cuda implementation for image suppression overlaps op.

* Correct queue emulation with cuda implementation of non_max_suppression_overlaps op.

* Prefinal stage of cuda implementation of non_max_suppression_overlaps.

* Worked cuda implementation of non_max_suppresion_overlaps helper.

* Fixed return to proper thread.

* Improvements for cuda implementation of image.non_max_suppression_overlaps op helper.

* Fixed implementation issues with non_max_suppression_overlaps on cuda platform.

* Fixed skip for non_max_suppression_overlaps on cuda platform.

* Finalize implementation of image_suppression helper and tests.

* Cosmetic changes only.
2019-10-30 13:43:45 +02:00
shugeo ace65355c5 Added doc for fake_quant_with_min_max* op helpers cuda implementations. 2019-10-10 18:35:28 +03:00
shugeo c3f755d975 Refactored helpers both for cuda and cpu platforms. 2019-10-10 18:02:49 +03:00
shugeo d5b352273d Implementation of cuda kernel for fake_quant_with_min_max_vars_per_channels op. Final revision. 2019-10-10 16:51:29 +03:00
shugeo 02d8616692 Implementation of cuda kernel for fake_quant_with_min_max_vars_per_channels op. 2019-10-10 16:40:56 +03:00
shugeo 3504b0cda9 Implemented fake_quant_with_min_max_vars_per_channel fop cuda helper. The first working revision. 2019-10-10 15:44:50 +03:00