cavis

Author	SHA1	Message	Date
Yurii Shyrma	011c272fde	Shyrma transpose (#244 ) * - provide contiguous strides for ouput in transpose op Signed-off-by: Yurii <iuriish@yahoo.com> * - provide contiguous strides for output in permute op Signed-off-by: Yurii <iuriish@yahoo.com> * - take into account empty shapes properly in transpose/permute op Signed-off-by: Yurii <iuriish@yahoo.com>	2020-02-17 08:04:28 +03:00
raver119	9e3c1b02b1	Perf improvements (#242 ) * initial commit Signed-off-by: raver119 <raver119@gmail.com> * meh Signed-off-by: raver119 <raver119@gmail.com> * better ExpandDims impl Signed-off-by: raver119 <raver119@gmail.com> * better Squeeze impl Signed-off-by: raver119 <raver119@gmail.com> * better Softmax impl Signed-off-by: raver119 <raver119@gmail.com> * one test disabled Signed-off-by: raver119 <raver119@gmail.com> * more accurate impl Signed-off-by: raver119 <raver119@gmail.com> * - GraphProfiler now prints full shapeInfo instead of shape - softmax typo fix Signed-off-by: raver119 <raver119@gmail.com>	2020-02-14 16:20:31 +03:00
raver119	3de3cd8277	R119 tests (#238 ) * one small test Signed-off-by: raver119 <raver119@gmail.com> * one small test Signed-off-by: raver119 <raver119@gmail.com> * bert test Signed-off-by: raver119 <raver119@gmail.com> * Graph FlowPath fix Signed-off-by: raver119 <raver119@gmail.com> * - GraphProfiler tweaks - NodeProfile now includes shapes Signed-off-by: raver119 <raver119@gmail.com> * RELU_layer inplace tweak Signed-off-by: raver119 <raver119@gmail.com> * meh Signed-off-by: raver119 <raver119@gmail.com> * identity tweaks Signed-off-by: raver119 <raver119@gmail.com> * bert result validation Signed-off-by: raver119 <raver119@gmail.com> * - bunch of Shape ops have inplace exec forbidden now - Legacy ops have inplace exec disabled by default now Signed-off-by: raver119 <raver119@gmail.com> * ffast-math enabled Signed-off-by: raver119 <raver119@gmail.com> * ffast-math enabled Signed-off-by: raver119 <raver119@gmail.com> * allow some legacy ops to be inplace Signed-off-by: raver119 <raver119@gmail.com> * disable -fast_math Signed-off-by: raver119 <raver119@gmail.com> * disable expensive test for cuda Signed-off-by: raver119 <raver119@gmail.com>	2020-02-13 20:59:35 +03:00
Yurii Shyrma	fe47f52896	Oleh tenzor mmul (#231 ) * Libnd4j: TensorMMul backprop op #8174, raw implementation Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: TensorMMul backprop op #8174 merge master and some corrections Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: TensorMMul backprop op #8174 algorithm update, need testing, sync with master * Libnd4j: TensorMMul backprop op #8174 fixed incorrect B axes calculation Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: TensorMMul backprop op #8174 optimize axes identification and fix bug of indeces overlapping, added first test. need testing with different shapes Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: TensorMMul backprop op #8174 some fixes and improvements need more testing Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: TensorMMul backprop op #8174 fixed order of matrix multiply Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: TensorMMul backprop op #8174 fixed issue of incorrect axes definition, add tests based on TF, need additional testing for case dLdC not equal 1 Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: TensorMMul backprop op #8174 fixed scalar case add test Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: TensorMMul backprop op #8174 fixed bp algorithm, axes definition, need some mode testing with different orders combination f,c; c,f f,f and add some checks for inputs Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: TensorMMul backprop op #8174 some checks and corrections added tests, exists the problem with different input orders support A-f B-c and A-f B-f Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: TensorMMul backprop op #8174 sync master Signed-off-by: Oleg <oleg.semeniv@gmail.com> * - correct bug in MmulHelper::tensorDot(a, b, c, axes_a, axes_b,permutForC) Signed-off-by: Yurii <iuriish@yahoo.com> * Libnd4j: TensorMMul backprop op #8174 code clean up and refactoring Signed-off-by: Oleg <oleg.semeniv@gmail.com> * - add check for linspase ordered permutations in ShapeUtils::evalShapeForTensorDot Signed-off-by: Yurii <iuriish@yahoo.com> * - provide additional code in shape::reshape stuff in order to reduce amount of allocation/copy operations during reshaping procedure Signed-off-by: Yurii <iuriish@yahoo.com> * - further work on problem of wrong shape evaluation during permute/reshape procedures Signed-off-by: Yurii <iuriish@yahoo.com> * - still looking for bug reason in reshape/permute stuff Signed-off-by: Yurii <iuriish@yahoo.com> * - correct bug in transform cuda native ops Signed-off-by: Yurii <iuriish@yahoo.com> * - correct bug in NDArray::assign Signed-off-by: Yurii <iuriish@yahoo.com> * - remove old shape::reshape stuff Signed-off-by: Yurii <iuriish@yahoo.com> * - add possibility to disable copy of old buffer to new buffer during reshape operation in NDArray class Signed-off-by: Yurii <iuriish@yahoo.com> * - correct bug in tensorDot which had to do with wrong pointers assigments Signed-off-by: Yurii <iuriish@yahoo.com> Co-authored-by: Oleh <oleg.semeniv@gmail.com>	2020-02-13 20:33:54 +03:00
shugeo	f0c684020f	Shugeo resize area fix4 (#229 ) * Fixed a couple of issues with resize_area op. Signed-off-by: shugeo <sgazeos@gmail.com> * Added additional test for alternate params for resize_area testing. Signed-off-by: shugeo <sgazeos@gmail.com>	2020-02-12 19:02:42 +03:00
Yurii Shyrma	948646b32d	Shyrma mkl test (#211 ) * - provide nhwc format in mkl conv ops Signed-off-by: Yurii <iuriish@yahoo.com> * - corrections in mkl conv3d Signed-off-by: Yurii <iuriish@yahoo.com> * - corrections in mkl batchnorm Signed-off-by: Yurii <iuriish@yahoo.com> * - corrections in mkl maxpooling2d Signed-off-by: Yurii <iuriish@yahoo.com> * - add format format_tag::any to outputs in mkl conv ops Signed-off-by: Yurii <iuriish@yahoo.com> * - complete corrections in mkl conv ops Signed-off-by: Yurii <iuriish@yahoo.com> * - add test for comparison of execution speeds of mkl conv2d op with different weights format Signed-off-by: Yurii <iuriish@yahoo.com> * - take into account order f in mkl conv ops Signed-off-by: Yurii <iuriish@yahoo.com>	2020-02-06 21:12:54 +03:00
shugeo	5ae40f6e38	Shugeo sequence mask fix2 (#216 ) * Fixed sequence_mask op and tests. Signed-off-by: shugeo <sgazeos@gmail.com> * Cuda fix for sequence_mask op. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed sequence_mask op for both platforms and tests. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed solve and triangular_solve for more than 2D for adjoint cases. Signed-off-by: shugeo <sgazeos@gmail.com> * Added adjoint solve test again. Signed-off-by: shugeo <sgazeos@gmail.com> * Added a set of tests for triangual_solve and generic solve ops. Signed-off-by: shugeo <sgazeos@gmail.com> * Added a pair tests for triangular_solve Signed-off-by: shugeo <sgazeos@gmail.com> * Added tests for triangular_solve op. Signed-off-by: shugeo <sgazeos@gmail.com>	2020-02-06 21:06:50 +03:00
shugeo	41ff907bc6	Shugeo solve linear (#191 ) * linear equations systems solve op. Initial commit. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed compiling issues. Signed-off-by: shugeo <sgazeos@gmail.com> * Linear equations systems solve. The next stage commit. Signed-off-by: shugeo <sgazeos@gmail.com> * Added test for linear equations systems solve operation. Signed-off-by: shugeo <sgazeos@gmail.com> * Added additional test and fixed lower matrix retrievance. * Implementation for solve of the systems of linear equations." Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored permutation generation. Signed-off-by: shugeo <sgazeos@gmail.com> * Added restore for permutations batched with cuda helper for solve op. Signed-off-by: shugeo <sgazeos@gmail.com> * Finished cuda implementation for solve op helpers. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored cpu helpers for solve op. Signed-off-by: shugeo <sgazeos@gmail.com> * Fix gtest output on Windows * Fixed issue with permutation matrix for cuda implementation. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed issue with permutation matrix for cpu implementation. Signed-off-by: shugeo <sgazeos@gmail.com> * Eliminated waste comments. Signed-off-by: shugeo <sgazeos@gmail.com> * LinearSolve added * Mapping added * Javadoc added * Refactored implementation of triangular_solve helpers and tests for solve matrix equations generally. Signed-off-by: shugeo <sgazeos@gmail.com> * Added a test for solve op. Signed-off-by: shugeo <sgazeos@gmail.com> * Solve test added * Fix for TF import Co-authored-by: Serhii Shepel <9946053+sshepel@users.noreply.github.com> Co-authored-by: raver119 <raver119@gmail.com> Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>	2020-02-04 08:59:11 +03:00
raver119	9bb5798cac	Null arrays fix (#208 ) * don't skip null arrays Signed-off-by: raver119 <raver119@gmail.com> * one test tweak Signed-off-by: raver119 <raver119@gmail.com>	2020-02-02 23:14:00 +03:00
Oleh	d52e67209e	Oleh convert (#200 ) * StringUtils for utf convertor raw implementation of all possible combinations, need to be add counter of bytes per symbol for any type and add api to call convertors and store data Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor more corrections to support convertors Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor some corrections and bug fixes, need review to discuss how to add multi-threading Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 some corrections to move to multi-threading, add one test need discussion data inputs/outputs array presentation, need discussion the way of multi-threading * StringUtils for utf convertor #8613 tests added some corrections to optimize build Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 some corrections and code clean up Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 code clean up and optimize usage, need update ndarray factory before replace std usage Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 some staff to integrate converters into NDArrayFactory, update tests and add some functionality Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 minor corrections and bug fix before discussion * StringUtils for utf convertor #8613 some fixes and tets * StringUtils for utf convertor #8613 some more staff to support different unicode Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 fix linking bug * StringUtils for utf convertor #8613 corrected several tests as defaults for string ndarray changed * StringUtils for utf convertor #8613 replace some incorrect implementation, revert some test changes, need sync before testing * StringUtils for utf convertor #8613 fixed several thing that were badly implemented yesterday, need optimization, testing (before testing have to be add support of u32 and u16 buffer visualization) * StringUtils for utf convertor #8613 fixed to support u16 and u32, and convertor in ndarray, fix buffer print, etc Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 merge master and sync with server Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 some correction for string cast, need print check only asci support Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 merge master, remove copies and add cast, need test, refactoring according review and clean up * StringUtils for utf convertor #8613 fixed cast and copy issues Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 fixed cuda and update tests * StringUtils for utf convertor #8613 integration into NdArray, fix several tests for build pass, refactoring, etc * - avoid ambiguity of NDArray ctrs overloading in some tests Signed-off-by: Yurii <iuriish@yahoo.com> * StringUtils for utf convertor #8613 NDArray string constructors added, updated NDArrayFactory, refactoring unicode and tests, etc Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 fixed cuda build and test, refactoring and void* added to some functions Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 void* integration, removed copy operation, refactoring, added tests for NDArray string constructors, etc Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 several more fixes, improvements and updates Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 master merge, code clean up and optimization before review Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 minor fixes string element size define Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 revert last changes as mistake Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 fixed NDArray constructor build problem, remove order from string factory, fixed order use for factory via project, added catch of incorrect sync in cast of arrays to data types, fixed e method for strings, etc Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 added javacpp hack, added multi-threading, minor corrections in license agreement Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 windows builds fix, as "sting" is not treated as utf8 Signed-off-by: Oleg <oleg.semeniv@gmail.com> Co-authored-by: Yurii Shyrma <iuriish@yahoo.com>	2020-01-31 16:30:49 +03:00
raver119	1ab86d1306	Range op data type (#204 ) * - range op now accepts dargs - dargs now can be in signature Signed-off-by: raver119 <raver119@gmail.com> * range dtype java side Signed-off-by: raver119 <raver119@gmail.com> * linspace fix Signed-off-by: raver119 <raver119@gmail.com> * lin_space fix for scalar outputs Signed-off-by: raver119 <raver119@gmail.com>	2020-01-31 10:45:40 +03:00
raver119	5d98cfcf47	Configurable DataType for ops (#201 ) * initial commit Signed-off-by: raver119 <raver119@gmail.com> * - one more test for OneHot with dtype - one more signature in Nd4j Signed-off-by: raver119 <raver119@gmail.com> * ones_as/zeros_as now accept dtype Signed-off-by: raver119 <raver119@gmail.com> * one more test Signed-off-by: raver119 <raver119@gmail.com> * - more updates for configurable data types - ones_as/zeros_as java side + tests Signed-off-by: raver119 <raver119@gmail.com> * few c++ tests fixed Signed-off-by: raver119 <raver119@gmail.com> * few more changes around DArgs Signed-off-by: raver119 <raver119@gmail.com>	2020-01-30 18:46:12 +03:00
raver119	ba961c7601	DataTypes & FlatBuffers (#197 ) * flatbuffers version upgrade Signed-off-by: raver119 <raver119@gmail.com> * flatbuffers version upgrade java side Signed-off-by: raver119 <raver119@gmail.com> * flatbuffers dependency version upgrade java side Signed-off-by: raver119 <raver119@gmail.com> * MKLDNN version upgrade Signed-off-by: raver119 <raver119@gmail.com> * DArgs first pass Signed-off-by: raver119 <raver119@gmail.com> * signatures first pass Signed-off-by: raver119 <raver119@gmail.com> * signatures second pass Signed-off-by: raver119 <raver119@gmail.com> * signatures third pass Signed-off-by: raver119 <raver119@gmail.com> * signatures third pass Signed-off-by: raver119 <raver119@gmail.com> * signatures fourth pass Signed-off-by: raver119 <raver119@gmail.com> * signatures fifth pass Signed-off-by: raver119 <raver119@gmail.com> * flatbuffers UI version upgrade java side Signed-off-by: raver119 <raver119@gmail.com> * flatbuffers ui update Signed-off-by: raver119 <raver119@gmail.com> * flatbuffers downgrade Signed-off-by: raver119 <raver119@gmail.com> * flatbuffers downgrade java side Signed-off-by: raver119 <raver119@gmail.com>	2020-01-30 10:07:24 +03:00
Yurii Shyrma	7a7ee4b021	Shyrma cudnn (#192 ) * - implementation of cudnn batchnorm_bp op Signed-off-by: Yurii <iuriish@yahoo.com> * - testing and fixing bugs in batchnorm_bp based on cudnn api Signed-off-by: Yurii <iuriish@yahoo.com> * - move pooling mkl code and delete some unnecessary files Signed-off-by: Yurii <iuriish@yahoo.com> * - implementation and testing cudnn pooling2d ops (avg/max, ff/bp) Signed-off-by: Yurii <iuriish@yahoo.com> * - implementation and testing cudnn pooling 3d (ff/bp) ops Signed-off-by: Yurii <iuriish@yahoo.com> * - provide ff step in case of cudnn maxpool3d_bp op Signed-off-by: Yurii <iuriish@yahoo.com> * - remove half type from set of supported types in mkl dpethwise conv op Signed-off-by: Yurii <iuriish@yahoo.com> * - bring back cudaStreamSynchronize in batchnorm and pooling cudnn ops Signed-off-by: Yurii <iuriish@yahoo.com> Co-authored-by: raver119 <raver119@gmail.com>	2020-01-28 18:23:07 +03:00
shugeo	2717b25931	Shugeo qr (#153 ) * Added qr op implementation. Initial version. * Fixed doc for qr op. Signed-off-by: shugeo <sgazeos@gmail.com> * Implementation of QR decomposition. CPU platform version. * Added a pair of tests for qr op testing. Signed-off-by: shugeo <sgazeos@gmail.com> * QR implementation. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected norm using. * Properly calculated intermediate results with QR decomposition. * Another step to implement QR algorithm by householder. * Cpu implementatio for QR decomposition. The first working edition. * Corrected test to QR decomposition. * Added tad multithreading with QR implementation. * Finished cpu implementation for QR decomposition helpers. * Refactored tests and improved multithreading. * Refactored QR cpu implementation and update cuda implementation helpers. * Cuda QR helper implementation. The first working edition. * Eliminated waste prints. * Restore multithreading with cuda implementation. * Ops names corrected * Refactored qr op helpers to optimize. Signed-off-by: shugeo <sgazeos@gmail.com> * Eliminated waste manual ticking. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored memory allocation to avoid waste memory usage. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored matrixMinor method both for cuda and cpu platforms. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored method of vmul to use raw buffers instead type conversion. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored temporary array of matricies. Signed-off-by: shugeo <sgazeos@gmail.com> Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> Co-authored-by: raver119 <raver119@gmail.com>	2020-01-22 13:59:36 +03:00
shugeo	815a2908af	Shugeo solve triangular (#173 ) * Added implementation of the triangular_solve op. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed compilation issues. Signed-off-by: shugeo <sgazeos@gmail.com> * Added verification of input data and helpers facilities for triangular_solve op.' Signed-off-by: shugeo <sgazeos@gmail.com> * Added cpu implementation for triangular_solve helpers. * Added tests and implementation for upper triangular equations. Signed-off-by: shugeo <sgazeos@gmail.com> * Added a pair of cases to tests. Signed-off-by: shugeo <sgazeos@gmail.com> * Added multithreading with cpu helpers for triangular_solve op. Signed-off-by: shugeo <sgazeos@gmail.com> * Added cuda implementation of triangular_solve op helpers. Signed-off-by: shugeo <sgazeos@gmail.com> * Finished cuda implementation of triangular_solve helpers and tests. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed copyright marks. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected grammar errors with doc and error messages. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored matricies processing with triangular_solve cuda helper implementation. Signed-off-by: shugeo <sgazeos@gmail.com> * Added triangular_solve wrapper * Fixed mapping * Added processing for adjoint with cpu helpers of triangular_solve op implementation. Signed-off-by: shugeo <sgazeos@gmail.com> * Added implementation for adjoint routine with cuda platform. Signed-off-by: shugeo <sgazeos@gmail.com> * Added multithreading with adjoint routine for cpu platform. Signed-off-by: shugeo <sgazeos@gmail.com> Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>	2020-01-22 10:48:03 +03:00
shugeo	e50b285c2c	Shugeo resize area (#162 ) * Added implementation for resize_area op. Initial commit. * Added implementation of resize_area op. Initial revision. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected resizeArea functor call. Signed-off-by: shugeo <sgazeos@gmail.com> * Implementation of resize_area. Cpu platform helpers. Signed-off-by: shugeo <sgazeos@gmail.com> * Implementation for resize_area helpers. The first part revision. Signed-off-by: shugeo <sgazeos@gmail.com> * Added a set of tests for resize_area op. Signed-off-by: shugeo <sgazeos@gmail.com> * Cuda implementation for resize_area. Initial approach. Signed-off-by: shugeo <sgazeos@gmail.com> * Adding multithreading for resize_area algorithm. Signed-off-by: shugeo <sgazeos@gmail.com> * Cuda implementation of resize_area helpers. Shared memory approach. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored resizeAreaKernel with cuda implementation. * Eliminated compilation errors. * ResizeArea helpers for cuda platform. The first working revision. Signed-off-by: shugeo <sgazeos@gmail.com> * Added test for batched resize_area op testing. Signed-off-by: shugeo <sgazeos@gmail.com> * Implementation of resize_are for cuda platform and tests. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed multithreading with resize_area op helper. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected copyright marks with sources. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected copyright mark for resize_area op implementation. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected copyright mark for parity ops header. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected typo in strings and so on with image resize ops. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored resize_area helpers and multithreading. Signed-off-by: shugeo <sgazeos@gmail.com> * Added ResizeArea wrapper * Added test with align_corners and fixed shape processing with only int args given for output size. Signed-off-by: shugeo <sgazeos@gmail.com> * Added test * TF mapping for ResizeArea * Fixed implementation issues with resize_area op for both platforms. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored image resizer struct to use flexible types for ints and floats. Signed-off-by: shugeo <sgazeos@gmail.com> * Improved multithreading with resizeAreaKernel launch. Signed-off-by: shugeo <sgazeos@gmail.com> * Use asynchronical memory copying with cuda platform image resize allocations. Signed-off-by: shugeo <sgazeos@gmail.com> Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>	2020-01-22 10:46:33 +03:00
raver119	7783012f39	cuDNN integration (#150 ) * initial commit Signed-off-by: raver119 <raver119@gmail.com> * one file Signed-off-by: raver119 <raver119@gmail.com> * few more includes Signed-off-by: raver119 <raver119@gmail.com> * m? Signed-off-by: raver119 <raver119@gmail.com> * const Signed-off-by: raver119 <raver119@gmail.com> * cudnn linkage in tests Signed-off-by: raver119 <raver119@gmail.com> * culibos Signed-off-by: raver119 <raver119@gmail.com> * static reminder Signed-off-by: raver119 <raver119@gmail.com> * platform engine tag Signed-off-by: raver119 <raver119@gmail.com> * HAVE_CUDNN moved to config.h.in Signed-off-by: raver119 <raver119@gmail.com> * include Signed-off-by: raver119 <raver119@gmail.com> * include Signed-off-by: raver119 <raver119@gmail.com> * skip cudnn handle creation if there's not cudnn Signed-off-by: raver119 <raver119@gmail.com> * meh Signed-off-by: raver119 <raver119@gmail.com> * target device in context Signed-off-by: raver119 <raver119@gmail.com> * platform engines Signed-off-by: raver119 <raver119@gmail.com> * platform engines Signed-off-by: raver119 <raver119@gmail.com> * allow multiple -h args Signed-off-by: raver119 <raver119@gmail.com> * allow multiple -h args Signed-off-by: raver119 <raver119@gmail.com> * move mkldnn out of CPU block Signed-off-by: raver119 <raver119@gmail.com> * link to mkldnn on cuda Signed-off-by: raver119 <raver119@gmail.com> * less prints Signed-off-by: raver119 <raver119@gmail.com> * minor tweaks Signed-off-by: raver119 <raver119@gmail.com> * next step Signed-off-by: raver119 <raver119@gmail.com> * conv2d NCHW draft Signed-off-by: raver119 <raver119@gmail.com> * conv2d biasAdd Signed-off-by: raver119 <raver119@gmail.com> * test for MKL/CUDNN combined use Signed-off-by: raver119 <raver119@gmail.com> * - provide additional code for conv2d ff based on cudnn api, not tested yet Signed-off-by: Yurii <iuriish@yahoo.com> * - further work on conv2d helper based on using cudnn api Signed-off-by: Yurii <iuriish@yahoo.com> * - fixing several cuda bugs which appeared after cudnn lib had been started to use Signed-off-by: Yurii <iuriish@yahoo.com> * - implementation of conv2d backprop op based on cudnn api Signed-off-by: Yurii <iuriish@yahoo.com> * - implementaion of conv3d and conv3d_bp ops based on cudnn api Signed-off-by: Yurii <iuriish@yahoo.com> * - bugs fixing in conv3d/conv3d_bp ops (cudnn in use) Signed-off-by: Yurii <iuriish@yahoo.com> * - implementation of depthwiseConv2d (ff/bp) op based on cudnn api Signed-off-by: Yurii <iuriish@yahoo.com> * - implementation of batchnorm ff op based on cudnn api Signed-off-by: Yurii <iuriish@yahoo.com> * - disable cudnn batchnorm temporary Signed-off-by: Yurii <iuriish@yahoo.com> * - add minor change in cmake Signed-off-by: Yurii <iuriish@yahoo.com> * engine for depthwise mkldnn Signed-off-by: raver119 <raver119@gmail.com> * couple of includes Signed-off-by: raver119 <raver119@gmail.com> * - provide permutation to cudnn batchnorm ff when format is NHWC Signed-off-by: Yurii <iuriish@yahoo.com> * lgamma fix Signed-off-by: raver119 <raver119@gmail.com> * - eliminate memory leak in two tests Signed-off-by: Yurii <iuriish@yahoo.com> Co-authored-by: Yurii Shyrma <iuriish@yahoo.com>	2020-01-20 21:32:46 +03:00
Oleh	8fc0e63ce7	Oleh powderev (#171 ) * Libnd4j: Add broadcastable elementwise power derivative #7461 first step of Pow_bp operation implementation Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: Add broadcastable elementwise power derivative #7461 some corrections of calculation steps Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: Add broadcastable elementwise power derivative #7461 some bug fixes, the PowDerevative op made broadcastable, add the raw tests for op, need refactoring to use broadcast ops * Libnd4j: Add broadcastable elementwise power derivative #7461 fixed several bugs add broadcast support and tests, need to fix scalar+array and array+scalar Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: Add broadcastable elementwise power derivative #7461 fixed bugs for scalar inputs, fixed multinomial tests, added tests Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: Add broadcastable elementwise power derivative #7461 fised bugs for different shapes support, tests updated * Libnd4j: Add broadcastable elementwise power derivative #7461 applied all possible variants via tiled arrays, add support of broadcast for Pow and PowDerivative ops, covered by tests, before review have to be replaced tiled implementation by applyTrueBroadcast Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: Add broadcastable elementwise power derivative #7461 replaced tile by broadcast implementation, fixed issue with negative x input, corrected tests, need additional testing Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: Add broadcastable elementwise power derivative #7461 added and corrected test cases, corrected implementation need review Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: Add broadcastable elementwise power derivative #7461 code clean up * Libnd4j: Add broadcastable elementwise power derivative #7461 code clean up, removed some tests, add tests with scalar Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: Add broadcastable elementwise power derivative #7461 code improvement and clean up, split tests Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: Add broadcastable elementwise power derivative #7461 some code clean up Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: Add broadcastable elementwise power derivative replace __isnanf by internal realization Signed-off-by: Oleg <oleg.semeniv@gmail.com> * pow_bp wrapper * Fixed PowBp wrapper * Tests added * Test fixed * Fix return type * Disable powBp usage * Pow backprop changed Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>	2020-01-20 12:59:12 +03:00
shugeo	6943a5f57a	Shugeo lgamma (#170 ) * lgamma op. Initial version. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored lgamma op and test. Signed-off-by: shugeo <sgazeos@gmail.com> * Lgamma wrapper * Added TF mapping Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>	2020-01-20 12:29:36 +03:00
Yurii Shyrma	cae5ef4180	Shyrma depthconv (#156 ) * - implementation of depthwise_conv2d (both ff/bp) based on mkl dnn api * - minor corrections in deconv3d Signed-off-by: Yurii <iuriish@yahoo.com> * - remove unnecessary time test Signed-off-by: Yurii <iuriish@yahoo.com> * - update mkl dnn version in cmake Signed-off-by: Yurii <iuriish@yahoo.com> * - take into account several notes given by pr reviewer Signed-off-by: Yurii <iuriish@yahoo.com> * - fix bug in depthwise conv2d op based on mkl Signed-off-by: Yurii <iuriish@yahoo.com>	2020-01-11 07:36:40 +03:00
Oleh	2404be5fe0	Oleh multinomial (#163 ) * libnd4j: Multinomial op #8570 first raw step of multinomial random data generator implementation Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: Multinomial op #8570 next step of multinomial random categories generator implementation on both cpu and cuda, need corrections and code clean up before review and testing * libnd4j: Multinomial op #8570 code clean up and fixed issues data selecting, moved from coords to tads * libnd4j: Multinomial op #8570 fixed cuda build add reference for math materials that was used for implementation * libnd4j: Multinomial op #8570 fixed several bugs, added several tests and improved cuda version. current implementation works, need testing of reproduction with the same seed * libnd4j: Multinomial op #8570 fixes and optimization after discussion in both cuda and cpu * libnd4j: Multinomial op #8570 add corrections after review, removed tads, replace 2D parallel loop by 3D Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: Multinomial op fixed declaration and add tests need discussion * libnd4j: Multinomial op fix in test * libnd4j: Multinomial op corrected behavior to get reproducible results, fixed issue in uniform value getting, tests added, need cuda review and cuda testing Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: Multinomial op fixed indexing on uniform calculation Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: Multinomial op some corrections in max min declaration Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: Multinomial op fixed index calculation, added rewind, corrected input declaration, added stats tests, both cuda and cpu. cuda need testing * libnd4j: Multinomial op fixed bugs on cuda nad cpu. need review Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: Multinomial op corrected tests to handle different orders Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: Multinomial op some improvements after code review Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: Multinomial op more corrections after review Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: Multinomial op fixed seed usage, update tests, fixed cuda based on comments, fixed bug of rewind, removed one behavior, minor corrections. Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: Multinomial op minor corrections Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: Multinomial op rise the bound of fluctuation for random cases Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: Multinomial op modified operation inputs and update implementation and tests on both cpu and cuda * libnd4j: Multinomial op corrected data types according ops.proto Co-authored-by: raver119 <raver119@gmail.com>	2020-01-06 22:35:05 +03:00
raver119	29e8e09db6	String changes (#3 ) * initial commit * additional data types & tensor type Signed-off-by: raver119 <raver119@gmail.com> * next step Signed-off-by: raver119 <raver119@gmail.com> * missing include * sparse_to_dense Signed-off-by: raver119 <raver119@gmail.com> * few more tests files Signed-off-by: raver119 <raver119@gmail.com> * draft Signed-off-by: raver119 <raver119@gmail.com> * numeric sparse_to_dense Signed-off-by: raver119 <raver119@gmail.com> * comment Signed-off-by: raver119 <raver119@gmail.com> * string sparse_to_dense version Signed-off-by: raver119 <raver119@gmail.com> * CUDA DataBuffer expand Signed-off-by: raver119 <raver119@gmail.com> * few tweaks for CUDA build Signed-off-by: raver119 <raver119@gmail.com> * shape fn for string_split Signed-off-by: raver119 <raver119@gmail.com> * one more comment Signed-off-by: raver119 <raver119@gmail.com> * string_split indices Signed-off-by: raver119 <raver119@gmail.com> * next step Signed-off-by: raver119 <raver119@gmail.com> * test passes Signed-off-by: raver119 <raver119@gmail.com> * few rearrangements for databuffer implementations Signed-off-by: raver119 <raver119@gmail.com> * DataBuffer: move inline methods to common implementations Signed-off-by: raver119 <raver119@gmail.com> * add native DataBuffer to Nd4j presets Signed-off-by: raver119 <raver119@gmail.com> * DataBuffer creation Signed-off-by: raver119 <raver119@gmail.com> * use DataBuffer for allocation Signed-off-by: raver119 <raver119@gmail.com> * cpu databuffer as deallocatable Signed-off-by: raver119 <raver119@gmail.com> * DataBuffer setters for bufers Signed-off-by: raver119 <raver119@gmail.com> * couple of wrappers Signed-off-by: raver119 <raver119@gmail.com> * DataBuffers being passed around Signed-off-by: raver119 <raver119@gmail.com> * Bunch of ByteBuffer-related signatures gone Signed-off-by: raver119 <raver119@gmail.com> * - few more Nd4j signatures removed - minor fix for bfloat16 Signed-off-by: raver119 <raver119@gmail.com> * nullptr pointer is still a pointer, but 0 as address :) Signed-off-by: raver119 <raver119@gmail.com> * one special test Signed-off-by: raver119 <raver119@gmail.com> * empty string array init Signed-off-by: raver119 <raver119@gmail.com> * one more test in cpp Signed-off-by: raver119 <raver119@gmail.com> * memcpy instead of databuffer swap Signed-off-by: raver119 <raver119@gmail.com> * special InteropDataBuffer for front-end languages Signed-off-by: raver119 <raver119@gmail.com> * few tweaks for java Signed-off-by: raver119 <raver119@gmail.com> * pointer/indexer actualization Signed-off-by: raver119 <raver119@gmail.com> * CustomOp returns list for inputArumgents and outputArguments instead of array Signed-off-by: raver119 <raver119@gmail.com> * redundant call Signed-off-by: raver119 <raver119@gmail.com> * print_variable op Signed-off-by: raver119 <raver119@gmail.com> * - view handling (but wrong one) - print_variable java wrapper Signed-off-by: raver119 <raver119@gmail.com> * one more test Signed-off-by: raver119 <raver119@gmail.com> * - empty arrays handling Signed-off-by: raver119 <raver119@gmail.com> * - deserialization works now Signed-off-by: raver119 <raver119@gmail.com> * minor fix Signed-off-by: raver119 <raver119@gmail.com> * meh Signed-off-by: raver119 <raver119@gmail.com> * one more fix Signed-off-by: raver119 <raver119@gmail.com> * initial cuda commit Signed-off-by: raver119 <raver119@gmail.com> * print_variable message validation Signed-off-by: raver119 <raver119@gmail.com> * CUDA views Signed-off-by: raver119 <raver119@gmail.com> * CUDA special buffer size Signed-off-by: raver119 <raver119@gmail.com> * minor update to match master changes Signed-off-by: raver119 <raver119@gmail.com> * - consider arrays always actual on device for CUDA - additional PrintVariable constructor - CudaUtf8Buffer now allocates host buffer by default Signed-off-by: raver119 <raver119@gmail.com> * meh Signed-off-by: raver119 <raver119@gmail.com> * - print_variable now allows print from device Signed-off-by: raver119 <raver119@gmail.com> * InteropDataBuffer data type fix Signed-off-by: raver119 <raver119@gmail.com> * ... Signed-off-by: raver119 <raver119@gmail.com> * disable some debug messages Signed-off-by: raver119 <raver119@gmail.com> * master pulled in Signed-off-by: raver119 <raver119@gmail.com> * couple of new methods for DataBuffer interop Signed-off-by: raver119 <raver119@gmail.com> * java side Signed-off-by: raver119 <raver119@gmail.com> * offsetted constructor Signed-off-by: raver119 <raver119@gmail.com> * new CUDA deallocator Signed-off-by: raver119 <raver119@gmail.com> * CUDA backend torn apart Signed-off-by: raver119 <raver119@gmail.com> * CUDA backend torn apart 2 Signed-off-by: raver119 <raver119@gmail.com> * CUDA backend torn apart 3 Signed-off-by: raver119 <raver119@gmail.com> * - few new tests - few new methods for DataBuffer management Signed-off-by: raver119 <raver119@gmail.com> * few more tests + few more tweaks Signed-off-by: raver119 <raver119@gmail.com> * two failing tests Signed-off-by: raver119 <raver119@gmail.com> * one more test Signed-off-by: raver119 <raver119@gmail.com> * two failing tests pass Signed-off-by: raver119 <raver119@gmail.com> * now we pass DataBuffer to legacy ops too Signed-off-by: raver119 <raver119@gmail.com> * Native DataBuffer for legacy ops, Java side Signed-off-by: raver119 <raver119@gmail.com> * CPU java side update Signed-off-by: raver119 <raver119@gmail.com> * CUDA java side update Signed-off-by: raver119 <raver119@gmail.com> * no more prepare/register action on java side Signed-off-by: raver119 <raver119@gmail.com> * NDArray::prepare/register use now accepts vectors Signed-off-by: raver119 <raver119@gmail.com> * InteropDataBuffer now has few more convenience methods Signed-off-by: raver119 <raver119@gmail.com> * java bindings update Signed-off-by: raver119 <raver119@gmail.com> * tick device in NativeOps Signed-off-by: raver119 <raver119@gmail.com> * Corrected usage of OpaqueBuffer for tests. * Corrected usage of OpaqueBuffer for java tests. * NativeOpsTests fixes. * print_variable now returns scalar Signed-off-by: raver119 <raver119@gmail.com> * one more test Signed-off-by: raver119 <raver119@gmail.com> * compat_string_split fix for CUDA Signed-off-by: raver119 <raver119@gmail.com> * - CUDA execScalar fix - CUDA lazyAllocateHostPointer now checks java indexer/pointer instead of native pointer Signed-off-by: raver119 <raver119@gmail.com> * legacy ops DataBuffer migration prototype Signed-off-by: raver119 <raver119@gmail.com> * ignore device shapeinfo coming from java Signed-off-by: raver119 <raver119@gmail.com> * minor fix Signed-off-by: raver119 <raver119@gmail.com> * minor transformAny fix Signed-off-by: raver119 <raver119@gmail.com> * minor tweak for lazy host allocation Signed-off-by: raver119 <raver119@gmail.com> * - DataBuffer::memcpy method - bitcast now uses memcpy Signed-off-by: raver119 <raver119@gmail.com> * - IndexReduce CUDA dimension buffer fix Signed-off-by: raver119 <raver119@gmail.com> * views for CPU and CUDA Signed-off-by: raver119 <raver119@gmail.com> * less spam Signed-off-by: raver119 <raver119@gmail.com> * optional memory init Signed-off-by: raver119 <raver119@gmail.com> * async memset Signed-off-by: raver119 <raver119@gmail.com> * - SummaryStats CUDA fix - DataBuffer.sameUnderlyingData() impl - execBroadcast fix Signed-off-by: raver119 <raver119@gmail.com> * - reduce3All fix switch to CUDA 10 temporarily Signed-off-by: raver119 <raver119@gmail.com> * CUDA version Signed-off-by: raver119 <raver119@gmail.com> * proper memory deallocator registration Signed-off-by: raver119 <raver119@gmail.com> * HOST_ONLY workspace allocation Signed-off-by: raver119 <raver119@gmail.com> * temp commit Signed-off-by: raver119 <raver119@gmail.com> * few conflicts resolved Signed-off-by: raver119 <raver119@gmail.com> * few minor fixes Signed-off-by: raver119 <raver119@gmail.com> * one more minor fix Signed-off-by: raver119 <raver119@gmail.com> * NDArray permute should operate on JVM primitives Signed-off-by: raver119 <raver119@gmail.com> * - create InteropDataBuffer for shapes as well - update pointers after view creation in Java Signed-off-by: raver119 <raver119@gmail.com> * - addressPointer temporary moved to C++ Signed-off-by: raver119 <raver119@gmail.com> * CUDA: don't account offset twice Signed-off-by: raver119 <raver119@gmail.com> * CUDA: DataBuffer pointer constructor updated Signed-off-by: raver119 <raver119@gmail.com> * CUDA NDArray.unsafeDuplication() simplified Signed-off-by: raver119 <raver119@gmail.com> * CUDA minor workspace-related fixes Signed-off-by: raver119 <raver119@gmail.com> * CPU DataBuffer.reallocate() Signed-off-by: raver119 <raver119@gmail.com> * print_affinity op Signed-off-by: raver119 <raver119@gmail.com> * print_affinity java side Signed-off-by: raver119 <raver119@gmail.com> * CUDA more tweaks for data locality Signed-off-by: raver119 <raver119@gmail.com> * - compat_string_split tweak - CudaUtf8Buffer update Signed-off-by: raver119 <raver119@gmail.com> * INDArray.close() mechanic restored Signed-off-by: raver119 <raver119@gmail.com> * one more test fixed Signed-off-by: raver119 <raver119@gmail.com> * - CUDA DataBuffer.reallocate() updated - cudaMemcpy (synchronous) restored Signed-off-by: raver119 <raver119@gmail.com> * one last fix Signed-off-by: raver119 <raver119@gmail.com> * bad import removed Signed-off-by: raver119 <raver119@gmail.com> * another small fix Signed-off-by: raver119 <raver119@gmail.com> * one special test Signed-off-by: raver119 <raver119@gmail.com> * fix bad databuffer size Signed-off-by: raver119 <raver119@gmail.com> * release primaryBuffer on replace Signed-off-by: raver119 <raver119@gmail.com> * higher timeout Signed-off-by: raver119 <raver119@gmail.com> * disable timeouts Signed-off-by: raver119 <raver119@gmail.com> * dbCreateView now validates offset and length of a view Signed-off-by: raver119 <raver119@gmail.com> * additional validation for dbExpand Signed-off-by: raver119 <raver119@gmail.com> * restore timeout back again Signed-off-by: raver119 <raver119@gmail.com> * smaller distribution for rng test to prevent timeouts Signed-off-by: raver119 <raver119@gmail.com> * CUDA DataBuffer::memcpy now copies to device all the time Signed-off-by: raver119 <raver119@gmail.com> * OpaqueDataBuffer now contains all required methods for interop Signed-off-by: raver119 <raver119@gmail.com> * some javadoc Signed-off-by: raver119 <raver119@gmail.com> * GC on failed allocations Signed-off-by: raver119 <raver119@gmail.com> * minoe memcpu tweak Signed-off-by: raver119 <raver119@gmail.com> * one more bitcast test Signed-off-by: raver119 <raver119@gmail.com> * - NDArray::deviceId() propagation - special multi-threaded test for data locality checks Signed-off-by: raver119 <raver119@gmail.com> * DataBuffer additional syncStream Signed-off-by: raver119 <raver119@gmail.com> * DataBuffer additional syncStream Signed-off-by: raver119 <raver119@gmail.com> * one ignored test Signed-off-by: raver119 <raver119@gmail.com> * skip host alloc for empty arrays Signed-off-by: raver119 <raver119@gmail.com> * ByteBuffer support is back Signed-off-by: raver119 <raver119@gmail.com> * DataBuffer::memcpy minor fix Signed-off-by: raver119 <raver119@gmail.com> * few minor prelu/bp tweaks Signed-off-by: raver119 <raver119@gmail.com> * nullify-related fixes Signed-off-by: raver119 <raver119@gmail.com> * PReLU fixes (#157) Signed-off-by: Alex Black <blacka101@gmail.com> * Build fixed * Fix tests * one more ByteBuffer signature restored Signed-off-by: raver119 <raver119@gmail.com> * nd4j-jdbc-hsql profiles fix Signed-off-by: raver119 <raver119@gmail.com> * nd4j-jdbc-hsql profiles fix Signed-off-by: raver119 <raver119@gmail.com> * PReLU weight init fix Signed-off-by: Alex Black <blacka101@gmail.com> * Small PReLU fix Signed-off-by: Alex Black <blacka101@gmail.com> * - INDArray.migrate() reactivated - DataBuffer::setDeviceId(...) added - InteropDataBuffer Z syncToDevice added for views Signed-off-by: raver119 <raver119@gmail.com> * missed file Signed-off-by: raver119 <raver119@gmail.com> * Small tweak Signed-off-by: Alex Black <blacka101@gmail.com> * cuda 10.2 Signed-off-by: raver119 <raver119@gmail.com> * minor fix Signed-off-by: raver119 <raver119@gmail.com> Co-authored-by: shugeo <sgazeos@gmail.com> Co-authored-by: Alex Black <blacka101@gmail.com> Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>	2020-01-04 13:27:50 +03:00
shugeo	fbf7c9d38b	Fixed lu for cuda platform and tests. (#158 ) Signed-off-by: shugeo <sgazeos@gmail.com>	2020-01-02 23:25:41 +03:00
raver119	fc760de348	RgbToYuv & YuvToRgb skip empty arrays Signed-off-by: raver119 <raver119@gmail.com>	2019-12-24 18:45:54 +03:00
Oleh	75123b0a4c	[WIP] Oleh rgb yuv (#147 ) * libnd4j: RgbToYuv and YuvToRgb, both implementations for both cpu and cuda. Need adding tests and review Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: RgbToYuv and YuvToRgb, replace coords method on Tad in both cpu and cuda, add tests, fixed bugs Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: RgbToYuv and YuvToRgb minor corrections Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: RgbToYuv and YuvToRgb corrections to use operations in-place	2019-12-24 18:30:54 +03:00
Abdelrauf	39d43ca170	RgbToYiq and YiqToRgb operations (#142 ) * RgbToYiq and YiqToRgb Signed-off-by: Abdelrauf <rauf@konduit.ai> * CUDA impl for RgbToYiq and YiqToRgb Signed-off-by: raver119 <raver119@gmail.com> * remove print Signed-off-by: raver119 <raver119@gmail.com> * allow inplace for hsv,rgb,yiq ops Signed-off-by: Abdelrauf <rauf@konduit.ai> Co-authored-by: raver119 <raver119@gmail.com>	2019-12-24 15:20:35 +03:00
raver119	62f93ac211	negative handling for empty arrays (#146 ) Signed-off-by: raver119 <raver119@gmail.com>	2019-12-24 13:23:25 +03:00
raver119	495256c827	minor build fix (#139 ) Signed-off-by: raver119 <raver119@gmail.com>	2019-12-21 08:07:13 +03:00
Yurii Shyrma	5d9b2a16e5	Shyrma temp (#131 ) * - specifying template instantiation for certain types in float16 and bloat16 Signed-off-by: Yurii <iuriish@yahoo.com> * - polishing bfloat16 and float16 member functions template specialization Signed-off-by: Yurii <iuriish@yahoo.com> * - rewrite and overload array +-/ scalar and scalar +-/ arr in NDAray class Signed-off-by: Yurii <iuriish@yahoo.com> * - make corrections which have to do with and rvalue lvalue conversions Signed-off-by: Yurii <iuriish@yahoo.com> * - provide move semantic in NDArray operators array +-/* array Signed-off-by: Yurii <iuriish@yahoo.com> * float16/bfloat16 tweaks Signed-off-by: raver119 <raver119@gmail.com> * one more tweak Signed-off-by: raver119 <raver119@gmail.com> * - make float16 and bfloat16 to compile successfully on cuda Signed-off-by: Yurii <iuriish@yahoo.com> * - do not use resources of view-like arrays when move semantics is applied Signed-off-by: Yurii <iuriish@yahoo.com> * - get rid of pointers in signatures NDArray methods 1 Signed-off-by: Yurii <iuriish@yahoo.com> * - correction of signature of NDArray::dup method Signed-off-by: Yurii <iuriish@yahoo.com> * - correction of signature of NDArray::reduceAlongDimension method Signed-off-by: Yurii <iuriish@yahoo.com> * - signature correction of NDArray::applyIndexReduce and applyTrueBroadcast methods Signed-off-by: Yurii <iuriish@yahoo.com> * - signature correction of NDArray::applyReduce3 and varianceAlongDimension methods Signed-off-by: Yurii <iuriish@yahoo.com> * - signature correction of NDArray::tensorsAlongDimension and diagonal methods Signed-off-by: Yurii <iuriish@yahoo.com> * - signature correction of NDArray::allTensorsAlongDimension Signed-off-by: Yurii <iuriish@yahoo.com> * - signature correction of NDArray::reduceAlongDimension 2 Signed-off-by: Yurii <iuriish@yahoo.com> * - signature correction of NDArray::applyTransform 2 Signed-off-by: Yurii <iuriish@yahoo.com> * - signature correction of NDArray::applyPairwiseTransform 2 Signed-off-by: Yurii <iuriish@yahoo.com> * - signature correction of NDArray::applyBroadcast 2 Signed-off-by: Yurii <iuriish@yahoo.com> * - signature correction of NDArray::applyTrueBroadcast 2 Signed-off-by: Yurii <iuriish@yahoo.com> * - signature correction of NDArray::applyScalar and applyScalarArr Signed-off-by: Yurii <iuriish@yahoo.com> * - signature correction of NDArray::lambda methods Signed-off-by: Yurii <iuriish@yahoo.com> * - signature correction of NDArray::reduce3 methods 2 Signed-off-by: Yurii <iuriish@yahoo.com> * - signature correction of following NDArray methods: add/sub/mul/div row/column and fillAsTriangular Signed-off-by: Yurii <iuriish@yahoo.com> * - signature correction of NDArray::tileToShape methods Signed-off-by: Yurii <iuriish@yahoo.com> * - signature correction of NDArray::isShapeSameStrict method Signed-off-by: Yurii <iuriish@yahoo.com> * minor corrections in tests Signed-off-by: Yurii <iuriish@yahoo.com> * - replace reduce op in batchnorm mkldnn Signed-off-by: Yurii <iuriish@yahoo.com> * - add explicit templates instantiations for operator+(NDArray&&. const scalar) Signed-off-by: Yurii <iuriish@yahoo.com> * - corrections of casts in float16/bfloat16 Signed-off-by: Yurii <iuriish@yahoo.com> * - provide move semantics in following NDArray methods: transform, applyTrueBroadcast, transpose, reshape, permute Signed-off-by: Yurii <iuriish@yahoo.com> * - get rid of input array A duplicate in svd cuda op Signed-off-by: Yurii <iuriish@yahoo.com> * - avoid available bug in svd cuda API Signed-off-by: Yurii <iuriish@yahoo.com> * - add temporary global memory buffer in svd cuda when calcUV = false and m != n Signed-off-by: Yurii <iuriish@yahoo.com> * - remove test with blfoat16 type for betainC Signed-off-by: Yurii <iuriish@yahoo.com> * - resolve conflicts after master has been merged in Signed-off-by: Yurii <iuriish@yahoo.com> * - changed type of affected input array in fused_batch_norm Signed-off-by: Yurii <iuriish@yahoo.com> * - add several explicit type castings Signed-off-by: Yurii <iuriish@yahoo.com> * - add ND4J_EXPORT to operators Signed-off-by: Yurii <iuriish@yahoo.com> * - add explicit template types in instantiations of template arithm operators of NDArray class Signed-off-by: Yurii <iuriish@yahoo.com> * - one more test fix Signed-off-by: Yurii <iuriish@yahoo.com> Co-authored-by: raver119 <raver119@gmail.com>	2019-12-20 22:35:39 +03:00
Oleh	211c0df76f	Oleh rgb to gray scale (#138 ) * libnd4j: RgbToGrayscale op #8536 - raw implementation in user branch, need checks for integration and adding other orders Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: RgbToGrayscale op #8536 next step of merging images Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: RgbToGrayscale op #8536, Revert merge of hsv_to_rgb and rgb_to_hsv as cause conflicts in naming need refactoring before merge, implementation of rbg_to_grs added * libnd4j: RgbToGrayscale op #8536 imlementation and conflict resolve * libnd4j: RgbToGrayscale op #8536 merged operations with images into image, renamed methods and files * libnd4j: RgbToGrayscale op #8536 added test for rgbToGrayScale, need clarification and fixed tests case run Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: RgbToGrayscale op #8536 bug fixing and need review * libnd4j: RgbToGrayscale op #8536 some additional corrections after review Signed-off-by: Oleg <oleg.semeniv@gmail.com> * - minor corrections in rgbToGrs test1 Signed-off-by: Yurii <iuriish@yahoo.com> * libnd4j: RgbToGrayscale op #8536, corrected tests and rbf_to_grs, fixed problems, refactoring, need review * libnd4j: RgbToGrayscale op #8536 fix for 'f' order in rgbToGrs * libnd4j: RgbToGrayscale op #8536 fixed several bugs with dimC, test case refactoring and improve Signed-off-by: Oleg <oleg.semeniv@gmail.com> * - add cuda kernel for rgbToGrs op Signed-off-by: Yurii <iuriish@yahoo.com> * - fix linkage errors Signed-off-by: Yurii <iuriish@yahoo.com> Co-authored-by: Yurii Shyrma <iuriish@yahoo.com>	2019-12-20 20:59:29 +03:00
shugeo	67d8199165	[WIP] Shugeo lup (#126 ) * Added infrastructure for implementation op lu for both cuda and cpu platforms. * Added implementation of helpers with lu op. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored LU decomposition to use vector of permutations instead. * Refactored helpers for lu op. * Fixed crash with determinant op. * Refactored cpu LU op heleper. * Added implementation for lu op. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed issue with argmax on column. * Added multithreaded behaviour for lu op helper. * Fixed multithreaded cpu implementation helpers for lu op. * Added cuda implementation for lu op helper. * Finished lu helper implementation for cuda platform. * Eliminated waste prints and comments. * Fixed race condition and multithreading issues. * Fixed memory leak with shape construction. * Corrected test for lu op to avoid near zero elements on the main diagonal." Signed-off-by: shugeo <sgazeos@gmail.com> * Improved test for adjust_constast op. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed issues with cuda implementation of resize_bicubic helpers. Signed-off-by: shugeo <sgazeos@gmail.com>	2019-12-20 17:56:28 +03:00
shugeo	e303c06042	Shugeo pad fix3 (#132 ) * Expanding allowed paddings type to 64bit ints also. * Extended to int64 paddins data types for mirror_pad op. Signed-off-by: shugeo <sgazeos@gmail.com>	2019-12-19 13:14:02 +03:00
shugeo	fc7c6d4e82	Shugeo roll fix3 (#127 ) * Added tests for roll with scalar shift and axis. * Fixed problem with roll on 1D input with scalar axis and test. * Only cosmetic changes.	2019-12-19 13:10:06 +03:00
Alexander Stoyakin	f5068f3980	Added missing Java ops wrappers (#122 ) * Timeouts added * Added some ops * Ops added * Fixed tests * Minor fix * Some fixes * Digamma added * Small fixes * Timeouts added * Added some ops * Ops added * Fixed tests * Minor fix * Some fixes * Digamma added * Small fixes * Fused batch norm fixes- Signed-off-by: AlexDBlack <blacka101@gmail.com> * Tests switched off. * Added test for resize_bicubic. * Eliminated wasted in test of bicubic resize. * Switched off multithreading explicit. * HsvToRgb and RgbToHsv added * Eliminated waste comments and conform proper float constants. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed multithreading with resize_bicubic helper for cpu platform. Signed-off-by: shugeo <sgazeos@gmail.com> * ResizeBicubic was fixed. * Some fixes * Fix op name * Validation fixed. * Clarifications for tests * Wrappers and small fixes for new ops.	2019-12-19 20:15:48 +11:00
Abdelrauf	e0a9cb6c08	[WIP] HSV,RGB color model conversions (#125 ) * CUDA implementation for hsv_to_rgb and rgb_to_hsv Signed-off-by: raver119 <raver119@gmail.com> * hsv_to_rgb and rgb_to_hsv operations Test coverage: c order 1d, 2d, 3d array Signed-off-by: Abdelrauf <rauf@konduit.ai> * Index check Signed-off-by: Abdelrauf <rauf@konduit.ai> * Suppress Msvc floating point errors Signed-off-by: Abdelrauf <rauf@konduit.ai> * Added Index Check for adjust_saturation and adjust_hue Signed-off-by: Abdelrauf <rauf@konduit.ai> * minor fix Signed-off-by: raver119 <raver119@gmail.com> * Fixes missed Msvc floating narrowing errors Signed-off-by: Abdelrauf <rauf@konduit.ai>	2019-12-17 09:42:09 +03:00
Alexander Stoyakin	927d591421	ResizeBicubic added (#117 ) * ResizeBicubic added Some fixes. * Test fixed * Narrowed argument type changed to boolean * Clean up	2019-12-09 18:25:39 +11:00
raver119	70e08c3a6c	reshape validation fix Signed-off-by: raver119 <raver119@gmail.com>	2019-12-06 21:19:33 +03:00
raver119	b32dd1bf92	[WIP] resize_bicubic types (#116 ) * resize_bicubic: allow more dtypes Signed-off-by: raver119 <raver119@gmail.com> * resize_bicubic: allow less dtypes Signed-off-by: raver119 <raver119@gmail.com> * Refactored resize_bicubic op to full conform with TF1.5 and tests. * Corrected test to proper data type output. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected double input test to float constant outputs. Signed-off-by: shugeo <sgazeos@gmail.com> * Finished with correction of tests for bicubic interpolated resizes expected. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed adjust_contrast ops to allow non-RGB inputs. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored adjust_contrast_v2 to conform with TF one. Signed-off-by: shugeo <sgazeos@gmail.com> * AdjustContrast tests activated * two typos fixed Signed-off-by: raver119 <raver119@gmail.com>	2019-12-06 18:58:37 +03:00
shugeo	e09a785232	Shugeo resize fix5 (#102 ) * Refactored resize images ops to use TF-like bool args as input. * Refactored helpers for cpu implementation of resize_bilinear and resize_nearest_neighbor ops. * Refactored cuda implementation for image.resize_bilinear and image.resize_nearest_neighbor ops helpers. * Refactored nearest_neighbor resize op. * Added a pair of tests for special case of resize_bilinear algorithm. * Fixed issue with resize_bilinear op. * Refactored cpu implementation for helpers with resize_nearest_neighbor op. * Final fixed for resize ops to conform TF v.1.5 * Refactored cuda helpers for resize_neares_neighbor op. * Fixed resize_bilinear to accept proper data. * Fixed issue with non-float input for resize_bilinear op. * Refactored cuda helper for resize_bilinear to proper process non-float inputs. * Added tests for resize_bilinear to int inputs. * Fixed ResizeBilinear wrapper * Tests fixed * Fixed float and bool constant to avoid overflow for some kind of compilers. * Corrected float constants with float data type. * Added f suffix for float constants. * Corrected float constant to avoid overflow with initializing lists. * Corrected float initializing list with float input. * Corrected bool constant with initalizing list. * Corrected float and bool values with initializing lists. * Fixed wrong constant. * Fixed issue with 1x1 input picture for resize. * ResizeBilinear default values on import fix Signed-off-by: raver119 <raver119@gmail.com>	2019-12-05 22:05:33 +03:00
Alex Black	2052ce7026	Various pre-release fixes (#111 ) * Various fixes Signed-off-by: AlexDBlack <blacka101@gmail.com> * Fix default dtypes for MaxPoolWithArgmax Signed-off-by: AlexDBlack <blacka101@gmail.com>	2019-12-05 14:20:03 +11:00
Alex Black	578a5abb68	DNNL/MKLDNN dilated causal conv1d + betainc (#103 ) * - add padding calculation in same mode in causal conv1d op for right mkl paddings Signed-off-by: Yurii <iuriish@yahoo.com> * - correct causal condition in mkldnnUtils.cpp Signed-off-by: Yurii <iuriish@yahoo.com> * - correct some code which caused additional round errors is betainc op Signed-off-by: Yurii <iuriish@yahoo.com> * - put float in place of template parameter in nan assign in betainc op Signed-off-by: Yurii <iuriish@yahoo.com>	2019-12-04 14:50:17 +03:00
shugeo	190575196c	Refactored pad and mirror_pad ops to conform with TF. (#100 )	2019-12-03 15:06:38 +03:00
Yurii Shyrma	1f5e15b541	Shyrma adjust (#98 ) * - add possibility of passing scalar-array as input parameter for scale factor in adjust hue/contrast/saturation ops - correct typo in function which calculates regularized incomplete beta integral Signed-off-by: Yurii <iuriish@yahoo.com> * - fix bug in betainc cuda kernel Signed-off-by: Yurii <iuriish@yahoo.com> * - start working on implementation of digamma function Signed-off-by: Yurii <iuriish@yahoo.com> * - further work on digamma function (cpu) Signed-off-by: Yurii <iuriish@yahoo.com> * - testing and fixing bugs in digamma op Signed-off-by: Yurii <iuriish@yahoo.com> * - make correction n cuda kernel for polyGamma Signed-off-by: Yurii <iuriish@yahoo.com> * - remove unnecessary stuff from betaInc cuda kernel Signed-off-by: Yurii <iuriish@yahoo.com> * - resolve conflicts in DeclarableOpsTests3.cpp after master branch has been merged Signed-off-by: Yurii <iuriish@yahoo.com> * - restore id number of Not opertion in legacy_ops.h Signed-off-by: Yurii <iuriish@yahoo.com> * - correct padding calculation in mkl dnn conv1d causal Signed-off-by: Yurii <iuriish@yahoo.com> * restore empty check in adjust_contrast_v2 Signed-off-by: raver119 <raver119@gmail.com>	2019-12-03 09:40:45 +03:00
shugeo	1e9ff114aa	Shugeo atomic tests (#97 ) * Added atomic tests for atomicAdd, atomicSub and atomicDiv. * Fixed atomicAdd for 16bit ints. * Fixed atomicMul for 16 floats. * Eliminated waste prints. * Fixed problems with double type on matrix inverse helepers. * Eliminated commented wrong code. * Refactored atomicMul for 16bit types. * few more minor tweaks Signed-off-by: raver119 <raver119@gmail.com> * Fixed fake_quant_with_min_max_vars_per_channel args processing.	2019-12-02 21:40:54 +03:00
raver119	25b3cd9b80	[WIP] CUDA tests (#95 ) * one more CI test Signed-off-by: raver119 <raver119@gmail.com> * export additional symbols Signed-off-by: raver119 <raver119@gmail.com> * few more tweaks Signed-off-by: raver119 <raver119@gmail.com> * one more tweak for linux Signed-off-by: raver119 <raver119@gmail.com> * fix dtype in few tests Signed-off-by: raver119 <raver119@gmail.com> * missing sync and memset in couple of tests Signed-off-by: raver119 <raver119@gmail.com> * copy step for libnd4j cuda Signed-off-by: raver119 <raver119@gmail.com> * no-op on empty for adjust hue/contrast/saturation Signed-off-by: raver119 <raver119@gmail.com> * CUDA_VERBOSE Off Signed-off-by: raver119 <raver119@gmail.com> * BroadcastBool fix + few tests Signed-off-by: raver119 <raver119@gmail.com> * trigger jenkins Signed-off-by: raver119 <raver119@gmail.com> * trigger jenkins Signed-off-by: raver119 <raver119@gmail.com> * - ignore couple of warnings - remove redundant compiler options Signed-off-by: raver119 <raver119@gmail.com>	2019-12-02 21:37:21 +03:00
raver119	4ada65b384	[WIP] MSVC-related tests fixes (#88 ) * fix narrowing down cast Signed-off-by: raver119 <raver119@gmail.com> * trigger jenkins Signed-off-by: raver119 <raver119@gmail.com> * few more fixes for MSVC and Windows Signed-off-by: raver119 <raver119@gmail.com> * few more fixes for MSVC and Windows Signed-off-by: raver119 <raver119@gmail.com> * few more fixes for MSVC and Windows Signed-off-by: raver119 <raver119@gmail.com> * few more fixes for MSVC and Windows Signed-off-by: raver119 <raver119@gmail.com> * few more tweaks Signed-off-by: raver119 <raver119@gmail.com> * few more tweaks Signed-off-by: raver119 <raver119@gmail.com> * few more tweaks Signed-off-by: raver119 <raver119@gmail.com> * few more tweaks Signed-off-by: raver119 <raver119@gmail.com> * few more tweaks Signed-off-by: raver119 <raver119@gmail.com> * - few more tweaks - tensormmul dtype validation Signed-off-by: raver119 <raver119@gmail.com> * - few more tweaks - batched gemm dtype validation Signed-off-by: raver119 <raver119@gmail.com> * - few more tweaks Signed-off-by: raver119 <raver119@gmail.com> * - few more tweaks Signed-off-by: raver119 <raver119@gmail.com> * - few more tweaks Signed-off-by: raver119 <raver119@gmail.com> * - few more tweaks Signed-off-by: raver119 <raver119@gmail.com>	2019-11-30 16:02:07 +03:00
shugeo	dc66a52bc7	[WIP] Shugeo release fixes4 (#91 ) * Fixed fake_quant_with_min_max_vars op. * Refactored bitcast op. * bad linspace removed Signed-off-by: raver119 <raver119@gmail.com> * Corrected tests for bitcast op. * Eliminated debug prints. * one fix Signed-off-by: raver119 <raver119@gmail.com> * one fix Signed-off-by: raver119 <raver119@gmail.com> * Added a pair of comments.	2019-11-29 16:05:08 +03:00
Yurii Shyrma	d19eeaec52	Shyrma casual conv1d (#90 ) * - add causal mode of padding to convolutions Signed-off-by: Yurii <iuriish@yahoo.com> * - add additional tests for causal conv1d Signed-off-by: Yurii <iuriish@yahoo.com> * - add causal mode for cuda conv kernels Signed-off-by: Yurii <iuriish@yahoo.com> * Java side of Conv1D changes Signed-off-by: raver119 <raver119@gmail.com> * Add Conv1DDerivative op Signed-off-by: Alex Black <blacka101@gmail.com> * Causal Conv1D gradient checks Signed-off-by: Alex Black <blacka101@gmail.com> * Tweaks Signed-off-by: Alex Black <blacka101@gmail.com> * - add causal padding mode to conv2d_bp Signed-off-by: Yurii <iuriish@yahoo.com> * More thorough causal conv1d tests Signed-off-by: Alex Black <blacka101@gmail.com>	2019-11-29 14:14:30 +03:00
shugeo	009007120b	Shugeo_release_fixes3 (#81 ) * Implementation for non_max_suppression_v3 was added. Initial version * Added check for overcome threshold. * Added definition for V3 method. * java remapping for NonMaxSuppressionV3 Signed-off-by: raver119 <raver119@gmail.com> * Fixed proporly processing of an empty output and test. * Refactored op to less threshold data to float. * Implemented cuda-based helper for non_max_suppression_v3 op. * Fixed fake_quant_with_min_max_vars op. * Fixed tests with float numbers. * - assert now stops execution - sortByKey/sortByValue now have input validation Signed-off-by: raver119 <raver119@gmail.com> * missing var Signed-off-by: raver119 <raver119@gmail.com> * Fixed proper processing for zero max_size inputs. * Refactored kernel callers. * Fixed return statement for logdet op helper. * Refactored unsorted segment SqrtN op. * get back 8 tail bytes on CUDA Signed-off-by: raver119 <raver119@gmail.com> * Refactored segment prod ops and helpers for cuda and tests. * Additional test. * CudaWorkspace tests updated for 8 tail bytes Signed-off-by: raver119 <raver119@gmail.com> * special atomic test Signed-off-by: raver119 <raver119@gmail.com> * atomicMul/atomicDiv fix for 16bit values Signed-off-by: raver119 <raver119@gmail.com> * Eliminated waste prints.	2019-11-28 21:08:51 +03:00

1 2 3 4

200 Commits