cavis

Author	SHA1	Message	Date
shugeo	f0c684020f	Shugeo resize area fix4 (#229 ) * Fixed a couple of issues with resize_area op. Signed-off-by: shugeo <sgazeos@gmail.com> * Added additional test for alternate params for resize_area testing. Signed-off-by: shugeo <sgazeos@gmail.com>	2020-02-12 19:02:42 +03:00
Abdelrauf	bead656feb	Initial performance improvement for Bias Add and etc #8556 (#217 ) * Initial performance improvement for Bias Add, loop coords helpers and increment aligned parallel threading Signed-off-by: AbdelRauf <rauf@konduit.ai> * One more test for Rauf Signed-off-by: raver119 <raver119@gmail.com> * disable couple of perf tests Signed-off-by: raver119 <raver119@gmail.com> Co-authored-by: raver119 <raver119@gmail.com>	2020-02-08 15:31:30 +03:00
shugeo	5ae40f6e38	Shugeo sequence mask fix2 (#216 ) * Fixed sequence_mask op and tests. Signed-off-by: shugeo <sgazeos@gmail.com> * Cuda fix for sequence_mask op. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed sequence_mask op for both platforms and tests. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed solve and triangular_solve for more than 2D for adjoint cases. Signed-off-by: shugeo <sgazeos@gmail.com> * Added adjoint solve test again. Signed-off-by: shugeo <sgazeos@gmail.com> * Added a set of tests for triangual_solve and generic solve ops. Signed-off-by: shugeo <sgazeos@gmail.com> * Added a pair tests for triangular_solve Signed-off-by: shugeo <sgazeos@gmail.com> * Added tests for triangular_solve op. Signed-off-by: shugeo <sgazeos@gmail.com>	2020-02-06 21:06:50 +03:00
shugeo	41ff907bc6	Shugeo solve linear (#191 ) * linear equations systems solve op. Initial commit. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed compiling issues. Signed-off-by: shugeo <sgazeos@gmail.com> * Linear equations systems solve. The next stage commit. Signed-off-by: shugeo <sgazeos@gmail.com> * Added test for linear equations systems solve operation. Signed-off-by: shugeo <sgazeos@gmail.com> * Added additional test and fixed lower matrix retrievance. * Implementation for solve of the systems of linear equations." Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored permutation generation. Signed-off-by: shugeo <sgazeos@gmail.com> * Added restore for permutations batched with cuda helper for solve op. Signed-off-by: shugeo <sgazeos@gmail.com> * Finished cuda implementation for solve op helpers. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored cpu helpers for solve op. Signed-off-by: shugeo <sgazeos@gmail.com> * Fix gtest output on Windows * Fixed issue with permutation matrix for cuda implementation. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed issue with permutation matrix for cpu implementation. Signed-off-by: shugeo <sgazeos@gmail.com> * Eliminated waste comments. Signed-off-by: shugeo <sgazeos@gmail.com> * LinearSolve added * Mapping added * Javadoc added * Refactored implementation of triangular_solve helpers and tests for solve matrix equations generally. Signed-off-by: shugeo <sgazeos@gmail.com> * Added a test for solve op. Signed-off-by: shugeo <sgazeos@gmail.com> * Solve test added * Fix for TF import Co-authored-by: Serhii Shepel <9946053+sshepel@users.noreply.github.com> Co-authored-by: raver119 <raver119@gmail.com> Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>	2020-02-04 08:59:11 +03:00
raver119	ba961c7601	DataTypes & FlatBuffers (#197 ) * flatbuffers version upgrade Signed-off-by: raver119 <raver119@gmail.com> * flatbuffers version upgrade java side Signed-off-by: raver119 <raver119@gmail.com> * flatbuffers dependency version upgrade java side Signed-off-by: raver119 <raver119@gmail.com> * MKLDNN version upgrade Signed-off-by: raver119 <raver119@gmail.com> * DArgs first pass Signed-off-by: raver119 <raver119@gmail.com> * signatures first pass Signed-off-by: raver119 <raver119@gmail.com> * signatures second pass Signed-off-by: raver119 <raver119@gmail.com> * signatures third pass Signed-off-by: raver119 <raver119@gmail.com> * signatures third pass Signed-off-by: raver119 <raver119@gmail.com> * signatures fourth pass Signed-off-by: raver119 <raver119@gmail.com> * signatures fifth pass Signed-off-by: raver119 <raver119@gmail.com> * flatbuffers UI version upgrade java side Signed-off-by: raver119 <raver119@gmail.com> * flatbuffers ui update Signed-off-by: raver119 <raver119@gmail.com> * flatbuffers downgrade Signed-off-by: raver119 <raver119@gmail.com> * flatbuffers downgrade java side Signed-off-by: raver119 <raver119@gmail.com>	2020-01-30 10:07:24 +03:00
shugeo	99a54829c2	Shugeo resize area fix2 (#181 ) * Added test for issue with resize_area op. Signed-off-by: shugeo <sgazeos@gmail.com> * Added a pair of tests for resize_are op. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored resize_area kernel to avoid shared memory overflow. Signed-off-by: shugeo <sgazeos@gmail.com> * Eliminated prints with tests. Signed-off-by: shugeo <sgazeos@gmail.com> * ignore bad test Signed-off-by: raver119 <raver119@gmail.com> * Fixed test with resize_area. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed test for float constants. Signed-off-by: shugeo <sgazeos@gmail.com> Co-authored-by: raver119 <raver119@gmail.com>	2020-01-24 20:55:25 +03:00
raver119	5d69069177	[WIP] Memory limits (#167 ) * initial commit Signed-off-by: raver119 <raver119@gmail.com> * one more initial commit Signed-off-by: raver119 <raver119@gmail.com> * additional initial commit Signed-off-by: raver119 <raver119@gmail.com> * subsequent initial commit Signed-off-by: raver119 <raver119@gmail.com> * initial commit testing Signed-off-by: raver119 <raver119@gmail.com> * initial commit per device Signed-off-by: raver119 <raver119@gmail.com> * initial commit per group Signed-off-by: raver119 <raver119@gmail.com> * initial commit for cuda Signed-off-by: raver119 <raver119@gmail.com> * initial commit for cuda + few missed lines Signed-off-by: raver119 <raver119@gmail.com> * initial commit for cuda + missed includes Signed-off-by: raver119 <raver119@gmail.com> * initial commit for cuda + one more missed include Signed-off-by: raver119 <raver119@gmail.com> * initial commit shouldn't count host mem as dev0 in cuda Signed-off-by: raver119 <raver119@gmail.com> * initial commit that tracks HOST group limits for CUDA Signed-off-by: raver119 <raver119@gmail.com> * initial commit with some Environment changes Signed-off-by: raver119 <raver119@gmail.com> * initial commit with more Environment changes Signed-off-by: raver119 <raver119@gmail.com> * initial commit with maxMasterThreads fix Signed-off-by: raver119 <raver119@gmail.com> * initial commit with maxMasterThreads fix Signed-off-by: raver119 <raver119@gmail.com> * initial commit without maxMasterThreads exception Signed-off-by: raver119 <raver119@gmail.com> * initial commit without Nd4jULong in Environment Signed-off-by: raver119 <raver119@gmail.com> * add sleep and more iterations for OOM cases Signed-off-by: raver119 <raver119@gmail.com> * limits propagation from java side Signed-off-by: raver119 <raver119@gmail.com> * - consume ErrorCode every time - one test for memory limits Signed-off-by: raver119 <raver119@gmail.com> * unordered_map Signed-off-by: raver119 <raver119@gmail.com> * unordered_map Signed-off-by: raver119 <raver119@gmail.com> * unordered_map Signed-off-by: raver119 <raver119@gmail.com> * RSub op mapping fixed Signed-off-by: raver119 <raver119@gmail.com> * typo fixed Signed-off-by: raver119 <raver119@gmail.com> * one bad test fixed Signed-off-by: raver119 <raver119@gmail.com>	2020-01-24 10:11:09 +03:00
shugeo	2717b25931	Shugeo qr (#153 ) * Added qr op implementation. Initial version. * Fixed doc for qr op. Signed-off-by: shugeo <sgazeos@gmail.com> * Implementation of QR decomposition. CPU platform version. * Added a pair of tests for qr op testing. Signed-off-by: shugeo <sgazeos@gmail.com> * QR implementation. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected norm using. * Properly calculated intermediate results with QR decomposition. * Another step to implement QR algorithm by householder. * Cpu implementatio for QR decomposition. The first working edition. * Corrected test to QR decomposition. * Added tad multithreading with QR implementation. * Finished cpu implementation for QR decomposition helpers. * Refactored tests and improved multithreading. * Refactored QR cpu implementation and update cuda implementation helpers. * Cuda QR helper implementation. The first working edition. * Eliminated waste prints. * Restore multithreading with cuda implementation. * Ops names corrected * Refactored qr op helpers to optimize. Signed-off-by: shugeo <sgazeos@gmail.com> * Eliminated waste manual ticking. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored memory allocation to avoid waste memory usage. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored matrixMinor method both for cuda and cpu platforms. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored method of vmul to use raw buffers instead type conversion. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored temporary array of matricies. Signed-off-by: shugeo <sgazeos@gmail.com> Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> Co-authored-by: raver119 <raver119@gmail.com>	2020-01-22 13:59:36 +03:00
shugeo	815a2908af	Shugeo solve triangular (#173 ) * Added implementation of the triangular_solve op. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed compilation issues. Signed-off-by: shugeo <sgazeos@gmail.com> * Added verification of input data and helpers facilities for triangular_solve op.' Signed-off-by: shugeo <sgazeos@gmail.com> * Added cpu implementation for triangular_solve helpers. * Added tests and implementation for upper triangular equations. Signed-off-by: shugeo <sgazeos@gmail.com> * Added a pair of cases to tests. Signed-off-by: shugeo <sgazeos@gmail.com> * Added multithreading with cpu helpers for triangular_solve op. Signed-off-by: shugeo <sgazeos@gmail.com> * Added cuda implementation of triangular_solve op helpers. Signed-off-by: shugeo <sgazeos@gmail.com> * Finished cuda implementation of triangular_solve helpers and tests. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed copyright marks. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected grammar errors with doc and error messages. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored matricies processing with triangular_solve cuda helper implementation. Signed-off-by: shugeo <sgazeos@gmail.com> * Added triangular_solve wrapper * Fixed mapping * Added processing for adjoint with cpu helpers of triangular_solve op implementation. Signed-off-by: shugeo <sgazeos@gmail.com> * Added implementation for adjoint routine with cuda platform. Signed-off-by: shugeo <sgazeos@gmail.com> * Added multithreading with adjoint routine for cpu platform. Signed-off-by: shugeo <sgazeos@gmail.com> Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>	2020-01-22 10:48:03 +03:00
shugeo	e50b285c2c	Shugeo resize area (#162 ) * Added implementation for resize_area op. Initial commit. * Added implementation of resize_area op. Initial revision. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected resizeArea functor call. Signed-off-by: shugeo <sgazeos@gmail.com> * Implementation of resize_area. Cpu platform helpers. Signed-off-by: shugeo <sgazeos@gmail.com> * Implementation for resize_area helpers. The first part revision. Signed-off-by: shugeo <sgazeos@gmail.com> * Added a set of tests for resize_area op. Signed-off-by: shugeo <sgazeos@gmail.com> * Cuda implementation for resize_area. Initial approach. Signed-off-by: shugeo <sgazeos@gmail.com> * Adding multithreading for resize_area algorithm. Signed-off-by: shugeo <sgazeos@gmail.com> * Cuda implementation of resize_area helpers. Shared memory approach. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored resizeAreaKernel with cuda implementation. * Eliminated compilation errors. * ResizeArea helpers for cuda platform. The first working revision. Signed-off-by: shugeo <sgazeos@gmail.com> * Added test for batched resize_area op testing. Signed-off-by: shugeo <sgazeos@gmail.com> * Implementation of resize_are for cuda platform and tests. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed multithreading with resize_area op helper. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected copyright marks with sources. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected copyright mark for resize_area op implementation. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected copyright mark for parity ops header. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected typo in strings and so on with image resize ops. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored resize_area helpers and multithreading. Signed-off-by: shugeo <sgazeos@gmail.com> * Added ResizeArea wrapper * Added test with align_corners and fixed shape processing with only int args given for output size. Signed-off-by: shugeo <sgazeos@gmail.com> * Added test * TF mapping for ResizeArea * Fixed implementation issues with resize_area op for both platforms. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored image resizer struct to use flexible types for ints and floats. Signed-off-by: shugeo <sgazeos@gmail.com> * Improved multithreading with resizeAreaKernel launch. Signed-off-by: shugeo <sgazeos@gmail.com> * Use asynchronical memory copying with cuda platform image resize allocations. Signed-off-by: shugeo <sgazeos@gmail.com> Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>	2020-01-22 10:46:33 +03:00
raver119	7783012f39	cuDNN integration (#150 ) * initial commit Signed-off-by: raver119 <raver119@gmail.com> * one file Signed-off-by: raver119 <raver119@gmail.com> * few more includes Signed-off-by: raver119 <raver119@gmail.com> * m? Signed-off-by: raver119 <raver119@gmail.com> * const Signed-off-by: raver119 <raver119@gmail.com> * cudnn linkage in tests Signed-off-by: raver119 <raver119@gmail.com> * culibos Signed-off-by: raver119 <raver119@gmail.com> * static reminder Signed-off-by: raver119 <raver119@gmail.com> * platform engine tag Signed-off-by: raver119 <raver119@gmail.com> * HAVE_CUDNN moved to config.h.in Signed-off-by: raver119 <raver119@gmail.com> * include Signed-off-by: raver119 <raver119@gmail.com> * include Signed-off-by: raver119 <raver119@gmail.com> * skip cudnn handle creation if there's not cudnn Signed-off-by: raver119 <raver119@gmail.com> * meh Signed-off-by: raver119 <raver119@gmail.com> * target device in context Signed-off-by: raver119 <raver119@gmail.com> * platform engines Signed-off-by: raver119 <raver119@gmail.com> * platform engines Signed-off-by: raver119 <raver119@gmail.com> * allow multiple -h args Signed-off-by: raver119 <raver119@gmail.com> * allow multiple -h args Signed-off-by: raver119 <raver119@gmail.com> * move mkldnn out of CPU block Signed-off-by: raver119 <raver119@gmail.com> * link to mkldnn on cuda Signed-off-by: raver119 <raver119@gmail.com> * less prints Signed-off-by: raver119 <raver119@gmail.com> * minor tweaks Signed-off-by: raver119 <raver119@gmail.com> * next step Signed-off-by: raver119 <raver119@gmail.com> * conv2d NCHW draft Signed-off-by: raver119 <raver119@gmail.com> * conv2d biasAdd Signed-off-by: raver119 <raver119@gmail.com> * test for MKL/CUDNN combined use Signed-off-by: raver119 <raver119@gmail.com> * - provide additional code for conv2d ff based on cudnn api, not tested yet Signed-off-by: Yurii <iuriish@yahoo.com> * - further work on conv2d helper based on using cudnn api Signed-off-by: Yurii <iuriish@yahoo.com> * - fixing several cuda bugs which appeared after cudnn lib had been started to use Signed-off-by: Yurii <iuriish@yahoo.com> * - implementation of conv2d backprop op based on cudnn api Signed-off-by: Yurii <iuriish@yahoo.com> * - implementaion of conv3d and conv3d_bp ops based on cudnn api Signed-off-by: Yurii <iuriish@yahoo.com> * - bugs fixing in conv3d/conv3d_bp ops (cudnn in use) Signed-off-by: Yurii <iuriish@yahoo.com> * - implementation of depthwiseConv2d (ff/bp) op based on cudnn api Signed-off-by: Yurii <iuriish@yahoo.com> * - implementation of batchnorm ff op based on cudnn api Signed-off-by: Yurii <iuriish@yahoo.com> * - disable cudnn batchnorm temporary Signed-off-by: Yurii <iuriish@yahoo.com> * - add minor change in cmake Signed-off-by: Yurii <iuriish@yahoo.com> * engine for depthwise mkldnn Signed-off-by: raver119 <raver119@gmail.com> * couple of includes Signed-off-by: raver119 <raver119@gmail.com> * - provide permutation to cudnn batchnorm ff when format is NHWC Signed-off-by: Yurii <iuriish@yahoo.com> * lgamma fix Signed-off-by: raver119 <raver119@gmail.com> * - eliminate memory leak in two tests Signed-off-by: Yurii <iuriish@yahoo.com> Co-authored-by: Yurii Shyrma <iuriish@yahoo.com>	2020-01-20 21:32:46 +03:00
shugeo	6943a5f57a	Shugeo lgamma (#170 ) * lgamma op. Initial version. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored lgamma op and test. Signed-off-by: shugeo <sgazeos@gmail.com> * Lgamma wrapper * Added TF mapping Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>	2020-01-20 12:29:36 +03:00
Yurii Shyrma	bbf88b53dd	- fix wrong calculation of elements offsets in batchnorm op when input arrays have unusual (#169 ) Signed-off-by: Yurii <iuriish@yahoo.com>	2020-01-11 00:14:20 +03:00
Oleh	2404be5fe0	Oleh multinomial (#163 ) * libnd4j: Multinomial op #8570 first raw step of multinomial random data generator implementation Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: Multinomial op #8570 next step of multinomial random categories generator implementation on both cpu and cuda, need corrections and code clean up before review and testing * libnd4j: Multinomial op #8570 code clean up and fixed issues data selecting, moved from coords to tads * libnd4j: Multinomial op #8570 fixed cuda build add reference for math materials that was used for implementation * libnd4j: Multinomial op #8570 fixed several bugs, added several tests and improved cuda version. current implementation works, need testing of reproduction with the same seed * libnd4j: Multinomial op #8570 fixes and optimization after discussion in both cuda and cpu * libnd4j: Multinomial op #8570 add corrections after review, removed tads, replace 2D parallel loop by 3D Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: Multinomial op fixed declaration and add tests need discussion * libnd4j: Multinomial op fix in test * libnd4j: Multinomial op corrected behavior to get reproducible results, fixed issue in uniform value getting, tests added, need cuda review and cuda testing Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: Multinomial op fixed indexing on uniform calculation Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: Multinomial op some corrections in max min declaration Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: Multinomial op fixed index calculation, added rewind, corrected input declaration, added stats tests, both cuda and cpu. cuda need testing * libnd4j: Multinomial op fixed bugs on cuda nad cpu. need review Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: Multinomial op corrected tests to handle different orders Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: Multinomial op some improvements after code review Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: Multinomial op more corrections after review Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: Multinomial op fixed seed usage, update tests, fixed cuda based on comments, fixed bug of rewind, removed one behavior, minor corrections. Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: Multinomial op minor corrections Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: Multinomial op rise the bound of fluctuation for random cases Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: Multinomial op modified operation inputs and update implementation and tests on both cpu and cuda * libnd4j: Multinomial op corrected data types according ops.proto Co-authored-by: raver119 <raver119@gmail.com>	2020-01-06 22:35:05 +03:00
raver119	d9ef5e2467	Minor fixes (#165 ) * ios-arm excluded Signed-off-by: raver119 <raver119@gmail.com> * histogram single threaded Signed-off-by: raver119 <raver119@gmail.com>	2020-01-04 15:27:16 +03:00
raver119	29e8e09db6	String changes (#3 ) * initial commit * additional data types & tensor type Signed-off-by: raver119 <raver119@gmail.com> * next step Signed-off-by: raver119 <raver119@gmail.com> * missing include * sparse_to_dense Signed-off-by: raver119 <raver119@gmail.com> * few more tests files Signed-off-by: raver119 <raver119@gmail.com> * draft Signed-off-by: raver119 <raver119@gmail.com> * numeric sparse_to_dense Signed-off-by: raver119 <raver119@gmail.com> * comment Signed-off-by: raver119 <raver119@gmail.com> * string sparse_to_dense version Signed-off-by: raver119 <raver119@gmail.com> * CUDA DataBuffer expand Signed-off-by: raver119 <raver119@gmail.com> * few tweaks for CUDA build Signed-off-by: raver119 <raver119@gmail.com> * shape fn for string_split Signed-off-by: raver119 <raver119@gmail.com> * one more comment Signed-off-by: raver119 <raver119@gmail.com> * string_split indices Signed-off-by: raver119 <raver119@gmail.com> * next step Signed-off-by: raver119 <raver119@gmail.com> * test passes Signed-off-by: raver119 <raver119@gmail.com> * few rearrangements for databuffer implementations Signed-off-by: raver119 <raver119@gmail.com> * DataBuffer: move inline methods to common implementations Signed-off-by: raver119 <raver119@gmail.com> * add native DataBuffer to Nd4j presets Signed-off-by: raver119 <raver119@gmail.com> * DataBuffer creation Signed-off-by: raver119 <raver119@gmail.com> * use DataBuffer for allocation Signed-off-by: raver119 <raver119@gmail.com> * cpu databuffer as deallocatable Signed-off-by: raver119 <raver119@gmail.com> * DataBuffer setters for bufers Signed-off-by: raver119 <raver119@gmail.com> * couple of wrappers Signed-off-by: raver119 <raver119@gmail.com> * DataBuffers being passed around Signed-off-by: raver119 <raver119@gmail.com> * Bunch of ByteBuffer-related signatures gone Signed-off-by: raver119 <raver119@gmail.com> * - few more Nd4j signatures removed - minor fix for bfloat16 Signed-off-by: raver119 <raver119@gmail.com> * nullptr pointer is still a pointer, but 0 as address :) Signed-off-by: raver119 <raver119@gmail.com> * one special test Signed-off-by: raver119 <raver119@gmail.com> * empty string array init Signed-off-by: raver119 <raver119@gmail.com> * one more test in cpp Signed-off-by: raver119 <raver119@gmail.com> * memcpy instead of databuffer swap Signed-off-by: raver119 <raver119@gmail.com> * special InteropDataBuffer for front-end languages Signed-off-by: raver119 <raver119@gmail.com> * few tweaks for java Signed-off-by: raver119 <raver119@gmail.com> * pointer/indexer actualization Signed-off-by: raver119 <raver119@gmail.com> * CustomOp returns list for inputArumgents and outputArguments instead of array Signed-off-by: raver119 <raver119@gmail.com> * redundant call Signed-off-by: raver119 <raver119@gmail.com> * print_variable op Signed-off-by: raver119 <raver119@gmail.com> * - view handling (but wrong one) - print_variable java wrapper Signed-off-by: raver119 <raver119@gmail.com> * one more test Signed-off-by: raver119 <raver119@gmail.com> * - empty arrays handling Signed-off-by: raver119 <raver119@gmail.com> * - deserialization works now Signed-off-by: raver119 <raver119@gmail.com> * minor fix Signed-off-by: raver119 <raver119@gmail.com> * meh Signed-off-by: raver119 <raver119@gmail.com> * one more fix Signed-off-by: raver119 <raver119@gmail.com> * initial cuda commit Signed-off-by: raver119 <raver119@gmail.com> * print_variable message validation Signed-off-by: raver119 <raver119@gmail.com> * CUDA views Signed-off-by: raver119 <raver119@gmail.com> * CUDA special buffer size Signed-off-by: raver119 <raver119@gmail.com> * minor update to match master changes Signed-off-by: raver119 <raver119@gmail.com> * - consider arrays always actual on device for CUDA - additional PrintVariable constructor - CudaUtf8Buffer now allocates host buffer by default Signed-off-by: raver119 <raver119@gmail.com> * meh Signed-off-by: raver119 <raver119@gmail.com> * - print_variable now allows print from device Signed-off-by: raver119 <raver119@gmail.com> * InteropDataBuffer data type fix Signed-off-by: raver119 <raver119@gmail.com> * ... Signed-off-by: raver119 <raver119@gmail.com> * disable some debug messages Signed-off-by: raver119 <raver119@gmail.com> * master pulled in Signed-off-by: raver119 <raver119@gmail.com> * couple of new methods for DataBuffer interop Signed-off-by: raver119 <raver119@gmail.com> * java side Signed-off-by: raver119 <raver119@gmail.com> * offsetted constructor Signed-off-by: raver119 <raver119@gmail.com> * new CUDA deallocator Signed-off-by: raver119 <raver119@gmail.com> * CUDA backend torn apart Signed-off-by: raver119 <raver119@gmail.com> * CUDA backend torn apart 2 Signed-off-by: raver119 <raver119@gmail.com> * CUDA backend torn apart 3 Signed-off-by: raver119 <raver119@gmail.com> * - few new tests - few new methods for DataBuffer management Signed-off-by: raver119 <raver119@gmail.com> * few more tests + few more tweaks Signed-off-by: raver119 <raver119@gmail.com> * two failing tests Signed-off-by: raver119 <raver119@gmail.com> * one more test Signed-off-by: raver119 <raver119@gmail.com> * two failing tests pass Signed-off-by: raver119 <raver119@gmail.com> * now we pass DataBuffer to legacy ops too Signed-off-by: raver119 <raver119@gmail.com> * Native DataBuffer for legacy ops, Java side Signed-off-by: raver119 <raver119@gmail.com> * CPU java side update Signed-off-by: raver119 <raver119@gmail.com> * CUDA java side update Signed-off-by: raver119 <raver119@gmail.com> * no more prepare/register action on java side Signed-off-by: raver119 <raver119@gmail.com> * NDArray::prepare/register use now accepts vectors Signed-off-by: raver119 <raver119@gmail.com> * InteropDataBuffer now has few more convenience methods Signed-off-by: raver119 <raver119@gmail.com> * java bindings update Signed-off-by: raver119 <raver119@gmail.com> * tick device in NativeOps Signed-off-by: raver119 <raver119@gmail.com> * Corrected usage of OpaqueBuffer for tests. * Corrected usage of OpaqueBuffer for java tests. * NativeOpsTests fixes. * print_variable now returns scalar Signed-off-by: raver119 <raver119@gmail.com> * one more test Signed-off-by: raver119 <raver119@gmail.com> * compat_string_split fix for CUDA Signed-off-by: raver119 <raver119@gmail.com> * - CUDA execScalar fix - CUDA lazyAllocateHostPointer now checks java indexer/pointer instead of native pointer Signed-off-by: raver119 <raver119@gmail.com> * legacy ops DataBuffer migration prototype Signed-off-by: raver119 <raver119@gmail.com> * ignore device shapeinfo coming from java Signed-off-by: raver119 <raver119@gmail.com> * minor fix Signed-off-by: raver119 <raver119@gmail.com> * minor transformAny fix Signed-off-by: raver119 <raver119@gmail.com> * minor tweak for lazy host allocation Signed-off-by: raver119 <raver119@gmail.com> * - DataBuffer::memcpy method - bitcast now uses memcpy Signed-off-by: raver119 <raver119@gmail.com> * - IndexReduce CUDA dimension buffer fix Signed-off-by: raver119 <raver119@gmail.com> * views for CPU and CUDA Signed-off-by: raver119 <raver119@gmail.com> * less spam Signed-off-by: raver119 <raver119@gmail.com> * optional memory init Signed-off-by: raver119 <raver119@gmail.com> * async memset Signed-off-by: raver119 <raver119@gmail.com> * - SummaryStats CUDA fix - DataBuffer.sameUnderlyingData() impl - execBroadcast fix Signed-off-by: raver119 <raver119@gmail.com> * - reduce3All fix switch to CUDA 10 temporarily Signed-off-by: raver119 <raver119@gmail.com> * CUDA version Signed-off-by: raver119 <raver119@gmail.com> * proper memory deallocator registration Signed-off-by: raver119 <raver119@gmail.com> * HOST_ONLY workspace allocation Signed-off-by: raver119 <raver119@gmail.com> * temp commit Signed-off-by: raver119 <raver119@gmail.com> * few conflicts resolved Signed-off-by: raver119 <raver119@gmail.com> * few minor fixes Signed-off-by: raver119 <raver119@gmail.com> * one more minor fix Signed-off-by: raver119 <raver119@gmail.com> * NDArray permute should operate on JVM primitives Signed-off-by: raver119 <raver119@gmail.com> * - create InteropDataBuffer for shapes as well - update pointers after view creation in Java Signed-off-by: raver119 <raver119@gmail.com> * - addressPointer temporary moved to C++ Signed-off-by: raver119 <raver119@gmail.com> * CUDA: don't account offset twice Signed-off-by: raver119 <raver119@gmail.com> * CUDA: DataBuffer pointer constructor updated Signed-off-by: raver119 <raver119@gmail.com> * CUDA NDArray.unsafeDuplication() simplified Signed-off-by: raver119 <raver119@gmail.com> * CUDA minor workspace-related fixes Signed-off-by: raver119 <raver119@gmail.com> * CPU DataBuffer.reallocate() Signed-off-by: raver119 <raver119@gmail.com> * print_affinity op Signed-off-by: raver119 <raver119@gmail.com> * print_affinity java side Signed-off-by: raver119 <raver119@gmail.com> * CUDA more tweaks for data locality Signed-off-by: raver119 <raver119@gmail.com> * - compat_string_split tweak - CudaUtf8Buffer update Signed-off-by: raver119 <raver119@gmail.com> * INDArray.close() mechanic restored Signed-off-by: raver119 <raver119@gmail.com> * one more test fixed Signed-off-by: raver119 <raver119@gmail.com> * - CUDA DataBuffer.reallocate() updated - cudaMemcpy (synchronous) restored Signed-off-by: raver119 <raver119@gmail.com> * one last fix Signed-off-by: raver119 <raver119@gmail.com> * bad import removed Signed-off-by: raver119 <raver119@gmail.com> * another small fix Signed-off-by: raver119 <raver119@gmail.com> * one special test Signed-off-by: raver119 <raver119@gmail.com> * fix bad databuffer size Signed-off-by: raver119 <raver119@gmail.com> * release primaryBuffer on replace Signed-off-by: raver119 <raver119@gmail.com> * higher timeout Signed-off-by: raver119 <raver119@gmail.com> * disable timeouts Signed-off-by: raver119 <raver119@gmail.com> * dbCreateView now validates offset and length of a view Signed-off-by: raver119 <raver119@gmail.com> * additional validation for dbExpand Signed-off-by: raver119 <raver119@gmail.com> * restore timeout back again Signed-off-by: raver119 <raver119@gmail.com> * smaller distribution for rng test to prevent timeouts Signed-off-by: raver119 <raver119@gmail.com> * CUDA DataBuffer::memcpy now copies to device all the time Signed-off-by: raver119 <raver119@gmail.com> * OpaqueDataBuffer now contains all required methods for interop Signed-off-by: raver119 <raver119@gmail.com> * some javadoc Signed-off-by: raver119 <raver119@gmail.com> * GC on failed allocations Signed-off-by: raver119 <raver119@gmail.com> * minoe memcpu tweak Signed-off-by: raver119 <raver119@gmail.com> * one more bitcast test Signed-off-by: raver119 <raver119@gmail.com> * - NDArray::deviceId() propagation - special multi-threaded test for data locality checks Signed-off-by: raver119 <raver119@gmail.com> * DataBuffer additional syncStream Signed-off-by: raver119 <raver119@gmail.com> * DataBuffer additional syncStream Signed-off-by: raver119 <raver119@gmail.com> * one ignored test Signed-off-by: raver119 <raver119@gmail.com> * skip host alloc for empty arrays Signed-off-by: raver119 <raver119@gmail.com> * ByteBuffer support is back Signed-off-by: raver119 <raver119@gmail.com> * DataBuffer::memcpy minor fix Signed-off-by: raver119 <raver119@gmail.com> * few minor prelu/bp tweaks Signed-off-by: raver119 <raver119@gmail.com> * nullify-related fixes Signed-off-by: raver119 <raver119@gmail.com> * PReLU fixes (#157) Signed-off-by: Alex Black <blacka101@gmail.com> * Build fixed * Fix tests * one more ByteBuffer signature restored Signed-off-by: raver119 <raver119@gmail.com> * nd4j-jdbc-hsql profiles fix Signed-off-by: raver119 <raver119@gmail.com> * nd4j-jdbc-hsql profiles fix Signed-off-by: raver119 <raver119@gmail.com> * PReLU weight init fix Signed-off-by: Alex Black <blacka101@gmail.com> * Small PReLU fix Signed-off-by: Alex Black <blacka101@gmail.com> * - INDArray.migrate() reactivated - DataBuffer::setDeviceId(...) added - InteropDataBuffer Z syncToDevice added for views Signed-off-by: raver119 <raver119@gmail.com> * missed file Signed-off-by: raver119 <raver119@gmail.com> * Small tweak Signed-off-by: Alex Black <blacka101@gmail.com> * cuda 10.2 Signed-off-by: raver119 <raver119@gmail.com> * minor fix Signed-off-by: raver119 <raver119@gmail.com> Co-authored-by: shugeo <sgazeos@gmail.com> Co-authored-by: Alex Black <blacka101@gmail.com> Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>	2020-01-04 13:27:50 +03:00
Alex Black	29104083cc	Various fixes (#143 ) * #8568 ArrayUtil optimization Signed-off-by: AlexDBlack <blacka101@gmail.com> * #6171 Keras ReLU and ELU support Signed-off-by: AlexDBlack <blacka101@gmail.com> * Keras softmax layer import Signed-off-by: AlexDBlack <blacka101@gmail.com> * #8549 Webjars dependency management Signed-off-by: AlexDBlack <blacka101@gmail.com> * Fix for TF import names ':0' suffix issue / NPE Signed-off-by: AlexDBlack <blacka101@gmail.com> * BiasAdd: fix default data format for TF import Signed-off-by: AlexDBlack <blacka101@gmail.com> * Update zoo test ignores Signed-off-by: AlexDBlack <blacka101@gmail.com> * #8509 SameDiff Listener API - provide frame + iteration Signed-off-by: AlexDBlack <blacka101@gmail.com> * #8520 ND4J Environment Signed-off-by: AlexDBlack <blacka101@gmail.com> * Deconv3d Signed-off-by: AlexDBlack <blacka101@gmail.com> * Deconv3d fixes + gradient check Signed-off-by: AlexDBlack <blacka101@gmail.com> * Conv3d fixes + deconv3d DType test Signed-off-by: AlexDBlack <blacka101@gmail.com> * Fix issue with deconv3d gradinet check weight init Signed-off-by: AlexDBlack <blacka101@gmail.com> * #8579 Fix BaseCudaDataBuffer constructor fix for UINT16 Signed-off-by: AlexDBlack <blacka101@gmail.com> * DataType.isNumerical() returns false for BOOL type Signed-off-by: AlexDBlack <blacka101@gmail.com> * #8504 Reduce Spark log spam for tests Signed-off-by: AlexDBlack <blacka101@gmail.com> * Clean up DL4J gradient check test spam Signed-off-by: AlexDBlack <blacka101@gmail.com> * More Gradient check spam reduction Signed-off-by: AlexDBlack <blacka101@gmail.com> * SameDiff test spam reduction Signed-off-by: AlexDBlack <blacka101@gmail.com> * Fixes for FlatBuffers mapping Signed-off-by: AlexDBlack <blacka101@gmail.com> * SameDiff log spam cleanup Signed-off-by: AlexDBlack <blacka101@gmail.com> * Tests should extend BaseNd4jTest Signed-off-by: AlexDBlack <blacka101@gmail.com> * Remove debug line in c++ op Signed-off-by: AlexDBlack <blacka101@gmail.com> * ND4J test spam cleanup Signed-off-by: AlexDBlack <blacka101@gmail.com> * DL4J test spam reduction Signed-off-by: AlexDBlack <blacka101@gmail.com> * More Dl4J and datavec test spam cleanup Signed-off-by: AlexDBlack <blacka101@gmail.com> * Fix for bad conv3d test Signed-off-by: AlexDBlack <blacka101@gmail.com> * Additional test Signed-off-by: AlexDBlack <blacka101@gmail.com> * Embedding layers: don't inherit global default activation function Signed-off-by: AlexDBlack <blacka101@gmail.com> * Trigger CI Signed-off-by: AlexDBlack <blacka101@gmail.com> * Consolidate all BaseDL4JTest classes to single class used everywhere; make timeout configurable per class Signed-off-by: AlexDBlack <blacka101@gmail.com> * Test fixes and timeout increases Signed-off-by: AlexDBlack <blacka101@gmail.com> * Timeouts and PReLU fixes Signed-off-by: AlexDBlack <blacka101@gmail.com> * Restore libnd4j build threads arg for CUDA build Signed-off-by: AlexDBlack <blacka101@gmail.com> * Increase timeouts on a few tests to avoid spurious failures on some CI machines Signed-off-by: AlexDBlack <blacka101@gmail.com> * More timeout fixes Signed-off-by: AlexDBlack <blacka101@gmail.com> * More test timeout fixes Signed-off-by: AlexDBlack <blacka101@gmail.com> * Tweak timeout for one more test Signed-off-by: AlexDBlack <blacka101@gmail.com> * Final tweaks Signed-off-by: AlexDBlack <blacka101@gmail.com> * One more ignore Signed-off-by: AlexDBlack <blacka101@gmail.com>	2020-01-04 13:45:07 +11:00
shugeo	fbf7c9d38b	Fixed lu for cuda platform and tests. (#158 ) Signed-off-by: shugeo <sgazeos@gmail.com>	2020-01-02 23:25:41 +03:00
raver119	9b329d2601	[WIP] bias_add NHWC loop (#149 ) * one more test Signed-off-by: raver119 <raver119@gmail.com> * one more test Signed-off-by: raver119 <raver119@gmail.com> * one more test Signed-off-by: raver119 <raver119@gmail.com> * bias_add nhwc 4D Signed-off-by: raver119 <raver119@gmail.com> * bias_add nhwc 4D Signed-off-by: raver119 <raver119@gmail.com> * bias_add nhwc 4D Signed-off-by: raver119 <raver119@gmail.com> * bias_add nhwc 4D Signed-off-by: raver119 <raver119@gmail.com> * disable test Signed-off-by: raver119 <raver119@gmail.com>	2019-12-24 20:56:49 +03:00
Oleh	75123b0a4c	[WIP] Oleh rgb yuv (#147 ) * libnd4j: RgbToYuv and YuvToRgb, both implementations for both cpu and cuda. Need adding tests and review Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: RgbToYuv and YuvToRgb, replace coords method on Tad in both cpu and cuda, add tests, fixed bugs Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: RgbToYuv and YuvToRgb minor corrections Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: RgbToYuv and YuvToRgb corrections to use operations in-place	2019-12-24 18:30:54 +03:00
raver119	d1e5e79c10	[WIP] CUDA concat tweak (#148 ) * one special test Signed-off-by: raver119 <raver119@gmail.com> * one special test Signed-off-by: raver119 <raver119@gmail.com> * local memory for concat Signed-off-by: raver119 <raver119@gmail.com> * fixed grid size for concat Signed-off-by: raver119 <raver119@gmail.com> * fixed grid size for concat Signed-off-by: raver119 <raver119@gmail.com> * test commented out Signed-off-by: raver119 <raver119@gmail.com>	2019-12-24 17:01:03 +03:00
Abdelrauf	39d43ca170	RgbToYiq and YiqToRgb operations (#142 ) * RgbToYiq and YiqToRgb Signed-off-by: Abdelrauf <rauf@konduit.ai> * CUDA impl for RgbToYiq and YiqToRgb Signed-off-by: raver119 <raver119@gmail.com> * remove print Signed-off-by: raver119 <raver119@gmail.com> * allow inplace for hsv,rgb,yiq ops Signed-off-by: Abdelrauf <rauf@konduit.ai> Co-authored-by: raver119 <raver119@gmail.com>	2019-12-24 15:20:35 +03:00
Yurii Shyrma	5d9b2a16e5	Shyrma temp (#131 ) * - specifying template instantiation for certain types in float16 and bloat16 Signed-off-by: Yurii <iuriish@yahoo.com> * - polishing bfloat16 and float16 member functions template specialization Signed-off-by: Yurii <iuriish@yahoo.com> * - rewrite and overload array +-/ scalar and scalar +-/ arr in NDAray class Signed-off-by: Yurii <iuriish@yahoo.com> * - make corrections which have to do with and rvalue lvalue conversions Signed-off-by: Yurii <iuriish@yahoo.com> * - provide move semantic in NDArray operators array +-/* array Signed-off-by: Yurii <iuriish@yahoo.com> * float16/bfloat16 tweaks Signed-off-by: raver119 <raver119@gmail.com> * one more tweak Signed-off-by: raver119 <raver119@gmail.com> * - make float16 and bfloat16 to compile successfully on cuda Signed-off-by: Yurii <iuriish@yahoo.com> * - do not use resources of view-like arrays when move semantics is applied Signed-off-by: Yurii <iuriish@yahoo.com> * - get rid of pointers in signatures NDArray methods 1 Signed-off-by: Yurii <iuriish@yahoo.com> * - correction of signature of NDArray::dup method Signed-off-by: Yurii <iuriish@yahoo.com> * - correction of signature of NDArray::reduceAlongDimension method Signed-off-by: Yurii <iuriish@yahoo.com> * - signature correction of NDArray::applyIndexReduce and applyTrueBroadcast methods Signed-off-by: Yurii <iuriish@yahoo.com> * - signature correction of NDArray::applyReduce3 and varianceAlongDimension methods Signed-off-by: Yurii <iuriish@yahoo.com> * - signature correction of NDArray::tensorsAlongDimension and diagonal methods Signed-off-by: Yurii <iuriish@yahoo.com> * - signature correction of NDArray::allTensorsAlongDimension Signed-off-by: Yurii <iuriish@yahoo.com> * - signature correction of NDArray::reduceAlongDimension 2 Signed-off-by: Yurii <iuriish@yahoo.com> * - signature correction of NDArray::applyTransform 2 Signed-off-by: Yurii <iuriish@yahoo.com> * - signature correction of NDArray::applyPairwiseTransform 2 Signed-off-by: Yurii <iuriish@yahoo.com> * - signature correction of NDArray::applyBroadcast 2 Signed-off-by: Yurii <iuriish@yahoo.com> * - signature correction of NDArray::applyTrueBroadcast 2 Signed-off-by: Yurii <iuriish@yahoo.com> * - signature correction of NDArray::applyScalar and applyScalarArr Signed-off-by: Yurii <iuriish@yahoo.com> * - signature correction of NDArray::lambda methods Signed-off-by: Yurii <iuriish@yahoo.com> * - signature correction of NDArray::reduce3 methods 2 Signed-off-by: Yurii <iuriish@yahoo.com> * - signature correction of following NDArray methods: add/sub/mul/div row/column and fillAsTriangular Signed-off-by: Yurii <iuriish@yahoo.com> * - signature correction of NDArray::tileToShape methods Signed-off-by: Yurii <iuriish@yahoo.com> * - signature correction of NDArray::isShapeSameStrict method Signed-off-by: Yurii <iuriish@yahoo.com> * minor corrections in tests Signed-off-by: Yurii <iuriish@yahoo.com> * - replace reduce op in batchnorm mkldnn Signed-off-by: Yurii <iuriish@yahoo.com> * - add explicit templates instantiations for operator+(NDArray&&. const scalar) Signed-off-by: Yurii <iuriish@yahoo.com> * - corrections of casts in float16/bfloat16 Signed-off-by: Yurii <iuriish@yahoo.com> * - provide move semantics in following NDArray methods: transform, applyTrueBroadcast, transpose, reshape, permute Signed-off-by: Yurii <iuriish@yahoo.com> * - get rid of input array A duplicate in svd cuda op Signed-off-by: Yurii <iuriish@yahoo.com> * - avoid available bug in svd cuda API Signed-off-by: Yurii <iuriish@yahoo.com> * - add temporary global memory buffer in svd cuda when calcUV = false and m != n Signed-off-by: Yurii <iuriish@yahoo.com> * - remove test with blfoat16 type for betainC Signed-off-by: Yurii <iuriish@yahoo.com> * - resolve conflicts after master has been merged in Signed-off-by: Yurii <iuriish@yahoo.com> * - changed type of affected input array in fused_batch_norm Signed-off-by: Yurii <iuriish@yahoo.com> * - add several explicit type castings Signed-off-by: Yurii <iuriish@yahoo.com> * - add ND4J_EXPORT to operators Signed-off-by: Yurii <iuriish@yahoo.com> * - add explicit template types in instantiations of template arithm operators of NDArray class Signed-off-by: Yurii <iuriish@yahoo.com> * - one more test fix Signed-off-by: Yurii <iuriish@yahoo.com> Co-authored-by: raver119 <raver119@gmail.com>	2019-12-20 22:35:39 +03:00
Oleh	211c0df76f	Oleh rgb to gray scale (#138 ) * libnd4j: RgbToGrayscale op #8536 - raw implementation in user branch, need checks for integration and adding other orders Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: RgbToGrayscale op #8536 next step of merging images Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: RgbToGrayscale op #8536, Revert merge of hsv_to_rgb and rgb_to_hsv as cause conflicts in naming need refactoring before merge, implementation of rbg_to_grs added * libnd4j: RgbToGrayscale op #8536 imlementation and conflict resolve * libnd4j: RgbToGrayscale op #8536 merged operations with images into image, renamed methods and files * libnd4j: RgbToGrayscale op #8536 added test for rgbToGrayScale, need clarification and fixed tests case run Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j: RgbToGrayscale op #8536 bug fixing and need review * libnd4j: RgbToGrayscale op #8536 some additional corrections after review Signed-off-by: Oleg <oleg.semeniv@gmail.com> * - minor corrections in rgbToGrs test1 Signed-off-by: Yurii <iuriish@yahoo.com> * libnd4j: RgbToGrayscale op #8536, corrected tests and rbf_to_grs, fixed problems, refactoring, need review * libnd4j: RgbToGrayscale op #8536 fix for 'f' order in rgbToGrs * libnd4j: RgbToGrayscale op #8536 fixed several bugs with dimC, test case refactoring and improve Signed-off-by: Oleg <oleg.semeniv@gmail.com> * - add cuda kernel for rgbToGrs op Signed-off-by: Yurii <iuriish@yahoo.com> * - fix linkage errors Signed-off-by: Yurii <iuriish@yahoo.com> Co-authored-by: Yurii Shyrma <iuriish@yahoo.com>	2019-12-20 20:59:29 +03:00
shugeo	67d8199165	[WIP] Shugeo lup (#126 ) * Added infrastructure for implementation op lu for both cuda and cpu platforms. * Added implementation of helpers with lu op. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored LU decomposition to use vector of permutations instead. * Refactored helpers for lu op. * Fixed crash with determinant op. * Refactored cpu LU op heleper. * Added implementation for lu op. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed issue with argmax on column. * Added multithreaded behaviour for lu op helper. * Fixed multithreaded cpu implementation helpers for lu op. * Added cuda implementation for lu op helper. * Finished lu helper implementation for cuda platform. * Eliminated waste prints and comments. * Fixed race condition and multithreading issues. * Fixed memory leak with shape construction. * Corrected test for lu op to avoid near zero elements on the main diagonal." Signed-off-by: shugeo <sgazeos@gmail.com> * Improved test for adjust_constast op. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed issues with cuda implementation of resize_bicubic helpers. Signed-off-by: shugeo <sgazeos@gmail.com>	2019-12-20 17:56:28 +03:00
Abdelrauf	3c9a2a5cd9	Fix for hsv and rgb ranges (#136 ) Signed-off-by: Abdelrauf <rauf@konduit.ai>	2019-12-20 08:48:30 +03:00
Alexander Stoyakin	f5068f3980	Added missing Java ops wrappers (#122 ) * Timeouts added * Added some ops * Ops added * Fixed tests * Minor fix * Some fixes * Digamma added * Small fixes * Timeouts added * Added some ops * Ops added * Fixed tests * Minor fix * Some fixes * Digamma added * Small fixes * Fused batch norm fixes- Signed-off-by: AlexDBlack <blacka101@gmail.com> * Tests switched off. * Added test for resize_bicubic. * Eliminated wasted in test of bicubic resize. * Switched off multithreading explicit. * HsvToRgb and RgbToHsv added * Eliminated waste comments and conform proper float constants. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed multithreading with resize_bicubic helper for cpu platform. Signed-off-by: shugeo <sgazeos@gmail.com> * ResizeBicubic was fixed. * Some fixes * Fix op name * Validation fixed. * Clarifications for tests * Wrappers and small fixes for new ops.	2019-12-19 20:15:48 +11:00
Abdelrauf	e0a9cb6c08	[WIP] HSV,RGB color model conversions (#125 ) * CUDA implementation for hsv_to_rgb and rgb_to_hsv Signed-off-by: raver119 <raver119@gmail.com> * hsv_to_rgb and rgb_to_hsv operations Test coverage: c order 1d, 2d, 3d array Signed-off-by: Abdelrauf <rauf@konduit.ai> * Index check Signed-off-by: Abdelrauf <rauf@konduit.ai> * Suppress Msvc floating point errors Signed-off-by: Abdelrauf <rauf@konduit.ai> * Added Index Check for adjust_saturation and adjust_hue Signed-off-by: Abdelrauf <rauf@konduit.ai> * minor fix Signed-off-by: raver119 <raver119@gmail.com> * Fixes missed Msvc floating narrowing errors Signed-off-by: Abdelrauf <rauf@konduit.ai>	2019-12-17 09:42:09 +03:00
AlexDBlack	0df1b46c8c	Merge	2019-12-10 15:08:50 +11:00
raver119	b32dd1bf92	[WIP] resize_bicubic types (#116 ) * resize_bicubic: allow more dtypes Signed-off-by: raver119 <raver119@gmail.com> * resize_bicubic: allow less dtypes Signed-off-by: raver119 <raver119@gmail.com> * Refactored resize_bicubic op to full conform with TF1.5 and tests. * Corrected test to proper data type output. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected double input test to float constant outputs. Signed-off-by: shugeo <sgazeos@gmail.com> * Finished with correction of tests for bicubic interpolated resizes expected. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed adjust_contrast ops to allow non-RGB inputs. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored adjust_contrast_v2 to conform with TF one. Signed-off-by: shugeo <sgazeos@gmail.com> * AdjustContrast tests activated * two typos fixed Signed-off-by: raver119 <raver119@gmail.com>	2019-12-06 18:58:37 +03:00
raver119	972fae60dc	Update master (#8511 ) * cleaned up bert iterator tests (#110) Signed-off-by: eraly <susan.eraly@gmail.com> * Various pre-release fixes (#111) * Various fixes Signed-off-by: AlexDBlack <blacka101@gmail.com> * Fix default dtypes for MaxPoolWithArgmax Signed-off-by: AlexDBlack <blacka101@gmail.com> * Small pre-release tweak (#112) * Log UI address on launch as in previous Play-based UI Signed-off-by: AlexDBlack <blacka101@gmail.com> * Logging level tweak for UI Signed-off-by: AlexDBlack <blacka101@gmail.com> * http not https Signed-off-by: AlexDBlack <blacka101@gmail.com> * datavec python ensure host (#113) * ensure host * one more host ensure * info->debug * [WIP] reverse improvements (#115) * initial commit Signed-off-by: raver119 <raver119@gmail.com> * reverse draft Signed-off-by: raver119 <raver119@gmail.com> * reverse kernel Signed-off-by: raver119 <raver119@gmail.com> * reverse kernel Signed-off-by: raver119 <raver119@gmail.com> * 2 micro fixes Signed-off-by: raver119 <raver119@gmail.com> * Shugeo resize fix5 (#102) * Refactored resize images ops to use TF-like bool args as input. * Refactored helpers for cpu implementation of resize_bilinear and resize_nearest_neighbor ops. * Refactored cuda implementation for image.resize_bilinear and image.resize_nearest_neighbor ops helpers. * Refactored nearest_neighbor resize op. * Added a pair of tests for special case of resize_bilinear algorithm. * Fixed issue with resize_bilinear op. * Refactored cpu implementation for helpers with resize_nearest_neighbor op. * Final fixed for resize ops to conform TF v.1.5 * Refactored cuda helpers for resize_neares_neighbor op. * Fixed resize_bilinear to accept proper data. * Fixed issue with non-float input for resize_bilinear op. * Refactored cuda helper for resize_bilinear to proper process non-float inputs. * Added tests for resize_bilinear to int inputs. * Fixed ResizeBilinear wrapper * Tests fixed * Fixed float and bool constant to avoid overflow for some kind of compilers. * Corrected float constants with float data type. * Added f suffix for float constants. * Corrected float constant to avoid overflow with initializing lists. * Corrected float initializing list with float input. * Corrected bool constant with initalizing list. * Corrected float and bool values with initializing lists. * Fixed wrong constant. * Fixed issue with 1x1 input picture for resize. * ResizeBilinear default values on import fix Signed-off-by: raver119 <raver119@gmail.com>	2019-12-06 11:10:44 +03:00
shugeo	e09a785232	Shugeo resize fix5 (#102 ) * Refactored resize images ops to use TF-like bool args as input. * Refactored helpers for cpu implementation of resize_bilinear and resize_nearest_neighbor ops. * Refactored cuda implementation for image.resize_bilinear and image.resize_nearest_neighbor ops helpers. * Refactored nearest_neighbor resize op. * Added a pair of tests for special case of resize_bilinear algorithm. * Fixed issue with resize_bilinear op. * Refactored cpu implementation for helpers with resize_nearest_neighbor op. * Final fixed for resize ops to conform TF v.1.5 * Refactored cuda helpers for resize_neares_neighbor op. * Fixed resize_bilinear to accept proper data. * Fixed issue with non-float input for resize_bilinear op. * Refactored cuda helper for resize_bilinear to proper process non-float inputs. * Added tests for resize_bilinear to int inputs. * Fixed ResizeBilinear wrapper * Tests fixed * Fixed float and bool constant to avoid overflow for some kind of compilers. * Corrected float constants with float data type. * Added f suffix for float constants. * Corrected float constant to avoid overflow with initializing lists. * Corrected float initializing list with float input. * Corrected bool constant with initalizing list. * Corrected float and bool values with initializing lists. * Fixed wrong constant. * Fixed issue with 1x1 input picture for resize. * ResizeBilinear default values on import fix Signed-off-by: raver119 <raver119@gmail.com>	2019-12-05 22:05:33 +03:00
raver119	355c6b6096	[WIP] reverse improvements (#115 ) * initial commit Signed-off-by: raver119 <raver119@gmail.com> * reverse draft Signed-off-by: raver119 <raver119@gmail.com> * reverse kernel Signed-off-by: raver119 <raver119@gmail.com> * reverse kernel Signed-off-by: raver119 <raver119@gmail.com>	2019-12-05 20:03:10 +03:00
Alex Black	578a5abb68	DNNL/MKLDNN dilated causal conv1d + betainc (#103 ) * - add padding calculation in same mode in causal conv1d op for right mkl paddings Signed-off-by: Yurii <iuriish@yahoo.com> * - correct causal condition in mkldnnUtils.cpp Signed-off-by: Yurii <iuriish@yahoo.com> * - correct some code which caused additional round errors is betainc op Signed-off-by: Yurii <iuriish@yahoo.com> * - put float in place of template parameter in nan assign in betainc op Signed-off-by: Yurii <iuriish@yahoo.com>	2019-12-04 14:50:17 +03:00
Yurii Shyrma	1f5e15b541	Shyrma adjust (#98 ) * - add possibility of passing scalar-array as input parameter for scale factor in adjust hue/contrast/saturation ops - correct typo in function which calculates regularized incomplete beta integral Signed-off-by: Yurii <iuriish@yahoo.com> * - fix bug in betainc cuda kernel Signed-off-by: Yurii <iuriish@yahoo.com> * - start working on implementation of digamma function Signed-off-by: Yurii <iuriish@yahoo.com> * - further work on digamma function (cpu) Signed-off-by: Yurii <iuriish@yahoo.com> * - testing and fixing bugs in digamma op Signed-off-by: Yurii <iuriish@yahoo.com> * - make correction n cuda kernel for polyGamma Signed-off-by: Yurii <iuriish@yahoo.com> * - remove unnecessary stuff from betaInc cuda kernel Signed-off-by: Yurii <iuriish@yahoo.com> * - resolve conflicts in DeclarableOpsTests3.cpp after master branch has been merged Signed-off-by: Yurii <iuriish@yahoo.com> * - restore id number of Not opertion in legacy_ops.h Signed-off-by: Yurii <iuriish@yahoo.com> * - correct padding calculation in mkl dnn conv1d causal Signed-off-by: Yurii <iuriish@yahoo.com> * restore empty check in adjust_contrast_v2 Signed-off-by: raver119 <raver119@gmail.com>	2019-12-03 09:40:45 +03:00
shugeo	1e9ff114aa	Shugeo atomic tests (#97 ) * Added atomic tests for atomicAdd, atomicSub and atomicDiv. * Fixed atomicAdd for 16bit ints. * Fixed atomicMul for 16 floats. * Eliminated waste prints. * Fixed problems with double type on matrix inverse helepers. * Eliminated commented wrong code. * Refactored atomicMul for 16bit types. * few more minor tweaks Signed-off-by: raver119 <raver119@gmail.com> * Fixed fake_quant_with_min_max_vars_per_channel args processing.	2019-12-02 21:40:54 +03:00
raver119	25b3cd9b80	[WIP] CUDA tests (#95 ) * one more CI test Signed-off-by: raver119 <raver119@gmail.com> * export additional symbols Signed-off-by: raver119 <raver119@gmail.com> * few more tweaks Signed-off-by: raver119 <raver119@gmail.com> * one more tweak for linux Signed-off-by: raver119 <raver119@gmail.com> * fix dtype in few tests Signed-off-by: raver119 <raver119@gmail.com> * missing sync and memset in couple of tests Signed-off-by: raver119 <raver119@gmail.com> * copy step for libnd4j cuda Signed-off-by: raver119 <raver119@gmail.com> * no-op on empty for adjust hue/contrast/saturation Signed-off-by: raver119 <raver119@gmail.com> * CUDA_VERBOSE Off Signed-off-by: raver119 <raver119@gmail.com> * BroadcastBool fix + few tests Signed-off-by: raver119 <raver119@gmail.com> * trigger jenkins Signed-off-by: raver119 <raver119@gmail.com> * trigger jenkins Signed-off-by: raver119 <raver119@gmail.com> * - ignore couple of warnings - remove redundant compiler options Signed-off-by: raver119 <raver119@gmail.com>	2019-12-02 21:37:21 +03:00
Yurii Shyrma	d19eeaec52	Shyrma casual conv1d (#90 ) * - add causal mode of padding to convolutions Signed-off-by: Yurii <iuriish@yahoo.com> * - add additional tests for causal conv1d Signed-off-by: Yurii <iuriish@yahoo.com> * - add causal mode for cuda conv kernels Signed-off-by: Yurii <iuriish@yahoo.com> * Java side of Conv1D changes Signed-off-by: raver119 <raver119@gmail.com> * Add Conv1DDerivative op Signed-off-by: Alex Black <blacka101@gmail.com> * Causal Conv1D gradient checks Signed-off-by: Alex Black <blacka101@gmail.com> * Tweaks Signed-off-by: Alex Black <blacka101@gmail.com> * - add causal padding mode to conv2d_bp Signed-off-by: Yurii <iuriish@yahoo.com> * More thorough causal conv1d tests Signed-off-by: Alex Black <blacka101@gmail.com>	2019-11-29 14:14:30 +03:00
shugeo	009007120b	Shugeo_release_fixes3 (#81 ) * Implementation for non_max_suppression_v3 was added. Initial version * Added check for overcome threshold. * Added definition for V3 method. * java remapping for NonMaxSuppressionV3 Signed-off-by: raver119 <raver119@gmail.com> * Fixed proporly processing of an empty output and test. * Refactored op to less threshold data to float. * Implemented cuda-based helper for non_max_suppression_v3 op. * Fixed fake_quant_with_min_max_vars op. * Fixed tests with float numbers. * - assert now stops execution - sortByKey/sortByValue now have input validation Signed-off-by: raver119 <raver119@gmail.com> * missing var Signed-off-by: raver119 <raver119@gmail.com> * Fixed proper processing for zero max_size inputs. * Refactored kernel callers. * Fixed return statement for logdet op helper. * Refactored unsorted segment SqrtN op. * get back 8 tail bytes on CUDA Signed-off-by: raver119 <raver119@gmail.com> * Refactored segment prod ops and helpers for cuda and tests. * Additional test. * CudaWorkspace tests updated for 8 tail bytes Signed-off-by: raver119 <raver119@gmail.com> * special atomic test Signed-off-by: raver119 <raver119@gmail.com> * atomicMul/atomicDiv fix for 16bit values Signed-off-by: raver119 <raver119@gmail.com> * Eliminated waste prints.	2019-11-28 21:08:51 +03:00
Yurii Shyrma	a8dd6713aa	Shyrma scatter (#84 ) * - improve performance of scatter (no lock) ops for 1D case Signed-off-by: Yurii <iuriish@yahoo.com> * - improve scatter lock op performance for 1D case Signed-off-by: Yurii <iuriish@yahoo.com> * - add kernel for verification of input indices-array elements in scatter and scatter_nd ops Signed-off-by: Yurii <iuriish@yahoo.com> * - provide fast indices checking on cpu side for scatter and gather osp Signed-off-by: Yurii <iuriish@yahoo.com> * - apply corrections requested by pr reviewer Signed-off-by: Yurii <iuriish@yahoo.com>	2019-11-26 20:29:09 +03:00
shugeo	4187190609	Shugeo release fix2 (#70 ) * Corrected input checking and tests for bitcast op. * Fixed an issue with non_max_suppression form generation and processing with score threshold given. * Fixed bilinear resize kernel and tests. * push for Serhii Signed-off-by: raver119 <raver119@gmail.com> * Added test for nearest_neighbor resize with int input. * Added data type check for input/output match. * Eliminate error in macros. * Improved output message for type checking. * Fixed input/output types for op. * Eliminated waste logging. * Refactored resize_bilinear helper for multithreading for cpu platform. * Cosmetic changes only. * Fixed error for string substitution. * Skip test for cbow_batch with cuda. * fix for resizeNearestNeighbor output dtype Signed-off-by: raver119 <raver119@gmail.com> * Refactored non_max_suppression helper. * Refactored shape generation and input handling. * Added additional test.	2019-11-22 22:42:44 +03:00
Yurii Shyrma	7a90a31cfb	Shyrma deconv3 (#69 ) * - profiling cuda kernels for vol2col and im2col Signed-off-by: Yurii <iuriish@yahoo.com> * - correct addBias helper Signed-off-by: Yurii <iuriish@yahoo.com> * - correct mkl dilation formula and switch off mkl api for dilation deconvolutions Signed-off-by: Yurii <iuriish@yahoo.com>	2019-11-21 21:17:30 +02:00
shugeo	dc0036f2c6	Shugeo image resize bicubic (#56 ) * Added implementation files for image_resize and resize_bicubic ops. * Image resize and image.resize_bicubic ops implementation. Initial revision. * Finished with infrastructure development for image.resize_bilinear op and image_resizo op implementation. * Refactored resize methods. * Added processing for Mitchelcubic algorithm. * Added check for input/output sizes. * Added int and float types for crop_and_resize op. * Refactored crop_and_resize output type check. * Added helper for bicubic interpolation as TF v.1 does. * Added TF v.1 bicubic helper for cuda platform. * Added cached class for bicubic algorithm. * Refactored cuda implementation for crop_and_resize helper to use proper output type. * Added facilities for bicubic interpolation. * Portion bicubic interpolation from TF. * Added tests for resize_bilinear testing. * Working implementation of bicubic interpolation and tests. * Refactored routines with image_resize bicubic op helper. * Refactored code with coding standards. * Refactored cpu helpers for resize_bicubic op. * Refactored bicubic helpers. * Added bicubic resize facilities. * Implementing cuda kernels for bicubic interpolation. Implementation step. * Cuda implementation of resize_bicubic op helper. * Refactor image.resize_bicubic op helpers. * Refactored helpers for resize_bicubic. Added error checking with cuda implementation. * Refactored cuda implementation of resize_bicubic op helper. The first working revision. * Cuda arch implementation for resize_bicubic op helper. Full working single-threaded revision. * Intermediate bicubic interpolation helper for cuda. * Refactored cpu helper for resize_bicubic. * Multithreaded cuda implementation for resize_bicubic. * Fixed merge issues. * Refactored nlp helpers. * Replicated resize_bicubic for 3D also. * Eliminated waste comments of unused code. * Eliminated waste comments with unused code. * Eliminated waste template definitions. * Eliminated waste debug code. * Eliminated waste comments. * Fixed multithreading with helpers. * Fixed test suites for float and double in float point input lists. * Fixed usage of reshape with 3D/4D on resizes. * Final fixes. * Fixed resize_neighbor op problem.	2019-11-20 21:11:04 +02:00
shugeo	13e5c0a280	Shugeo release fix1 (#61 ) * Added a pair of tests for failed ops. * Fixed cpu helper for draw_bounding_boxes op. * Refactored implementation of draw_bounding_boxes op to full conform with TF. * Improved multithreading with draw_bounding_boxes op cuda helper. * Eliminated log messages. * Changed logging with draw_bounding_boxes op helper and tests. * Resize_biliear with 3D input allowed. * Refactored 3D input acception with resize_bilinear op. * And another improvement. * Refactored reshape of input/output for resize_bilinear. * Improvements final. * Finished with 3D replication for image.resize_bilinear/_nearest_neighbor. * Added copyrights for TF code. * Using new form of multithreading for cpu implementation. * Fixed shape error. * Added multithreaded with batches on crop_and_resize functor. * Refactored multithreading with crop_and_resize and draw_bounding_boxes.	2019-11-20 13:37:48 +02:00
Yurii Shyrma	66b84b38cf	Shyrma mmul (#58 ) * - get rid of some copy procedures in mmulHelper ops Signed-off-by: Yurii <iuriish@yahoo.com> * - further work on embedding cuda api for batched gemm (cublasGemmBatchedEx) in our mmulHelper class Signed-off-by: Yurii <iuriish@yahoo.com> * - further work on cuda batched gamm api Signed-off-by: Yurii <iuriish@yahoo.com> * - write own cuda kernel performing batched gemm Signed-off-by: Yurii <iuriish@yahoo.com> * missing include in MmulHelper Signed-off-by: raver119 <raver119@gmail.com> * - forgot to keep in code previous correct kernels for mmulNxN, since it may happen that new onw will fail for some reason in future Signed-off-by: Yurii <iuriish@yahoo.com> * disable old tensordot Signed-off-by: raver119 <raver119@gmail.com> * - rewrite cuda kernels for usualGemm and usualGemv Signed-off-by: Yurii <iuriish@yahoo.com> * - profiling mmul helpers Signed-off-by: Yurii <iuriish@yahoo.com> * - prints to check shapes were added Signed-off-by: Yurii <iuriish@yahoo.com> * - correct type of output array Cin mmulNxN Signed-off-by: Yurii <iuriish@yahoo.com> * - take into account possible nans in C array Signed-off-by: Yurii <iuriish@yahoo.com> * slightly change numThreads message Signed-off-by: raver119 <raver119@gmail.com> * - make corrections in accordance to given notes in pr review Signed-off-by: Yurii <iuriish@yahoo.com>	2019-11-19 15:39:36 +02:00
Alex Black	da1944e8e1	SameDiff TF import (#49 ) * Added implementation files for image_resize and resize_bicubic ops. * Image resize and image.resize_bicubic ops implementation. Initial revision. * Minor fix * Some TF imports disabled. * Finished with infrastructure development for image.resize_bilinear op and image_resizo op implementation. * Refactored resize methods. * Added processing for Mitchelcubic algorithm. * adjust_contrast * Small fix for TF import expected value loading when variable name starts with the test name Signed-off-by: AlexDBlack <blacka101@gmail.com> * Tests * Tests added. * Removed tf names absent in mapping. * Some fixes. * Small fixes * Minor change * Some failing tests. * Disable failed test * Ignore some tests * Fix import class mapping Signed-off-by: AlexDBlack <blacka101@gmail.com> * Fix float property mapping (flatbuffers) Signed-off-by: AlexDBlack <blacka101@gmail.com> * Override equality function for model 'dropout' Signed-off-by: AlexDBlack <blacka101@gmail.com> * Fail tests * Failed tests ignored temporarily. * Minor fixes * Small fix * Conflict resolved * Default implementations of tensorflowName and onnxName	2019-11-19 22:44:29 +11:00
Yurii Shyrma	62d8e0d409	- make agreement between our and mkl api dilation/padding formulas (#47 ) Signed-off-by: Yurii <iuriish@yahoo.com>	2019-11-14 20:21:22 +03:00
Alex Black	47d19908f4	Various fixes (#43 ) * #8172 Enable DL4J MKLDNN batch norm backward pass Signed-off-by: AlexDBlack <blacka101@gmail.com> * #8382 INDArray.toString() rank 1 brackets / ambiguity fix Signed-off-by: AlexDBlack <blacka101@gmail.com> * #8308 Fix handful of broken links (inc. some in errors) Signed-off-by: AlexDBlack <blacka101@gmail.com> * Unused dependencies, round 1 Signed-off-by: AlexDBlack <blacka101@gmail.com> * Unused dependencies, round 2 Signed-off-by: AlexDBlack <blacka101@gmail.com> * Unused dependencies, round 3 Signed-off-by: AlexDBlack <blacka101@gmail.com> * Small fix Signed-off-by: AlexDBlack <blacka101@gmail.com> * Uniform distribution TF import fix Signed-off-by: AlexDBlack <blacka101@gmail.com>	2019-11-14 19:38:20 +11:00
raver119	48df1acdfb	[WIP] ThreadPool (#8 ) This PR removes OpenMP use in 95% of cases	2019-11-13 17:04:59 +03:00
raver119	929c1dc5c7	- new NDArrayFactory scalar constructor - minor tweak in randomuniform - one more test Signed-off-by: raver119 <raver119@gmail.com>	2019-11-08 08:49:41 +03:00

1 2 3 4 5

207 Commits