cavis

Author	SHA1	Message	Date
raver119	784a2d13f8	separate omp impl for softmax (#289 ) Signed-off-by: raver119 <raver119@gmail.com>	2020-03-05 11:14:22 +03:00
raver119	3bb22a6ff8	strided_slice without view (#288 ) Signed-off-by: raver119 <raver119@gmail.com>	2020-03-05 09:56:52 +03:00
raver119	ca96a13ed0	softmax as standalone compilation unit Signed-off-by: raver119 <raver119@gmail.com>	2020-03-05 08:45:10 +03:00
Oleh	4d81af9fe9	Softmax operation implementation for mkldnn (#286 ) * libnd4j first step of softmax mkldnn implementation Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j raw implementation of mkldnn softmax Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j merge master and added softmax to MklDnnTests Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j some corrections for softmax mkldnn Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j merge branch, fixed problem with negative axis, fixed dnnl::memory::format_tag selection, test cases added Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j minor corrections to avoid risk connected with negative axis usage Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j fixed windows builds, added switcher to use mkldnn sofmax version only for 3D, 4D, 5D, 6D arrays Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j fixed dataType selection per request Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j fix for mac and windows builds Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j builds fix Signed-off-by: Oleg <oleg.semeniv@gmail.com>	2020-03-04 19:36:42 +03:00
raver119	f990b2486d	simplified addBias2D for CUDA (#285 ) Signed-off-by: raver119 <raver119@gmail.com>	2020-03-04 09:50:55 +03:00
Yurii Shyrma	78934c17ad	profiling of stack and unstack ops (#261 ) * - profiling of stack and unstack ops Signed-off-by: Yurii <iuriish@yahoo.com> * - fix bug in cpu concat op Signed-off-by: Yurii <iuriish@yahoo.com> * - correction of cuda stack and unstack Signed-off-by: Yurii <iuriish@yahoo.com> * - change shape.h method which operates with unity dimensions strides Signed-off-by: Yurii <iuriish@yahoo.com> * - rearrange stack tests Signed-off-by: Yurii <iuriish@yahoo.com> * - correct evaluation of smallest stride for moving through contiguous axis Signed-off-by: Yurii <iuriish@yahoo.com> * - forgot to update signature of function strideOverContigAxis in cuda concat and split ops Signed-off-by: Yurii <iuriish@yahoo.com> * - remove ShapeUtils::shapeAsString method applied before input arrays validations Signed-off-by: Yurii <iuriish@yahoo.com> * - further removing of ShapeUtils::shapeAsString Signed-off-by: Yurii <iuriish@yahoo.com> * - take sub-array shapeIndo/offset calculation out of NDArray class - add possibility of contiguous memory copy in execTransformAny op if opNum == assign Signed-off-by: Yurii <iuriish@yahoo.com> * - correct test_empty_scatter_2 in EmptyTests.cpp Signed-off-by: Yurii <iuriish@yahoo.com> * - profiling of slice op Signed-off-by: Yurii <iuriish@yahoo.com> * - get rid of contiguous memcpy for some cases in concat and split ops Signed-off-by: Yurii <iuriish@yahoo.com> * - forgot to declare oid nd4j::SpecialMethods<T>::splitCpuGeneric Signed-off-by: Yurii <iuriish@yahoo.com> * - correct typo in calculation of threads in cuda split op Signed-off-by: Yurii <iuriish@yahoo.com> * - forgot to correct another set of threads variables in split cuda ops Signed-off-by: Yurii <iuriish@yahoo.com> * - further conflicts resolving Signed-off-by: Yurii <iuriish@yahoo.com> Co-authored-by: raver119 <raver119@gmail.com>	2020-03-03 07:32:37 +03:00
raver119	63fa3c2ef3	libnd4j polishing (#273 ) * initial set of include changes Signed-off-by: raver119 <raver119@gmail.com> * one more tweak Signed-off-by: raver119 <raver119@gmail.com> * few more rearrangements Signed-off-by: raver119 <raver119@gmail.com> * few more rearrangements Signed-off-by: raver119 <raver119@gmail.com> * few more rearrangements Signed-off-by: raver119 <raver119@gmail.com> * cuda includes rearrangements Signed-off-by: raver119 <raver119@gmail.com> * java update Signed-off-by: raver119 <raver119@gmail.com> * = namespace changed to sd - few CMake variables renamed with SD_ prefix Signed-off-by: raver119 <raver119@gmail.com> * java update Signed-off-by: raver119 <raver119@gmail.com> * LoopKind minor fix Signed-off-by: raver119 <raver119@gmail.com> * few more changes Signed-off-by: raver119 <raver119@gmail.com> * few more changes Signed-off-by: raver119 <raver119@gmail.com> * few more changes Signed-off-by: raver119 <raver119@gmail.com> * sanitizer is optional now Signed-off-by: raver119 <raver119@gmail.com> * dev tests updated Signed-off-by: raver119 <raver119@gmail.com> * few more changes Signed-off-by: raver119 <raver119@gmail.com> * last update Signed-off-by: raver119 <raver119@gmail.com> * java update Signed-off-by: raver119 <raver119@gmail.com>	2020-03-02 12:49:41 +03:00
Oleh	f116f53d61	Loops auto-vectorization problem fix (#277 ) * libnd4j cast loop types Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j more type castination added to loops Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j sync casting types of iterated variable in loops Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j more loops reviewed for vectorization problem fix Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j fixed several typos Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j several more files reviewed to fix auto-vectorization problem in loops Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j merge master and reviewed more files to fix auto-vectorization problem in loops Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j several type casting added in broadcasting that were missed, fixed mac builds Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j double check all files and fix several more places in loops Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j fixed builds Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j revert changes for lup.cpp Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j more files reviewed for auto-vectorization problem fix Signed-off-by: Oleg <oleg.semeniv@gmail.com>	2020-02-28 17:04:45 +03:00
raver119	5332ace32b	better inplace exec with FastPath (#280 ) Signed-off-by: raver119 <raver119@gmail.com>	2020-02-28 12:06:30 +03:00
shugeo	330a69d4e2	Shugeo solve ls (#203 ) * lstsq op. Initial commit. Signed-off-by: shugeo <sgazeos@gmail.com> * Least squares linear problem solve op (lstsq). Cpu draft implementation. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed shape routine and tests. Signed-off-by: shugeo <sgazeos@gmail.com> * Added test for lstsq op. Signed-off-by: shugeo <sgazeos@gmail.com> * Rectification for lstsq op implementation. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected test to avoid numerical inconsistensy. Signed-off-by: shugeo <sgazeos@gmail.com> * Added prints for check computing. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected tests to use evalueate facility instead. Signed-off-by: shugeo <sgazeos@gmail.com> * CPU implementation of MatrixSolveLs op and tests. Signed-off-by: shugeo <sgazeos@gmail.com> * Added cuda implementation for helpers with lstsq op. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored tests for lstsq op. Signed-off-by: shugeo <sgazeos@gmail.com> * Added processing for empty inputs. Signed-off-by: shugeo <sgazeos@gmail.com> * Merged tests. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored lstsq op for fast case. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed test. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored lstsq op. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed some issues with solve. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed lstsq op to avoid erros. Signed-off-by: shugeo <sgazeos@gmail.com> * Added kernel for giagonal factor Signed-off-by: shugeo <sgazeos@gmail.com> * lstsq wrapper and triangular_solve fixed * Added proper processing empty inputs and test. Signed-off-by: shugeo <sgazeos@gmail.com> * SequenceMask test * Build fixed * Added proper processing of empty inputs with solve op. Signed-off-by: shugeo <sgazeos@gmail.com> * Mapping added * Added check of input shapes with solve op. Signed-off-by: shugeo <sgazeos@gmail.com> * Added a couple of tests for lstsq op and minor changes with cuda helper for one.' Signed-off-by: shugeo <sgazeos@gmail.com> * Tests on * Refactored test for lstsq op. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed test * Added another approach for lstsq op aka solve_ls. Signed-off-by: shugeo <sgazeos@gmail.com> * Finished cpu part for solve_ls op helpers. * Added helper for low triangular matrix inversion. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored alternate solve_ls cpu implementation. Signed-off-by: shugeo <sgazeos@gmail.com> * Removed alternate approach for solve_ls op. Added multithreading with matrix inversion. Signed-off-by: shugeo <sgazeos@gmail.com> * Assert fixed * Refactored multithreading for inverse matricies. Signed-off-by: shugeo <sgazeos@gmail.com> Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>	2020-02-28 11:37:26 +03:00
raver119	358c650b62	one micro fix Signed-off-by: raver119 <raver119@gmail.com>	2020-02-27 19:28:26 +03:00
raver119	31e3a2f7a5	transparent conversion to FastPath execution within Graph (#278 ) Signed-off-by: raver119 <raver119@gmail.com>	2020-02-27 16:10:38 +03:00
Oleh	b4575d11e9	Loops auto-vectorization problem fix (#274 ) * libnd4j cast loop types Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j more type castination added to loops Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j sync casting types of iterated variable in loops Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j more loops reviewed for vectorization problem fix Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j fixed several typos Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j several more files reviewed to fix auto-vectorization problem in loops Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j merge master and reviewed more files to fix auto-vectorization problem in loops Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j several type casting added in broadcasting that were missed, fixed mac builds Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j double check all files and fix several more places in loops Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j fixed builds Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j revert changes for lup.cpp Signed-off-by: Oleg <oleg.semeniv@gmail.com>	2020-02-26 21:12:19 +03:00
raver119	5c806d2fb5	reshape tweak (#275 ) * - expand dims tweak - reshape memcpy Signed-off-by: raver119 <raver119@gmail.com> * validation fix Signed-off-by: raver119 <raver119@gmail.com>	2020-02-26 14:05:32 +03:00
Oleh	b686368b82	Refactoring split operation (#266 ) * libnd4j moved split operation implementation to helpers before special case adding Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j minor fixes for general split operation move, merge master Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libndj4 split cpu implementation Signed-off-by: Oleg <oleg.semeniv@gmail.com> * - provide cuda helper for split op Signed-off-by: Yurii <iuriish@yahoo.com> * - minor correction Signed-off-by: Yurii <iuriish@yahoo.com> * - minor correction 2 Signed-off-by: Yurii <iuriish@yahoo.com> * libnd4j moved split implementation from specials to split.cpp Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j update loopkind selections for 3D, 4D and 5D cases Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j removed unnecessary BUILD_SINGLE_TEMPLATE Signed-off-by: Oleg <oleg.semeniv@gmail.com> Co-authored-by: Yurii Shyrma <iuriish@yahoo.com>	2020-02-26 10:20:39 +03:00
raver119	cf67c7165a	nano fix Signed-off-by: raver119 <raver119@gmail.com>	2020-02-25 15:20:51 +03:00
raver119	f6442b6724	few minor tweaks (#272 ) Signed-off-by: raver119 <raver119@gmail.com>	2020-02-25 11:13:23 +03:00
Oleh	f0706b21aa	Split operation improvement (#262 ) * libnd4j moved split operation implementation to helpers before special case adding Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j minor fixes for general split operation move, merge master Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libndj4 split cpu implementation Signed-off-by: Oleg <oleg.semeniv@gmail.com> * - provide cuda helper for split op Signed-off-by: Yurii <iuriish@yahoo.com> * - minor correction Signed-off-by: Yurii <iuriish@yahoo.com> * - minor correction 2 Signed-off-by: Yurii <iuriish@yahoo.com> Co-authored-by: Yurii Shyrma <iuriish@yahoo.com> Co-authored-by: raver119 <raver119@gmail.com>	2020-02-24 08:22:41 +03:00
shugeo	1bb3ae4b03	Shugeo unordered map (#256 ) * Refactored usage of std::map to std::unordered_map instead. Signed-off-by: shugeo <sgazeos@gmail.com> * Eliminated crash with wrong ShapeDescriptor hash. Signed-off-by: shugeo <sgazeos@gmail.com> * Eliminated crash with TadDescriptor hash. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored Stash hash. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored hashes. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored TadDescriptor hash and top_k mapping. * Refactored hashes for ShapeDescriptor and TadDescriptor classes. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored hash for ConstantDescriptor and ShapeDescriptor classes. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed map using with cuda platform. Signed-off-by: shugeo <sgazeos@gmail.com> * - few rearrangements for hash functions - shared openblas allowed Signed-off-by: raver119 <raver119@gmail.com> * exports Signed-off-by: raver119 <raver119@gmail.com> * exports Signed-off-by: raver119 <raver119@gmail.com> * Stash reverted to std::map Signed-off-by: raver119@gmail.com <raver119@gmail.com> * Added additional test. Signed-off-by: shugeo <sgazeos@gmail.com> * different maps for different compilers Signed-off-by: raver119 <raver119@gmail.com> * missing include Signed-off-by: raver119 <raver119@gmail.com> * fix the leak Signed-off-by: raver119 <raver119@gmail.com> Co-authored-by: raver119 <raver119@gmail.com>	2020-02-24 07:51:01 +03:00
raver119	e78be14cc1	arm fix (#260 ) * range check for scalar_int Signed-off-by: raver119 <raver119@gmail.com> * no simd Signed-off-by: raver119 <raver119@gmail.com> * no ops Signed-off-by: raver119 <raver119@gmail.com> * cyclic shift? Signed-off-by: raver119 <raver119@gmail.com> * left split Signed-off-by: raver119 <raver119@gmail.com> * left split Signed-off-by: raver119 <raver119@gmail.com> * rot ops unrolled templates Signed-off-by: raver119 <raver119@gmail.com> * no rotl/rotr for uint64 Signed-off-by: raver119 <raver119@gmail.com> * no rotl/rotr for uint64 2 Signed-off-by: raver119 <raver119@gmail.com> * no rotl/rotr for uint64 3 Signed-off-by: raver119 <raver119@gmail.com> * ARM_BUILD declared Signed-off-by: raver119 <raver119@gmail.com>	2020-02-21 14:31:00 +03:00
Yurii Shyrma	f7a9190407	profiling of concat op (both cuda and cpu) (#151 ) * - profiling of concat op (both cuda and cpu) Signed-off-by: Yurii <iuriish@yahoo.com> * better comparison for large concat Signed-off-by: raver119 <raver119@gmail.com> * - further improving of concat op Signed-off-by: Yurii <iuriish@yahoo.com> * some loggin Signed-off-by: raver119 <raver119@gmail.com> * - add possibility to verify presence of trailing unities in shape and set strides/ews correspondingly - restrict second simple case in concat op to c order only Signed-off-by: Yurii <iuriish@yahoo.com> * - move concat op to specials_single.cpp file Signed-off-by: Yurii <iuriish@yahoo.com> * - get rid of second concat op declaration in transforms.cpp file Signed-off-by: Yurii <iuriish@yahoo.com> Co-authored-by: raver119 <raver119@gmail.com>	2020-02-20 21:19:01 +03:00
raver119	215641ea9e	Minor improvements (#255 ) * static increments in loops Signed-off-by: raver119 <raver119@gmail.com> * specials and concat split into separate units Signed-off-by: raver119 <raver119@gmail.com>	2020-02-20 11:43:26 +03:00
Yurii Shyrma	c5193ecb81	Shyrma gather (#254 ) * - profiling gather op for aurora Signed-off-by: Yurii <iuriish@yahoo.com> * - include contiguous memcpy in gather op Signed-off-by: Yurii <iuriish@yahoo.com>	2020-02-19 09:35:52 +03:00
Yurii Shyrma	22c7aa9acf	Shyrma mkl matmul (#250 ) * - provide matmul code based on mkl api Signed-off-by: Yurii <iuriish@yahoo.com> * - correct typo in mkl matmul op Signed-off-by: Yurii <iuriish@yahoo.com> * - take into account empty arrays in mkl matmul op Signed-off-by: Yurii <iuriish@yahoo.com> * - fix bug in mkl matmul and group all matmul tests in one file Signed-off-by: Yurii <iuriish@yahoo.com>	2020-02-18 08:58:01 +03:00
raver119	f9d51b7278	More compilation units (#246 ) * weird edge case Signed-off-by: raver119 <raver119@gmail.com> * weird edge case Signed-off-by: raver119 <raver119@gmail.com> * get rid of it Signed-off-by: raver119 <raver119@gmail.com> * crop and resize reorganized Signed-off-by: raver119 <raver119@gmail.com> * restore test Signed-off-by: raver119 <raver119@gmail.com> * remove unwanted unit refs in cmale Signed-off-by: raver119 <raver119@gmail.com>	2020-02-17 10:23:05 +03:00
Yurii Shyrma	011c272fde	Shyrma transpose (#244 ) * - provide contiguous strides for ouput in transpose op Signed-off-by: Yurii <iuriish@yahoo.com> * - provide contiguous strides for output in permute op Signed-off-by: Yurii <iuriish@yahoo.com> * - take into account empty shapes properly in transpose/permute op Signed-off-by: Yurii <iuriish@yahoo.com>	2020-02-17 08:04:28 +03:00
raver119	9e3c1b02b1	Perf improvements (#242 ) * initial commit Signed-off-by: raver119 <raver119@gmail.com> * meh Signed-off-by: raver119 <raver119@gmail.com> * better ExpandDims impl Signed-off-by: raver119 <raver119@gmail.com> * better Squeeze impl Signed-off-by: raver119 <raver119@gmail.com> * better Softmax impl Signed-off-by: raver119 <raver119@gmail.com> * one test disabled Signed-off-by: raver119 <raver119@gmail.com> * more accurate impl Signed-off-by: raver119 <raver119@gmail.com> * - GraphProfiler now prints full shapeInfo instead of shape - softmax typo fix Signed-off-by: raver119 <raver119@gmail.com>	2020-02-14 16:20:31 +03:00
raver119	3de3cd8277	R119 tests (#238 ) * one small test Signed-off-by: raver119 <raver119@gmail.com> * one small test Signed-off-by: raver119 <raver119@gmail.com> * bert test Signed-off-by: raver119 <raver119@gmail.com> * Graph FlowPath fix Signed-off-by: raver119 <raver119@gmail.com> * - GraphProfiler tweaks - NodeProfile now includes shapes Signed-off-by: raver119 <raver119@gmail.com> * RELU_layer inplace tweak Signed-off-by: raver119 <raver119@gmail.com> * meh Signed-off-by: raver119 <raver119@gmail.com> * identity tweaks Signed-off-by: raver119 <raver119@gmail.com> * bert result validation Signed-off-by: raver119 <raver119@gmail.com> * - bunch of Shape ops have inplace exec forbidden now - Legacy ops have inplace exec disabled by default now Signed-off-by: raver119 <raver119@gmail.com> * ffast-math enabled Signed-off-by: raver119 <raver119@gmail.com> * ffast-math enabled Signed-off-by: raver119 <raver119@gmail.com> * allow some legacy ops to be inplace Signed-off-by: raver119 <raver119@gmail.com> * disable -fast_math Signed-off-by: raver119 <raver119@gmail.com> * disable expensive test for cuda Signed-off-by: raver119 <raver119@gmail.com>	2020-02-13 20:59:35 +03:00
Yurii Shyrma	fe47f52896	Oleh tenzor mmul (#231 ) * Libnd4j: TensorMMul backprop op #8174, raw implementation Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: TensorMMul backprop op #8174 merge master and some corrections Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: TensorMMul backprop op #8174 algorithm update, need testing, sync with master * Libnd4j: TensorMMul backprop op #8174 fixed incorrect B axes calculation Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: TensorMMul backprop op #8174 optimize axes identification and fix bug of indeces overlapping, added first test. need testing with different shapes Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: TensorMMul backprop op #8174 some fixes and improvements need more testing Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: TensorMMul backprop op #8174 fixed order of matrix multiply Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: TensorMMul backprop op #8174 fixed issue of incorrect axes definition, add tests based on TF, need additional testing for case dLdC not equal 1 Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: TensorMMul backprop op #8174 fixed scalar case add test Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: TensorMMul backprop op #8174 fixed bp algorithm, axes definition, need some mode testing with different orders combination f,c; c,f f,f and add some checks for inputs Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: TensorMMul backprop op #8174 some checks and corrections added tests, exists the problem with different input orders support A-f B-c and A-f B-f Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: TensorMMul backprop op #8174 sync master Signed-off-by: Oleg <oleg.semeniv@gmail.com> * - correct bug in MmulHelper::tensorDot(a, b, c, axes_a, axes_b,permutForC) Signed-off-by: Yurii <iuriish@yahoo.com> * Libnd4j: TensorMMul backprop op #8174 code clean up and refactoring Signed-off-by: Oleg <oleg.semeniv@gmail.com> * - add check for linspase ordered permutations in ShapeUtils::evalShapeForTensorDot Signed-off-by: Yurii <iuriish@yahoo.com> * - provide additional code in shape::reshape stuff in order to reduce amount of allocation/copy operations during reshaping procedure Signed-off-by: Yurii <iuriish@yahoo.com> * - further work on problem of wrong shape evaluation during permute/reshape procedures Signed-off-by: Yurii <iuriish@yahoo.com> * - still looking for bug reason in reshape/permute stuff Signed-off-by: Yurii <iuriish@yahoo.com> * - correct bug in transform cuda native ops Signed-off-by: Yurii <iuriish@yahoo.com> * - correct bug in NDArray::assign Signed-off-by: Yurii <iuriish@yahoo.com> * - remove old shape::reshape stuff Signed-off-by: Yurii <iuriish@yahoo.com> * - add possibility to disable copy of old buffer to new buffer during reshape operation in NDArray class Signed-off-by: Yurii <iuriish@yahoo.com> * - correct bug in tensorDot which had to do with wrong pointers assigments Signed-off-by: Yurii <iuriish@yahoo.com> Co-authored-by: Oleh <oleg.semeniv@gmail.com>	2020-02-13 20:33:54 +03:00
shugeo	f0c684020f	Shugeo resize area fix4 (#229 ) * Fixed a couple of issues with resize_area op. Signed-off-by: shugeo <sgazeos@gmail.com> * Added additional test for alternate params for resize_area testing. Signed-off-by: shugeo <sgazeos@gmail.com>	2020-02-12 19:02:42 +03:00
raver119	237c137166	few more smaller compilation units (#226 ) Signed-off-by: raver119 <raver119@gmail.com>	2020-02-10 10:57:18 +03:00
raver119	8a0d5e3b97	Compilation units (#224 ) * - TrueBroadcastHelper split into multiple compilation units - legacy gemm.cpp disabled Signed-off-by: raver119 <raver119@gmail.com> * - IndexReduce int32/int64 split into multiple compilation units Signed-off-by: raver119 <raver119@gmail.com> * - Reduce3 ops split into multiple compilation units Signed-off-by: raver119 <raver119@gmail.com>	2020-02-09 19:48:32 +03:00
Abdelrauf	bead656feb	Initial performance improvement for Bias Add and etc #8556 (#217 ) * Initial performance improvement for Bias Add, loop coords helpers and increment aligned parallel threading Signed-off-by: AbdelRauf <rauf@konduit.ai> * One more test for Rauf Signed-off-by: raver119 <raver119@gmail.com> * disable couple of perf tests Signed-off-by: raver119 <raver119@gmail.com> Co-authored-by: raver119 <raver119@gmail.com>	2020-02-08 15:31:30 +03:00
Yurii Shyrma	948646b32d	Shyrma mkl test (#211 ) * - provide nhwc format in mkl conv ops Signed-off-by: Yurii <iuriish@yahoo.com> * - corrections in mkl conv3d Signed-off-by: Yurii <iuriish@yahoo.com> * - corrections in mkl batchnorm Signed-off-by: Yurii <iuriish@yahoo.com> * - corrections in mkl maxpooling2d Signed-off-by: Yurii <iuriish@yahoo.com> * - add format format_tag::any to outputs in mkl conv ops Signed-off-by: Yurii <iuriish@yahoo.com> * - complete corrections in mkl conv ops Signed-off-by: Yurii <iuriish@yahoo.com> * - add test for comparison of execution speeds of mkl conv2d op with different weights format Signed-off-by: Yurii <iuriish@yahoo.com> * - take into account order f in mkl conv ops Signed-off-by: Yurii <iuriish@yahoo.com>	2020-02-06 21:12:54 +03:00
shugeo	5ae40f6e38	Shugeo sequence mask fix2 (#216 ) * Fixed sequence_mask op and tests. Signed-off-by: shugeo <sgazeos@gmail.com> * Cuda fix for sequence_mask op. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed sequence_mask op for both platforms and tests. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed solve and triangular_solve for more than 2D for adjoint cases. Signed-off-by: shugeo <sgazeos@gmail.com> * Added adjoint solve test again. Signed-off-by: shugeo <sgazeos@gmail.com> * Added a set of tests for triangual_solve and generic solve ops. Signed-off-by: shugeo <sgazeos@gmail.com> * Added a pair tests for triangular_solve Signed-off-by: shugeo <sgazeos@gmail.com> * Added tests for triangular_solve op. Signed-off-by: shugeo <sgazeos@gmail.com>	2020-02-06 21:06:50 +03:00
raver119	5d28e6143d	OpContext handling (#214 ) * nano tweaks Signed-off-by: raver119 <raver119@gmail.com> * OpContext tweaks Signed-off-by: raver119 <raver119@gmail.com> * OpContext deallocators Signed-off-by: raver119 <raver119@gmail.com> * get rid of few mkldnn safety checks Signed-off-by: raver119 <raver119@gmail.com> * databuffer setSpecial fix Signed-off-by: raver119 <raver119@gmail.com>	2020-02-05 07:27:24 +03:00
shugeo	41ff907bc6	Shugeo solve linear (#191 ) * linear equations systems solve op. Initial commit. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed compiling issues. Signed-off-by: shugeo <sgazeos@gmail.com> * Linear equations systems solve. The next stage commit. Signed-off-by: shugeo <sgazeos@gmail.com> * Added test for linear equations systems solve operation. Signed-off-by: shugeo <sgazeos@gmail.com> * Added additional test and fixed lower matrix retrievance. * Implementation for solve of the systems of linear equations." Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored permutation generation. Signed-off-by: shugeo <sgazeos@gmail.com> * Added restore for permutations batched with cuda helper for solve op. Signed-off-by: shugeo <sgazeos@gmail.com> * Finished cuda implementation for solve op helpers. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored cpu helpers for solve op. Signed-off-by: shugeo <sgazeos@gmail.com> * Fix gtest output on Windows * Fixed issue with permutation matrix for cuda implementation. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed issue with permutation matrix for cpu implementation. Signed-off-by: shugeo <sgazeos@gmail.com> * Eliminated waste comments. Signed-off-by: shugeo <sgazeos@gmail.com> * LinearSolve added * Mapping added * Javadoc added * Refactored implementation of triangular_solve helpers and tests for solve matrix equations generally. Signed-off-by: shugeo <sgazeos@gmail.com> * Added a test for solve op. Signed-off-by: shugeo <sgazeos@gmail.com> * Solve test added * Fix for TF import Co-authored-by: Serhii Shepel <9946053+sshepel@users.noreply.github.com> Co-authored-by: raver119 <raver119@gmail.com> Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>	2020-02-04 08:59:11 +03:00
raver119	9bb5798cac	Null arrays fix (#208 ) * don't skip null arrays Signed-off-by: raver119 <raver119@gmail.com> * one test tweak Signed-off-by: raver119 <raver119@gmail.com>	2020-02-02 23:14:00 +03:00
Oleh	d52e67209e	Oleh convert (#200 ) * StringUtils for utf convertor raw implementation of all possible combinations, need to be add counter of bytes per symbol for any type and add api to call convertors and store data Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor more corrections to support convertors Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor some corrections and bug fixes, need review to discuss how to add multi-threading Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 some corrections to move to multi-threading, add one test need discussion data inputs/outputs array presentation, need discussion the way of multi-threading * StringUtils for utf convertor #8613 tests added some corrections to optimize build Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 some corrections and code clean up Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 code clean up and optimize usage, need update ndarray factory before replace std usage Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 some staff to integrate converters into NDArrayFactory, update tests and add some functionality Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 minor corrections and bug fix before discussion * StringUtils for utf convertor #8613 some fixes and tets * StringUtils for utf convertor #8613 some more staff to support different unicode Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 fix linking bug * StringUtils for utf convertor #8613 corrected several tests as defaults for string ndarray changed * StringUtils for utf convertor #8613 replace some incorrect implementation, revert some test changes, need sync before testing * StringUtils for utf convertor #8613 fixed several thing that were badly implemented yesterday, need optimization, testing (before testing have to be add support of u32 and u16 buffer visualization) * StringUtils for utf convertor #8613 fixed to support u16 and u32, and convertor in ndarray, fix buffer print, etc Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 merge master and sync with server Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 some correction for string cast, need print check only asci support Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 merge master, remove copies and add cast, need test, refactoring according review and clean up * StringUtils for utf convertor #8613 fixed cast and copy issues Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 fixed cuda and update tests * StringUtils for utf convertor #8613 integration into NdArray, fix several tests for build pass, refactoring, etc * - avoid ambiguity of NDArray ctrs overloading in some tests Signed-off-by: Yurii <iuriish@yahoo.com> * StringUtils for utf convertor #8613 NDArray string constructors added, updated NDArrayFactory, refactoring unicode and tests, etc Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 fixed cuda build and test, refactoring and void* added to some functions Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 void* integration, removed copy operation, refactoring, added tests for NDArray string constructors, etc Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 several more fixes, improvements and updates Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 master merge, code clean up and optimization before review Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 minor fixes string element size define Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 revert last changes as mistake Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 fixed NDArray constructor build problem, remove order from string factory, fixed order use for factory via project, added catch of incorrect sync in cast of arrays to data types, fixed e method for strings, etc Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 added javacpp hack, added multi-threading, minor corrections in license agreement Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 windows builds fix, as "sting" is not treated as utf8 Signed-off-by: Oleg <oleg.semeniv@gmail.com> Co-authored-by: Yurii Shyrma <iuriish@yahoo.com>	2020-01-31 16:30:49 +03:00
raver119	1ab86d1306	Range op data type (#204 ) * - range op now accepts dargs - dargs now can be in signature Signed-off-by: raver119 <raver119@gmail.com> * range dtype java side Signed-off-by: raver119 <raver119@gmail.com> * linspace fix Signed-off-by: raver119 <raver119@gmail.com> * lin_space fix for scalar outputs Signed-off-by: raver119 <raver119@gmail.com>	2020-01-31 10:45:40 +03:00
raver119	5d98cfcf47	Configurable DataType for ops (#201 ) * initial commit Signed-off-by: raver119 <raver119@gmail.com> * - one more test for OneHot with dtype - one more signature in Nd4j Signed-off-by: raver119 <raver119@gmail.com> * ones_as/zeros_as now accept dtype Signed-off-by: raver119 <raver119@gmail.com> * one more test Signed-off-by: raver119 <raver119@gmail.com> * - more updates for configurable data types - ones_as/zeros_as java side + tests Signed-off-by: raver119 <raver119@gmail.com> * few c++ tests fixed Signed-off-by: raver119 <raver119@gmail.com> * few more changes around DArgs Signed-off-by: raver119 <raver119@gmail.com>	2020-01-30 18:46:12 +03:00
raver119	ba961c7601	DataTypes & FlatBuffers (#197 ) * flatbuffers version upgrade Signed-off-by: raver119 <raver119@gmail.com> * flatbuffers version upgrade java side Signed-off-by: raver119 <raver119@gmail.com> * flatbuffers dependency version upgrade java side Signed-off-by: raver119 <raver119@gmail.com> * MKLDNN version upgrade Signed-off-by: raver119 <raver119@gmail.com> * DArgs first pass Signed-off-by: raver119 <raver119@gmail.com> * signatures first pass Signed-off-by: raver119 <raver119@gmail.com> * signatures second pass Signed-off-by: raver119 <raver119@gmail.com> * signatures third pass Signed-off-by: raver119 <raver119@gmail.com> * signatures third pass Signed-off-by: raver119 <raver119@gmail.com> * signatures fourth pass Signed-off-by: raver119 <raver119@gmail.com> * signatures fifth pass Signed-off-by: raver119 <raver119@gmail.com> * flatbuffers UI version upgrade java side Signed-off-by: raver119 <raver119@gmail.com> * flatbuffers ui update Signed-off-by: raver119 <raver119@gmail.com> * flatbuffers downgrade Signed-off-by: raver119 <raver119@gmail.com> * flatbuffers downgrade java side Signed-off-by: raver119 <raver119@gmail.com>	2020-01-30 10:07:24 +03:00
Yurii Shyrma	7a7ee4b021	Shyrma cudnn (#192 ) * - implementation of cudnn batchnorm_bp op Signed-off-by: Yurii <iuriish@yahoo.com> * - testing and fixing bugs in batchnorm_bp based on cudnn api Signed-off-by: Yurii <iuriish@yahoo.com> * - move pooling mkl code and delete some unnecessary files Signed-off-by: Yurii <iuriish@yahoo.com> * - implementation and testing cudnn pooling2d ops (avg/max, ff/bp) Signed-off-by: Yurii <iuriish@yahoo.com> * - implementation and testing cudnn pooling 3d (ff/bp) ops Signed-off-by: Yurii <iuriish@yahoo.com> * - provide ff step in case of cudnn maxpool3d_bp op Signed-off-by: Yurii <iuriish@yahoo.com> * - remove half type from set of supported types in mkl dpethwise conv op Signed-off-by: Yurii <iuriish@yahoo.com> * - bring back cudaStreamSynchronize in batchnorm and pooling cudnn ops Signed-off-by: Yurii <iuriish@yahoo.com> Co-authored-by: raver119 <raver119@gmail.com>	2020-01-28 18:23:07 +03:00
shugeo	99a54829c2	Shugeo resize area fix2 (#181 ) * Added test for issue with resize_area op. Signed-off-by: shugeo <sgazeos@gmail.com> * Added a pair of tests for resize_are op. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored resize_area kernel to avoid shared memory overflow. Signed-off-by: shugeo <sgazeos@gmail.com> * Eliminated prints with tests. Signed-off-by: shugeo <sgazeos@gmail.com> * ignore bad test Signed-off-by: raver119 <raver119@gmail.com> * Fixed test with resize_area. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed test for float constants. Signed-off-by: shugeo <sgazeos@gmail.com> Co-authored-by: raver119 <raver119@gmail.com>	2020-01-24 20:55:25 +03:00
raver119	5d69069177	[WIP] Memory limits (#167 ) * initial commit Signed-off-by: raver119 <raver119@gmail.com> * one more initial commit Signed-off-by: raver119 <raver119@gmail.com> * additional initial commit Signed-off-by: raver119 <raver119@gmail.com> * subsequent initial commit Signed-off-by: raver119 <raver119@gmail.com> * initial commit testing Signed-off-by: raver119 <raver119@gmail.com> * initial commit per device Signed-off-by: raver119 <raver119@gmail.com> * initial commit per group Signed-off-by: raver119 <raver119@gmail.com> * initial commit for cuda Signed-off-by: raver119 <raver119@gmail.com> * initial commit for cuda + few missed lines Signed-off-by: raver119 <raver119@gmail.com> * initial commit for cuda + missed includes Signed-off-by: raver119 <raver119@gmail.com> * initial commit for cuda + one more missed include Signed-off-by: raver119 <raver119@gmail.com> * initial commit shouldn't count host mem as dev0 in cuda Signed-off-by: raver119 <raver119@gmail.com> * initial commit that tracks HOST group limits for CUDA Signed-off-by: raver119 <raver119@gmail.com> * initial commit with some Environment changes Signed-off-by: raver119 <raver119@gmail.com> * initial commit with more Environment changes Signed-off-by: raver119 <raver119@gmail.com> * initial commit with maxMasterThreads fix Signed-off-by: raver119 <raver119@gmail.com> * initial commit with maxMasterThreads fix Signed-off-by: raver119 <raver119@gmail.com> * initial commit without maxMasterThreads exception Signed-off-by: raver119 <raver119@gmail.com> * initial commit without Nd4jULong in Environment Signed-off-by: raver119 <raver119@gmail.com> * add sleep and more iterations for OOM cases Signed-off-by: raver119 <raver119@gmail.com> * limits propagation from java side Signed-off-by: raver119 <raver119@gmail.com> * - consume ErrorCode every time - one test for memory limits Signed-off-by: raver119 <raver119@gmail.com> * unordered_map Signed-off-by: raver119 <raver119@gmail.com> * unordered_map Signed-off-by: raver119 <raver119@gmail.com> * unordered_map Signed-off-by: raver119 <raver119@gmail.com> * RSub op mapping fixed Signed-off-by: raver119 <raver119@gmail.com> * typo fixed Signed-off-by: raver119 <raver119@gmail.com> * one bad test fixed Signed-off-by: raver119 <raver119@gmail.com>	2020-01-24 10:11:09 +03:00
shugeo	2717b25931	Shugeo qr (#153 ) * Added qr op implementation. Initial version. * Fixed doc for qr op. Signed-off-by: shugeo <sgazeos@gmail.com> * Implementation of QR decomposition. CPU platform version. * Added a pair of tests for qr op testing. Signed-off-by: shugeo <sgazeos@gmail.com> * QR implementation. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected norm using. * Properly calculated intermediate results with QR decomposition. * Another step to implement QR algorithm by householder. * Cpu implementatio for QR decomposition. The first working edition. * Corrected test to QR decomposition. * Added tad multithreading with QR implementation. * Finished cpu implementation for QR decomposition helpers. * Refactored tests and improved multithreading. * Refactored QR cpu implementation and update cuda implementation helpers. * Cuda QR helper implementation. The first working edition. * Eliminated waste prints. * Restore multithreading with cuda implementation. * Ops names corrected * Refactored qr op helpers to optimize. Signed-off-by: shugeo <sgazeos@gmail.com> * Eliminated waste manual ticking. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored memory allocation to avoid waste memory usage. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored matrixMinor method both for cuda and cpu platforms. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored method of vmul to use raw buffers instead type conversion. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored temporary array of matricies. Signed-off-by: shugeo <sgazeos@gmail.com> Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> Co-authored-by: raver119 <raver119@gmail.com>	2020-01-22 13:59:36 +03:00
shugeo	815a2908af	Shugeo solve triangular (#173 ) * Added implementation of the triangular_solve op. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed compilation issues. Signed-off-by: shugeo <sgazeos@gmail.com> * Added verification of input data and helpers facilities for triangular_solve op.' Signed-off-by: shugeo <sgazeos@gmail.com> * Added cpu implementation for triangular_solve helpers. * Added tests and implementation for upper triangular equations. Signed-off-by: shugeo <sgazeos@gmail.com> * Added a pair of cases to tests. Signed-off-by: shugeo <sgazeos@gmail.com> * Added multithreading with cpu helpers for triangular_solve op. Signed-off-by: shugeo <sgazeos@gmail.com> * Added cuda implementation of triangular_solve op helpers. Signed-off-by: shugeo <sgazeos@gmail.com> * Finished cuda implementation of triangular_solve helpers and tests. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed copyright marks. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected grammar errors with doc and error messages. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored matricies processing with triangular_solve cuda helper implementation. Signed-off-by: shugeo <sgazeos@gmail.com> * Added triangular_solve wrapper * Fixed mapping * Added processing for adjoint with cpu helpers of triangular_solve op implementation. Signed-off-by: shugeo <sgazeos@gmail.com> * Added implementation for adjoint routine with cuda platform. Signed-off-by: shugeo <sgazeos@gmail.com> * Added multithreading with adjoint routine for cpu platform. Signed-off-by: shugeo <sgazeos@gmail.com> Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>	2020-01-22 10:48:03 +03:00
shugeo	e50b285c2c	Shugeo resize area (#162 ) * Added implementation for resize_area op. Initial commit. * Added implementation of resize_area op. Initial revision. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected resizeArea functor call. Signed-off-by: shugeo <sgazeos@gmail.com> * Implementation of resize_area. Cpu platform helpers. Signed-off-by: shugeo <sgazeos@gmail.com> * Implementation for resize_area helpers. The first part revision. Signed-off-by: shugeo <sgazeos@gmail.com> * Added a set of tests for resize_area op. Signed-off-by: shugeo <sgazeos@gmail.com> * Cuda implementation for resize_area. Initial approach. Signed-off-by: shugeo <sgazeos@gmail.com> * Adding multithreading for resize_area algorithm. Signed-off-by: shugeo <sgazeos@gmail.com> * Cuda implementation of resize_area helpers. Shared memory approach. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored resizeAreaKernel with cuda implementation. * Eliminated compilation errors. * ResizeArea helpers for cuda platform. The first working revision. Signed-off-by: shugeo <sgazeos@gmail.com> * Added test for batched resize_area op testing. Signed-off-by: shugeo <sgazeos@gmail.com> * Implementation of resize_are for cuda platform and tests. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed multithreading with resize_area op helper. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected copyright marks with sources. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected copyright mark for resize_area op implementation. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected copyright mark for parity ops header. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected typo in strings and so on with image resize ops. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored resize_area helpers and multithreading. Signed-off-by: shugeo <sgazeos@gmail.com> * Added ResizeArea wrapper * Added test with align_corners and fixed shape processing with only int args given for output size. Signed-off-by: shugeo <sgazeos@gmail.com> * Added test * TF mapping for ResizeArea * Fixed implementation issues with resize_area op for both platforms. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored image resizer struct to use flexible types for ints and floats. Signed-off-by: shugeo <sgazeos@gmail.com> * Improved multithreading with resizeAreaKernel launch. Signed-off-by: shugeo <sgazeos@gmail.com> * Use asynchronical memory copying with cuda platform image resize allocations. Signed-off-by: shugeo <sgazeos@gmail.com> Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>	2020-01-22 10:46:33 +03:00
raver119	7783012f39	cuDNN integration (#150 ) * initial commit Signed-off-by: raver119 <raver119@gmail.com> * one file Signed-off-by: raver119 <raver119@gmail.com> * few more includes Signed-off-by: raver119 <raver119@gmail.com> * m? Signed-off-by: raver119 <raver119@gmail.com> * const Signed-off-by: raver119 <raver119@gmail.com> * cudnn linkage in tests Signed-off-by: raver119 <raver119@gmail.com> * culibos Signed-off-by: raver119 <raver119@gmail.com> * static reminder Signed-off-by: raver119 <raver119@gmail.com> * platform engine tag Signed-off-by: raver119 <raver119@gmail.com> * HAVE_CUDNN moved to config.h.in Signed-off-by: raver119 <raver119@gmail.com> * include Signed-off-by: raver119 <raver119@gmail.com> * include Signed-off-by: raver119 <raver119@gmail.com> * skip cudnn handle creation if there's not cudnn Signed-off-by: raver119 <raver119@gmail.com> * meh Signed-off-by: raver119 <raver119@gmail.com> * target device in context Signed-off-by: raver119 <raver119@gmail.com> * platform engines Signed-off-by: raver119 <raver119@gmail.com> * platform engines Signed-off-by: raver119 <raver119@gmail.com> * allow multiple -h args Signed-off-by: raver119 <raver119@gmail.com> * allow multiple -h args Signed-off-by: raver119 <raver119@gmail.com> * move mkldnn out of CPU block Signed-off-by: raver119 <raver119@gmail.com> * link to mkldnn on cuda Signed-off-by: raver119 <raver119@gmail.com> * less prints Signed-off-by: raver119 <raver119@gmail.com> * minor tweaks Signed-off-by: raver119 <raver119@gmail.com> * next step Signed-off-by: raver119 <raver119@gmail.com> * conv2d NCHW draft Signed-off-by: raver119 <raver119@gmail.com> * conv2d biasAdd Signed-off-by: raver119 <raver119@gmail.com> * test for MKL/CUDNN combined use Signed-off-by: raver119 <raver119@gmail.com> * - provide additional code for conv2d ff based on cudnn api, not tested yet Signed-off-by: Yurii <iuriish@yahoo.com> * - further work on conv2d helper based on using cudnn api Signed-off-by: Yurii <iuriish@yahoo.com> * - fixing several cuda bugs which appeared after cudnn lib had been started to use Signed-off-by: Yurii <iuriish@yahoo.com> * - implementation of conv2d backprop op based on cudnn api Signed-off-by: Yurii <iuriish@yahoo.com> * - implementaion of conv3d and conv3d_bp ops based on cudnn api Signed-off-by: Yurii <iuriish@yahoo.com> * - bugs fixing in conv3d/conv3d_bp ops (cudnn in use) Signed-off-by: Yurii <iuriish@yahoo.com> * - implementation of depthwiseConv2d (ff/bp) op based on cudnn api Signed-off-by: Yurii <iuriish@yahoo.com> * - implementation of batchnorm ff op based on cudnn api Signed-off-by: Yurii <iuriish@yahoo.com> * - disable cudnn batchnorm temporary Signed-off-by: Yurii <iuriish@yahoo.com> * - add minor change in cmake Signed-off-by: Yurii <iuriish@yahoo.com> * engine for depthwise mkldnn Signed-off-by: raver119 <raver119@gmail.com> * couple of includes Signed-off-by: raver119 <raver119@gmail.com> * - provide permutation to cudnn batchnorm ff when format is NHWC Signed-off-by: Yurii <iuriish@yahoo.com> * lgamma fix Signed-off-by: raver119 <raver119@gmail.com> * - eliminate memory leak in two tests Signed-off-by: Yurii <iuriish@yahoo.com> Co-authored-by: Yurii Shyrma <iuriish@yahoo.com>	2020-01-20 21:32:46 +03:00
Oleh	8fc0e63ce7	Oleh powderev (#171 ) * Libnd4j: Add broadcastable elementwise power derivative #7461 first step of Pow_bp operation implementation Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: Add broadcastable elementwise power derivative #7461 some corrections of calculation steps Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: Add broadcastable elementwise power derivative #7461 some bug fixes, the PowDerevative op made broadcastable, add the raw tests for op, need refactoring to use broadcast ops * Libnd4j: Add broadcastable elementwise power derivative #7461 fixed several bugs add broadcast support and tests, need to fix scalar+array and array+scalar Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: Add broadcastable elementwise power derivative #7461 fixed bugs for scalar inputs, fixed multinomial tests, added tests Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: Add broadcastable elementwise power derivative #7461 fised bugs for different shapes support, tests updated * Libnd4j: Add broadcastable elementwise power derivative #7461 applied all possible variants via tiled arrays, add support of broadcast for Pow and PowDerivative ops, covered by tests, before review have to be replaced tiled implementation by applyTrueBroadcast Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: Add broadcastable elementwise power derivative #7461 replaced tile by broadcast implementation, fixed issue with negative x input, corrected tests, need additional testing Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: Add broadcastable elementwise power derivative #7461 added and corrected test cases, corrected implementation need review Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: Add broadcastable elementwise power derivative #7461 code clean up * Libnd4j: Add broadcastable elementwise power derivative #7461 code clean up, removed some tests, add tests with scalar Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: Add broadcastable elementwise power derivative #7461 code improvement and clean up, split tests Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: Add broadcastable elementwise power derivative #7461 some code clean up Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: Add broadcastable elementwise power derivative replace __isnanf by internal realization Signed-off-by: Oleg <oleg.semeniv@gmail.com> * pow_bp wrapper * Fixed PowBp wrapper * Tests added * Test fixed * Fix return type * Disable powBp usage * Pow backprop changed Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>	2020-01-20 12:59:12 +03:00

1 2 3 4 5 ...

264 Commits