cavis

Author	SHA1	Message	Date
raver119	7a2ac800dd	Nullify (#304 ) * initial commit Signed-off-by: raver119 <raver119@gmail.com> * bunch of tweaks Signed-off-by: raver119 <raver119@gmail.com> * hamming distance nullification Signed-off-by: raver119 <raver119@gmail.com> * Add output array value assignment for testing/debugging Signed-off-by: Alex Black <blacka101@gmail.com> * don't assign empty arrays Signed-off-by: raver119 <raver119@gmail.com> * conv2d/conv3d/depthwise2d nullified Signed-off-by: raver119 <raver119@gmail.com> * conv2d/conv3d/depthwise2d nullified Signed-off-by: raver119 <raver119@gmail.com> * conv2d/conv3d/depthwise2d nullified Signed-off-by: raver119 <raver119@gmail.com> * few more fixes Signed-off-by: raver119 <raver119@gmail.com> * im2col Signed-off-by: raver119 <raver119@gmail.com> * pooling? Signed-off-by: raver119 <raver119@gmail.com> * more nullified Signed-off-by: raver119 <raver119@gmail.com> * ismax nullified Signed-off-by: raver119 <raver119@gmail.com> * rollback ismax nullification Signed-off-by: raver119 <raver119@gmail.com> * synchronized cublas handle use on per-device basis Signed-off-by: raver119 <raver119@gmail.com> * hiding method from jcpp Signed-off-by: raver119 <raver119@gmail.com> * get rid of test assigns in DeclarableOp Signed-off-by: raver119 <raver119@gmail.com> * get rid of assigns Signed-off-by: raver119 <raver119@gmail.com> * proper deviceId is back Signed-off-by: raver119 <raver119@gmail.com> * include fixed Signed-off-by: raver119 <raver119@gmail.com> Co-authored-by: Alex Black <blacka101@gmail.com>	2020-03-20 08:49:28 +03:00
raver119	77244f5496	avg/max pooling3d bp fixed (#323 ) Signed-off-by: raver119 <raver119@gmail.com>	2020-03-16 18:17:42 +03:00
raver119	4cf2afad2b	benchmarks fixes (#321 ) * bunch of small fixes Signed-off-by: raver119 <raver119@gmail.com> * validation for legacy random op Signed-off-by: raver119 <raver119@gmail.com> * get rid of test Signed-off-by: raver119 <raver119@gmail.com>	2020-03-16 10:31:06 +03:00
Oleh	e7a995e959	Tanh backpropagation mkldnn implementation (#308 ) * libnd4j first step of tanh_bp operation implementation on mkldnn Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j optimize several places and added test case for tanh_bp Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j minor corrections and renaming, added one more test case Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j missed mkldnn data format definition Signed-off-by: Oleg <oleg.semeniv@gmail.com>	2020-03-13 19:01:00 +03:00
Yurii Shyrma	e42b4e96c3	correct output empty shapes deducing in split op (#311 ) * - correct output empty shapes deducing in split op Signed-off-by: Yurii <iuriish@yahoo.com> * java test fixed Signed-off-by: raver119 <raver119@gmail.com> * - split broadcast::exec function on individual functions corresponding to switch arg Signed-off-by: Yurii <iuriish@yahoo.com> * - split broadcast::exec _int and _bool function on individual functions corresponding to switch arg Signed-off-by: Yurii <iuriish@yahoo.com> Co-authored-by: raver119 <raver119@gmail.com>	2020-03-12 18:25:54 +03:00
Oleh	41bde8f885	Softmax BP mkldnn implementation (#301 ) * libnd4j mkldnn softmax_bp operation implementation and integration, 2 tests added, need some refactoring and code clean up and more testing with different input shapes Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j softmax_bp update, code refactoring, etc Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j merge master, fixed typos, minor tweaks, code clean up Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j integrate mkldnnUtils helpers in other mkldnn operations Signed-off-by: Oleg <oleg.semeniv@gmail.com>	2020-03-12 18:25:29 +03:00
Yurii Shyrma	58550b7c98	[WIP] Shyrma coords (#305 ) * - provide faster index2coords function for cpu Signed-off-by: Yurii <iuriish@yahoo.com> * - new faster index2coords function is introduced into cpu code Signed-off-by: Yurii <iuriish@yahoo.com> * - replace long long coordinates with int coordinates Signed-off-by: Yurii <iuriish@yahoo.com> * - add missed reload of coords2index function Signed-off-by: Yurii <iuriish@yahoo.com> * - reststart jenkins Signed-off-by: Yurii <iuriish@yahoo.com> * - rollback changes in convolutions.cu and addBias.cu Signed-off-by: Yurii <iuriish@yahoo.com>	2020-03-11 16:21:59 +03:00
Yurii Shyrma	6aaca58506	Shyrma broadcast (#302 ) * - profiling TrueBroadcastHelper Signed-off-by: Yurii <iuriish@yahoo.com> * - further improving of TrueBroadcastHelper Signed-off-by: Yurii <iuriish@yahoo.com> * - further profiling of broadcast op Signed-off-by: Yurii <iuriish@yahoo.com> * - implementation of broadcastShapeHelper which inserts unities in shapes of arrays to be broadcasted Signed-off-by: Yurii <iuriish@yahoo.com> * - provide additional method in ConstantShapeHelper class for deducing broadcast shapes with unities Signed-off-by: Yurii <iuriish@yahoo.com> * - provide new NativeOps helpers for usual and true broadcast methods Signed-off-by: Yurii <iuriish@yahoo.com> * enable bert profiler Signed-off-by: raver119 <raver119@gmail.com> * - delete unnessesary tests Signed-off-by: Yurii <iuriish@yahoo.com> Co-authored-by: raver119 <raver119@gmail.com>	2020-03-10 16:29:09 +03:00
Oleh	c3223dbc7a	Improve ResultSet usage in libnd4j (#281 ) * libnd4j profiling DeclarableOp and Tests by replacing return ResultSet pointer by instance Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j profiling semantic change in tests cases Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j some corrections to make new ResultSet semantic works, fixed one test Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j more tests fixes Signed-off-by: Oleg <oleg.semeniv@gmail.com> * - correct copy and move assignment operators of ResultSet class Signed-off-by: Yurii <iuriish@yahoo.com> Co-authored-by: Yurii <iuriish@yahoo.com> Co-authored-by: raver119 <raver119@gmail.com>	2020-03-10 07:42:50 +03:00
raver119	57210b936c	Revert "OpenMP Threads execution (#297 )" (#299 ) This reverts commit dd2043ef485a96de3d64563f1eed4c50a8cd72f7.	2020-03-09 08:22:49 +03:00
raver119	dd2043ef48	OpenMP Threads execution (#297 ) * omp threads backported Signed-off-by: raver119 <raver119@gmail.com> * omp scalar reduce Signed-off-by: raver119 <raver119@gmail.com> * timing Signed-off-by: raver119 <raver119@gmail.com> * timing Signed-off-by: raver119 <raver119@gmail.com> * minor tweaks Signed-off-by: raver119 <raver119@gmail.com> * minor tweaks Signed-off-by: raver119 <raver119@gmail.com> * namespace change Signed-off-by: raver119 <raver119@gmail.com> * num_threads Signed-off-by: raver119 <raver119@gmail.com> * one minor fix Signed-off-by: raver119 <raver119@gmail.com>	2020-03-09 08:21:44 +03:00
Oleh	ead5162c97	Tanh mkldnn implementation (#296 ) * libnd4j first step of softmax mkldnn implementation Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j raw implementation of mkldnn softmax Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j merge master and added softmax to MklDnnTests Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j some corrections for softmax mkldnn Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j merge branch, fixed problem with negative axis, fixed dnnl::memory::format_tag selection, test cases added Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j minor corrections to avoid risk connected with negative axis usage Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j fixed windows builds, added switcher to use mkldnn sofmax version only for 3D, 4D, 5D, 6D arrays Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j fixed dataType selection per request Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j fix for mac and windows builds Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j builds fix Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j first spet of elementwize tanh implementation on mkldnn Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j fixed typo in error message for softmax MKLDNN, test case added, implementation of tanh on MKLDNN, need supported DataType testing Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j several fixes for tanh and temporary performance test added Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j fixed mkldnn platform loader for tanh Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j MklDnn tanh removed unsupported data types, removed performance test case, added more appropriate equivalence test case, code clean up Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j fixed problem with empty input case for MklDnn tanh and softmax Signed-off-by: Oleg <oleg.semeniv@gmail.com>	2020-03-06 17:11:22 +03:00
Oleh	4d81af9fe9	Softmax operation implementation for mkldnn (#286 ) * libnd4j first step of softmax mkldnn implementation Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j raw implementation of mkldnn softmax Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j merge master and added softmax to MklDnnTests Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j some corrections for softmax mkldnn Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j merge branch, fixed problem with negative axis, fixed dnnl::memory::format_tag selection, test cases added Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j minor corrections to avoid risk connected with negative axis usage Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j fixed windows builds, added switcher to use mkldnn sofmax version only for 3D, 4D, 5D, 6D arrays Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j fixed dataType selection per request Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j fix for mac and windows builds Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j builds fix Signed-off-by: Oleg <oleg.semeniv@gmail.com>	2020-03-04 19:36:42 +03:00
Yurii Shyrma	78934c17ad	profiling of stack and unstack ops (#261 ) * - profiling of stack and unstack ops Signed-off-by: Yurii <iuriish@yahoo.com> * - fix bug in cpu concat op Signed-off-by: Yurii <iuriish@yahoo.com> * - correction of cuda stack and unstack Signed-off-by: Yurii <iuriish@yahoo.com> * - change shape.h method which operates with unity dimensions strides Signed-off-by: Yurii <iuriish@yahoo.com> * - rearrange stack tests Signed-off-by: Yurii <iuriish@yahoo.com> * - correct evaluation of smallest stride for moving through contiguous axis Signed-off-by: Yurii <iuriish@yahoo.com> * - forgot to update signature of function strideOverContigAxis in cuda concat and split ops Signed-off-by: Yurii <iuriish@yahoo.com> * - remove ShapeUtils::shapeAsString method applied before input arrays validations Signed-off-by: Yurii <iuriish@yahoo.com> * - further removing of ShapeUtils::shapeAsString Signed-off-by: Yurii <iuriish@yahoo.com> * - take sub-array shapeIndo/offset calculation out of NDArray class - add possibility of contiguous memory copy in execTransformAny op if opNum == assign Signed-off-by: Yurii <iuriish@yahoo.com> * - correct test_empty_scatter_2 in EmptyTests.cpp Signed-off-by: Yurii <iuriish@yahoo.com> * - profiling of slice op Signed-off-by: Yurii <iuriish@yahoo.com> * - get rid of contiguous memcpy for some cases in concat and split ops Signed-off-by: Yurii <iuriish@yahoo.com> * - forgot to declare oid nd4j::SpecialMethods<T>::splitCpuGeneric Signed-off-by: Yurii <iuriish@yahoo.com> * - correct typo in calculation of threads in cuda split op Signed-off-by: Yurii <iuriish@yahoo.com> * - forgot to correct another set of threads variables in split cuda ops Signed-off-by: Yurii <iuriish@yahoo.com> * - further conflicts resolving Signed-off-by: Yurii <iuriish@yahoo.com> Co-authored-by: raver119 <raver119@gmail.com>	2020-03-03 07:32:37 +03:00
raver119	c54cdaab75	full bert graph (#282 ) Signed-off-by: raver119 <raver119@gmail.com>	2020-03-02 18:14:32 +03:00
raver119	63fa3c2ef3	libnd4j polishing (#273 ) * initial set of include changes Signed-off-by: raver119 <raver119@gmail.com> * one more tweak Signed-off-by: raver119 <raver119@gmail.com> * few more rearrangements Signed-off-by: raver119 <raver119@gmail.com> * few more rearrangements Signed-off-by: raver119 <raver119@gmail.com> * few more rearrangements Signed-off-by: raver119 <raver119@gmail.com> * cuda includes rearrangements Signed-off-by: raver119 <raver119@gmail.com> * java update Signed-off-by: raver119 <raver119@gmail.com> * = namespace changed to sd - few CMake variables renamed with SD_ prefix Signed-off-by: raver119 <raver119@gmail.com> * java update Signed-off-by: raver119 <raver119@gmail.com> * LoopKind minor fix Signed-off-by: raver119 <raver119@gmail.com> * few more changes Signed-off-by: raver119 <raver119@gmail.com> * few more changes Signed-off-by: raver119 <raver119@gmail.com> * few more changes Signed-off-by: raver119 <raver119@gmail.com> * sanitizer is optional now Signed-off-by: raver119 <raver119@gmail.com> * dev tests updated Signed-off-by: raver119 <raver119@gmail.com> * few more changes Signed-off-by: raver119 <raver119@gmail.com> * last update Signed-off-by: raver119 <raver119@gmail.com> * java update Signed-off-by: raver119 <raver119@gmail.com>	2020-03-02 12:49:41 +03:00
shugeo	330a69d4e2	Shugeo solve ls (#203 ) * lstsq op. Initial commit. Signed-off-by: shugeo <sgazeos@gmail.com> * Least squares linear problem solve op (lstsq). Cpu draft implementation. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed shape routine and tests. Signed-off-by: shugeo <sgazeos@gmail.com> * Added test for lstsq op. Signed-off-by: shugeo <sgazeos@gmail.com> * Rectification for lstsq op implementation. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected test to avoid numerical inconsistensy. Signed-off-by: shugeo <sgazeos@gmail.com> * Added prints for check computing. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected tests to use evalueate facility instead. Signed-off-by: shugeo <sgazeos@gmail.com> * CPU implementation of MatrixSolveLs op and tests. Signed-off-by: shugeo <sgazeos@gmail.com> * Added cuda implementation for helpers with lstsq op. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored tests for lstsq op. Signed-off-by: shugeo <sgazeos@gmail.com> * Added processing for empty inputs. Signed-off-by: shugeo <sgazeos@gmail.com> * Merged tests. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored lstsq op for fast case. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed test. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored lstsq op. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed some issues with solve. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed lstsq op to avoid erros. Signed-off-by: shugeo <sgazeos@gmail.com> * Added kernel for giagonal factor Signed-off-by: shugeo <sgazeos@gmail.com> * lstsq wrapper and triangular_solve fixed * Added proper processing empty inputs and test. Signed-off-by: shugeo <sgazeos@gmail.com> * SequenceMask test * Build fixed * Added proper processing of empty inputs with solve op. Signed-off-by: shugeo <sgazeos@gmail.com> * Mapping added * Added check of input shapes with solve op. Signed-off-by: shugeo <sgazeos@gmail.com> * Added a couple of tests for lstsq op and minor changes with cuda helper for one.' Signed-off-by: shugeo <sgazeos@gmail.com> * Tests on * Refactored test for lstsq op. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed test * Added another approach for lstsq op aka solve_ls. Signed-off-by: shugeo <sgazeos@gmail.com> * Finished cpu part for solve_ls op helpers. * Added helper for low triangular matrix inversion. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored alternate solve_ls cpu implementation. Signed-off-by: shugeo <sgazeos@gmail.com> * Removed alternate approach for solve_ls op. Added multithreading with matrix inversion. Signed-off-by: shugeo <sgazeos@gmail.com> * Assert fixed * Refactored multithreading for inverse matricies. Signed-off-by: shugeo <sgazeos@gmail.com> Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>	2020-02-28 11:37:26 +03:00
raver119	241ed05c64	VariableSpace uses unordered maps as well (#270 ) Signed-off-by: raver119 <raver119@gmail.com>	2020-02-24 21:58:23 +03:00
shugeo	1bb3ae4b03	Shugeo unordered map (#256 ) * Refactored usage of std::map to std::unordered_map instead. Signed-off-by: shugeo <sgazeos@gmail.com> * Eliminated crash with wrong ShapeDescriptor hash. Signed-off-by: shugeo <sgazeos@gmail.com> * Eliminated crash with TadDescriptor hash. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored Stash hash. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored hashes. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored TadDescriptor hash and top_k mapping. * Refactored hashes for ShapeDescriptor and TadDescriptor classes. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored hash for ConstantDescriptor and ShapeDescriptor classes. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed map using with cuda platform. Signed-off-by: shugeo <sgazeos@gmail.com> * - few rearrangements for hash functions - shared openblas allowed Signed-off-by: raver119 <raver119@gmail.com> * exports Signed-off-by: raver119 <raver119@gmail.com> * exports Signed-off-by: raver119 <raver119@gmail.com> * Stash reverted to std::map Signed-off-by: raver119@gmail.com <raver119@gmail.com> * Added additional test. Signed-off-by: shugeo <sgazeos@gmail.com> * different maps for different compilers Signed-off-by: raver119 <raver119@gmail.com> * missing include Signed-off-by: raver119 <raver119@gmail.com> * fix the leak Signed-off-by: raver119 <raver119@gmail.com> Co-authored-by: raver119 <raver119@gmail.com>	2020-02-24 07:51:01 +03:00
Oleh	0748c7e7c2	Oleh broadcast4d (#257 ) * libnd4j raw implementation of native broadcast for special cases Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j fixed bugs for special case of 4D loop broadcast, add some tests, need more testing and discussion Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j added 3D and 5D cases support and tests, need testing with different orders Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j correctd case selection for broadcast 3,4,5D loops, fixed several places for more stable behavior, clean up Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j minor corrections to avoid some risks in strides selection, added tests and rename some variables Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j optimize usage the stride selection for all loops in separate ShapeUtils method copyCertainStridesFromShapeInfo, merge master Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j remove per request several tests for 3D, 4D and 5D broadcast loops Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j removed some loac changes that had not been sync with serve playground, turn on new loops usage	2020-02-21 07:46:05 +03:00
Yurii Shyrma	f7a9190407	profiling of concat op (both cuda and cpu) (#151 ) * - profiling of concat op (both cuda and cpu) Signed-off-by: Yurii <iuriish@yahoo.com> * better comparison for large concat Signed-off-by: raver119 <raver119@gmail.com> * - further improving of concat op Signed-off-by: Yurii <iuriish@yahoo.com> * some loggin Signed-off-by: raver119 <raver119@gmail.com> * - add possibility to verify presence of trailing unities in shape and set strides/ews correspondingly - restrict second simple case in concat op to c order only Signed-off-by: Yurii <iuriish@yahoo.com> * - move concat op to specials_single.cpp file Signed-off-by: Yurii <iuriish@yahoo.com> * - get rid of second concat op declaration in transforms.cpp file Signed-off-by: Yurii <iuriish@yahoo.com> Co-authored-by: raver119 <raver119@gmail.com>	2020-02-20 21:19:01 +03:00
raver119	da39a63c9b	one more bert-like test Signed-off-by: raver119 <raver119@gmail.com>	2020-02-18 11:20:38 +03:00
Yurii Shyrma	22c7aa9acf	Shyrma mkl matmul (#250 ) * - provide matmul code based on mkl api Signed-off-by: Yurii <iuriish@yahoo.com> * - correct typo in mkl matmul op Signed-off-by: Yurii <iuriish@yahoo.com> * - take into account empty arrays in mkl matmul op Signed-off-by: Yurii <iuriish@yahoo.com> * - fix bug in mkl matmul and group all matmul tests in one file Signed-off-by: Yurii <iuriish@yahoo.com>	2020-02-18 08:58:01 +03:00
raver119	2698fbf541	Broadcast perf improvements (#248 ) * broadcast as scalar edge case Signed-off-by: raver119 <raver119@gmail.com> * missing return Signed-off-by: raver119 <raver119@gmail.com> * few fixes Signed-off-by: raver119 <raver119@gmail.com> * one more fix Signed-off-by: raver119 <raver119@gmail.com> * no need for lambdas Signed-off-by: raver119 <raver119@gmail.com>	2020-02-17 16:25:09 +03:00
raver119	f9d51b7278	More compilation units (#246 ) * weird edge case Signed-off-by: raver119 <raver119@gmail.com> * weird edge case Signed-off-by: raver119 <raver119@gmail.com> * get rid of it Signed-off-by: raver119 <raver119@gmail.com> * crop and resize reorganized Signed-off-by: raver119 <raver119@gmail.com> * restore test Signed-off-by: raver119 <raver119@gmail.com> * remove unwanted unit refs in cmale Signed-off-by: raver119 <raver119@gmail.com>	2020-02-17 10:23:05 +03:00
Yurii Shyrma	011c272fde	Shyrma transpose (#244 ) * - provide contiguous strides for ouput in transpose op Signed-off-by: Yurii <iuriish@yahoo.com> * - provide contiguous strides for output in permute op Signed-off-by: Yurii <iuriish@yahoo.com> * - take into account empty shapes properly in transpose/permute op Signed-off-by: Yurii <iuriish@yahoo.com>	2020-02-17 08:04:28 +03:00
raver119	9e3c1b02b1	Perf improvements (#242 ) * initial commit Signed-off-by: raver119 <raver119@gmail.com> * meh Signed-off-by: raver119 <raver119@gmail.com> * better ExpandDims impl Signed-off-by: raver119 <raver119@gmail.com> * better Squeeze impl Signed-off-by: raver119 <raver119@gmail.com> * better Softmax impl Signed-off-by: raver119 <raver119@gmail.com> * one test disabled Signed-off-by: raver119 <raver119@gmail.com> * more accurate impl Signed-off-by: raver119 <raver119@gmail.com> * - GraphProfiler now prints full shapeInfo instead of shape - softmax typo fix Signed-off-by: raver119 <raver119@gmail.com>	2020-02-14 16:20:31 +03:00
Oleh	6e6289b6b9	Oleh bert multiply true broad cast (#239 ) * libnd4j trueBroadcast rank 3 row implementation of special case Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j rule clarify for second special case for all tests pass * libnd4j parallel_tad loop switch on in special case * libnd4j more general case for special case 2, need additional testing Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j more general case for trueBroadcast special cases added * libnd4j minor corrections and clean up * libnd4j one more minor fix Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j fixed check point to support all Y common vector representations in first special case for trueBroadcast Signed-off-by: Oleg <oleg.semeniv@gmail.com> Co-authored-by: raver119 <raver119@gmail.com>	2020-02-14 12:04:38 +03:00
raver119	3de3cd8277	R119 tests (#238 ) * one small test Signed-off-by: raver119 <raver119@gmail.com> * one small test Signed-off-by: raver119 <raver119@gmail.com> * bert test Signed-off-by: raver119 <raver119@gmail.com> * Graph FlowPath fix Signed-off-by: raver119 <raver119@gmail.com> * - GraphProfiler tweaks - NodeProfile now includes shapes Signed-off-by: raver119 <raver119@gmail.com> * RELU_layer inplace tweak Signed-off-by: raver119 <raver119@gmail.com> * meh Signed-off-by: raver119 <raver119@gmail.com> * identity tweaks Signed-off-by: raver119 <raver119@gmail.com> * bert result validation Signed-off-by: raver119 <raver119@gmail.com> * - bunch of Shape ops have inplace exec forbidden now - Legacy ops have inplace exec disabled by default now Signed-off-by: raver119 <raver119@gmail.com> * ffast-math enabled Signed-off-by: raver119 <raver119@gmail.com> * ffast-math enabled Signed-off-by: raver119 <raver119@gmail.com> * allow some legacy ops to be inplace Signed-off-by: raver119 <raver119@gmail.com> * disable -fast_math Signed-off-by: raver119 <raver119@gmail.com> * disable expensive test for cuda Signed-off-by: raver119 <raver119@gmail.com>	2020-02-13 20:59:35 +03:00
Yurii Shyrma	fe47f52896	Oleh tenzor mmul (#231 ) * Libnd4j: TensorMMul backprop op #8174, raw implementation Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: TensorMMul backprop op #8174 merge master and some corrections Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: TensorMMul backprop op #8174 algorithm update, need testing, sync with master * Libnd4j: TensorMMul backprop op #8174 fixed incorrect B axes calculation Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: TensorMMul backprop op #8174 optimize axes identification and fix bug of indeces overlapping, added first test. need testing with different shapes Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: TensorMMul backprop op #8174 some fixes and improvements need more testing Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: TensorMMul backprop op #8174 fixed order of matrix multiply Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: TensorMMul backprop op #8174 fixed issue of incorrect axes definition, add tests based on TF, need additional testing for case dLdC not equal 1 Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: TensorMMul backprop op #8174 fixed scalar case add test Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: TensorMMul backprop op #8174 fixed bp algorithm, axes definition, need some mode testing with different orders combination f,c; c,f f,f and add some checks for inputs Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: TensorMMul backprop op #8174 some checks and corrections added tests, exists the problem with different input orders support A-f B-c and A-f B-f Signed-off-by: Oleg <oleg.semeniv@gmail.com> * Libnd4j: TensorMMul backprop op #8174 sync master Signed-off-by: Oleg <oleg.semeniv@gmail.com> * - correct bug in MmulHelper::tensorDot(a, b, c, axes_a, axes_b,permutForC) Signed-off-by: Yurii <iuriish@yahoo.com> * Libnd4j: TensorMMul backprop op #8174 code clean up and refactoring Signed-off-by: Oleg <oleg.semeniv@gmail.com> * - add check for linspase ordered permutations in ShapeUtils::evalShapeForTensorDot Signed-off-by: Yurii <iuriish@yahoo.com> * - provide additional code in shape::reshape stuff in order to reduce amount of allocation/copy operations during reshaping procedure Signed-off-by: Yurii <iuriish@yahoo.com> * - further work on problem of wrong shape evaluation during permute/reshape procedures Signed-off-by: Yurii <iuriish@yahoo.com> * - still looking for bug reason in reshape/permute stuff Signed-off-by: Yurii <iuriish@yahoo.com> * - correct bug in transform cuda native ops Signed-off-by: Yurii <iuriish@yahoo.com> * - correct bug in NDArray::assign Signed-off-by: Yurii <iuriish@yahoo.com> * - remove old shape::reshape stuff Signed-off-by: Yurii <iuriish@yahoo.com> * - add possibility to disable copy of old buffer to new buffer during reshape operation in NDArray class Signed-off-by: Yurii <iuriish@yahoo.com> * - correct bug in tensorDot which had to do with wrong pointers assigments Signed-off-by: Yurii <iuriish@yahoo.com> Co-authored-by: Oleh <oleg.semeniv@gmail.com>	2020-02-13 20:33:54 +03:00
shugeo	f0c684020f	Shugeo resize area fix4 (#229 ) * Fixed a couple of issues with resize_area op. Signed-off-by: shugeo <sgazeos@gmail.com> * Added additional test for alternate params for resize_area testing. Signed-off-by: shugeo <sgazeos@gmail.com>	2020-02-12 19:02:42 +03:00
Oleh	11cb561045	Oleh true broadcast opt (#234 ) * libnd4j trueBroadcast special case Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j fix trueBroadcast special case Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j special case of TrueBroadcastHelper Signed-off-by: Oleg <oleg.semeniv@gmail.com> * libnd4j trueBroadCast special case and test * libnd4j minor changes sync with master * libnd4j changes to TrueBroadcastHelper.hpp per require Signed-off-by: Oleg <oleg.semeniv@gmail.com>	2020-02-12 14:12:17 +03:00
raver119	f3fa4fd632	C++ NPY (#233 ) * import .npy files in C++ Signed-off-by: raver119 <raver119@gmail.com> * reuse existing method Signed-off-by: raver119 <raver119@gmail.com> * add CPU_FEATURES to static lib Signed-off-by: raver119 <raver119@gmail.com>	2020-02-12 12:38:10 +03:00
raver119	8a0d5e3b97	Compilation units (#224 ) * - TrueBroadcastHelper split into multiple compilation units - legacy gemm.cpp disabled Signed-off-by: raver119 <raver119@gmail.com> * - IndexReduce int32/int64 split into multiple compilation units Signed-off-by: raver119 <raver119@gmail.com> * - Reduce3 ops split into multiple compilation units Signed-off-by: raver119 <raver119@gmail.com>	2020-02-09 19:48:32 +03:00
Abdelrauf	bead656feb	Initial performance improvement for Bias Add and etc #8556 (#217 ) * Initial performance improvement for Bias Add, loop coords helpers and increment aligned parallel threading Signed-off-by: AbdelRauf <rauf@konduit.ai> * One more test for Rauf Signed-off-by: raver119 <raver119@gmail.com> * disable couple of perf tests Signed-off-by: raver119 <raver119@gmail.com> Co-authored-by: raver119 <raver119@gmail.com>	2020-02-08 15:31:30 +03:00
Yurii Shyrma	948646b32d	Shyrma mkl test (#211 ) * - provide nhwc format in mkl conv ops Signed-off-by: Yurii <iuriish@yahoo.com> * - corrections in mkl conv3d Signed-off-by: Yurii <iuriish@yahoo.com> * - corrections in mkl batchnorm Signed-off-by: Yurii <iuriish@yahoo.com> * - corrections in mkl maxpooling2d Signed-off-by: Yurii <iuriish@yahoo.com> * - add format format_tag::any to outputs in mkl conv ops Signed-off-by: Yurii <iuriish@yahoo.com> * - complete corrections in mkl conv ops Signed-off-by: Yurii <iuriish@yahoo.com> * - add test for comparison of execution speeds of mkl conv2d op with different weights format Signed-off-by: Yurii <iuriish@yahoo.com> * - take into account order f in mkl conv ops Signed-off-by: Yurii <iuriish@yahoo.com>	2020-02-06 21:12:54 +03:00
shugeo	5ae40f6e38	Shugeo sequence mask fix2 (#216 ) * Fixed sequence_mask op and tests. Signed-off-by: shugeo <sgazeos@gmail.com> * Cuda fix for sequence_mask op. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed sequence_mask op for both platforms and tests. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed solve and triangular_solve for more than 2D for adjoint cases. Signed-off-by: shugeo <sgazeos@gmail.com> * Added adjoint solve test again. Signed-off-by: shugeo <sgazeos@gmail.com> * Added a set of tests for triangual_solve and generic solve ops. Signed-off-by: shugeo <sgazeos@gmail.com> * Added a pair tests for triangular_solve Signed-off-by: shugeo <sgazeos@gmail.com> * Added tests for triangular_solve op. Signed-off-by: shugeo <sgazeos@gmail.com>	2020-02-06 21:06:50 +03:00
shugeo	41ff907bc6	Shugeo solve linear (#191 ) * linear equations systems solve op. Initial commit. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed compiling issues. Signed-off-by: shugeo <sgazeos@gmail.com> * Linear equations systems solve. The next stage commit. Signed-off-by: shugeo <sgazeos@gmail.com> * Added test for linear equations systems solve operation. Signed-off-by: shugeo <sgazeos@gmail.com> * Added additional test and fixed lower matrix retrievance. * Implementation for solve of the systems of linear equations." Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored permutation generation. Signed-off-by: shugeo <sgazeos@gmail.com> * Added restore for permutations batched with cuda helper for solve op. Signed-off-by: shugeo <sgazeos@gmail.com> * Finished cuda implementation for solve op helpers. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored cpu helpers for solve op. Signed-off-by: shugeo <sgazeos@gmail.com> * Fix gtest output on Windows * Fixed issue with permutation matrix for cuda implementation. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed issue with permutation matrix for cpu implementation. Signed-off-by: shugeo <sgazeos@gmail.com> * Eliminated waste comments. Signed-off-by: shugeo <sgazeos@gmail.com> * LinearSolve added * Mapping added * Javadoc added * Refactored implementation of triangular_solve helpers and tests for solve matrix equations generally. Signed-off-by: shugeo <sgazeos@gmail.com> * Added a test for solve op. Signed-off-by: shugeo <sgazeos@gmail.com> * Solve test added * Fix for TF import Co-authored-by: Serhii Shepel <9946053+sshepel@users.noreply.github.com> Co-authored-by: raver119 <raver119@gmail.com> Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>	2020-02-04 08:59:11 +03:00
raver119	9bb5798cac	Null arrays fix (#208 ) * don't skip null arrays Signed-off-by: raver119 <raver119@gmail.com> * one test tweak Signed-off-by: raver119 <raver119@gmail.com>	2020-02-02 23:14:00 +03:00
Oleh	d52e67209e	Oleh convert (#200 ) * StringUtils for utf convertor raw implementation of all possible combinations, need to be add counter of bytes per symbol for any type and add api to call convertors and store data Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor more corrections to support convertors Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor some corrections and bug fixes, need review to discuss how to add multi-threading Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 some corrections to move to multi-threading, add one test need discussion data inputs/outputs array presentation, need discussion the way of multi-threading * StringUtils for utf convertor #8613 tests added some corrections to optimize build Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 some corrections and code clean up Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 code clean up and optimize usage, need update ndarray factory before replace std usage Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 some staff to integrate converters into NDArrayFactory, update tests and add some functionality Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 minor corrections and bug fix before discussion * StringUtils for utf convertor #8613 some fixes and tets * StringUtils for utf convertor #8613 some more staff to support different unicode Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 fix linking bug * StringUtils for utf convertor #8613 corrected several tests as defaults for string ndarray changed * StringUtils for utf convertor #8613 replace some incorrect implementation, revert some test changes, need sync before testing * StringUtils for utf convertor #8613 fixed several thing that were badly implemented yesterday, need optimization, testing (before testing have to be add support of u32 and u16 buffer visualization) * StringUtils for utf convertor #8613 fixed to support u16 and u32, and convertor in ndarray, fix buffer print, etc Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 merge master and sync with server Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 some correction for string cast, need print check only asci support Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 merge master, remove copies and add cast, need test, refactoring according review and clean up * StringUtils for utf convertor #8613 fixed cast and copy issues Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 fixed cuda and update tests * StringUtils for utf convertor #8613 integration into NdArray, fix several tests for build pass, refactoring, etc * - avoid ambiguity of NDArray ctrs overloading in some tests Signed-off-by: Yurii <iuriish@yahoo.com> * StringUtils for utf convertor #8613 NDArray string constructors added, updated NDArrayFactory, refactoring unicode and tests, etc Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 fixed cuda build and test, refactoring and void* added to some functions Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 void* integration, removed copy operation, refactoring, added tests for NDArray string constructors, etc Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 several more fixes, improvements and updates Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 master merge, code clean up and optimization before review Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 minor fixes string element size define Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 revert last changes as mistake Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 fixed NDArray constructor build problem, remove order from string factory, fixed order use for factory via project, added catch of incorrect sync in cast of arrays to data types, fixed e method for strings, etc Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 added javacpp hack, added multi-threading, minor corrections in license agreement Signed-off-by: Oleg <oleg.semeniv@gmail.com> * StringUtils for utf convertor #8613 windows builds fix, as "sting" is not treated as utf8 Signed-off-by: Oleg <oleg.semeniv@gmail.com> Co-authored-by: Yurii Shyrma <iuriish@yahoo.com>	2020-01-31 16:30:49 +03:00
raver119	1ab86d1306	Range op data type (#204 ) * - range op now accepts dargs - dargs now can be in signature Signed-off-by: raver119 <raver119@gmail.com> * range dtype java side Signed-off-by: raver119 <raver119@gmail.com> * linspace fix Signed-off-by: raver119 <raver119@gmail.com> * lin_space fix for scalar outputs Signed-off-by: raver119 <raver119@gmail.com>	2020-01-31 10:45:40 +03:00
raver119	5d98cfcf47	Configurable DataType for ops (#201 ) * initial commit Signed-off-by: raver119 <raver119@gmail.com> * - one more test for OneHot with dtype - one more signature in Nd4j Signed-off-by: raver119 <raver119@gmail.com> * ones_as/zeros_as now accept dtype Signed-off-by: raver119 <raver119@gmail.com> * one more test Signed-off-by: raver119 <raver119@gmail.com> * - more updates for configurable data types - ones_as/zeros_as java side + tests Signed-off-by: raver119 <raver119@gmail.com> * few c++ tests fixed Signed-off-by: raver119 <raver119@gmail.com> * few more changes around DArgs Signed-off-by: raver119 <raver119@gmail.com>	2020-01-30 18:46:12 +03:00
raver119	ba961c7601	DataTypes & FlatBuffers (#197 ) * flatbuffers version upgrade Signed-off-by: raver119 <raver119@gmail.com> * flatbuffers version upgrade java side Signed-off-by: raver119 <raver119@gmail.com> * flatbuffers dependency version upgrade java side Signed-off-by: raver119 <raver119@gmail.com> * MKLDNN version upgrade Signed-off-by: raver119 <raver119@gmail.com> * DArgs first pass Signed-off-by: raver119 <raver119@gmail.com> * signatures first pass Signed-off-by: raver119 <raver119@gmail.com> * signatures second pass Signed-off-by: raver119 <raver119@gmail.com> * signatures third pass Signed-off-by: raver119 <raver119@gmail.com> * signatures third pass Signed-off-by: raver119 <raver119@gmail.com> * signatures fourth pass Signed-off-by: raver119 <raver119@gmail.com> * signatures fifth pass Signed-off-by: raver119 <raver119@gmail.com> * flatbuffers UI version upgrade java side Signed-off-by: raver119 <raver119@gmail.com> * flatbuffers ui update Signed-off-by: raver119 <raver119@gmail.com> * flatbuffers downgrade Signed-off-by: raver119 <raver119@gmail.com> * flatbuffers downgrade java side Signed-off-by: raver119 <raver119@gmail.com>	2020-01-30 10:07:24 +03:00
Yurii Shyrma	7a7ee4b021	Shyrma cudnn (#192 ) * - implementation of cudnn batchnorm_bp op Signed-off-by: Yurii <iuriish@yahoo.com> * - testing and fixing bugs in batchnorm_bp based on cudnn api Signed-off-by: Yurii <iuriish@yahoo.com> * - move pooling mkl code and delete some unnecessary files Signed-off-by: Yurii <iuriish@yahoo.com> * - implementation and testing cudnn pooling2d ops (avg/max, ff/bp) Signed-off-by: Yurii <iuriish@yahoo.com> * - implementation and testing cudnn pooling 3d (ff/bp) ops Signed-off-by: Yurii <iuriish@yahoo.com> * - provide ff step in case of cudnn maxpool3d_bp op Signed-off-by: Yurii <iuriish@yahoo.com> * - remove half type from set of supported types in mkl dpethwise conv op Signed-off-by: Yurii <iuriish@yahoo.com> * - bring back cudaStreamSynchronize in batchnorm and pooling cudnn ops Signed-off-by: Yurii <iuriish@yahoo.com> Co-authored-by: raver119 <raver119@gmail.com>	2020-01-28 18:23:07 +03:00
shugeo	99a54829c2	Shugeo resize area fix2 (#181 ) * Added test for issue with resize_area op. Signed-off-by: shugeo <sgazeos@gmail.com> * Added a pair of tests for resize_are op. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored resize_area kernel to avoid shared memory overflow. Signed-off-by: shugeo <sgazeos@gmail.com> * Eliminated prints with tests. Signed-off-by: shugeo <sgazeos@gmail.com> * ignore bad test Signed-off-by: raver119 <raver119@gmail.com> * Fixed test with resize_area. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed test for float constants. Signed-off-by: shugeo <sgazeos@gmail.com> Co-authored-by: raver119 <raver119@gmail.com>	2020-01-24 20:55:25 +03:00
raver119	99cfb88933	two tests fixes Signed-off-by: raver119 <raver119@gmail.com>	2020-01-24 15:26:46 +03:00
raver119	5d69069177	[WIP] Memory limits (#167 ) * initial commit Signed-off-by: raver119 <raver119@gmail.com> * one more initial commit Signed-off-by: raver119 <raver119@gmail.com> * additional initial commit Signed-off-by: raver119 <raver119@gmail.com> * subsequent initial commit Signed-off-by: raver119 <raver119@gmail.com> * initial commit testing Signed-off-by: raver119 <raver119@gmail.com> * initial commit per device Signed-off-by: raver119 <raver119@gmail.com> * initial commit per group Signed-off-by: raver119 <raver119@gmail.com> * initial commit for cuda Signed-off-by: raver119 <raver119@gmail.com> * initial commit for cuda + few missed lines Signed-off-by: raver119 <raver119@gmail.com> * initial commit for cuda + missed includes Signed-off-by: raver119 <raver119@gmail.com> * initial commit for cuda + one more missed include Signed-off-by: raver119 <raver119@gmail.com> * initial commit shouldn't count host mem as dev0 in cuda Signed-off-by: raver119 <raver119@gmail.com> * initial commit that tracks HOST group limits for CUDA Signed-off-by: raver119 <raver119@gmail.com> * initial commit with some Environment changes Signed-off-by: raver119 <raver119@gmail.com> * initial commit with more Environment changes Signed-off-by: raver119 <raver119@gmail.com> * initial commit with maxMasterThreads fix Signed-off-by: raver119 <raver119@gmail.com> * initial commit with maxMasterThreads fix Signed-off-by: raver119 <raver119@gmail.com> * initial commit without maxMasterThreads exception Signed-off-by: raver119 <raver119@gmail.com> * initial commit without Nd4jULong in Environment Signed-off-by: raver119 <raver119@gmail.com> * add sleep and more iterations for OOM cases Signed-off-by: raver119 <raver119@gmail.com> * limits propagation from java side Signed-off-by: raver119 <raver119@gmail.com> * - consume ErrorCode every time - one test for memory limits Signed-off-by: raver119 <raver119@gmail.com> * unordered_map Signed-off-by: raver119 <raver119@gmail.com> * unordered_map Signed-off-by: raver119 <raver119@gmail.com> * unordered_map Signed-off-by: raver119 <raver119@gmail.com> * RSub op mapping fixed Signed-off-by: raver119 <raver119@gmail.com> * typo fixed Signed-off-by: raver119 <raver119@gmail.com> * one bad test fixed Signed-off-by: raver119 <raver119@gmail.com>	2020-01-24 10:11:09 +03:00
shugeo	2717b25931	Shugeo qr (#153 ) * Added qr op implementation. Initial version. * Fixed doc for qr op. Signed-off-by: shugeo <sgazeos@gmail.com> * Implementation of QR decomposition. CPU platform version. * Added a pair of tests for qr op testing. Signed-off-by: shugeo <sgazeos@gmail.com> * QR implementation. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected norm using. * Properly calculated intermediate results with QR decomposition. * Another step to implement QR algorithm by householder. * Cpu implementatio for QR decomposition. The first working edition. * Corrected test to QR decomposition. * Added tad multithreading with QR implementation. * Finished cpu implementation for QR decomposition helpers. * Refactored tests and improved multithreading. * Refactored QR cpu implementation and update cuda implementation helpers. * Cuda QR helper implementation. The first working edition. * Eliminated waste prints. * Restore multithreading with cuda implementation. * Ops names corrected * Refactored qr op helpers to optimize. Signed-off-by: shugeo <sgazeos@gmail.com> * Eliminated waste manual ticking. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored memory allocation to avoid waste memory usage. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored matrixMinor method both for cuda and cpu platforms. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored method of vmul to use raw buffers instead type conversion. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored temporary array of matricies. Signed-off-by: shugeo <sgazeos@gmail.com> Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> Co-authored-by: raver119 <raver119@gmail.com>	2020-01-22 13:59:36 +03:00
shugeo	815a2908af	Shugeo solve triangular (#173 ) * Added implementation of the triangular_solve op. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed compilation issues. Signed-off-by: shugeo <sgazeos@gmail.com> * Added verification of input data and helpers facilities for triangular_solve op.' Signed-off-by: shugeo <sgazeos@gmail.com> * Added cpu implementation for triangular_solve helpers. * Added tests and implementation for upper triangular equations. Signed-off-by: shugeo <sgazeos@gmail.com> * Added a pair of cases to tests. Signed-off-by: shugeo <sgazeos@gmail.com> * Added multithreading with cpu helpers for triangular_solve op. Signed-off-by: shugeo <sgazeos@gmail.com> * Added cuda implementation of triangular_solve op helpers. Signed-off-by: shugeo <sgazeos@gmail.com> * Finished cuda implementation of triangular_solve helpers and tests. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed copyright marks. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected grammar errors with doc and error messages. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored matricies processing with triangular_solve cuda helper implementation. Signed-off-by: shugeo <sgazeos@gmail.com> * Added triangular_solve wrapper * Fixed mapping * Added processing for adjoint with cpu helpers of triangular_solve op implementation. Signed-off-by: shugeo <sgazeos@gmail.com> * Added implementation for adjoint routine with cuda platform. Signed-off-by: shugeo <sgazeos@gmail.com> * Added multithreading with adjoint routine for cpu platform. Signed-off-by: shugeo <sgazeos@gmail.com> Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>	2020-01-22 10:48:03 +03:00
shugeo	e50b285c2c	Shugeo resize area (#162 ) * Added implementation for resize_area op. Initial commit. * Added implementation of resize_area op. Initial revision. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected resizeArea functor call. Signed-off-by: shugeo <sgazeos@gmail.com> * Implementation of resize_area. Cpu platform helpers. Signed-off-by: shugeo <sgazeos@gmail.com> * Implementation for resize_area helpers. The first part revision. Signed-off-by: shugeo <sgazeos@gmail.com> * Added a set of tests for resize_area op. Signed-off-by: shugeo <sgazeos@gmail.com> * Cuda implementation for resize_area. Initial approach. Signed-off-by: shugeo <sgazeos@gmail.com> * Adding multithreading for resize_area algorithm. Signed-off-by: shugeo <sgazeos@gmail.com> * Cuda implementation of resize_area helpers. Shared memory approach. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored resizeAreaKernel with cuda implementation. * Eliminated compilation errors. * ResizeArea helpers for cuda platform. The first working revision. Signed-off-by: shugeo <sgazeos@gmail.com> * Added test for batched resize_area op testing. Signed-off-by: shugeo <sgazeos@gmail.com> * Implementation of resize_are for cuda platform and tests. Signed-off-by: shugeo <sgazeos@gmail.com> * Fixed multithreading with resize_area op helper. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected copyright marks with sources. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected copyright mark for resize_area op implementation. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected copyright mark for parity ops header. Signed-off-by: shugeo <sgazeos@gmail.com> * Corrected typo in strings and so on with image resize ops. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored resize_area helpers and multithreading. Signed-off-by: shugeo <sgazeos@gmail.com> * Added ResizeArea wrapper * Added test with align_corners and fixed shape processing with only int args given for output size. Signed-off-by: shugeo <sgazeos@gmail.com> * Added test * TF mapping for ResizeArea * Fixed implementation issues with resize_area op for both platforms. Signed-off-by: shugeo <sgazeos@gmail.com> * Refactored image resizer struct to use flexible types for ints and floats. Signed-off-by: shugeo <sgazeos@gmail.com> * Improved multithreading with resizeAreaKernel launch. Signed-off-by: shugeo <sgazeos@gmail.com> * Use asynchronical memory copying with cuda platform image resize allocations. Signed-off-by: shugeo <sgazeos@gmail.com> Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>	2020-01-22 10:46:33 +03:00

1 2 3 4 5

204 Commits