cavis

Author	SHA1	Message	Date
Yurii Shyrma	7a90a31cfb	Shyrma deconv3 (#69 ) * - profiling cuda kernels for vol2col and im2col Signed-off-by: Yurii <iuriish@yahoo.com> * - correct addBias helper Signed-off-by: Yurii <iuriish@yahoo.com> * - correct mkl dilation formula and switch off mkl api for dilation deconvolutions Signed-off-by: Yurii <iuriish@yahoo.com>	2019-11-21 21:17:30 +02:00
raver119	064a56ccf1	Few fixes (#66 ) * skip legacy transforms execution in case of empty input arrays Signed-off-by: raver119 <raver119@gmail.com> * - BroadcastBool ops now accept extraParams to make MatchCondition possible - TrueBroadcastHelper now uses samediff::threads Signed-off-by: raver119 <raver119@gmail.com> * java side Signed-off-by: raver119 <raver119@gmail.com> * trigger jenkins Signed-off-by: raver119 <raver119@gmail.com> * update LessThanOrEqual opNum mapping Signed-off-by: raver119 <raver119@gmail.com> * update LessThanOrEqual opNum mapping Signed-off-by: raver119 <raver119@gmail.com>	2019-11-21 15:43:03 +03:00
shugeo	dc0036f2c6	Shugeo image resize bicubic (#56 ) * Added implementation files for image_resize and resize_bicubic ops. * Image resize and image.resize_bicubic ops implementation. Initial revision. * Finished with infrastructure development for image.resize_bilinear op and image_resizo op implementation. * Refactored resize methods. * Added processing for Mitchelcubic algorithm. * Added check for input/output sizes. * Added int and float types for crop_and_resize op. * Refactored crop_and_resize output type check. * Added helper for bicubic interpolation as TF v.1 does. * Added TF v.1 bicubic helper for cuda platform. * Added cached class for bicubic algorithm. * Refactored cuda implementation for crop_and_resize helper to use proper output type. * Added facilities for bicubic interpolation. * Portion bicubic interpolation from TF. * Added tests for resize_bilinear testing. * Working implementation of bicubic interpolation and tests. * Refactored routines with image_resize bicubic op helper. * Refactored code with coding standards. * Refactored cpu helpers for resize_bicubic op. * Refactored bicubic helpers. * Added bicubic resize facilities. * Implementing cuda kernels for bicubic interpolation. Implementation step. * Cuda implementation of resize_bicubic op helper. * Refactor image.resize_bicubic op helpers. * Refactored helpers for resize_bicubic. Added error checking with cuda implementation. * Refactored cuda implementation of resize_bicubic op helper. The first working revision. * Cuda arch implementation for resize_bicubic op helper. Full working single-threaded revision. * Intermediate bicubic interpolation helper for cuda. * Refactored cpu helper for resize_bicubic. * Multithreaded cuda implementation for resize_bicubic. * Fixed merge issues. * Refactored nlp helpers. * Replicated resize_bicubic for 3D also. * Eliminated waste comments of unused code. * Eliminated waste comments with unused code. * Eliminated waste template definitions. * Eliminated waste debug code. * Eliminated waste comments. * Fixed multithreading with helpers. * Fixed test suites for float and double in float point input lists. * Fixed usage of reshape with 3D/4D on resizes. * Final fixes. * Fixed resize_neighbor op problem.	2019-11-20 21:11:04 +02:00
shugeo	13e5c0a280	Shugeo release fix1 (#61 ) * Added a pair of tests for failed ops. * Fixed cpu helper for draw_bounding_boxes op. * Refactored implementation of draw_bounding_boxes op to full conform with TF. * Improved multithreading with draw_bounding_boxes op cuda helper. * Eliminated log messages. * Changed logging with draw_bounding_boxes op helper and tests. * Resize_biliear with 3D input allowed. * Refactored 3D input acception with resize_bilinear op. * And another improvement. * Refactored reshape of input/output for resize_bilinear. * Improvements final. * Finished with 3D replication for image.resize_bilinear/_nearest_neighbor. * Added copyrights for TF code. * Using new form of multithreading for cpu implementation. * Fixed shape error. * Added multithreaded with batches on crop_and_resize functor. * Refactored multithreading with crop_and_resize and draw_bounding_boxes.	2019-11-20 13:37:48 +02:00
raver119	59e955cedc	- MKL-DNN version upgrade to 1.1.x (#62 ) - MKL-DNN namespace changes to match DNNL rename Signed-off-by: raver119 <raver119@gmail.com>	2019-11-20 13:23:08 +03:00
raver119	7898f3c0cc	fix for is_increasing/non_decreasing ops for empty input case (#63 ) Signed-off-by: raver119 <raver119@gmail.com>	2019-11-20 11:12:15 +03:00
Yurii Shyrma	66b84b38cf	Shyrma mmul (#58 ) * - get rid of some copy procedures in mmulHelper ops Signed-off-by: Yurii <iuriish@yahoo.com> * - further work on embedding cuda api for batched gemm (cublasGemmBatchedEx) in our mmulHelper class Signed-off-by: Yurii <iuriish@yahoo.com> * - further work on cuda batched gamm api Signed-off-by: Yurii <iuriish@yahoo.com> * - write own cuda kernel performing batched gemm Signed-off-by: Yurii <iuriish@yahoo.com> * missing include in MmulHelper Signed-off-by: raver119 <raver119@gmail.com> * - forgot to keep in code previous correct kernels for mmulNxN, since it may happen that new onw will fail for some reason in future Signed-off-by: Yurii <iuriish@yahoo.com> * disable old tensordot Signed-off-by: raver119 <raver119@gmail.com> * - rewrite cuda kernels for usualGemm and usualGemv Signed-off-by: Yurii <iuriish@yahoo.com> * - profiling mmul helpers Signed-off-by: Yurii <iuriish@yahoo.com> * - prints to check shapes were added Signed-off-by: Yurii <iuriish@yahoo.com> * - correct type of output array Cin mmulNxN Signed-off-by: Yurii <iuriish@yahoo.com> * - take into account possible nans in C array Signed-off-by: Yurii <iuriish@yahoo.com> * slightly change numThreads message Signed-off-by: raver119 <raver119@gmail.com> * - make corrections in accordance to given notes in pr review Signed-off-by: Yurii <iuriish@yahoo.com>	2019-11-19 15:39:36 +02:00
raver119	bbd59a3537	fake quant dtype validation fix (#60 ) Signed-off-by: raver119 <raver119@gmail.com>	2019-11-19 12:53:52 +03:00
raver119	1780dcc883	[WIP] Small fixes here and there (#50 ) * one range test Signed-off-by: raver119 <raver119@gmail.com> * few Context convenience singatures Signed-off-by: raver119 <raver119@gmail.com> * one more range test Signed-off-by: raver119 <raver119@gmail.com> * "range" "fix" Signed-off-by: raver119 <raver119@gmail.com> * adjuct_contrast_v2 now allows scale factor to be provided via input_variable Signed-off-by: raver119 <raver119@gmail.com> * adjust_contrast now allows scale factor as variable too Signed-off-by: raver119 <raver119@gmail.com> * bitcast shape tests Signed-off-by: raver119 <raver119@gmail.com> * BitCast import dtype added Signed-off-by: raver119 <raver119@gmail.com> * few more BitCast signatures Signed-off-by: raver119 <raver119@gmail.com>	2019-11-15 17:04:29 +03:00
Yurii Shyrma	62d8e0d409	- make agreement between our and mkl api dilation/padding formulas (#47 ) Signed-off-by: Yurii <iuriish@yahoo.com>	2019-11-14 20:21:22 +03:00
Alex Black	47d19908f4	Various fixes (#43 ) * #8172 Enable DL4J MKLDNN batch norm backward pass Signed-off-by: AlexDBlack <blacka101@gmail.com> * #8382 INDArray.toString() rank 1 brackets / ambiguity fix Signed-off-by: AlexDBlack <blacka101@gmail.com> * #8308 Fix handful of broken links (inc. some in errors) Signed-off-by: AlexDBlack <blacka101@gmail.com> * Unused dependencies, round 1 Signed-off-by: AlexDBlack <blacka101@gmail.com> * Unused dependencies, round 2 Signed-off-by: AlexDBlack <blacka101@gmail.com> * Unused dependencies, round 3 Signed-off-by: AlexDBlack <blacka101@gmail.com> * Small fix Signed-off-by: AlexDBlack <blacka101@gmail.com> * Uniform distribution TF import fix Signed-off-by: AlexDBlack <blacka101@gmail.com>	2019-11-14 19:38:20 +11:00
raver119	48df1acdfb	[WIP] ThreadPool (#8 ) This PR removes OpenMP use in 95% of cases	2019-11-13 17:04:59 +03:00
Yurii Shyrma	0eda1e733e	Shyrma bnorm bp (#41 ) Batchnorm backprop mkldnn	2019-11-12 11:58:48 +03:00
raver119	cd961727bb	[WIP] perf tests (#40 ) * special maxpool test Signed-off-by: raver119 <raver119@gmail.com> * special maxpool test Signed-off-by: raver119 <raver119@gmail.com>	2019-11-11 17:45:59 +03:00
raver119	929c1dc5c7	- new NDArrayFactory scalar constructor - minor tweak in randomuniform - one more test Signed-off-by: raver119 <raver119@gmail.com>	2019-11-08 08:49:41 +03:00
shugeo	679e42199a	Shugeo strided slice bp fix2 (#33 ) * Fixed crash and restored brocken functionality for strided slice. * Added comments for strided_slice_bp main step.	2019-11-07 13:44:02 +03:00
raver119	4276e63054	one more test Signed-off-by: raver119 <raver119@gmail.com>	2019-11-07 08:49:27 +03:00
shugeo	08853c7829	Shugeo random uniform int (#30 ) * Corrected randomuniform declaration. * Refactored uniform distribution for both cuda and cpu platforms. * Refactored uniform distribution and tests. * Fixed type usage with indices. * Refactored uniform distribution implementation and tests to full conform with TF implementation. * Refactored gamma function to use type util method. * Copyright changes and fixes with ConstantHelper. * Added error checking on allocate cuda device memory and operations.	2019-11-06 12:49:27 +02:00
shugeo	9124974e3b	Fixed crash with strided_slice_bp op and tests. (#29 )	2019-11-05 12:49:15 +02:00
shugeo	7b14a9f603	Gamma and Poisson distributions (#27 ) * Added implementation for random_gamma op. * Added implementation for random_poisson op and support classes. * Added helpers for random_poisson and random_gamma ops. * Implementation of random_poisson. The first working edition. * Implementation of random_poisson. Parallelized working edition. * Implementation of random_gamma. Parallelized working edition with alpha only. * Added cuda implementation for helper of poisson distribution. * Corrected shape calculation with random_gamma and tests. * Finished cpu implementation for gamma distribution. * Finished cuda implementation for random_gamma op. * Refactored cpu helpers for random_gamma and random_poisson ops. * Refactored cuda helpers for gamma and poisson distribution. * Refactored cuda helper for gamma distribution. * Refactored cpu helper for random_poisson op. * Refactored cpu helper for random_gamma op.	2019-11-04 15:42:28 +02:00
Yurii Shyrma	0cdb5750e0	Shyrma concat (#24 ) * - provide possibility to pass axis as last input array in concat op - corrcect sumation in bias_add_bp op for NHWC case Signed-off-by: Yurii <iuriish@yahoo.com> * - write code for deconv2d op based on mkl dnn api * no unsafe math Signed-off-by: raver119 <raver119@gmail.com> * no unsafe math Signed-off-by: raver119 <raver119@gmail.com> * - get rid of e<> and p<> methods in svd helper Signed-off-by: Yurii <iuriish@yahoo.com> * - provide mkl api support for deconvolution 3d Signed-off-by: Yurii <iuriish@yahoo.com> * - write deconv2d_bp based on mkl api Signed-off-by: Yurii <iuriish@yahoo.com> * - write deconv3d_bp based on mkl api Signed-off-by: Yurii <iuriish@yahoo.com> * - testing and fixing deconv based on mkl api Signed-off-by: Yurii <iuriish@yahoo.com> * - remove dilation form conv2d/3d mkl Signed-off-by: Yurii <iuriish@yahoo.com> * - minor changes Signed-off-by: Yurii <iuriish@yahoo.com> * - further corrections of deconv ops based on mkl dnn api Signed-off-by: Yurii <iuriish@yahoo.com> * - provide deconv2d_tf based on mkl dnn api Signed-off-by: Yurii <iuriish@yahoo.com> * - add minor corrections required by reviewer Signed-off-by: Yurii <iuriish@yahoo.com>	2019-11-03 12:37:19 +02:00
shugeo	95f7ad7b94	Shugeo suppression overlaps (#9 ) * Added non_max_suppression_overlaps op and tests. * Refactored implementation of non_max_suppression_overlaps. * Refactoring of implementation of non_max_suppression_overlaps op. * Refactoring of implementation of non_max_suppression op. * Fixed portion error. * Added cuda frontends for image suppression ops. * Eliminated crash with cuda arch on image.non_max_suppression_overlaps op. * Improved implementation of image_suppression helper for cpu platform. * The generic approach of non_max_suppression_overlaps op helper with cuda platform. * Working cuda implementation of helper non_max_suppression_overlaps op. * Eliminated waste comments. * Improved implementations for both platforms * Refactored cuda implementation of image.non_max_suppression_overlaps op helper. * Improved cuda implementation of non_max_suppression op helper. * Refactored cuda implementation of image.non_max_suppression_overlaps op helper. * Improved cuda implementation of image.non_max_suppression_overlaps op helper. * Added modifications into cuda implementation for image suppression overlaps op. * Correct queue emulation with cuda implementation of non_max_suppression_overlaps op. * Prefinal stage of cuda implementation of non_max_suppression_overlaps. * Worked cuda implementation of non_max_suppresion_overlaps helper. * Fixed return to proper thread. * Improvements for cuda implementation of image.non_max_suppression_overlaps op helper. * Fixed implementation issues with non_max_suppression_overlaps on cuda platform. * Fixed skip for non_max_suppression_overlaps on cuda platform. * Finalize implementation of image_suppression helper and tests. * Cosmetic changes only.	2019-10-30 13:43:45 +02:00
Yurii Shyrma	029a69a835	Shyrma bn mkl bp (#14 ) * - write code for new batchnorm backprop Signed-off-by: Yurii <iuriish@yahoo.com> * - testing batchnorm backprop Signed-off-by: Yurii <iuriish@yahoo.com> * - write code for batchnorm backprop based on mkl dnn api Signed-off-by: Yurii <iuriish@yahoo.com> * - testing and fixing bugs in batchnorm_bp mkl dnn Signed-off-by: Yurii <iuriish@yahoo.com> * - made corrections required by reviewer Signed-off-by: Yurii <iuriish@yahoo.com> * - change name in java wrapper for batchnorm op Signed-off-by: Yurii <iuriish@yahoo.com>	2019-10-26 14:14:21 +03:00
Alexander Stoyakin	f31661e13b	Merge pull request #7 from KonduitAI/asto_nd4s_10172019 KDTree optimization	2019-10-23 12:11:25 +03:00
Yurii	70bd925abd	- write 2 versions of new lstmLayer: one is based on own code, second uses mkl dnn api	2019-10-17 20:44:52 +03:00
shugeo	478a0c1f97	Added igamma and igammac broadcastable ops implementations and tests.	2019-10-16 14:02:53 +03:00
shugeo	d5b352273d	Implementation of cuda kernel for fake_quant_with_min_max_vars_per_channels op. Final revision.	2019-10-10 16:51:29 +03:00
shugeo	c13e945a96	Fixed fake_quant_with_min_max_vars op and tests.	2019-10-10 13:23:11 +03:00
shugeo	352f1eee80	Implemented fake_quant_with_min_max_per_channel helper for cpu platform. The first approach.	2019-10-09 21:39:59 +03:00
shugeo	3a89e51811	Added tests for fake_quant_with_min_max_vars_per_channel op.	2019-10-09 13:38:18 +03:00
shugeo	30a8af566c	The first working implementation of cuda kernel for draw_bounding_boxes op helper.	2019-10-08 13:45:18 +03:00
shugeo	6cf3a8fa9c	Refactored cpu implementatio and added cuda aproach.	2019-10-07 17:51:07 +03:00
shugeo	78443ffebf	Working implementation of draw_bounding_boxes op for cpu.	2019-10-07 15:04:44 +03:00
shugeo	53a2ebddbe	Added test and helpers for draw_bounding_boxes op both cpu and cuda related.	2019-10-04 20:46:26 +03:00
shugeo	8f70b4441f	draw_bounding_boxes op implementation. Inital revision.	2019-10-04 18:32:21 +03:00
shugeo	908e4c4912	Added implementation for divide_no_nan op and tests.	2019-10-04 10:29:15 +03:00
raver119	cff26f13c5	Revert "Implement divide_no_nan op."	2019-10-03 20:25:52 +03:00
shugeo	6eaca179d6	Implement divide_no_nan op.	2019-10-03 18:22:17 +03:00
shugeo	130ee25682	Implemented compare_and_bitpack op.	2019-10-03 10:57:48 +03:00
shugeo	75ad3c8153	Fixed test names.	2019-10-02 19:05:26 +03:00
shugeo	a27e61553a	Added tests and fixed op name.	2019-10-02 15:04:28 +03:00
shugeo	1575c704ae	Added implementation for adjust_contrast_v2 op and tests.	2019-10-01 11:44:27 +03:00
shugeo	e06dfb5dcc	Implementation of adjust_contrast op.	2019-09-30 18:24:12 +03:00
raver119	78bca543a8	missed include for MklDnnTests run without mkldnn Signed-off-by: raver119 <raver119@gmail.com>	2019-09-12 10:49:01 +03:00
AlexDBlack	a66e03355e	Merge remote-tracking branch 'fork/master'	2019-09-12 12:20:57 +10:00
raver119	98e2814879	Platform helpers (#8216 ) * platform helpers draft Signed-off-by: raver119 <raver119@gmail.com> * typo Signed-off-by: raver119 <raver119@gmail.com> * disable platform cmake Signed-off-by: raver119 <raver119@gmail.com> * another draft Signed-off-by: raver119 <raver119@gmail.com> * mkldnn convolution refactored Signed-off-by: raver119 <raver119@gmail.com> * minor tweaks Signed-off-by: raver119 <raver119@gmail.com> * one more safety check Signed-off-by: raver119 <raver119@gmail.com> * prototype works Signed-off-by: raver119 <raver119@gmail.com> * meh Signed-off-by: raver119 <raver119@gmail.com> * force static library mode for mkldnn Signed-off-by: raver119 <raver119@gmail.com> * - ismax fix - experimental arg fix - don't enforce openblas on Apple hardware Signed-off-by: raver119 <raver119@gmail.com> * bunch of small fixes Signed-off-by: raver119@gmail.com <raver119@gmail.com> * declare concurrent Signed-off-by: raver119@gmail.com <raver119@gmail.com> * - MKLDNN version upgrade to 1.0.2 - avgpool2d/maxpool2d APIs update Signed-off-by: raver119 <raver119@gmail.com> * - avgpool2d_bp/maxpool2d_bp APIs update Signed-off-by: raver119 <raver119@gmail.com> * - conv2d/batchnorm APIs update Signed-off-by: raver119 <raver119@gmail.com> * - lrn/conv2d_bp/conv3d/conv3d_bp APIs update Signed-off-by: raver119 <raver119@gmail.com> * all ops converted to MKLDNN 1.x Signed-off-by: raver119 <raver119@gmail.com> * bunch of tweaks Signed-off-by: raver119 <raver119@gmail.com> * namespace for platform helpers Signed-off-by: raver119 <raver119@gmail.com> * make sure platform helpers aren't opimized out Signed-off-by: raver119 <raver119@gmail.com> * build cpu_features on x86 systems Signed-off-by: raver119 <raver119@gmail.com> * build cpu_features on x86 systems Signed-off-by: raver119 <raver119@gmail.com> * more of cpu_features Signed-off-by: raver119 <raver119@gmail.com> * - mkldnn removed from java - cpu_features checks in CpuNDArrayFactory Signed-off-by: raver119 <raver119@gmail.com> * F16C definition renamed Signed-off-by: raver119 <raver119@gmail.com> * some mkldnn rearrangements Signed-off-by: raver119 <raver119@gmail.com> * check supported instructions before doing anything Signed-off-by: raver119 <raver119@gmail.com> * typo Signed-off-by: raver119 <raver119@gmail.com> * missied impl Signed-off-by: raver119 <raver119@gmail.com> * BUILD_PIC option Signed-off-by: raver119 <raver119@gmail.com> * conv2d fix Signed-off-by: raver119 <raver119@gmail.com> * avgpool3d fix Signed-off-by: raver119 <raver119@gmail.com> * avgpool3d_bp fix Signed-off-by: raver119 <raver119@gmail.com> * avgpool2d_bp leak fix Signed-off-by: raver119 <raver119@gmail.com> * avgpool3d_bp leak fix Signed-off-by: raver119 <raver119@gmail.com> * maxpool bp leaks fixed Signed-off-by: raver119 <raver119@gmail.com> * printf removed Signed-off-by: raver119 <raver119@gmail.com> * batchnorm fix Signed-off-by: raver119 <raver119@gmail.com> * AVX warning/error polishing Signed-off-by: AlexDBlack <blacka101@gmail.com> * Fix Signed-off-by: AlexDBlack <blacka101@gmail.com> * More polish Signed-off-by: AlexDBlack <blacka101@gmail.com> * Polish Signed-off-by: AlexDBlack <blacka101@gmail.com> * remove previous MKL-DNN support layer Signed-off-by: raver119 <raver119@gmail.com> * avx2 tweak Signed-off-by: raver119 <raver119@gmail.com> * allow static for apple Signed-off-by: raver119@gmail.com <raver119@gmail.com> * exclude mkldnn in one more place Signed-off-by: raver119 <raver119@gmail.com> * exclude mkldnn in one more place Signed-off-by: raver119 <raver119@gmail.com> * restore OPENBLAS_PATH use Signed-off-by: raver119 <raver119@gmail.com> * add runtime check for avx/avx2 support Signed-off-by: raver119 <raver119@gmail.com> * convolution_auto Signed-off-by: raver119 <raver119@gmail.com> * Add logic for helper argument * minor test fix Signed-off-by: raver119 <raver119@gmail.com> * few tweaks Signed-off-by: raver119 <raver119@gmail.com> * few tweaks Signed-off-by: raver119 <raver119@gmail.com> * skip OpTracker props for non-x86 builds Signed-off-by: raver119 <raver119@gmail.com> * linux arm isn't x86 :) Signed-off-by: raver119 <raver119@gmail.com> * avx-512 Signed-off-by: raver119 <raver119@gmail.com> * CUDA presets fix Signed-off-by: raver119 <raver119@gmail.com> * BUILD_PIC Signed-off-by: raver119 <raver119@gmail.com> * prefetchw for avx2 Signed-off-by: raver119 <raver119@gmail.com> * BUILD_PIC again Signed-off-by: raver119 <raver119@gmail.com>	2019-09-11 21:50:28 +03:00
raver119	589401477d	[WIP] bunch of improvements (#257 ) * - profiling bias_add op - add some docementation Signed-off-by: Yurii <yurii@skymind.io> * - minor change Signed-off-by: Yurii <yurii@skymind.io> * - provide addBias cuda kernel Signed-off-by: Yurii <yurii@skymind.io> * - improve shape::getIndexOfffset and change its signature Signed-off-by: Yurii <yurii@skymind.io> * - same as previous Signed-off-by: Yurii <yurii@skymind.io> * - improve and change signature in some shape:: stuff which has to do with calculation of offsets for array elements Signed-off-by: Yurii <yurii@skymind.io> * - minor changes in flatten Signed-off-by: Yurii <shyrma@skymind.io> * - add function shape::getIndexOffsetOrdered Signed-off-by: Yurii <shyrma@skymind.io> * - correct shape::getIndexOffsetOrdered() Signed-off-by: Yurii <shyrma@skymind.io> * - move getIndexOffsetOrdered to flatten.h header in order to isolate this function Signed-off-by: Yurii <shyrma@skymind.io>	2019-09-11 20:12:09 +03:00
raver119	1de9fb218e	- bits_hamming_distance dtype fix (#8208 ) - DataTypeUtils::asString fixe + new dtypes added Signed-off-by: raver119 <raver119@gmail.com>	2019-09-06 08:59:05 +03:00
raver119	46f8c58502	- bits_hamming_distance dtype fix - DataTypeUtils::asString fixe + new dtypes added Signed-off-by: raver119 <raver119@gmail.com>	2019-09-06 08:57:53 +03:00
Yves Quemener	d1e9b34982	libnd4j: Remove some unused declarations in unit tests (#8202 )	2019-09-05 15:04:36 +09:00

1 2 3

111 Commits