Commit Graph

110 Commits (972fae60dc3afb39c08fcf05cb80ce2f170c1365)

Author SHA1 Message Date
raver119 972fae60dc
Update master (#8511)
* cleaned up bert iterator tests (#110)

Signed-off-by: eraly <susan.eraly@gmail.com>

* Various pre-release fixes (#111)

* Various fixes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fix default dtypes for MaxPoolWithArgmax

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Small pre-release tweak (#112)

* Log UI address on launch as in previous Play-based UI

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Logging level tweak for UI

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* http not https

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* datavec python ensure host (#113)

* ensure host

* one more host ensure

* info->debug

* [WIP] reverse improvements (#115)

* initial commit

Signed-off-by: raver119 <raver119@gmail.com>

* reverse draft

Signed-off-by: raver119 <raver119@gmail.com>

* reverse kernel

Signed-off-by: raver119 <raver119@gmail.com>

* reverse kernel

Signed-off-by: raver119 <raver119@gmail.com>

* 2 micro fixes

Signed-off-by: raver119 <raver119@gmail.com>

* Shugeo resize fix5 (#102)

* Refactored resize images ops to use TF-like bool args as input.

* Refactored helpers for cpu implementation of resize_bilinear and resize_nearest_neighbor ops.

* Refactored cuda implementation for image.resize_bilinear and image.resize_nearest_neighbor ops helpers.

* Refactored nearest_neighbor resize op.

* Added a pair of tests for special case of resize_bilinear algorithm.

* Fixed issue with resize_bilinear op.

* Refactored cpu implementation for helpers with resize_nearest_neighbor op.

* Final fixed for resize ops to conform TF v.1.5

* Refactored cuda helpers for resize_neares_neighbor op.

* Fixed resize_bilinear to accept proper data.

* Fixed issue with non-float input for resize_bilinear op.

* Refactored cuda helper for resize_bilinear to proper process non-float inputs.

* Added tests for resize_bilinear to int inputs.

* Fixed ResizeBilinear wrapper

* Tests fixed

* Fixed float and bool constant to avoid overflow for some kind of compilers.

* Corrected float constants with float data type.

* Added f suffix for float constants.

* Corrected float constant to avoid overflow with initializing lists.

* Corrected float initializing list with float input.

* Corrected bool constant with initalizing list.

* Corrected float and bool values with initializing lists.

* Fixed wrong constant.

* Fixed issue with 1x1 input picture for resize.

* ResizeBilinear default values on import fix

Signed-off-by: raver119 <raver119@gmail.com>
2019-12-06 11:10:44 +03:00
Alex Black 578a5abb68 DNNL/MKLDNN dilated causal conv1d + betainc (#103)
* - add padding calculation in same mode in causal conv1d op for right mkl paddings

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct causal condition in mkldnnUtils.cpp

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct some code which caused additional round errors is betainc op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - put float in place of template parameter in nan assign in betainc op

Signed-off-by: Yurii <iuriish@yahoo.com>
2019-12-04 14:50:17 +03:00
shugeo 190575196c Refactored pad and mirror_pad ops to conform with TF. (#100) 2019-12-03 15:06:38 +03:00
Yurii Shyrma 1f5e15b541 Shyrma adjust (#98)
* - add possibility of passing scalar-array as input parameter for scale factor in adjust hue/contrast/saturation ops
- correct typo in function which calculates regularized incomplete beta integral

Signed-off-by: Yurii <iuriish@yahoo.com>

* - fix bug in betainc cuda kernel

Signed-off-by: Yurii <iuriish@yahoo.com>

* - start working on implementation of digamma function

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further work on digamma function (cpu)

Signed-off-by: Yurii <iuriish@yahoo.com>

* - testing and fixing bugs in digamma op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - make correction n cuda kernel for polyGamma

Signed-off-by: Yurii <iuriish@yahoo.com>

* - remove unnecessary stuff from betaInc cuda kernel

Signed-off-by: Yurii <iuriish@yahoo.com>

* - resolve conflicts in DeclarableOpsTests3.cpp after master branch has been merged

Signed-off-by: Yurii <iuriish@yahoo.com>

* - restore id number of Not opertion in legacy_ops.h

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct padding calculation in mkl dnn conv1d causal

Signed-off-by: Yurii <iuriish@yahoo.com>

* restore empty check in adjust_contrast_v2

Signed-off-by: raver119 <raver119@gmail.com>
2019-12-03 09:40:45 +03:00
shugeo 1e9ff114aa Shugeo atomic tests (#97)
* Added atomic tests for atomicAdd, atomicSub and atomicDiv.

* Fixed atomicAdd for 16bit ints.

* Fixed atomicMul for 16 floats.

* Eliminated waste prints.

* Fixed problems with double type on matrix inverse helepers.

* Eliminated commented wrong code.

* Refactored atomicMul for 16bit types.

* few more minor tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* Fixed fake_quant_with_min_max_vars_per_channel args processing.
2019-12-02 21:40:54 +03:00
raver119 25b3cd9b80
[WIP] CUDA tests (#95)
* one more CI test

Signed-off-by: raver119 <raver119@gmail.com>

* export additional symbols

Signed-off-by: raver119 <raver119@gmail.com>

* few more tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* one more tweak for linux

Signed-off-by: raver119 <raver119@gmail.com>

* fix dtype in few tests

Signed-off-by: raver119 <raver119@gmail.com>

* missing sync and memset in couple of tests

Signed-off-by: raver119 <raver119@gmail.com>

* copy step for libnd4j cuda

Signed-off-by: raver119 <raver119@gmail.com>

* no-op on empty for adjust hue/contrast/saturation

Signed-off-by: raver119 <raver119@gmail.com>

* CUDA_VERBOSE Off

Signed-off-by: raver119 <raver119@gmail.com>

* BroadcastBool fix + few tests

Signed-off-by: raver119 <raver119@gmail.com>

* trigger jenkins

Signed-off-by: raver119 <raver119@gmail.com>

* trigger jenkins

Signed-off-by: raver119 <raver119@gmail.com>

* - ignore couple of warnings
- remove redundant compiler options

Signed-off-by: raver119 <raver119@gmail.com>
2019-12-02 21:37:21 +03:00
raver119 4ada65b384
[WIP] MSVC-related tests fixes (#88)
* fix narrowing down cast

Signed-off-by: raver119 <raver119@gmail.com>

* trigger jenkins

Signed-off-by: raver119 <raver119@gmail.com>

* few more fixes for MSVC and Windows

Signed-off-by: raver119 <raver119@gmail.com>

* few more fixes for MSVC and Windows

Signed-off-by: raver119 <raver119@gmail.com>

* few more fixes for MSVC and Windows

Signed-off-by: raver119 <raver119@gmail.com>

* few more fixes for MSVC and Windows

Signed-off-by: raver119 <raver119@gmail.com>

* few more tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* few more tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* few more tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* few more tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* few more tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* - few more tweaks
- tensormmul dtype validation

Signed-off-by: raver119 <raver119@gmail.com>

* - few more tweaks
- batched gemm dtype validation

Signed-off-by: raver119 <raver119@gmail.com>

* - few more tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* - few more tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* - few more tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* - few more tweaks

Signed-off-by: raver119 <raver119@gmail.com>
2019-11-30 16:02:07 +03:00
shugeo dc66a52bc7 [WIP] Shugeo release fixes4 (#91)
* Fixed fake_quant_with_min_max_vars op.

* Refactored bitcast op.

* bad linspace removed

Signed-off-by: raver119 <raver119@gmail.com>

* Corrected tests for bitcast op.

* Eliminated debug prints.

* one fix

Signed-off-by: raver119 <raver119@gmail.com>

* one fix

Signed-off-by: raver119 <raver119@gmail.com>

* Added a pair of comments.
2019-11-29 16:05:08 +03:00
Yurii Shyrma d19eeaec52 Shyrma casual conv1d (#90)
* - add causal mode of padding to convolutions

Signed-off-by: Yurii <iuriish@yahoo.com>

* - add additional tests for causal conv1d

Signed-off-by: Yurii <iuriish@yahoo.com>

* - add causal mode for cuda conv kernels

Signed-off-by: Yurii <iuriish@yahoo.com>

* Java side of Conv1D changes

Signed-off-by: raver119 <raver119@gmail.com>

* Add Conv1DDerivative op

Signed-off-by: Alex Black <blacka101@gmail.com>

* Causal Conv1D gradient checks

Signed-off-by: Alex Black <blacka101@gmail.com>

* Tweaks

Signed-off-by: Alex Black <blacka101@gmail.com>

* - add causal padding mode to conv2d_bp

Signed-off-by: Yurii <iuriish@yahoo.com>

* More thorough causal conv1d tests

Signed-off-by: Alex Black <blacka101@gmail.com>
2019-11-29 14:14:30 +03:00
shugeo 009007120b Shugeo_release_fixes3 (#81)
* Implementation for non_max_suppression_v3 was added. Initial version

* Added check for overcome threshold.

* Added definition for V3 method.

* java remapping for NonMaxSuppressionV3

Signed-off-by: raver119 <raver119@gmail.com>

* Fixed proporly processing of an empty output and test.

* Refactored op to less threshold data to float.

* Implemented cuda-based helper for non_max_suppression_v3 op.

* Fixed fake_quant_with_min_max_vars op.

* Fixed tests with float numbers.

* - assert now stops execution
- sortByKey/sortByValue now have input validation

Signed-off-by: raver119 <raver119@gmail.com>

* missing var

Signed-off-by: raver119 <raver119@gmail.com>

* Fixed proper processing for zero max_size inputs.

* Refactored kernel callers.

* Fixed return statement for logdet op helper.

* Refactored unsorted segment SqrtN op.

* get back 8 tail bytes on CUDA

Signed-off-by: raver119 <raver119@gmail.com>

* Refactored segment prod ops and helpers for cuda and tests.

* Additional test.

* CudaWorkspace tests updated for 8 tail bytes

Signed-off-by: raver119 <raver119@gmail.com>

* special atomic test

Signed-off-by: raver119 <raver119@gmail.com>

* atomicMul/atomicDiv fix for 16bit values

Signed-off-by: raver119 <raver119@gmail.com>

* Eliminated waste prints.
2019-11-28 21:08:51 +03:00
Yurii Shyrma a8dd6713aa Shyrma scatter (#84)
* - improve performance of scatter (no lock) ops for 1D case

Signed-off-by: Yurii <iuriish@yahoo.com>

* - improve scatter lock op performance for 1D case

Signed-off-by: Yurii <iuriish@yahoo.com>

* - add kernel for verification of input indices-array elements in scatter and scatter_nd ops

Signed-off-by: Yurii <iuriish@yahoo.com>

* - provide fast indices checking on cpu side for scatter and gather osp

Signed-off-by: Yurii <iuriish@yahoo.com>

* - apply corrections requested by pr reviewer

Signed-off-by: Yurii <iuriish@yahoo.com>
2019-11-26 20:29:09 +03:00
shugeo 4187190609 Shugeo release fix2 (#70)
* Corrected input checking and tests for bitcast op.

* Fixed an issue with non_max_suppression form generation and processing with score threshold given.

* Fixed bilinear resize kernel and tests.

* push for Serhii

Signed-off-by: raver119 <raver119@gmail.com>

* Added test for nearest_neighbor resize with int input.

* Added data type check for input/output match.

* Eliminate error in macros.

* Improved output message for type checking.

* Fixed input/output types for op.

* Eliminated waste logging.

* Refactored resize_bilinear helper for multithreading for cpu platform.

* Cosmetic changes only.

* Fixed error for string substitution.

* Skip test for cbow_batch with cuda.

* fix for resizeNearestNeighbor output dtype

Signed-off-by: raver119 <raver119@gmail.com>

* Refactored non_max_suppression helper.

* Refactored shape generation and input handling.

* Added additional test.
2019-11-22 22:42:44 +03:00
Yurii Shyrma 7a90a31cfb
Shyrma deconv3 (#69)
* - profiling cuda kernels for vol2col and im2col

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct addBias helper

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct mkl dilation formula and switch off mkl api for dilation deconvolutions

Signed-off-by: Yurii <iuriish@yahoo.com>
2019-11-21 21:17:30 +02:00
raver119 83cb0d9329
[WIP] Create and small fix (#67)
* - create op
- skip exec for empty inputs for non_max_suppression
- EmptyHandling idea

Signed-off-by: raver119 <raver119@gmail.com>

* Create op and mapping for it

Signed-off-by: raver119 <raver119@gmail.com>
2019-11-21 13:31:20 +03:00
shugeo dc0036f2c6
Shugeo image resize bicubic (#56)
* Added implementation files for image_resize and resize_bicubic ops.

* Image resize and image.resize_bicubic ops implementation. Initial revision.

* Finished with infrastructure development for image.resize_bilinear op and image_resizo op implementation.

* Refactored resize methods.

* Added processing for Mitchelcubic algorithm.

* Added check for input/output sizes.

* Added int and float types for crop_and_resize op.

* Refactored crop_and_resize output type check.

* Added helper for bicubic interpolation as TF v.1 does.

* Added TF v.1 bicubic helper for cuda platform.

* Added cached class for bicubic algorithm.

* Refactored cuda implementation for crop_and_resize helper to use proper output type.

* Added facilities for bicubic interpolation.

* Portion bicubic interpolation from TF.

* Added tests for resize_bilinear testing.

* Working implementation of bicubic interpolation and tests.

* Refactored routines with image_resize bicubic op helper.

* Refactored code with coding standards.

* Refactored cpu helpers for resize_bicubic op.

* Refactored bicubic helpers.

* Added bicubic resize facilities.

* Implementing cuda kernels for bicubic interpolation. Implementation step.

* Cuda implementation of resize_bicubic op helper.

* Refactor image.resize_bicubic op helpers.

* Refactored helpers for resize_bicubic. Added error checking with cuda implementation.

* Refactored cuda implementation of resize_bicubic op helper. The first working revision.

* Cuda arch implementation for resize_bicubic op helper. Full working single-threaded revision.

* Intermediate bicubic interpolation helper for cuda.

* Refactored cpu helper for resize_bicubic.

* Multithreaded cuda implementation for resize_bicubic.

* Fixed merge issues.

* Refactored nlp helpers.

* Replicated resize_bicubic for 3D also.

* Eliminated waste comments of unused code.

* Eliminated waste comments with unused code.

* Eliminated waste template definitions.

* Eliminated waste debug code.

* Eliminated waste comments.

* Fixed multithreading with helpers.

* Fixed test suites for float and double in float point input lists.

* Fixed usage of reshape with 3D/4D on resizes.

* Final fixes.

* Fixed resize_neighbor op problem.
2019-11-20 21:11:04 +02:00
shugeo 13e5c0a280
Shugeo release fix1 (#61)
* Added a pair of tests for failed ops.

* Fixed cpu helper for draw_bounding_boxes op.

* Refactored implementation of draw_bounding_boxes op to full conform with TF.

* Improved multithreading with draw_bounding_boxes op cuda helper.

* Eliminated log messages.

* Changed logging with draw_bounding_boxes op helper and tests.

* Resize_biliear with 3D input allowed.

* Refactored 3D input acception with resize_bilinear op.

* And another improvement.

* Refactored reshape of input/output for resize_bilinear.

* Improvements final.

* Finished with 3D replication for image.resize_bilinear/_nearest_neighbor.

* Added copyrights for TF code.

* Using new form of multithreading for cpu implementation.

* Fixed shape error.

* Added multithreaded with batches on crop_and_resize functor.

* Refactored multithreading with crop_and_resize and draw_bounding_boxes.
2019-11-20 13:37:48 +02:00
raver119 7898f3c0cc
fix for is_increasing/non_decreasing ops for empty input case (#63)
Signed-off-by: raver119 <raver119@gmail.com>
2019-11-20 11:12:15 +03:00
Alex Black da1944e8e1
SameDiff TF import (#49)
* Added implementation files for image_resize and resize_bicubic ops.

* Image resize and image.resize_bicubic ops implementation. Initial revision.

* Minor fix

* Some TF imports disabled.

* Finished with infrastructure development for image.resize_bilinear op and image_resizo op implementation.

* Refactored resize methods.

* Added processing for Mitchelcubic algorithm.

* adjust_contrast

* Small fix for TF import expected value loading when variable name starts with the test name

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Tests

* Tests added.

* Removed tf names absent in mapping.

* Some fixes.

* Small fixes

* Minor change

* Some failing tests.

* Disable failed test

* Ignore some tests

* Fix import class mapping

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fix float property mapping (flatbuffers)

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Override equality function for model 'dropout'

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fail tests

* Failed tests ignored temporarily.

* Minor fixes

* Small fix

* Conflict resolved

* Default implementations of tensorflowName and onnxName
2019-11-19 22:44:29 +11:00
raver119 bbd59a3537
fake quant dtype validation fix (#60)
Signed-off-by: raver119 <raver119@gmail.com>
2019-11-19 12:53:52 +03:00
raver119 1780dcc883
[WIP] Small fixes here and there (#50)
* one range test

Signed-off-by: raver119 <raver119@gmail.com>

* few Context convenience singatures

Signed-off-by: raver119 <raver119@gmail.com>

* one more range test

Signed-off-by: raver119 <raver119@gmail.com>

* "range" "fix"

Signed-off-by: raver119 <raver119@gmail.com>

* adjuct_contrast_v2 now allows scale factor to be provided via input_variable

Signed-off-by: raver119 <raver119@gmail.com>

* adjust_contrast now allows scale factor as variable too

Signed-off-by: raver119 <raver119@gmail.com>

* bitcast shape tests

Signed-off-by: raver119 <raver119@gmail.com>

* BitCast import dtype added

Signed-off-by: raver119 <raver119@gmail.com>

* few more BitCast signatures

Signed-off-by: raver119 <raver119@gmail.com>
2019-11-15 17:04:29 +03:00
raver119 48df1acdfb
[WIP] ThreadPool (#8)
This PR removes OpenMP use in 95% of cases
2019-11-13 17:04:59 +03:00
Alex Black 18c01f5bdc
Add SameDiff memory reuse memory manager (array cache) (#39)
* Attention op comments

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* ArrayCacheMemoryMgr - first pass

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Tweak array cache for use with SameDiff identity arrays

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* ArrayCacheMemoryMgr javadoc and properly get max memory

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* LRU cache policy + add tests

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fixes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Resize arrays internally if required for ArrayCacheMemoryMgr

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Test improvement

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Small polish

Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-11-12 21:15:44 +11:00
Yurii Shyrma 0eda1e733e Shyrma bnorm bp (#41)
Batchnorm backprop mkldnn
2019-11-12 11:58:48 +03:00
raver119 929c1dc5c7 - new NDArrayFactory scalar constructor
- minor tweak in randomuniform
- one more test

Signed-off-by: raver119 <raver119@gmail.com>
2019-11-08 08:49:41 +03:00
raver119 51f3a1371d
[WIP] Random Uniform (#36)
* args

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* T args

Signed-off-by: raver119 <raver119@gmail.com>
2019-11-07 17:09:47 +03:00
shugeo 679e42199a Shugeo strided slice bp fix2 (#33)
* Fixed crash and restored brocken functionality for strided slice.

* Added comments for strided_slice_bp main step.
2019-11-07 13:44:02 +03:00
shugeo 08853c7829
Shugeo random uniform int (#30)
* Corrected randomuniform declaration.

* Refactored uniform distribution for both cuda and cpu platforms.

* Refactored uniform distribution and tests.

* Fixed type usage with indices.

* Refactored uniform distribution implementation and tests to full conform with TF implementation.

* Refactored gamma function to use type util method.

* Copyright changes and fixes with ConstantHelper.

* Added error checking on allocate cuda device memory and operations.
2019-11-06 12:49:27 +02:00
shugeo 9124974e3b
Fixed crash with strided_slice_bp op and tests. (#29) 2019-11-05 12:49:15 +02:00
shugeo 7b14a9f603
Gamma and Poisson distributions (#27)
* Added implementation for random_gamma op.

* Added implementation for random_poisson op and support classes.

* Added helpers for random_poisson and random_gamma ops.

* Implementation of random_poisson. The first working edition.

* Implementation of random_poisson. Parallelized working edition.

* Implementation of random_gamma. Parallelized working edition with alpha only.

* Added cuda implementation for helper of poisson distribution.

* Corrected shape calculation with random_gamma and tests.

* Finished cpu implementation for gamma distribution.

* Finished cuda implementation for random_gamma op.

* Refactored cpu helpers for random_gamma and random_poisson ops.

* Refactored cuda helpers for gamma and poisson distribution.

* Refactored cuda helper for gamma distribution.

* Refactored cpu helper for random_poisson op.

* Refactored cpu helper for random_gamma op.
2019-11-04 15:42:28 +02:00
Alex Black 948ebef41c
Op Fixes (#28)
* #8280 biasadd_bp nchw arg fixes (java side) + test

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8285 Concat op Java side fixes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Concat op cpp fix - allow dynamic axis to be negative, same as static axis

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* ignores for deconv3d import tests until deconv3d_tf op is implemented

Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-11-05 00:05:04 +11:00
Yurii Shyrma 0cdb5750e0
Shyrma concat (#24)
* - provide possibility to pass axis as last input array in concat op
- corrcect sumation in bias_add_bp op for NHWC case

Signed-off-by: Yurii <iuriish@yahoo.com>

* - write code for deconv2d op based on mkl dnn api

* no unsafe math

Signed-off-by: raver119 <raver119@gmail.com>

* no unsafe math

Signed-off-by: raver119 <raver119@gmail.com>

* - get rid of e<> and p<> methods in svd helper

Signed-off-by: Yurii <iuriish@yahoo.com>

* - provide mkl api support for deconvolution 3d

Signed-off-by: Yurii <iuriish@yahoo.com>

* - write deconv2d_bp based on mkl api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - write deconv3d_bp based on mkl api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - testing and fixing deconv based on mkl api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - remove dilation form conv2d/3d mkl

Signed-off-by: Yurii <iuriish@yahoo.com>

* - minor changes

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further corrections of deconv ops based on mkl dnn api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - provide deconv2d_tf based on mkl dnn api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - add minor corrections required by reviewer

Signed-off-by: Yurii <iuriish@yahoo.com>
2019-11-03 12:37:19 +02:00
shugeo 95f7ad7b94
Shugeo suppression overlaps (#9)
* Added non_max_suppression_overlaps op and tests.

* Refactored implementation of non_max_suppression_overlaps.

* Refactoring of implementation of non_max_suppression_overlaps op.

* Refactoring of implementation of non_max_suppression op.

* Fixed portion error.

* Added cuda frontends for image suppression ops.

* Eliminated crash with cuda arch on image.non_max_suppression_overlaps op.

* Improved implementation of image_suppression helper for cpu platform.

* The generic approach of non_max_suppression_overlaps op helper with cuda platform.

* Working cuda implementation of helper non_max_suppression_overlaps op.

* Eliminated waste comments.

* Improved implementations for both platforms

* Refactored cuda implementation of image.non_max_suppression_overlaps op helper.

* Improved cuda implementation of non_max_suppression op helper.

* Refactored cuda implementation of image.non_max_suppression_overlaps op helper.

* Improved cuda implementation of image.non_max_suppression_overlaps op helper.

* Added modifications into cuda implementation for image suppression overlaps op.

* Correct queue emulation with cuda implementation of non_max_suppression_overlaps op.

* Prefinal stage of cuda implementation of non_max_suppression_overlaps.

* Worked cuda implementation of non_max_suppresion_overlaps helper.

* Fixed return to proper thread.

* Improvements for cuda implementation of image.non_max_suppression_overlaps op helper.

* Fixed implementation issues with non_max_suppression_overlaps on cuda platform.

* Fixed skip for non_max_suppression_overlaps on cuda platform.

* Finalize implementation of image_suppression helper and tests.

* Cosmetic changes only.
2019-10-30 13:43:45 +02:00
Yurii Shyrma 029a69a835
Shyrma bn mkl bp (#14)
* - write code for new batchnorm backprop

Signed-off-by: Yurii <iuriish@yahoo.com>

* - testing batchnorm backprop

Signed-off-by: Yurii <iuriish@yahoo.com>

* - write code for batchnorm backprop based on mkl dnn api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - testing and fixing bugs in batchnorm_bp mkl dnn

Signed-off-by: Yurii <iuriish@yahoo.com>

* - made corrections required by reviewer

Signed-off-by: Yurii <iuriish@yahoo.com>

* - change name in java wrapper for batchnorm op

Signed-off-by: Yurii <iuriish@yahoo.com>
2019-10-26 14:14:21 +03:00
Alex Black d333d29099
SameDiff cleanup and fixes (#12)
* #8160 Remove resolvePrepertiesFromSameDiffBeforeExecution

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* SameDiff API cleanup

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* More SameDiff cleanup

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Small fixes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8248 Switch SameDiff variable init from lazy to creation time for more predictable behaviour

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8252 TanhDerivative javadoc

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8225 Deconvolution2D input validation

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8265 Switch SameDiff.outputs() to user settable, instead of unreliable 'best guess'

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8224 SameDiff.zero and .one create constants, not variables

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* More cleanup and fixes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Small test fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Small fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* DL4J SameDiff fixes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Re-add hack for Deconvolution2DLayer until #8315 is resolved

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8270 Move CUDA device/version logging to Java; can be disabled via existing org.nd4j.log.initialization system property

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* All ND4J init logging checks system property

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Small tweak

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Remove redundant device logging

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* One more fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* UX improvements

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Deconv fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Add deconv tests

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Cleanup

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Remove debug code

Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-10-26 12:38:08 +11:00
Alexander Stoyakin f31661e13b
Merge pull request #7 from KonduitAI/asto_nd4s_10172019
KDTree optimization
2019-10-23 12:11:25 +03:00
Yurii 70bd925abd - write 2 versions of new lstmLayer: one is based on own code, second uses mkl dnn api 2019-10-17 20:44:52 +03:00
Alexander Stoyakin 630bb3c9b6
Merge pull request #2 from KonduitAI/asto_ops_wrapper
[WIP] New ops wrapper
2019-10-16 20:21:50 +03:00
Alexander Stoyakin 96a9a1a733 Fixed output from operation. 2019-10-16 18:07:52 +03:00
shugeo 478a0c1f97 Added igamma and igammac broadcastable ops implementations and tests. 2019-10-16 14:02:53 +03:00
shugeo 92636b0b86 Eliminated waste operator. 2019-10-10 17:08:59 +03:00
shugeo 02d8616692 Implementation of cuda kernel for fake_quant_with_min_max_vars_per_channels op. 2019-10-10 16:40:56 +03:00
shugeo d0cbd33b0e Added input checks for op. 2019-10-09 15:52:13 +03:00
shugeo cb56b0b06a The first approach for fake_quant_with_min_max_vars_per_channel op implementation. 2019-10-08 19:00:41 +03:00
shugeo 53a2ebddbe Added test and helpers for draw_bounding_boxes op both cpu and cuda related. 2019-10-04 20:46:26 +03:00
shugeo 8f70b4441f draw_bounding_boxes op implementation. Inital revision. 2019-10-04 18:32:21 +03:00
shugeo 908e4c4912 Added implementation for divide_no_nan op and tests. 2019-10-04 10:29:15 +03:00
shugeo 130ee25682 Implemented compare_and_bitpack op. 2019-10-03 10:57:48 +03:00
shugeo f3e42173ef Refactored buffer copying to avoid wrong usage of buffers. 2019-10-02 16:51:09 +03:00
shugeo 1c6173d218 Added implementation of bitcast op. 2019-10-02 15:04:59 +03:00
shugeo afeb524238 Refactored implementation for adjust_contrast ops. 2019-10-01 14:13:09 +03:00