* Libnd4j: Add broadcastable elementwise power derivative #7461 first step of Pow_bp operation implementation
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: Add broadcastable elementwise power derivative #7461 some corrections of calculation steps
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: Add broadcastable elementwise power derivative #7461 some bug fixes, the PowDerevative op made broadcastable, add the raw tests for op, need refactoring to use broadcast ops
* Libnd4j: Add broadcastable elementwise power derivative #7461 fixed several bugs add broadcast support and tests, need to fix scalar+array and array+scalar
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: Add broadcastable elementwise power derivative #7461 fixed bugs for scalar inputs, fixed multinomial tests, added tests
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: Add broadcastable elementwise power derivative #7461 fised bugs for different shapes support, tests updated
* Libnd4j: Add broadcastable elementwise power derivative #7461 applied all possible variants via tiled arrays, add support of broadcast for Pow and PowDerivative ops, covered by tests, before review have to be replaced tiled implementation by applyTrueBroadcast
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: Add broadcastable elementwise power derivative #7461 replaced tile by broadcast implementation, fixed issue with negative x input, corrected tests, need additional testing
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: Add broadcastable elementwise power derivative #7461 added and corrected test cases, corrected implementation need review
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: Add broadcastable elementwise power derivative #7461 code clean up
* Libnd4j: Add broadcastable elementwise power derivative #7461 code clean up, removed some tests, add tests with scalar
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: Add broadcastable elementwise power derivative #7461 code improvement and clean up, split tests
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: Add broadcastable elementwise power derivative #7461 some code clean up
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: Add broadcastable elementwise power derivative replace __isnanf by internal realization
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* pow_bp wrapper
* Fixed PowBp wrapper
* Tests added
* Test fixed
* Fix return type
* Disable powBp usage
* Pow backprop changed
Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
* - implementation of depthwise_conv2d (both ff/bp) based on mkl dnn api
* - minor corrections in deconv3d
Signed-off-by: Yurii <iuriish@yahoo.com>
* - remove unnecessary time test
Signed-off-by: Yurii <iuriish@yahoo.com>
* - update mkl dnn version in cmake
Signed-off-by: Yurii <iuriish@yahoo.com>
* - take into account several notes given by pr reviewer
Signed-off-by: Yurii <iuriish@yahoo.com>
* - fix bug in depthwise conv2d op based on mkl
Signed-off-by: Yurii <iuriish@yahoo.com>
* libnd4j: Multinomial op #8570 first raw step of multinomial random data generator implementation
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j: Multinomial op #8570 next step of multinomial random categories generator implementation on both cpu and cuda, need corrections and code clean up before review and testing
* libnd4j: Multinomial op #8570 code clean up and fixed issues data selecting, moved from coords to tads
* libnd4j: Multinomial op #8570 fixed cuda build add reference for math materials that was used for implementation
* libnd4j: Multinomial op #8570 fixed several bugs, added several tests and improved cuda version. current implementation works, need testing of reproduction with the same seed
* libnd4j: Multinomial op #8570 fixes and optimization after discussion in both cuda and cpu
* libnd4j: Multinomial op #8570 add corrections after review, removed tads, replace 2D parallel loop by 3D
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j: Multinomial op fixed declaration and add tests need discussion
* libnd4j: Multinomial op fix in test
* libnd4j: Multinomial op corrected behavior to get reproducible results, fixed issue in uniform value getting, tests added, need cuda review and cuda testing
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j: Multinomial op fixed indexing on uniform calculation
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j: Multinomial op some corrections in max min declaration
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j: Multinomial op fixed index calculation, added rewind, corrected input declaration, added stats tests, both cuda and cpu. cuda need testing
* libnd4j: Multinomial op fixed bugs on cuda nad cpu. need review
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j: Multinomial op corrected tests to handle different orders
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j: Multinomial op some improvements after code review
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j: Multinomial op more corrections after review
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j: Multinomial op fixed seed usage, update tests, fixed cuda based on comments, fixed bug of rewind, removed one behavior, minor corrections.
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j: Multinomial op minor corrections
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j: Multinomial op rise the bound of fluctuation for random cases
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j: Multinomial op modified operation inputs and update implementation and tests on both cpu and cuda
* libnd4j: Multinomial op corrected data types according ops.proto
Co-authored-by: raver119 <raver119@gmail.com>
* libnd4j: RgbToYuv and YuvToRgb, both implementations for both cpu and cuda. Need adding tests and review
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j: RgbToYuv and YuvToRgb, replace coords method on Tad in both cpu and cuda, add tests, fixed bugs
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j: RgbToYuv and YuvToRgb minor corrections
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j: RgbToYuv and YuvToRgb corrections to use operations in-place
* one special test
Signed-off-by: raver119 <raver119@gmail.com>
* one special test
Signed-off-by: raver119 <raver119@gmail.com>
* local memory for concat
Signed-off-by: raver119 <raver119@gmail.com>
* fixed grid size for concat
Signed-off-by: raver119 <raver119@gmail.com>
* fixed grid size for concat
Signed-off-by: raver119 <raver119@gmail.com>
* test commented out
Signed-off-by: raver119 <raver119@gmail.com>
* - specifying template instantiation for certain types in float16 and bloat16
Signed-off-by: Yurii <iuriish@yahoo.com>
* - polishing bfloat16 and float16 member functions template specialization
Signed-off-by: Yurii <iuriish@yahoo.com>
* - rewrite and overload array +-*/ scalar and scalar +-*/ arr in NDAray class
Signed-off-by: Yurii <iuriish@yahoo.com>
* - make corrections which have to do with and rvalue lvalue conversions
Signed-off-by: Yurii <iuriish@yahoo.com>
* - provide move semantic in NDArray operators array +-/* array
Signed-off-by: Yurii <iuriish@yahoo.com>
* float16/bfloat16 tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* one more tweak
Signed-off-by: raver119 <raver119@gmail.com>
* - make float16 and bfloat16 to compile successfully on cuda
Signed-off-by: Yurii <iuriish@yahoo.com>
* - do not use resources of view-like arrays when move semantics is applied
Signed-off-by: Yurii <iuriish@yahoo.com>
* - get rid of pointers in signatures NDArray methods 1
Signed-off-by: Yurii <iuriish@yahoo.com>
* - correction of signature of NDArray::dup method
Signed-off-by: Yurii <iuriish@yahoo.com>
* - correction of signature of NDArray::reduceAlongDimension method
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::applyIndexReduce and applyTrueBroadcast methods
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::applyReduce3 and varianceAlongDimension methods
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::tensorsAlongDimension and diagonal methods
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::allTensorsAlongDimension
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::reduceAlongDimension 2
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::applyTransform 2
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::applyPairwiseTransform 2
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::applyBroadcast 2
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::applyTrueBroadcast 2
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::applyScalar and applyScalarArr
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::lambda methods
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::reduce3 methods 2
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of following NDArray methods: add/sub/mul/div row/column and fillAsTriangular
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::tileToShape methods
Signed-off-by: Yurii <iuriish@yahoo.com>
* - signature correction of NDArray::isShapeSameStrict method
Signed-off-by: Yurii <iuriish@yahoo.com>
* minor corrections in tests
Signed-off-by: Yurii <iuriish@yahoo.com>
* - replace reduce op in batchnorm mkldnn
Signed-off-by: Yurii <iuriish@yahoo.com>
* - add explicit templates instantiations for operator+(NDArray&&. const scalar)
Signed-off-by: Yurii <iuriish@yahoo.com>
* - corrections of casts in float16/bfloat16
Signed-off-by: Yurii <iuriish@yahoo.com>
* - provide move semantics in following NDArray methods: transform, applyTrueBroadcast, transpose, reshape, permute
Signed-off-by: Yurii <iuriish@yahoo.com>
* - get rid of input array A duplicate in svd cuda op
Signed-off-by: Yurii <iuriish@yahoo.com>
* - avoid available bug in svd cuda API
Signed-off-by: Yurii <iuriish@yahoo.com>
* - add temporary global memory buffer in svd cuda when calcUV = false and m != n
Signed-off-by: Yurii <iuriish@yahoo.com>
* - remove test with blfoat16 type for betainC
Signed-off-by: Yurii <iuriish@yahoo.com>
* - resolve conflicts after master has been merged in
Signed-off-by: Yurii <iuriish@yahoo.com>
* - changed type of affected input array in fused_batch_norm
Signed-off-by: Yurii <iuriish@yahoo.com>
* - add several explicit type castings
Signed-off-by: Yurii <iuriish@yahoo.com>
* - add ND4J_EXPORT to operators
Signed-off-by: Yurii <iuriish@yahoo.com>
* - add explicit template types in instantiations of template arithm operators of NDArray class
Signed-off-by: Yurii <iuriish@yahoo.com>
* - one more test fix
Signed-off-by: Yurii <iuriish@yahoo.com>
Co-authored-by: raver119 <raver119@gmail.com>
* libnd4j: RgbToGrayscale op #8536 - raw implementation in user branch, need checks for integration and adding other orders
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j: RgbToGrayscale op #8536 next step of merging images
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j: RgbToGrayscale op #8536, Revert merge of hsv_to_rgb and rgb_to_hsv as cause conflicts in naming need refactoring before merge, implementation of rbg_to_grs added
* libnd4j: RgbToGrayscale op #8536 imlementation and conflict resolve
* libnd4j: RgbToGrayscale op #8536 merged operations with images into image, renamed methods and files
* libnd4j: RgbToGrayscale op #8536 added test for rgbToGrayScale, need clarification and fixed tests case run
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j: RgbToGrayscale op #8536 bug fixing and need review
* libnd4j: RgbToGrayscale op #8536 some additional corrections after review
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* - minor corrections in rgbToGrs test1
Signed-off-by: Yurii <iuriish@yahoo.com>
* libnd4j: RgbToGrayscale op #8536, corrected tests and rbf_to_grs, fixed problems, refactoring, need review
* libnd4j: RgbToGrayscale op #8536 fix for 'f' order in rgbToGrs
* libnd4j: RgbToGrayscale op #8536 fixed several bugs with dimC, test case refactoring and improve
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* - add cuda kernel for rgbToGrs op
Signed-off-by: Yurii <iuriish@yahoo.com>
* - fix linkage errors
Signed-off-by: Yurii <iuriish@yahoo.com>
Co-authored-by: Yurii Shyrma <iuriish@yahoo.com>
* Added infrastructure for implementation op lu for both cuda and cpu platforms.
* Added implementation of helpers with lu op.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Refactored LU decomposition to use vector of permutations instead.
* Refactored helpers for lu op.
* Fixed crash with determinant op.
* Refactored cpu LU op heleper.
* Added implementation for lu op.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Fixed issue with argmax on column.
* Added multithreaded behaviour for lu op helper.
* Fixed multithreaded cpu implementation helpers for lu op.
* Added cuda implementation for lu op helper.
* Finished lu helper implementation for cuda platform.
* Eliminated waste prints and comments.
* Fixed race condition and multithreading issues.
* Fixed memory leak with shape construction.
* Corrected test for lu op to avoid near zero elements on the main diagonal."
Signed-off-by: shugeo <sgazeos@gmail.com>
* Improved test for adjust_constast op.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Fixed issues with cuda implementation of resize_bicubic helpers.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Expanding allowed paddings type to 64bit ints also.
* Extended to int64 paddins data types for mirror_pad op.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Timeouts added
* Added some ops
* Ops added
* Fixed tests
* Minor fix
* Some fixes
* Digamma added
* Small fixes
* Timeouts added
* Added some ops
* Ops added
* Fixed tests
* Minor fix
* Some fixes
* Digamma added
* Small fixes
* Fused batch norm fixes-
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Tests switched off.
* Added test for resize_bicubic.
* Eliminated wasted in test of bicubic resize.
* Switched off multithreading explicit.
* HsvToRgb and RgbToHsv added
* Eliminated waste comments and conform proper float constants.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Fixed multithreading with resize_bicubic helper for cpu platform.
Signed-off-by: shugeo <sgazeos@gmail.com>
* ResizeBicubic was fixed.
* Some fixes
* Fix op name
* Validation fixed.
* Clarifications for tests
* Wrappers and small fixes for new ops.
* CUDA implementation for hsv_to_rgb and rgb_to_hsv
Signed-off-by: raver119 <raver119@gmail.com>
* hsv_to_rgb and rgb_to_hsv operations
Test coverage: c order 1d, 2d, 3d array
Signed-off-by: Abdelrauf <rauf@konduit.ai>
* Index check
Signed-off-by: Abdelrauf <rauf@konduit.ai>
* Suppress Msvc floating point errors
Signed-off-by: Abdelrauf <rauf@konduit.ai>
* Added Index Check for adjust_saturation and adjust_hue
Signed-off-by: Abdelrauf <rauf@konduit.ai>
* minor fix
Signed-off-by: raver119 <raver119@gmail.com>
* Fixes missed Msvc floating narrowing errors
Signed-off-by: Abdelrauf <rauf@konduit.ai>
* IndexReduce and Reduce3 split into few units
Signed-off-by: raver119 <raver119@gmail.com>
* IndexReductionLoops split as well
Signed-off-by: raver119 <raver119@gmail.com>
* reduce_float split as well
Signed-off-by: raver119 <raver119@gmail.com>
* working prototype of new CUDA build with cmake
Signed-off-by: raver119 <raver119@gmail.com>
* get rid of older stuff
Signed-off-by: raver119 <raver119@gmail.com>
* remove legacy CUDA debug section
Signed-off-by: raver119 <raver119@gmail.com>
* fPIC for GCC
Signed-off-by: raver119 <raver119@gmail.com>
* - switch to /MD
- make MSVC runtime lib configurable from 1 place
Signed-off-by: raver119 <raver119@gmail.com>
* few last tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* mae static library optional
Signed-off-by: raver119 <raver119@gmail.com>
* typo fixed
Signed-off-by: raver119 <raver119@gmail.com>
* resize_bicubic: allow more dtypes
Signed-off-by: raver119 <raver119@gmail.com>
* resize_bicubic: allow less dtypes
Signed-off-by: raver119 <raver119@gmail.com>
* Refactored resize_bicubic op to full conform with TF1.5 and tests.
* Corrected test to proper data type output.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Corrected double input test to float constant outputs.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Finished with correction of tests for bicubic interpolated resizes expected.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Fixed adjust_contrast ops to allow non-RGB inputs.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Refactored adjust_contrast_v2 to conform with TF one.
Signed-off-by: shugeo <sgazeos@gmail.com>
* AdjustContrast tests activated
* two typos fixed
Signed-off-by: raver119 <raver119@gmail.com>
* cleaned up bert iterator tests (#110)
Signed-off-by: eraly <susan.eraly@gmail.com>
* Various pre-release fixes (#111)
* Various fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix default dtypes for MaxPoolWithArgmax
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small pre-release tweak (#112)
* Log UI address on launch as in previous Play-based UI
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Logging level tweak for UI
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* http not https
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* datavec python ensure host (#113)
* ensure host
* one more host ensure
* info->debug
* [WIP] reverse improvements (#115)
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* reverse draft
Signed-off-by: raver119 <raver119@gmail.com>
* reverse kernel
Signed-off-by: raver119 <raver119@gmail.com>
* reverse kernel
Signed-off-by: raver119 <raver119@gmail.com>
* 2 micro fixes
Signed-off-by: raver119 <raver119@gmail.com>
* Shugeo resize fix5 (#102)
* Refactored resize images ops to use TF-like bool args as input.
* Refactored helpers for cpu implementation of resize_bilinear and resize_nearest_neighbor ops.
* Refactored cuda implementation for image.resize_bilinear and image.resize_nearest_neighbor ops helpers.
* Refactored nearest_neighbor resize op.
* Added a pair of tests for special case of resize_bilinear algorithm.
* Fixed issue with resize_bilinear op.
* Refactored cpu implementation for helpers with resize_nearest_neighbor op.
* Final fixed for resize ops to conform TF v.1.5
* Refactored cuda helpers for resize_neares_neighbor op.
* Fixed resize_bilinear to accept proper data.
* Fixed issue with non-float input for resize_bilinear op.
* Refactored cuda helper for resize_bilinear to proper process non-float inputs.
* Added tests for resize_bilinear to int inputs.
* Fixed ResizeBilinear wrapper
* Tests fixed
* Fixed float and bool constant to avoid overflow for some kind of compilers.
* Corrected float constants with float data type.
* Added f suffix for float constants.
* Corrected float constant to avoid overflow with initializing lists.
* Corrected float initializing list with float input.
* Corrected bool constant with initalizing list.
* Corrected float and bool values with initializing lists.
* Fixed wrong constant.
* Fixed issue with 1x1 input picture for resize.
* ResizeBilinear default values on import fix
Signed-off-by: raver119 <raver119@gmail.com>
* Refactored resize images ops to use TF-like bool args as input.
* Refactored helpers for cpu implementation of resize_bilinear and resize_nearest_neighbor ops.
* Refactored cuda implementation for image.resize_bilinear and image.resize_nearest_neighbor ops helpers.
* Refactored nearest_neighbor resize op.
* Added a pair of tests for special case of resize_bilinear algorithm.
* Fixed issue with resize_bilinear op.
* Refactored cpu implementation for helpers with resize_nearest_neighbor op.
* Final fixed for resize ops to conform TF v.1.5
* Refactored cuda helpers for resize_neares_neighbor op.
* Fixed resize_bilinear to accept proper data.
* Fixed issue with non-float input for resize_bilinear op.
* Refactored cuda helper for resize_bilinear to proper process non-float inputs.
* Added tests for resize_bilinear to int inputs.
* Fixed ResizeBilinear wrapper
* Tests fixed
* Fixed float and bool constant to avoid overflow for some kind of compilers.
* Corrected float constants with float data type.
* Added f suffix for float constants.
* Corrected float constant to avoid overflow with initializing lists.
* Corrected float initializing list with float input.
* Corrected bool constant with initalizing list.
* Corrected float and bool values with initializing lists.
* Fixed wrong constant.
* Fixed issue with 1x1 input picture for resize.
* ResizeBilinear default values on import fix
Signed-off-by: raver119 <raver119@gmail.com>
* - add padding calculation in same mode in causal conv1d op for right mkl paddings
Signed-off-by: Yurii <iuriish@yahoo.com>
* - correct causal condition in mkldnnUtils.cpp
Signed-off-by: Yurii <iuriish@yahoo.com>
* - correct some code which caused additional round errors is betainc op
Signed-off-by: Yurii <iuriish@yahoo.com>
* - put float in place of template parameter in nan assign in betainc op
Signed-off-by: Yurii <iuriish@yahoo.com>
* - add possibility of passing scalar-array as input parameter for scale factor in adjust hue/contrast/saturation ops
- correct typo in function which calculates regularized incomplete beta integral
Signed-off-by: Yurii <iuriish@yahoo.com>
* - fix bug in betainc cuda kernel
Signed-off-by: Yurii <iuriish@yahoo.com>
* - start working on implementation of digamma function
Signed-off-by: Yurii <iuriish@yahoo.com>
* - further work on digamma function (cpu)
Signed-off-by: Yurii <iuriish@yahoo.com>
* - testing and fixing bugs in digamma op
Signed-off-by: Yurii <iuriish@yahoo.com>
* - make correction n cuda kernel for polyGamma
Signed-off-by: Yurii <iuriish@yahoo.com>
* - remove unnecessary stuff from betaInc cuda kernel
Signed-off-by: Yurii <iuriish@yahoo.com>
* - resolve conflicts in DeclarableOpsTests3.cpp after master branch has been merged
Signed-off-by: Yurii <iuriish@yahoo.com>
* - restore id number of Not opertion in legacy_ops.h
Signed-off-by: Yurii <iuriish@yahoo.com>
* - correct padding calculation in mkl dnn conv1d causal
Signed-off-by: Yurii <iuriish@yahoo.com>
* restore empty check in adjust_contrast_v2
Signed-off-by: raver119 <raver119@gmail.com>
* fix narrowing down cast
Signed-off-by: raver119 <raver119@gmail.com>
* trigger jenkins
Signed-off-by: raver119 <raver119@gmail.com>
* few more fixes for MSVC and Windows
Signed-off-by: raver119 <raver119@gmail.com>
* few more fixes for MSVC and Windows
Signed-off-by: raver119 <raver119@gmail.com>
* few more fixes for MSVC and Windows
Signed-off-by: raver119 <raver119@gmail.com>
* few more fixes for MSVC and Windows
Signed-off-by: raver119 <raver119@gmail.com>
* few more tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* few more tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* few more tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* few more tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* few more tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* - few more tweaks
- tensormmul dtype validation
Signed-off-by: raver119 <raver119@gmail.com>
* - few more tweaks
- batched gemm dtype validation
Signed-off-by: raver119 <raver119@gmail.com>
* - few more tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* - few more tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* - few more tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* - few more tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* - add causal mode of padding to convolutions
Signed-off-by: Yurii <iuriish@yahoo.com>
* - add additional tests for causal conv1d
Signed-off-by: Yurii <iuriish@yahoo.com>
* - add causal mode for cuda conv kernels
Signed-off-by: Yurii <iuriish@yahoo.com>
* Java side of Conv1D changes
Signed-off-by: raver119 <raver119@gmail.com>
* Add Conv1DDerivative op
Signed-off-by: Alex Black <blacka101@gmail.com>
* Causal Conv1D gradient checks
Signed-off-by: Alex Black <blacka101@gmail.com>
* Tweaks
Signed-off-by: Alex Black <blacka101@gmail.com>
* - add causal padding mode to conv2d_bp
Signed-off-by: Yurii <iuriish@yahoo.com>
* More thorough causal conv1d tests
Signed-off-by: Alex Black <blacka101@gmail.com>
* Implementation for non_max_suppression_v3 was added. Initial version
* Added check for overcome threshold.
* Added definition for V3 method.
* java remapping for NonMaxSuppressionV3
Signed-off-by: raver119 <raver119@gmail.com>
* Fixed proporly processing of an empty output and test.
* Refactored op to less threshold data to float.
* Implemented cuda-based helper for non_max_suppression_v3 op.
* Fixed fake_quant_with_min_max_vars op.
* Fixed tests with float numbers.
* - assert now stops execution
- sortByKey/sortByValue now have input validation
Signed-off-by: raver119 <raver119@gmail.com>
* missing var
Signed-off-by: raver119 <raver119@gmail.com>
* Fixed proper processing for zero max_size inputs.
* Refactored kernel callers.
* Fixed return statement for logdet op helper.
* Refactored unsorted segment SqrtN op.
* get back 8 tail bytes on CUDA
Signed-off-by: raver119 <raver119@gmail.com>
* Refactored segment prod ops and helpers for cuda and tests.
* Additional test.
* CudaWorkspace tests updated for 8 tail bytes
Signed-off-by: raver119 <raver119@gmail.com>
* special atomic test
Signed-off-by: raver119 <raver119@gmail.com>
* atomicMul/atomicDiv fix for 16bit values
Signed-off-by: raver119 <raver119@gmail.com>
* Eliminated waste prints.
* - improve performance of scatter (no lock) ops for 1D case
Signed-off-by: Yurii <iuriish@yahoo.com>
* - improve scatter lock op performance for 1D case
Signed-off-by: Yurii <iuriish@yahoo.com>
* - add kernel for verification of input indices-array elements in scatter and scatter_nd ops
Signed-off-by: Yurii <iuriish@yahoo.com>
* - provide fast indices checking on cpu side for scatter and gather osp
Signed-off-by: Yurii <iuriish@yahoo.com>
* - apply corrections requested by pr reviewer
Signed-off-by: Yurii <iuriish@yahoo.com>
* Corrected input checking and tests for bitcast op.
* Fixed an issue with non_max_suppression form generation and processing with score threshold given.
* Fixed bilinear resize kernel and tests.
* push for Serhii
Signed-off-by: raver119 <raver119@gmail.com>
* Added test for nearest_neighbor resize with int input.
* Added data type check for input/output match.
* Eliminate error in macros.
* Improved output message for type checking.
* Fixed input/output types for op.
* Eliminated waste logging.
* Refactored resize_bilinear helper for multithreading for cpu platform.
* Cosmetic changes only.
* Fixed error for string substitution.
* Skip test for cbow_batch with cuda.
* fix for resizeNearestNeighbor output dtype
Signed-off-by: raver119 <raver119@gmail.com>
* Refactored non_max_suppression helper.
* Refactored shape generation and input handling.
* Added additional test.
* - profiling cuda kernels for vol2col and im2col
Signed-off-by: Yurii <iuriish@yahoo.com>
* - correct addBias helper
Signed-off-by: Yurii <iuriish@yahoo.com>
* - correct mkl dilation formula and switch off mkl api for dilation deconvolutions
Signed-off-by: Yurii <iuriish@yahoo.com>
* - create op
- skip exec for empty inputs for non_max_suppression
- EmptyHandling idea
Signed-off-by: raver119 <raver119@gmail.com>
* Create op and mapping for it
Signed-off-by: raver119 <raver119@gmail.com>
* Added implementation files for image_resize and resize_bicubic ops.
* Image resize and image.resize_bicubic ops implementation. Initial revision.
* Finished with infrastructure development for image.resize_bilinear op and image_resizo op implementation.
* Refactored resize methods.
* Added processing for Mitchelcubic algorithm.
* Added check for input/output sizes.
* Added int and float types for crop_and_resize op.
* Refactored crop_and_resize output type check.
* Added helper for bicubic interpolation as TF v.1 does.
* Added TF v.1 bicubic helper for cuda platform.
* Added cached class for bicubic algorithm.
* Refactored cuda implementation for crop_and_resize helper to use proper output type.
* Added facilities for bicubic interpolation.
* Portion bicubic interpolation from TF.
* Added tests for resize_bilinear testing.
* Working implementation of bicubic interpolation and tests.
* Refactored routines with image_resize bicubic op helper.
* Refactored code with coding standards.
* Refactored cpu helpers for resize_bicubic op.
* Refactored bicubic helpers.
* Added bicubic resize facilities.
* Implementing cuda kernels for bicubic interpolation. Implementation step.
* Cuda implementation of resize_bicubic op helper.
* Refactor image.resize_bicubic op helpers.
* Refactored helpers for resize_bicubic. Added error checking with cuda implementation.
* Refactored cuda implementation of resize_bicubic op helper. The first working revision.
* Cuda arch implementation for resize_bicubic op helper. Full working single-threaded revision.
* Intermediate bicubic interpolation helper for cuda.
* Refactored cpu helper for resize_bicubic.
* Multithreaded cuda implementation for resize_bicubic.
* Fixed merge issues.
* Refactored nlp helpers.
* Replicated resize_bicubic for 3D also.
* Eliminated waste comments of unused code.
* Eliminated waste comments with unused code.
* Eliminated waste template definitions.
* Eliminated waste debug code.
* Eliminated waste comments.
* Fixed multithreading with helpers.
* Fixed test suites for float and double in float point input lists.
* Fixed usage of reshape with 3D/4D on resizes.
* Final fixes.
* Fixed resize_neighbor op problem.
* Added a pair of tests for failed ops.
* Fixed cpu helper for draw_bounding_boxes op.
* Refactored implementation of draw_bounding_boxes op to full conform with TF.
* Improved multithreading with draw_bounding_boxes op cuda helper.
* Eliminated log messages.
* Changed logging with draw_bounding_boxes op helper and tests.
* Resize_biliear with 3D input allowed.
* Refactored 3D input acception with resize_bilinear op.
* And another improvement.
* Refactored reshape of input/output for resize_bilinear.
* Improvements final.
* Finished with 3D replication for image.resize_bilinear/_nearest_neighbor.
* Added copyrights for TF code.
* Using new form of multithreading for cpu implementation.
* Fixed shape error.
* Added multithreaded with batches on crop_and_resize functor.
* Refactored multithreading with crop_and_resize and draw_bounding_boxes.
* - get rid of some copy procedures in mmulHelper ops
Signed-off-by: Yurii <iuriish@yahoo.com>
* - further work on embedding cuda api for batched gemm (cublasGemmBatchedEx) in our mmulHelper class
Signed-off-by: Yurii <iuriish@yahoo.com>
* - further work on cuda batched gamm api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - write own cuda kernel performing batched gemm
Signed-off-by: Yurii <iuriish@yahoo.com>
* missing include in MmulHelper
Signed-off-by: raver119 <raver119@gmail.com>
* - forgot to keep in code previous correct kernels for mmulNxN, since it may happen that new onw will fail for some reason in future
Signed-off-by: Yurii <iuriish@yahoo.com>
* disable old tensordot
Signed-off-by: raver119 <raver119@gmail.com>
* - rewrite cuda kernels for usualGemm and usualGemv
Signed-off-by: Yurii <iuriish@yahoo.com>
* - profiling mmul helpers
Signed-off-by: Yurii <iuriish@yahoo.com>
* - prints to check shapes were added
Signed-off-by: Yurii <iuriish@yahoo.com>
* - correct type of output array Cin mmulNxN
Signed-off-by: Yurii <iuriish@yahoo.com>
* - take into account possible nans in C array
Signed-off-by: Yurii <iuriish@yahoo.com>
* slightly change numThreads message
Signed-off-by: raver119 <raver119@gmail.com>
* - make corrections in accordance to given notes in pr review
Signed-off-by: Yurii <iuriish@yahoo.com>
* Added implementation files for image_resize and resize_bicubic ops.
* Image resize and image.resize_bicubic ops implementation. Initial revision.
* Minor fix
* Some TF imports disabled.
* Finished with infrastructure development for image.resize_bilinear op and image_resizo op implementation.
* Refactored resize methods.
* Added processing for Mitchelcubic algorithm.
* adjust_contrast
* Small fix for TF import expected value loading when variable name starts with the test name
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Tests
* Tests added.
* Removed tf names absent in mapping.
* Some fixes.
* Small fixes
* Minor change
* Some failing tests.
* Disable failed test
* Ignore some tests
* Fix import class mapping
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix float property mapping (flatbuffers)
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Override equality function for model 'dropout'
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fail tests
* Failed tests ignored temporarily.
* Minor fixes
* Small fix
* Conflict resolved
* Default implementations of tensorflowName and onnxName
* one range test
Signed-off-by: raver119 <raver119@gmail.com>
* few Context convenience singatures
Signed-off-by: raver119 <raver119@gmail.com>
* one more range test
Signed-off-by: raver119 <raver119@gmail.com>
* "range" "fix"
Signed-off-by: raver119 <raver119@gmail.com>
* adjuct_contrast_v2 now allows scale factor to be provided via input_variable
Signed-off-by: raver119 <raver119@gmail.com>
* adjust_contrast now allows scale factor as variable too
Signed-off-by: raver119 <raver119@gmail.com>
* bitcast shape tests
Signed-off-by: raver119 <raver119@gmail.com>
* BitCast import dtype added
Signed-off-by: raver119 <raver119@gmail.com>
* few more BitCast signatures
Signed-off-by: raver119 <raver119@gmail.com>
* - platform helpers can be disabled on per-op basis now via Context::allowHelpers
- java has access to it as well
Signed-off-by: raver119 <raver119@gmail.com>
* global platform-helpers trigger
Signed-off-by: raver119 <raver119@gmail.com>
* few signatures renamed
Signed-off-by: raver119 <raver119@gmail.com>
* - few new env variables to follow
- maxThreads/masterThreads differentiation
Signed-off-by: raver119 <raver119@gmail.com>
* Javadoc update
Signed-off-by: raver119 <raver119@gmail.com>
* Corrected randomuniform declaration.
* Refactored uniform distribution for both cuda and cpu platforms.
* Refactored uniform distribution and tests.
* Fixed type usage with indices.
* Refactored uniform distribution implementation and tests to full conform with TF implementation.
* Refactored gamma function to use type util method.
* Copyright changes and fixes with ConstantHelper.
* Added error checking on allocate cuda device memory and operations.
* Added implementation for random_gamma op.
* Added implementation for random_poisson op and support classes.
* Added helpers for random_poisson and random_gamma ops.
* Implementation of random_poisson. The first working edition.
* Implementation of random_poisson. Parallelized working edition.
* Implementation of random_gamma. Parallelized working edition with alpha only.
* Added cuda implementation for helper of poisson distribution.
* Corrected shape calculation with random_gamma and tests.
* Finished cpu implementation for gamma distribution.
* Finished cuda implementation for random_gamma op.
* Refactored cpu helpers for random_gamma and random_poisson ops.
* Refactored cuda helpers for gamma and poisson distribution.
* Refactored cuda helper for gamma distribution.
* Refactored cpu helper for random_poisson op.
* Refactored cpu helper for random_gamma op.
* #8280 biasadd_bp nchw arg fixes (java side) + test
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8285 Concat op Java side fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Concat op cpp fix - allow dynamic axis to be negative, same as static axis
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* ignores for deconv3d import tests until deconv3d_tf op is implemented
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* - provide possibility to pass axis as last input array in concat op
- corrcect sumation in bias_add_bp op for NHWC case
Signed-off-by: Yurii <iuriish@yahoo.com>
* - write code for deconv2d op based on mkl dnn api
* no unsafe math
Signed-off-by: raver119 <raver119@gmail.com>
* no unsafe math
Signed-off-by: raver119 <raver119@gmail.com>
* - get rid of e<> and p<> methods in svd helper
Signed-off-by: Yurii <iuriish@yahoo.com>
* - provide mkl api support for deconvolution 3d
Signed-off-by: Yurii <iuriish@yahoo.com>
* - write deconv2d_bp based on mkl api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - write deconv3d_bp based on mkl api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - testing and fixing deconv based on mkl api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - remove dilation form conv2d/3d mkl
Signed-off-by: Yurii <iuriish@yahoo.com>
* - minor changes
Signed-off-by: Yurii <iuriish@yahoo.com>
* - further corrections of deconv ops based on mkl dnn api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - provide deconv2d_tf based on mkl dnn api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - add minor corrections required by reviewer
Signed-off-by: Yurii <iuriish@yahoo.com>
* Added non_max_suppression_overlaps op and tests.
* Refactored implementation of non_max_suppression_overlaps.
* Refactoring of implementation of non_max_suppression_overlaps op.
* Refactoring of implementation of non_max_suppression op.
* Fixed portion error.
* Added cuda frontends for image suppression ops.
* Eliminated crash with cuda arch on image.non_max_suppression_overlaps op.
* Improved implementation of image_suppression helper for cpu platform.
* The generic approach of non_max_suppression_overlaps op helper with cuda platform.
* Working cuda implementation of helper non_max_suppression_overlaps op.
* Eliminated waste comments.
* Improved implementations for both platforms
* Refactored cuda implementation of image.non_max_suppression_overlaps op helper.
* Improved cuda implementation of non_max_suppression op helper.
* Refactored cuda implementation of image.non_max_suppression_overlaps op helper.
* Improved cuda implementation of image.non_max_suppression_overlaps op helper.
* Added modifications into cuda implementation for image suppression overlaps op.
* Correct queue emulation with cuda implementation of non_max_suppression_overlaps op.
* Prefinal stage of cuda implementation of non_max_suppression_overlaps.
* Worked cuda implementation of non_max_suppresion_overlaps helper.
* Fixed return to proper thread.
* Improvements for cuda implementation of image.non_max_suppression_overlaps op helper.
* Fixed implementation issues with non_max_suppression_overlaps on cuda platform.
* Fixed skip for non_max_suppression_overlaps on cuda platform.
* Finalize implementation of image_suppression helper and tests.
* Cosmetic changes only.
* - write code for new batchnorm backprop
Signed-off-by: Yurii <iuriish@yahoo.com>
* - testing batchnorm backprop
Signed-off-by: Yurii <iuriish@yahoo.com>
* - write code for batchnorm backprop based on mkl dnn api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - testing and fixing bugs in batchnorm_bp mkl dnn
Signed-off-by: Yurii <iuriish@yahoo.com>
* - made corrections required by reviewer
Signed-off-by: Yurii <iuriish@yahoo.com>
* - change name in java wrapper for batchnorm op
Signed-off-by: Yurii <iuriish@yahoo.com>
* - provide correct call NDArray::applyBroadcast inside of NDArray::applyTrueBroadcast
Signed-off-by: Yurii <yurii@skymind.io>
* - provide new trueBroadcast helper
Signed-off-by: Yurii <yurii@skymind.io>
* example for yurii
Signed-off-by: raver119 <raver119@gmail.com>
* - provide new trueBroadcast helper for cpu
Signed-off-by: Yurii <yurii@skymind.io>
* - start working on new trueBroadcat helper for cuda
Signed-off-by: Yurii <yurii@skymind.io>
* - further work on trueBroadcast for cuda
Signed-off-by: Yurii <yurii@skymind.io>
* - fix bugs in cuda helper trueBroadcast
Signed-off-by: Yurii <yurii@skymind.io>
* Added comments to tileKernel routine.
* Refactored kernel and added doc to it.
* Refactored setDiagonal kernel and added doc for it.
* Added doc for tnse cuda helpers.
* Added doc for diag kernels.
* Added doc for kernel.
* Refactored code with fake quantization.
* Added docs for image resize and crop kernels.
* Added docs for image suppression helpers.
* Added docs to matrix_band helpers.
* Added docs for matrix_diag_part and nth_element helpers.
* Fixed syntax error and refactored getIndexOffset usage.
* - profiling bias_add op
- add some docementation
Signed-off-by: Yurii <yurii@skymind.io>
* - minor change
Signed-off-by: Yurii <yurii@skymind.io>
* - provide addBias cuda kernel
Signed-off-by: Yurii <yurii@skymind.io>
* - improve shape::getIndexOfffset and change its signature
Signed-off-by: Yurii <yurii@skymind.io>
* - same as previous
Signed-off-by: Yurii <yurii@skymind.io>
* - improve and change signature in some shape:: stuff which has to do with calculation of offsets for array elements
Signed-off-by: Yurii <yurii@skymind.io>
* - minor changes in flatten
Signed-off-by: Yurii <shyrma@skymind.io>
* - add function shape::getIndexOffsetOrdered
Signed-off-by: Yurii <shyrma@skymind.io>
* - correct shape::getIndexOffsetOrdered()
Signed-off-by: Yurii <shyrma@skymind.io>
* - move getIndexOffsetOrdered to flatten.h header in order to isolate this function
Signed-off-by: Yurii <shyrma@skymind.io>
* Fix repo links and clean up old github templates
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More link updates
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Comments axis shifts.
* Fixed LUP solver usage. Added helpers doc.
* Switch off OMP for roll and lup. Fixed omp usage for ClipByGlobalNorm.
* Switch off omp for ClipByGlobalNorm to reduce omp ambigiousness.
* Actualized doc to tnse ops.
* Added comments for dynamic_stitch op.
* Added comments to dynamic_stitch op implementation.
* Modified comment for unstack_list op.
* Added doc for space_to_depth and depth_to_space ops.
* Added doc for space_to_batch op.
* Enlarge test type for adjustSaturation.
* Added doc for runner.
* Rename flatbuffers DataType enum to DType
Signed-off-by: Alex Black <blacka101@gmail.com>
* Rename flatbuffers DataType enum to DType
Signed-off-by: Alex Black <blacka101@gmail.com>
* Updates for flatbuffers datatype enum renaming
Signed-off-by: Alex Black <blacka101@gmail.com>
* - documenting and profiling matrix_set_diag cuda kernel
Signed-off-by: Yurii <yurii@skymind.io>
* - correct formula of pnorm pooling in cuda 2d/3d kernels
- remove helper matrix_diag which duplicates work of helper matrix_set_diag
Signed-off-by: Yurii <yurii@skymind.io>
* one test for alex
Signed-off-by: raver119 <raver119@gmail.com>
* fix
Signed-off-by: raver119 <raver119@gmail.com>
* get rid of safety offset in cpp
Signed-off-by: raver119 <raver119@gmail.com>
* bfloat16
Signed-off-by: raver119 <raver119@gmail.com>
* minor test rearrangement to fastpath launch
Signed-off-by: raver119 <raver119@gmail.com>
* - atomicAdd/Mul/Div fix for float16/bfloat16 misalignment
- one special test for maxpoolbp java
- safety offset of 8 bytes is back to libnd4j legacy
Signed-off-by: raver119 <raver119@gmail.com>
* - provide new cuda kernel for softmax
Signed-off-by: Yurii <yurii@skymind.io>
* - further work on cuda kernel for softmax
Signed-off-by: Yurii <yurii@skymind.io>
* - correction cuda kernel for softmax
Signed-off-by: Yurii <yurii@skymind.io>
* - add one additional test for svd
* - provide float argument in eye op to be a type of output array
Signed-off-by: Yurii <yurii@skymind.io>
* - add cuda capability check to mmulHelper
Signed-off-by: Yurii <yurii@skymind.io>
* - make use another method for divice id evaluation
Signed-off-by: Yurii <yurii@skymind.io>
* Eye data type as T argument
Signed-off-by: raver119 <raver119@gmail.com>
* Refactored kernels for segment_max/min/sum ops.
* Refactored segment_prod kernels.
* Refactored segment_prod kernels.
* DynamicPartition test
Signed-off-by: raver119 <raver119@gmail.com>
* Addede linear test for dynamic_partition op.
* Refactored test with int datatype.
* some logging
Signed-off-by: raver119 <raver119@gmail.com>
* some logging
Signed-off-by: raver119 <raver119@gmail.com>
* some logging
Signed-off-by: raver119 <raver119@gmail.com>
* dynamicPartition fix
Signed-off-by: raver119 <raver119@gmail.com>
* get rid of some logging
Signed-off-by: raver119 <raver119@gmail.com>
* one more test for dynamic_stitch
Signed-off-by: raver119 <raver119@gmail.com>
* one more test for dynamic_stitch
Signed-off-by: raver119 <raver119@gmail.com>
* empty check for stitch
Signed-off-by: raver119 <raver119@gmail.com>
* minor print changes
Signed-off-by: raver119 <raver119@gmail.com>
* one noop test
Signed-off-by: raver119 <raver119@gmail.com>
* skip input validation for no-input ops
Signed-off-by: raver119 <raver119@gmail.com>
* - one more noop empty test
- one more validation before sync
Signed-off-by: raver119 <raver119@gmail.com>
* typo
Signed-off-by: raver119 <raver119@gmail.com>
* one more validation fix
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA empty reductions java side
Signed-off-by: raver119 <raver119@gmail.com>
* one svd test
Signed-off-by: raver119 <raver119@gmail.com>
* Corrected segment_mean helpers and added another test.
* Refactored segment_mean kernels to avoid race_condition.
* - further work on layer_norm
Signed-off-by: Yurii <yurii@skymind.io>
* - further work on layer_norm 2
Signed-off-by: Yurii <yurii@skymind.io>
* - correct helpers for svd cuda
Signed-off-by: Yurii <yurii@skymind.io>
* one test for gather_nd
Signed-off-by: raver119 <raver119@gmail.com>
* get rid of old concat tests
Signed-off-by: raver119 <raver119@gmail.com>
* one printf
Signed-off-by: raver119 <raver119@gmail.com>
* one more legacy test removed
Signed-off-by: raver119 <raver119@gmail.com>
* gatherNd launch params fix
Signed-off-by: raver119 <raver119@gmail.com>
* gatherNd launch params fix
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA empty reduction
Signed-off-by: raver119 <raver119@gmail.com>
* - listdiff synchronization fix for CUDA
- listdiff test
Signed-off-by: raver119 <raver119@gmail.com>
* - IndexReduce ops now allow INDEXING_TYPES output
- topK op accepts only INDEXING_TYPES as output
Signed-off-by: raver119 <raver119@gmail.com>
* one test for maxpool2d_bp
Signed-off-by: raver119 <raver119@gmail.com>
* - maxpool2d_bp cuda fix for NaNs
- streamSync after each custom op execution
Signed-off-by: raver119 <raver119@gmail.com>
* one test for size
Signed-off-by: raver119 <raver119@gmail.com>
* - few tests for size op
- size/rank/size_at ops now use p instead of assign
Signed-off-by: raver119 <raver119@gmail.com>
* throw exception if op execution failed
Signed-off-by: raver119 <raver119@gmail.com>
* expected for test
Signed-off-by: raver119 <raver119@gmail.com>
* one more ismax test
Signed-off-by: raver119 <raver119@gmail.com>
* ismax view fix
Signed-off-by: raver119 <raver119@gmail.com>
* Small batch norm fix (cuda/no-mkldnn)
Signed-off-by: Alex Black <blacka101@gmail.com>
* Dropout fix for RnnOutputLayer
Signed-off-by: Alex Black <blacka101@gmail.com>
* Allow block size < 2 in batch_to_space_nd and space_to_batch_nd for import, in spite of what TF docs say
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* - start working on space_to_batch_nd
Signed-off-by: Yurii <yurii@skymind.io>
* - provide cpu helper for space_to_batch_nd op
Signed-off-by: Yurii <yurii@skymind.io>
* few typos fixed
Signed-off-by: raver119 <raver119@gmail.com>
* - add tests for space_to_batch and correct bugs
Signed-off-by: Yurii <yurii@skymind.io>
* - write cuda kernel for space_to_batch op
Signed-off-by: Yurii <yurii@skymind.io>
* - add order argument to shape::index2coords method in convolution cuda ops
Signed-off-by: Yurii <yurii@skymind.io>
* - restore some previous code
Signed-off-by: Yurii <yurii@skymind.io>
* old col2im kernel activated
Signed-off-by: raver119 <raver119@gmail.com>
* - change coords calculation in col2im kernel
Signed-off-by: Yurii <yurii@skymind.io>
* - restore old col2im kernel
Signed-off-by: Yurii <yurii@skymind.io>
* - add custom op for batch_to_space
Signed-off-by: Yurii <yurii@skymind.io>
* - provide cpu version for batch_to_space_nd op
Signed-off-by: Yurii <yurii@skymind.io>
* - provide cuda kernel for batch_to_space_nd op
Signed-off-by: Yurii <yurii@skymind.io>
* Added tests for get_seed/set_seed ops.
* Added missed tests for scatter_sub/mul/div ops.
* Added tests for hardsigmoid and hardsigmoid_bp.
* Added tests for hardtanh and hardtanh_bp ops.
* Added test for histogram op.
* Added tests for identity op.
* Refactored mergemaxindex op. Added tests for log1p,mergemaxindex, mod and mod_bp ops.
* Fixed tests for FloorDiv.
* Added test for rank op.
* Added tests for rationaltanh/rationaltanh_bp ops.
* Added tests for realdiv/realdiv_bp.
* Added tests for rectifiedtanh/_bp ops.
* Added tests for shapes_of op.
* Added tests for shapes_of op.
* Added tests for size op.
* Added tests for softplus/_bp ops.
* Added tests for softsign/_bp ops.
* Added tests for toggle_bits op. Fixed processing of OP_IMPL and so on defititions.
* Added test for truncatediv op.
* Added another test for truncatediv op.
* Added another test for histogram.
* Added tests for unstack_list op.
* Refactored to_int32/uint32/float16/float32/double/int64/uint64 ops and tests.
* Refactored mergemaxindex op helper for cuda platform and tests.
* Fixed cuda kernel for histogram op helper.
* Refactor skipgram to avoid early buffers shift.
* Fixed check up with non_max_suppression op cuda helper. Added cuda kernel implementation for skipgram op helpers.
* Added implementation of skipgram op helper for cuda platform. Working revision
* Fixed mergeMaxIndex kernel and move it to separate source file.
* - correct cuda concat
Signed-off-by: Yurii <yurii@skymind.io>
* - pooling 2d/3d : take into account possible case when input and gradI have different strides
Signed-off-by: Yurii <yurii@skymind.io>
* master pulled in
Signed-off-by: raver119 <raver119@gmail.com>
* floordiv_bp test reverted
Signed-off-by: raver119 <raver119@gmail.com>
* - add NDArray::printLinearBuffer method
Signed-off-by: Yurii <yurii@skymind.io>