Commit Graph

67 Commits (00cd61f32dfdf97d498ce4e42b7ce20b52237e0d)

Author SHA1 Message Date
raver119 1ab86d1306
Range op data type (#204)
* - range op now accepts dargs
- dargs now can be in signature

Signed-off-by: raver119 <raver119@gmail.com>

* range dtype java side

Signed-off-by: raver119 <raver119@gmail.com>

* linspace fix

Signed-off-by: raver119 <raver119@gmail.com>

* lin_space fix for scalar outputs

Signed-off-by: raver119 <raver119@gmail.com>
2020-01-31 10:45:40 +03:00
raver119 5d98cfcf47
Configurable DataType for ops (#201)
* initial commit

Signed-off-by: raver119 <raver119@gmail.com>

* - one more test for OneHot with dtype
- one more signature in Nd4j

Signed-off-by: raver119 <raver119@gmail.com>

* ones_as/zeros_as now accept dtype

Signed-off-by: raver119 <raver119@gmail.com>

* one more test

Signed-off-by: raver119 <raver119@gmail.com>

* - more updates for configurable data types
- ones_as/zeros_as java side + tests

Signed-off-by: raver119 <raver119@gmail.com>

* few c++ tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* few more changes around DArgs

Signed-off-by: raver119 <raver119@gmail.com>
2020-01-30 18:46:12 +03:00
raver119 ba961c7601
DataTypes & FlatBuffers (#197)
* flatbuffers version upgrade

Signed-off-by: raver119 <raver119@gmail.com>

* flatbuffers version upgrade java side

Signed-off-by: raver119 <raver119@gmail.com>

* flatbuffers dependency version upgrade java side

Signed-off-by: raver119 <raver119@gmail.com>

* MKLDNN version upgrade

Signed-off-by: raver119 <raver119@gmail.com>

* DArgs first pass

Signed-off-by: raver119 <raver119@gmail.com>

* signatures first pass

Signed-off-by: raver119 <raver119@gmail.com>

* signatures second pass

Signed-off-by: raver119 <raver119@gmail.com>

* signatures third pass

Signed-off-by: raver119 <raver119@gmail.com>

* signatures third pass

Signed-off-by: raver119 <raver119@gmail.com>

* signatures fourth pass

Signed-off-by: raver119 <raver119@gmail.com>

* signatures fifth pass

Signed-off-by: raver119 <raver119@gmail.com>

* flatbuffers UI version upgrade java side

Signed-off-by: raver119 <raver119@gmail.com>

* flatbuffers ui update

Signed-off-by: raver119 <raver119@gmail.com>

* flatbuffers downgrade

Signed-off-by: raver119 <raver119@gmail.com>

* flatbuffers downgrade java side

Signed-off-by: raver119 <raver119@gmail.com>
2020-01-30 10:07:24 +03:00
shugeo 2717b25931 Shugeo qr (#153)
* Added qr op implementation. Initial version.

* Fixed doc for qr op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Implementation of QR decomposition. CPU platform version.

* Added a pair of tests for qr op testing.

Signed-off-by: shugeo <sgazeos@gmail.com>

* QR implementation.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Corrected norm using.

* Properly calculated intermediate results with QR decomposition.

* Another step to implement QR algorithm by householder.

* Cpu implementatio for QR decomposition. The first working edition.

* Corrected test to QR decomposition.

* Added tad multithreading with QR implementation.

* Finished cpu implementation for QR decomposition helpers.

* Refactored tests and improved multithreading.

* Refactored QR cpu implementation and update cuda implementation helpers.

* Cuda QR helper implementation. The first working edition.

* Eliminated waste prints.

* Restore multithreading with cuda implementation.

* Ops names corrected

* Refactored qr op helpers to optimize.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Eliminated waste manual ticking.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored memory allocation to avoid waste memory usage.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored matrixMinor method both for cuda and cpu platforms.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored method of vmul to use raw buffers instead type conversion.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored temporary array of matricies.

Signed-off-by: shugeo <sgazeos@gmail.com>

Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
Co-authored-by: raver119 <raver119@gmail.com>
2020-01-22 13:59:36 +03:00
shugeo 815a2908af Shugeo solve triangular (#173)
* Added implementation of the triangular_solve op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed compilation issues.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added verification of input data and helpers facilities for triangular_solve op.'

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added cpu implementation for triangular_solve helpers.

* Added tests and implementation for upper triangular equations.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added a pair of cases to tests.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added multithreading with cpu helpers for triangular_solve op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added cuda implementation of triangular_solve op helpers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Finished cuda implementation of triangular_solve helpers and tests.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed copyright marks.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Corrected grammar errors with doc and error messages.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored matricies processing with triangular_solve cuda helper implementation.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added triangular_solve wrapper

* Fixed mapping

* Added processing for adjoint with cpu helpers of triangular_solve op implementation.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added implementation for adjoint routine with cuda platform.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added multithreading with adjoint routine for cpu platform.

Signed-off-by: shugeo <sgazeos@gmail.com>

Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
2020-01-22 10:48:03 +03:00
shugeo e50b285c2c Shugeo resize area (#162)
* Added implementation for resize_area op. Initial commit.

* Added implementation of resize_area op. Initial revision.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Corrected resizeArea functor call.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Implementation of resize_area. Cpu platform helpers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Implementation for resize_area helpers. The first part revision.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added a set of tests for resize_area op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Cuda implementation for resize_area. Initial approach.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adding multithreading for resize_area algorithm.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Cuda implementation of resize_area helpers. Shared memory approach.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored resizeAreaKernel with cuda implementation.

* Eliminated compilation errors.

* ResizeArea helpers for cuda platform. The first working revision.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added test for batched resize_area op testing.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Implementation of resize_are for cuda platform and tests.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed multithreading with resize_area op helper.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Corrected copyright marks with sources.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Corrected copyright mark for resize_area op implementation.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Corrected copyright mark for parity ops header.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Corrected typo in strings and so on with image resize ops.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored resize_area helpers and multithreading.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added ResizeArea wrapper

* Added test with align_corners and fixed shape processing with only int args given for output size.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added test

* TF mapping for ResizeArea

* Fixed implementation issues with resize_area op for both platforms.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored image resizer struct to use flexible types for ints and floats.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Improved multithreading with resizeAreaKernel launch.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Use asynchronical memory copying with cuda platform image resize allocations.

Signed-off-by: shugeo <sgazeos@gmail.com>

Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
2020-01-22 10:46:33 +03:00
shugeo 6943a5f57a Shugeo lgamma (#170)
* lgamma op. Initial version.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored lgamma op and test.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Lgamma wrapper

* Added TF mapping

Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
2020-01-20 12:29:36 +03:00
shugeo fbf7c9d38b Fixed lu for cuda platform and tests. (#158)
Signed-off-by: shugeo <sgazeos@gmail.com>
2020-01-02 23:25:41 +03:00
Yurii Shyrma 5d9b2a16e5 Shyrma temp (#131)
* - specifying template instantiation for certain types in float16 and bloat16

Signed-off-by: Yurii <iuriish@yahoo.com>

* - polishing bfloat16 and float16 member functions template specialization

Signed-off-by: Yurii <iuriish@yahoo.com>

* - rewrite and overload array +-*/ scalar and scalar +-*/ arr in NDAray class

Signed-off-by: Yurii <iuriish@yahoo.com>

* - make corrections which have to do with and rvalue lvalue conversions

Signed-off-by: Yurii <iuriish@yahoo.com>

* - provide move semantic in NDArray operators array +-/* array

Signed-off-by: Yurii <iuriish@yahoo.com>

* float16/bfloat16 tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* one more tweak

Signed-off-by: raver119 <raver119@gmail.com>

* - make float16 and bfloat16 to compile successfully on cuda

Signed-off-by: Yurii <iuriish@yahoo.com>

* - do not use resources of view-like arrays when move semantics is applied

Signed-off-by: Yurii <iuriish@yahoo.com>

* - get rid of pointers in signatures NDArray methods 1

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correction of signature of NDArray::dup method

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correction of signature of NDArray::reduceAlongDimension method

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::applyIndexReduce and applyTrueBroadcast methods

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::applyReduce3 and varianceAlongDimension methods

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::tensorsAlongDimension and diagonal methods

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::allTensorsAlongDimension

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::reduceAlongDimension 2

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::applyTransform 2

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::applyPairwiseTransform 2

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::applyBroadcast 2

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::applyTrueBroadcast 2

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::applyScalar and applyScalarArr

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::lambda methods

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::reduce3 methods 2

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of following NDArray methods: add/sub/mul/div row/column and fillAsTriangular

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::tileToShape methods

Signed-off-by: Yurii <iuriish@yahoo.com>

* - signature correction of NDArray::isShapeSameStrict method

Signed-off-by: Yurii <iuriish@yahoo.com>

* minor corrections in tests

Signed-off-by: Yurii <iuriish@yahoo.com>

* - replace reduce op in batchnorm mkldnn

Signed-off-by: Yurii <iuriish@yahoo.com>

* - add explicit templates instantiations for operator+(NDArray&&. const scalar)

Signed-off-by: Yurii <iuriish@yahoo.com>

* - corrections of casts in float16/bfloat16

Signed-off-by: Yurii <iuriish@yahoo.com>

* - provide move semantics in following NDArray methods: transform, applyTrueBroadcast, transpose, reshape, permute

Signed-off-by: Yurii <iuriish@yahoo.com>

* - get rid of input array A duplicate in svd cuda op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - avoid available bug in svd cuda API

Signed-off-by: Yurii <iuriish@yahoo.com>

* - add temporary global memory buffer in svd cuda when calcUV = false and  m != n

Signed-off-by: Yurii <iuriish@yahoo.com>

* - remove test with blfoat16 type for betainC

Signed-off-by: Yurii <iuriish@yahoo.com>

* - resolve conflicts after master has been merged in

Signed-off-by: Yurii <iuriish@yahoo.com>

* - changed type of affected input array in fused_batch_norm

Signed-off-by: Yurii <iuriish@yahoo.com>

* - add several explicit type castings

Signed-off-by: Yurii <iuriish@yahoo.com>

* - add ND4J_EXPORT to operators

Signed-off-by: Yurii <iuriish@yahoo.com>

* - add explicit template types in instantiations of template arithm operators of NDArray class

Signed-off-by: Yurii <iuriish@yahoo.com>

* - one more test fix

Signed-off-by: Yurii <iuriish@yahoo.com>

Co-authored-by: raver119 <raver119@gmail.com>
2019-12-20 22:35:39 +03:00
shugeo 67d8199165 [WIP] Shugeo lup (#126)
* Added infrastructure for implementation op lu for both cuda and cpu platforms.

* Added implementation of helpers with lu op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored LU decomposition to use vector of permutations instead.

* Refactored helpers for lu op.

* Fixed crash with determinant op.

* Refactored cpu LU op heleper.

* Added implementation for lu op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed issue with argmax on column.

* Added multithreaded behaviour for lu op helper.

* Fixed multithreaded cpu implementation helpers for lu op.

* Added cuda implementation for lu op helper.

* Finished lu helper implementation for cuda platform.

* Eliminated waste prints and comments.

* Fixed race condition and multithreading issues.

* Fixed memory leak with shape construction.

* Corrected test for lu op to avoid near zero elements on the main diagonal."

Signed-off-by: shugeo <sgazeos@gmail.com>

* Improved test for adjust_constast op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed issues with cuda implementation of resize_bicubic helpers.

Signed-off-by: shugeo <sgazeos@gmail.com>
2019-12-20 17:56:28 +03:00
shugeo fc7c6d4e82 Shugeo roll fix3 (#127)
* Added tests for roll with scalar shift and axis.

* Fixed problem with roll on 1D input with scalar axis and test.

* Only cosmetic changes.
2019-12-19 13:10:06 +03:00
Abdelrauf e0a9cb6c08 [WIP] HSV,RGB color model conversions (#125)
* CUDA implementation for hsv_to_rgb and rgb_to_hsv

Signed-off-by: raver119 <raver119@gmail.com>

* hsv_to_rgb and rgb_to_hsv operations
Test coverage: c order 1d, 2d, 3d array

Signed-off-by: Abdelrauf <rauf@konduit.ai>

* Index check

Signed-off-by: Abdelrauf <rauf@konduit.ai>

* Suppress Msvc floating point errors

Signed-off-by: Abdelrauf <rauf@konduit.ai>

* Added Index Check for adjust_saturation and adjust_hue

Signed-off-by: Abdelrauf <rauf@konduit.ai>

* minor fix

Signed-off-by: raver119 <raver119@gmail.com>

* Fixes missed Msvc floating narrowing errors

Signed-off-by: Abdelrauf <rauf@konduit.ai>
2019-12-17 09:42:09 +03:00
Alexander Stoyakin 927d591421 ResizeBicubic added (#117)
* ResizeBicubic added
Some fixes.

* Test fixed

* Narrowed argument type changed to boolean

* Clean up
2019-12-09 18:25:39 +11:00
raver119 b32dd1bf92
[WIP] resize_bicubic types (#116)
* resize_bicubic: allow more dtypes

Signed-off-by: raver119 <raver119@gmail.com>

* resize_bicubic: allow less dtypes

Signed-off-by: raver119 <raver119@gmail.com>

* Refactored resize_bicubic op to full conform with TF1.5 and tests.

* Corrected test to proper data type output.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Corrected double input test to float constant outputs.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Finished with correction of tests for bicubic interpolated resizes expected.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed adjust_contrast ops to allow non-RGB inputs.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored adjust_contrast_v2 to conform with TF one.

Signed-off-by: shugeo <sgazeos@gmail.com>

* AdjustContrast tests activated

* two typos fixed

Signed-off-by: raver119 <raver119@gmail.com>
2019-12-06 18:58:37 +03:00
shugeo e09a785232 Shugeo resize fix5 (#102)
* Refactored resize images ops to use TF-like bool args as input.

* Refactored helpers for cpu implementation of resize_bilinear and resize_nearest_neighbor ops.

* Refactored cuda implementation for image.resize_bilinear and image.resize_nearest_neighbor ops helpers.

* Refactored nearest_neighbor resize op.

* Added a pair of tests for special case of resize_bilinear algorithm.

* Fixed issue with resize_bilinear op.

* Refactored cpu implementation for helpers with resize_nearest_neighbor op.

* Final fixed for resize ops to conform TF v.1.5

* Refactored cuda helpers for resize_neares_neighbor op.

* Fixed resize_bilinear to accept proper data.

* Fixed issue with non-float input for resize_bilinear op.

* Refactored cuda helper for resize_bilinear to proper process non-float inputs.

* Added tests for resize_bilinear to int inputs.

* Fixed ResizeBilinear wrapper

* Tests fixed

* Fixed float and bool constant to avoid overflow for some kind of compilers.

* Corrected float constants with float data type.

* Added f suffix for float constants.

* Corrected float constant to avoid overflow with initializing lists.

* Corrected float initializing list with float input.

* Corrected bool constant with initalizing list.

* Corrected float and bool values with initializing lists.

* Fixed wrong constant.

* Fixed issue with 1x1 input picture for resize.

* ResizeBilinear default values on import fix

Signed-off-by: raver119 <raver119@gmail.com>
2019-12-05 22:05:33 +03:00
Alex Black 578a5abb68 DNNL/MKLDNN dilated causal conv1d + betainc (#103)
* - add padding calculation in same mode in causal conv1d op for right mkl paddings

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct causal condition in mkldnnUtils.cpp

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct some code which caused additional round errors is betainc op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - put float in place of template parameter in nan assign in betainc op

Signed-off-by: Yurii <iuriish@yahoo.com>
2019-12-04 14:50:17 +03:00
Yurii Shyrma 1f5e15b541 Shyrma adjust (#98)
* - add possibility of passing scalar-array as input parameter for scale factor in adjust hue/contrast/saturation ops
- correct typo in function which calculates regularized incomplete beta integral

Signed-off-by: Yurii <iuriish@yahoo.com>

* - fix bug in betainc cuda kernel

Signed-off-by: Yurii <iuriish@yahoo.com>

* - start working on implementation of digamma function

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further work on digamma function (cpu)

Signed-off-by: Yurii <iuriish@yahoo.com>

* - testing and fixing bugs in digamma op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - make correction n cuda kernel for polyGamma

Signed-off-by: Yurii <iuriish@yahoo.com>

* - remove unnecessary stuff from betaInc cuda kernel

Signed-off-by: Yurii <iuriish@yahoo.com>

* - resolve conflicts in DeclarableOpsTests3.cpp after master branch has been merged

Signed-off-by: Yurii <iuriish@yahoo.com>

* - restore id number of Not opertion in legacy_ops.h

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct padding calculation in mkl dnn conv1d causal

Signed-off-by: Yurii <iuriish@yahoo.com>

* restore empty check in adjust_contrast_v2

Signed-off-by: raver119 <raver119@gmail.com>
2019-12-03 09:40:45 +03:00
shugeo 1e9ff114aa Shugeo atomic tests (#97)
* Added atomic tests for atomicAdd, atomicSub and atomicDiv.

* Fixed atomicAdd for 16bit ints.

* Fixed atomicMul for 16 floats.

* Eliminated waste prints.

* Fixed problems with double type on matrix inverse helepers.

* Eliminated commented wrong code.

* Refactored atomicMul for 16bit types.

* few more minor tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* Fixed fake_quant_with_min_max_vars_per_channel args processing.
2019-12-02 21:40:54 +03:00
raver119 25b3cd9b80
[WIP] CUDA tests (#95)
* one more CI test

Signed-off-by: raver119 <raver119@gmail.com>

* export additional symbols

Signed-off-by: raver119 <raver119@gmail.com>

* few more tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* one more tweak for linux

Signed-off-by: raver119 <raver119@gmail.com>

* fix dtype in few tests

Signed-off-by: raver119 <raver119@gmail.com>

* missing sync and memset in couple of tests

Signed-off-by: raver119 <raver119@gmail.com>

* copy step for libnd4j cuda

Signed-off-by: raver119 <raver119@gmail.com>

* no-op on empty for adjust hue/contrast/saturation

Signed-off-by: raver119 <raver119@gmail.com>

* CUDA_VERBOSE Off

Signed-off-by: raver119 <raver119@gmail.com>

* BroadcastBool fix + few tests

Signed-off-by: raver119 <raver119@gmail.com>

* trigger jenkins

Signed-off-by: raver119 <raver119@gmail.com>

* trigger jenkins

Signed-off-by: raver119 <raver119@gmail.com>

* - ignore couple of warnings
- remove redundant compiler options

Signed-off-by: raver119 <raver119@gmail.com>
2019-12-02 21:37:21 +03:00
shugeo 009007120b Shugeo_release_fixes3 (#81)
* Implementation for non_max_suppression_v3 was added. Initial version

* Added check for overcome threshold.

* Added definition for V3 method.

* java remapping for NonMaxSuppressionV3

Signed-off-by: raver119 <raver119@gmail.com>

* Fixed proporly processing of an empty output and test.

* Refactored op to less threshold data to float.

* Implemented cuda-based helper for non_max_suppression_v3 op.

* Fixed fake_quant_with_min_max_vars op.

* Fixed tests with float numbers.

* - assert now stops execution
- sortByKey/sortByValue now have input validation

Signed-off-by: raver119 <raver119@gmail.com>

* missing var

Signed-off-by: raver119 <raver119@gmail.com>

* Fixed proper processing for zero max_size inputs.

* Refactored kernel callers.

* Fixed return statement for logdet op helper.

* Refactored unsorted segment SqrtN op.

* get back 8 tail bytes on CUDA

Signed-off-by: raver119 <raver119@gmail.com>

* Refactored segment prod ops and helpers for cuda and tests.

* Additional test.

* CudaWorkspace tests updated for 8 tail bytes

Signed-off-by: raver119 <raver119@gmail.com>

* special atomic test

Signed-off-by: raver119 <raver119@gmail.com>

* atomicMul/atomicDiv fix for 16bit values

Signed-off-by: raver119 <raver119@gmail.com>

* Eliminated waste prints.
2019-11-28 21:08:51 +03:00
Yurii Shyrma a8dd6713aa Shyrma scatter (#84)
* - improve performance of scatter (no lock) ops for 1D case

Signed-off-by: Yurii <iuriish@yahoo.com>

* - improve scatter lock op performance for 1D case

Signed-off-by: Yurii <iuriish@yahoo.com>

* - add kernel for verification of input indices-array elements in scatter and scatter_nd ops

Signed-off-by: Yurii <iuriish@yahoo.com>

* - provide fast indices checking on cpu side for scatter and gather osp

Signed-off-by: Yurii <iuriish@yahoo.com>

* - apply corrections requested by pr reviewer

Signed-off-by: Yurii <iuriish@yahoo.com>
2019-11-26 20:29:09 +03:00
shugeo 4187190609 Shugeo release fix2 (#70)
* Corrected input checking and tests for bitcast op.

* Fixed an issue with non_max_suppression form generation and processing with score threshold given.

* Fixed bilinear resize kernel and tests.

* push for Serhii

Signed-off-by: raver119 <raver119@gmail.com>

* Added test for nearest_neighbor resize with int input.

* Added data type check for input/output match.

* Eliminate error in macros.

* Improved output message for type checking.

* Fixed input/output types for op.

* Eliminated waste logging.

* Refactored resize_bilinear helper for multithreading for cpu platform.

* Cosmetic changes only.

* Fixed error for string substitution.

* Skip test for cbow_batch with cuda.

* fix for resizeNearestNeighbor output dtype

Signed-off-by: raver119 <raver119@gmail.com>

* Refactored non_max_suppression helper.

* Refactored shape generation and input handling.

* Added additional test.
2019-11-22 22:42:44 +03:00
raver119 83cb0d9329
[WIP] Create and small fix (#67)
* - create op
- skip exec for empty inputs for non_max_suppression
- EmptyHandling idea

Signed-off-by: raver119 <raver119@gmail.com>

* Create op and mapping for it

Signed-off-by: raver119 <raver119@gmail.com>
2019-11-21 13:31:20 +03:00
shugeo dc0036f2c6
Shugeo image resize bicubic (#56)
* Added implementation files for image_resize and resize_bicubic ops.

* Image resize and image.resize_bicubic ops implementation. Initial revision.

* Finished with infrastructure development for image.resize_bilinear op and image_resizo op implementation.

* Refactored resize methods.

* Added processing for Mitchelcubic algorithm.

* Added check for input/output sizes.

* Added int and float types for crop_and_resize op.

* Refactored crop_and_resize output type check.

* Added helper for bicubic interpolation as TF v.1 does.

* Added TF v.1 bicubic helper for cuda platform.

* Added cached class for bicubic algorithm.

* Refactored cuda implementation for crop_and_resize helper to use proper output type.

* Added facilities for bicubic interpolation.

* Portion bicubic interpolation from TF.

* Added tests for resize_bilinear testing.

* Working implementation of bicubic interpolation and tests.

* Refactored routines with image_resize bicubic op helper.

* Refactored code with coding standards.

* Refactored cpu helpers for resize_bicubic op.

* Refactored bicubic helpers.

* Added bicubic resize facilities.

* Implementing cuda kernels for bicubic interpolation. Implementation step.

* Cuda implementation of resize_bicubic op helper.

* Refactor image.resize_bicubic op helpers.

* Refactored helpers for resize_bicubic. Added error checking with cuda implementation.

* Refactored cuda implementation of resize_bicubic op helper. The first working revision.

* Cuda arch implementation for resize_bicubic op helper. Full working single-threaded revision.

* Intermediate bicubic interpolation helper for cuda.

* Refactored cpu helper for resize_bicubic.

* Multithreaded cuda implementation for resize_bicubic.

* Fixed merge issues.

* Refactored nlp helpers.

* Replicated resize_bicubic for 3D also.

* Eliminated waste comments of unused code.

* Eliminated waste comments with unused code.

* Eliminated waste template definitions.

* Eliminated waste debug code.

* Eliminated waste comments.

* Fixed multithreading with helpers.

* Fixed test suites for float and double in float point input lists.

* Fixed usage of reshape with 3D/4D on resizes.

* Final fixes.

* Fixed resize_neighbor op problem.
2019-11-20 21:11:04 +02:00
shugeo 13e5c0a280
Shugeo release fix1 (#61)
* Added a pair of tests for failed ops.

* Fixed cpu helper for draw_bounding_boxes op.

* Refactored implementation of draw_bounding_boxes op to full conform with TF.

* Improved multithreading with draw_bounding_boxes op cuda helper.

* Eliminated log messages.

* Changed logging with draw_bounding_boxes op helper and tests.

* Resize_biliear with 3D input allowed.

* Refactored 3D input acception with resize_bilinear op.

* And another improvement.

* Refactored reshape of input/output for resize_bilinear.

* Improvements final.

* Finished with 3D replication for image.resize_bilinear/_nearest_neighbor.

* Added copyrights for TF code.

* Using new form of multithreading for cpu implementation.

* Fixed shape error.

* Added multithreaded with batches on crop_and_resize functor.

* Refactored multithreading with crop_and_resize and draw_bounding_boxes.
2019-11-20 13:37:48 +02:00
Alex Black da1944e8e1
SameDiff TF import (#49)
* Added implementation files for image_resize and resize_bicubic ops.

* Image resize and image.resize_bicubic ops implementation. Initial revision.

* Minor fix

* Some TF imports disabled.

* Finished with infrastructure development for image.resize_bilinear op and image_resizo op implementation.

* Refactored resize methods.

* Added processing for Mitchelcubic algorithm.

* adjust_contrast

* Small fix for TF import expected value loading when variable name starts with the test name

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Tests

* Tests added.

* Removed tf names absent in mapping.

* Some fixes.

* Small fixes

* Minor change

* Some failing tests.

* Disable failed test

* Ignore some tests

* Fix import class mapping

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fix float property mapping (flatbuffers)

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Override equality function for model 'dropout'

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fail tests

* Failed tests ignored temporarily.

* Minor fixes

* Small fix

* Conflict resolved

* Default implementations of tensorflowName and onnxName
2019-11-19 22:44:29 +11:00
raver119 bbd59a3537
fake quant dtype validation fix (#60)
Signed-off-by: raver119 <raver119@gmail.com>
2019-11-19 12:53:52 +03:00
raver119 1780dcc883
[WIP] Small fixes here and there (#50)
* one range test

Signed-off-by: raver119 <raver119@gmail.com>

* few Context convenience singatures

Signed-off-by: raver119 <raver119@gmail.com>

* one more range test

Signed-off-by: raver119 <raver119@gmail.com>

* "range" "fix"

Signed-off-by: raver119 <raver119@gmail.com>

* adjuct_contrast_v2 now allows scale factor to be provided via input_variable

Signed-off-by: raver119 <raver119@gmail.com>

* adjust_contrast now allows scale factor as variable too

Signed-off-by: raver119 <raver119@gmail.com>

* bitcast shape tests

Signed-off-by: raver119 <raver119@gmail.com>

* BitCast import dtype added

Signed-off-by: raver119 <raver119@gmail.com>

* few more BitCast signatures

Signed-off-by: raver119 <raver119@gmail.com>
2019-11-15 17:04:29 +03:00
raver119 48df1acdfb
[WIP] ThreadPool (#8)
This PR removes OpenMP use in 95% of cases
2019-11-13 17:04:59 +03:00
shugeo 679e42199a Shugeo strided slice bp fix2 (#33)
* Fixed crash and restored brocken functionality for strided slice.

* Added comments for strided_slice_bp main step.
2019-11-07 13:44:02 +03:00
shugeo 9124974e3b
Fixed crash with strided_slice_bp op and tests. (#29) 2019-11-05 12:49:15 +02:00
Yurii Shyrma 0cdb5750e0
Shyrma concat (#24)
* - provide possibility to pass axis as last input array in concat op
- corrcect sumation in bias_add_bp op for NHWC case

Signed-off-by: Yurii <iuriish@yahoo.com>

* - write code for deconv2d op based on mkl dnn api

* no unsafe math

Signed-off-by: raver119 <raver119@gmail.com>

* no unsafe math

Signed-off-by: raver119 <raver119@gmail.com>

* - get rid of e<> and p<> methods in svd helper

Signed-off-by: Yurii <iuriish@yahoo.com>

* - provide mkl api support for deconvolution 3d

Signed-off-by: Yurii <iuriish@yahoo.com>

* - write deconv2d_bp based on mkl api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - write deconv3d_bp based on mkl api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - testing and fixing deconv based on mkl api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - remove dilation form conv2d/3d mkl

Signed-off-by: Yurii <iuriish@yahoo.com>

* - minor changes

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further corrections of deconv ops based on mkl dnn api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - provide deconv2d_tf based on mkl dnn api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - add minor corrections required by reviewer

Signed-off-by: Yurii <iuriish@yahoo.com>
2019-11-03 12:37:19 +02:00
shugeo 95f7ad7b94
Shugeo suppression overlaps (#9)
* Added non_max_suppression_overlaps op and tests.

* Refactored implementation of non_max_suppression_overlaps.

* Refactoring of implementation of non_max_suppression_overlaps op.

* Refactoring of implementation of non_max_suppression op.

* Fixed portion error.

* Added cuda frontends for image suppression ops.

* Eliminated crash with cuda arch on image.non_max_suppression_overlaps op.

* Improved implementation of image_suppression helper for cpu platform.

* The generic approach of non_max_suppression_overlaps op helper with cuda platform.

* Working cuda implementation of helper non_max_suppression_overlaps op.

* Eliminated waste comments.

* Improved implementations for both platforms

* Refactored cuda implementation of image.non_max_suppression_overlaps op helper.

* Improved cuda implementation of non_max_suppression op helper.

* Refactored cuda implementation of image.non_max_suppression_overlaps op helper.

* Improved cuda implementation of image.non_max_suppression_overlaps op helper.

* Added modifications into cuda implementation for image suppression overlaps op.

* Correct queue emulation with cuda implementation of non_max_suppression_overlaps op.

* Prefinal stage of cuda implementation of non_max_suppression_overlaps.

* Worked cuda implementation of non_max_suppresion_overlaps helper.

* Fixed return to proper thread.

* Improvements for cuda implementation of image.non_max_suppression_overlaps op helper.

* Fixed implementation issues with non_max_suppression_overlaps on cuda platform.

* Fixed skip for non_max_suppression_overlaps on cuda platform.

* Finalize implementation of image_suppression helper and tests.

* Cosmetic changes only.
2019-10-30 13:43:45 +02:00
Alexander Stoyakin 96a9a1a733 Fixed output from operation. 2019-10-16 18:07:52 +03:00
shugeo 92636b0b86 Eliminated waste operator. 2019-10-10 17:08:59 +03:00
shugeo 02d8616692 Implementation of cuda kernel for fake_quant_with_min_max_vars_per_channels op. 2019-10-10 16:40:56 +03:00
shugeo d0cbd33b0e Added input checks for op. 2019-10-09 15:52:13 +03:00
shugeo cb56b0b06a The first approach for fake_quant_with_min_max_vars_per_channel op implementation. 2019-10-08 19:00:41 +03:00
shugeo 53a2ebddbe Added test and helpers for draw_bounding_boxes op both cpu and cuda related. 2019-10-04 20:46:26 +03:00
shugeo 8f70b4441f draw_bounding_boxes op implementation. Inital revision. 2019-10-04 18:32:21 +03:00
shugeo 130ee25682 Implemented compare_and_bitpack op. 2019-10-03 10:57:48 +03:00
shugeo afeb524238 Refactored implementation for adjust_contrast ops. 2019-10-01 14:13:09 +03:00
shugeo 1575c704ae Added implementation for adjust_contrast_v2 op and tests. 2019-10-01 11:44:27 +03:00
shugeo e06dfb5dcc Implementation of adjust_contrast op. 2019-09-30 18:24:12 +03:00
raver119 589401477d
[WIP] bunch of improvements (#257)
* - profiling bias_add op
- add some docementation

Signed-off-by: Yurii <yurii@skymind.io>

* - minor change

Signed-off-by: Yurii <yurii@skymind.io>

* - provide addBias cuda kernel

Signed-off-by: Yurii <yurii@skymind.io>

* - improve shape::getIndexOfffset and change its signature

Signed-off-by: Yurii <yurii@skymind.io>

* - same as previous

Signed-off-by: Yurii <yurii@skymind.io>

* - improve and change signature in some shape:: stuff which has to do with calculation of offsets for array elements

Signed-off-by: Yurii <yurii@skymind.io>

* - minor changes in flatten

Signed-off-by: Yurii <shyrma@skymind.io>

* - add function shape::getIndexOffsetOrdered

Signed-off-by: Yurii <shyrma@skymind.io>

* - correct shape::getIndexOffsetOrdered()

Signed-off-by: Yurii <shyrma@skymind.io>

* - move getIndexOffsetOrdered to flatten.h header in order to isolate this function

Signed-off-by: Yurii <shyrma@skymind.io>
2019-09-11 20:12:09 +03:00
shugeo 548044a1e2 Shugeo doc (#235)
* Actualized doc to tnse ops.

* Added comments for dynamic_stitch op.

* Added comments to dynamic_stitch op implementation.

* Modified comment for unstack_list op.

* Added doc for space_to_depth and depth_to_space ops.

* Added doc for space_to_batch op.

* Enlarge test type for adjustSaturation.

* Added doc for runner.
2019-09-04 14:57:59 +03:00
Yurii Shyrma cb4c9377b1 Shyrma docs (#222)
* - documenting and profiling matrix_set_diag cuda kernel

Signed-off-by: Yurii <yurii@skymind.io>

* - correct formula of pnorm pooling in cuda 2d/3d kernels
- remove helper matrix_diag which duplicates work of helper matrix_set_diag

Signed-off-by: Yurii <yurii@skymind.io>
2019-09-02 16:25:58 +03:00
raver119 70a9ae5068
[WIP] few tweaks (#206)
* scatter empty check

Signed-off-by: raver119 <raver119@gmail.com>

* scatter empty test

Signed-off-by: raver119 <raver119@gmail.com>

* one more test

Signed-off-by: raver119 <raver119@gmail.com>

* two tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* dup tweak

Signed-off-by: raver119 <raver119@gmail.com>

* - put empty checking of indices array immediately prior  helper run

Signed-off-by: Yurii <yurii@skymind.io>

* minor tests fix

Signed-off-by: raver119 <raver119@gmail.com>

* minor tests fix

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-30 16:32:01 +03:00
Yurii Shyrma 5395d4fbe5 - rewrite broadcast_dynamic_shape and delete corresponding helpers (#194)
Signed-off-by: Yurii <yurii@skymind.io>
2019-08-29 20:38:02 +03:00
raver119 df84bc7255
[WIP] More tweaks (#173)
* CUDA empty reduction

Signed-off-by: raver119 <raver119@gmail.com>

* - listdiff synchronization fix for CUDA
- listdiff test

Signed-off-by: raver119 <raver119@gmail.com>

* - IndexReduce ops now allow INDEXING_TYPES output
- topK op accepts only INDEXING_TYPES as output

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-27 10:37:10 +03:00