Commit Graph

264 Commits (19d5a8d49d9257c975e232703d8132cbc554e3dd)

Author SHA1 Message Date
raver119 784a2d13f8
separate omp impl for softmax (#289)
Signed-off-by: raver119 <raver119@gmail.com>
2020-03-05 11:14:22 +03:00
raver119 3bb22a6ff8
strided_slice without view (#288)
Signed-off-by: raver119 <raver119@gmail.com>
2020-03-05 09:56:52 +03:00
raver119 ca96a13ed0 softmax as standalone compilation unit
Signed-off-by: raver119 <raver119@gmail.com>
2020-03-05 08:45:10 +03:00
Oleh 4d81af9fe9
Softmax operation implementation for mkldnn (#286)
* libnd4j first step of softmax mkldnn implementation

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j raw implementation of mkldnn softmax

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j merge master and added softmax to MklDnnTests

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j some corrections for softmax mkldnn

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j merge branch, fixed problem with negative axis, fixed dnnl::memory::format_tag selection, test cases added

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j minor corrections to avoid risk connected with negative axis usage

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j fixed windows builds, added switcher to use mkldnn sofmax version only for 3D, 4D, 5D, 6D arrays

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j fixed dataType selection per request

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j fix for mac and windows builds

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j builds fix

Signed-off-by: Oleg <oleg.semeniv@gmail.com>
2020-03-04 19:36:42 +03:00
raver119 f990b2486d
simplified addBias2D for CUDA (#285)
Signed-off-by: raver119 <raver119@gmail.com>
2020-03-04 09:50:55 +03:00
Yurii Shyrma 78934c17ad
profiling of stack and unstack ops (#261)
* - profiling of stack and unstack ops

Signed-off-by: Yurii <iuriish@yahoo.com>

* - fix bug in cpu concat op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correction of cuda stack and unstack

Signed-off-by: Yurii <iuriish@yahoo.com>

* - change shape.h method which operates with unity dimensions strides

Signed-off-by: Yurii <iuriish@yahoo.com>

* - rearrange stack tests

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct evaluation of smallest stride for moving through contiguous axis

Signed-off-by: Yurii <iuriish@yahoo.com>

* - forgot to update signature of function strideOverContigAxis in cuda concat and split ops

Signed-off-by: Yurii <iuriish@yahoo.com>

* - remove ShapeUtils::shapeAsString method applied before input arrays validations

Signed-off-by: Yurii <iuriish@yahoo.com>

* -  further removing of ShapeUtils::shapeAsString

Signed-off-by: Yurii <iuriish@yahoo.com>

* - take sub-array shapeIndo/offset calculation out of NDArray class
- add possibility of contiguous memory copy in execTransformAny op if opNum == assign

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct test_empty_scatter_2 in EmptyTests.cpp

Signed-off-by: Yurii <iuriish@yahoo.com>

* - profiling of slice op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - get rid of contiguous memcpy for some cases in concat and split ops

Signed-off-by: Yurii <iuriish@yahoo.com>

* - forgot to declare oid nd4j::SpecialMethods<T>::splitCpuGeneric

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct typo in calculation of threads in cuda split op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - forgot to correct another set of threads variables in split cuda ops

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further conflicts resolving

Signed-off-by: Yurii <iuriish@yahoo.com>

Co-authored-by: raver119 <raver119@gmail.com>
2020-03-03 07:32:37 +03:00
raver119 63fa3c2ef3
libnd4j polishing (#273)
* initial set of include changes

Signed-off-by: raver119 <raver119@gmail.com>

* one more tweak

Signed-off-by: raver119 <raver119@gmail.com>

* few more rearrangements

Signed-off-by: raver119 <raver119@gmail.com>

* few more rearrangements

Signed-off-by: raver119 <raver119@gmail.com>

* few more rearrangements

Signed-off-by: raver119 <raver119@gmail.com>

* cuda includes rearrangements

Signed-off-by: raver119 <raver119@gmail.com>

* java update

Signed-off-by: raver119 <raver119@gmail.com>

* = namespace changed to sd
- few CMake variables renamed with SD_ prefix

Signed-off-by: raver119 <raver119@gmail.com>

* java update

Signed-off-by: raver119 <raver119@gmail.com>

* LoopKind minor fix

Signed-off-by: raver119 <raver119@gmail.com>

* few more changes

Signed-off-by: raver119 <raver119@gmail.com>

* few more changes

Signed-off-by: raver119 <raver119@gmail.com>

* few more changes

Signed-off-by: raver119 <raver119@gmail.com>

* sanitizer is optional now

Signed-off-by: raver119 <raver119@gmail.com>

* dev tests updated

Signed-off-by: raver119 <raver119@gmail.com>

* few more changes

Signed-off-by: raver119 <raver119@gmail.com>

* last update

Signed-off-by: raver119 <raver119@gmail.com>

* java update

Signed-off-by: raver119 <raver119@gmail.com>
2020-03-02 12:49:41 +03:00
Oleh f116f53d61
Loops auto-vectorization problem fix (#277)
* libnd4j cast loop types

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j more type castination added to loops

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j sync casting types of iterated variable in loops

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j more loops reviewed for vectorization problem fix

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j fixed several typos

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j several more files reviewed to fix auto-vectorization problem in loops

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j merge master and reviewed more files to fix auto-vectorization problem in loops

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j several type casting added in broadcasting that were missed, fixed mac builds

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j double check all files and fix several more places in loops

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j fixed builds

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j revert changes for lup.cpp

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j more files reviewed for auto-vectorization problem fix

Signed-off-by: Oleg <oleg.semeniv@gmail.com>
2020-02-28 17:04:45 +03:00
raver119 5332ace32b
better inplace exec with FastPath (#280)
Signed-off-by: raver119 <raver119@gmail.com>
2020-02-28 12:06:30 +03:00
shugeo 330a69d4e2
Shugeo solve ls (#203)
* lstsq op. Initial commit.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Least squares linear problem solve op (lstsq). Cpu draft implementation.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed shape routine and tests.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added test for lstsq op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Rectification for lstsq op implementation.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Corrected test to avoid numerical inconsistensy.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added prints for check computing.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Corrected tests to use evalueate facility instead.

Signed-off-by: shugeo <sgazeos@gmail.com>

* CPU implementation of MatrixSolveLs op and tests.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added cuda implementation for helpers with lstsq op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored tests for lstsq op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added processing for empty inputs.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Merged tests.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored lstsq op for fast case.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed test.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored lstsq op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed some issues with solve.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed lstsq op to avoid erros.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added kernel for giagonal factor

Signed-off-by: shugeo <sgazeos@gmail.com>

* lstsq wrapper and triangular_solve fixed

* Added proper processing empty inputs and test.

Signed-off-by: shugeo <sgazeos@gmail.com>

* SequenceMask test

* Build fixed

* Added proper processing of empty inputs with solve op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Mapping added

* Added check of input shapes with solve op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added a couple of tests for lstsq op and minor changes with cuda helper for one.'

Signed-off-by: shugeo <sgazeos@gmail.com>

* Tests on

* Refactored test for lstsq op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed test

* Added another approach for lstsq op aka solve_ls.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Finished cpu part for solve_ls op helpers.

* Added helper for low triangular matrix inversion.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored alternate solve_ls cpu implementation.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Removed alternate approach for solve_ls op. Added multithreading with matrix inversion.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Assert fixed

* Refactored multithreading for inverse matricies.

Signed-off-by: shugeo <sgazeos@gmail.com>

Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
2020-02-28 11:37:26 +03:00
raver119 358c650b62 one micro fix
Signed-off-by: raver119 <raver119@gmail.com>
2020-02-27 19:28:26 +03:00
raver119 31e3a2f7a5
transparent conversion to FastPath execution within Graph (#278)
Signed-off-by: raver119 <raver119@gmail.com>
2020-02-27 16:10:38 +03:00
Oleh b4575d11e9
Loops auto-vectorization problem fix (#274)
* libnd4j cast loop types

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j more type castination added to loops

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j sync casting types of iterated variable in loops

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j more loops reviewed for vectorization problem fix

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j fixed several typos

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j several more files reviewed to fix auto-vectorization problem in loops

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j merge master and reviewed more files to fix auto-vectorization problem in loops

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j several type casting added in broadcasting that were missed, fixed mac builds

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j double check all files and fix several more places in loops

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j fixed builds

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j revert changes for lup.cpp

Signed-off-by: Oleg <oleg.semeniv@gmail.com>
2020-02-26 21:12:19 +03:00
raver119 5c806d2fb5
reshape tweak (#275)
* - expand dims tweak
- reshape memcpy

Signed-off-by: raver119 <raver119@gmail.com>

* validation fix

Signed-off-by: raver119 <raver119@gmail.com>
2020-02-26 14:05:32 +03:00
Oleh b686368b82
Refactoring split operation (#266)
* libnd4j moved split operation implementation to helpers before special case adding

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j minor fixes for general split operation move, merge master

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libndj4 split cpu implementation

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* - provide cuda helper for split op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - minor correction

Signed-off-by: Yurii <iuriish@yahoo.com>

* - minor correction 2

Signed-off-by: Yurii <iuriish@yahoo.com>

* libnd4j moved split implementation from specials to split.cpp

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j update loopkind selections for 3D, 4D and 5D cases

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j removed unnecessary BUILD_SINGLE_TEMPLATE

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

Co-authored-by: Yurii Shyrma <iuriish@yahoo.com>
2020-02-26 10:20:39 +03:00
raver119 cf67c7165a nano fix
Signed-off-by: raver119 <raver119@gmail.com>
2020-02-25 15:20:51 +03:00
raver119 f6442b6724
few minor tweaks (#272)
Signed-off-by: raver119 <raver119@gmail.com>
2020-02-25 11:13:23 +03:00
Oleh f0706b21aa
Split operation improvement (#262)
* libnd4j moved split operation implementation to helpers before special case adding

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libnd4j minor fixes for general split operation move, merge master

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* libndj4 split cpu implementation

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* - provide cuda helper for split op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - minor correction

Signed-off-by: Yurii <iuriish@yahoo.com>

* - minor correction 2

Signed-off-by: Yurii <iuriish@yahoo.com>

Co-authored-by: Yurii Shyrma <iuriish@yahoo.com>
Co-authored-by: raver119 <raver119@gmail.com>
2020-02-24 08:22:41 +03:00
shugeo 1bb3ae4b03
Shugeo unordered map (#256)
* Refactored usage of std::map to std::unordered_map instead.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Eliminated crash with wrong ShapeDescriptor hash.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Eliminated crash with TadDescriptor hash.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored Stash hash.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored hashes.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored TadDescriptor hash and top_k mapping.

* Refactored hashes for ShapeDescriptor and TadDescriptor classes.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored hash for ConstantDescriptor and ShapeDescriptor classes.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed map using with cuda platform.

Signed-off-by: shugeo <sgazeos@gmail.com>

* - few rearrangements for hash functions
- shared openblas allowed

Signed-off-by: raver119 <raver119@gmail.com>

* exports

Signed-off-by: raver119 <raver119@gmail.com>

* exports

Signed-off-by: raver119 <raver119@gmail.com>

* Stash reverted to std::map

Signed-off-by: raver119@gmail.com <raver119@gmail.com>

* Added additional test.

Signed-off-by: shugeo <sgazeos@gmail.com>

* different maps for different compilers

Signed-off-by: raver119 <raver119@gmail.com>

* missing include

Signed-off-by: raver119 <raver119@gmail.com>

* fix the leak

Signed-off-by: raver119 <raver119@gmail.com>

Co-authored-by: raver119 <raver119@gmail.com>
2020-02-24 07:51:01 +03:00
raver119 e78be14cc1
arm fix (#260)
* range check for scalar_int

Signed-off-by: raver119 <raver119@gmail.com>

* no simd

Signed-off-by: raver119 <raver119@gmail.com>

* no ops

Signed-off-by: raver119 <raver119@gmail.com>

* cyclic shift?

Signed-off-by: raver119 <raver119@gmail.com>

* left split

Signed-off-by: raver119 <raver119@gmail.com>

* left split

Signed-off-by: raver119 <raver119@gmail.com>

* rot ops unrolled templates

Signed-off-by: raver119 <raver119@gmail.com>

* no rotl/rotr for uint64

Signed-off-by: raver119 <raver119@gmail.com>

* no rotl/rotr for uint64 2

Signed-off-by: raver119 <raver119@gmail.com>

* no rotl/rotr for uint64 3

Signed-off-by: raver119 <raver119@gmail.com>

* ARM_BUILD declared

Signed-off-by: raver119 <raver119@gmail.com>
2020-02-21 14:31:00 +03:00
Yurii Shyrma f7a9190407
profiling of concat op (both cuda and cpu) (#151)
* - profiling of concat op (both cuda and cpu)

Signed-off-by: Yurii <iuriish@yahoo.com>

* better comparison for large concat

Signed-off-by: raver119 <raver119@gmail.com>

* - further improving of concat op

Signed-off-by: Yurii <iuriish@yahoo.com>

* some loggin

Signed-off-by: raver119 <raver119@gmail.com>

* - add possibility to verify presence of trailing unities in shape and set strides/ews correspondingly
- restrict second simple case in concat op to c order only

Signed-off-by: Yurii <iuriish@yahoo.com>

* - move concat op to specials_single.cpp file

Signed-off-by: Yurii <iuriish@yahoo.com>

* - get rid of second concat op declaration in transforms.cpp file

Signed-off-by: Yurii <iuriish@yahoo.com>

Co-authored-by: raver119 <raver119@gmail.com>
2020-02-20 21:19:01 +03:00
raver119 215641ea9e
Minor improvements (#255)
* static increments in loops

Signed-off-by: raver119 <raver119@gmail.com>

* specials and concat split into separate units

Signed-off-by: raver119 <raver119@gmail.com>
2020-02-20 11:43:26 +03:00
Yurii Shyrma c5193ecb81
Shyrma gather (#254)
* - profiling gather op for aurora

Signed-off-by: Yurii <iuriish@yahoo.com>

* - include contiguous memcpy in gather op

Signed-off-by: Yurii <iuriish@yahoo.com>
2020-02-19 09:35:52 +03:00
Yurii Shyrma 22c7aa9acf
Shyrma mkl matmul (#250)
* - provide matmul code based on mkl api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct typo in mkl matmul op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - take into account empty arrays in mkl matmul op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - fix bug in mkl matmul and group all matmul tests in one file

Signed-off-by: Yurii <iuriish@yahoo.com>
2020-02-18 08:58:01 +03:00
raver119 f9d51b7278
More compilation units (#246)
* weird edge case

Signed-off-by: raver119 <raver119@gmail.com>

* weird edge case

Signed-off-by: raver119 <raver119@gmail.com>

* get rid of it

Signed-off-by: raver119 <raver119@gmail.com>

* crop and resize reorganized

Signed-off-by: raver119 <raver119@gmail.com>

* restore test

Signed-off-by: raver119 <raver119@gmail.com>

* remove unwanted unit refs in cmale

Signed-off-by: raver119 <raver119@gmail.com>
2020-02-17 10:23:05 +03:00
Yurii Shyrma 011c272fde
Shyrma transpose (#244)
* - provide contiguous strides for ouput in transpose op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - provide contiguous strides for output in permute op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - take into account empty shapes properly in transpose/permute op

Signed-off-by: Yurii <iuriish@yahoo.com>
2020-02-17 08:04:28 +03:00
raver119 9e3c1b02b1
Perf improvements (#242)
* initial commit

Signed-off-by: raver119 <raver119@gmail.com>

* meh

Signed-off-by: raver119 <raver119@gmail.com>

* better ExpandDims impl

Signed-off-by: raver119 <raver119@gmail.com>

* better Squeeze impl

Signed-off-by: raver119 <raver119@gmail.com>

* better Softmax impl

Signed-off-by: raver119 <raver119@gmail.com>

* one test disabled

Signed-off-by: raver119 <raver119@gmail.com>

* more accurate impl

Signed-off-by: raver119 <raver119@gmail.com>

* - GraphProfiler now prints full shapeInfo instead of shape
- softmax typo fix

Signed-off-by: raver119 <raver119@gmail.com>
2020-02-14 16:20:31 +03:00
raver119 3de3cd8277
R119 tests (#238)
* one small test

Signed-off-by: raver119 <raver119@gmail.com>

* one small test

Signed-off-by: raver119 <raver119@gmail.com>

* bert test

Signed-off-by: raver119 <raver119@gmail.com>

* Graph FlowPath fix

Signed-off-by: raver119 <raver119@gmail.com>

* - GraphProfiler tweaks
- NodeProfile now includes shapes

Signed-off-by: raver119 <raver119@gmail.com>

* RELU_layer inplace tweak

Signed-off-by: raver119 <raver119@gmail.com>

* meh

Signed-off-by: raver119 <raver119@gmail.com>

* identity tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* bert result validation

Signed-off-by: raver119 <raver119@gmail.com>

* - bunch of Shape ops have inplace exec forbidden now
- Legacy ops have inplace exec disabled by default now

Signed-off-by: raver119 <raver119@gmail.com>

* ffast-math enabled

Signed-off-by: raver119 <raver119@gmail.com>

* ffast-math enabled

Signed-off-by: raver119 <raver119@gmail.com>

* allow some legacy ops to be inplace

Signed-off-by: raver119 <raver119@gmail.com>

* disable -fast_math

Signed-off-by: raver119 <raver119@gmail.com>

* disable expensive test for cuda

Signed-off-by: raver119 <raver119@gmail.com>
2020-02-13 20:59:35 +03:00
Yurii Shyrma fe47f52896
Oleh tenzor mmul (#231)
* Libnd4j: TensorMMul backprop op #8174, raw implementation

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: TensorMMul backprop op #8174 merge master and some corrections

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: TensorMMul backprop op #8174 algorithm update, need testing, sync with  master

* Libnd4j: TensorMMul backprop op #8174 fixed incorrect B axes calculation

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: TensorMMul backprop op #8174 optimize axes identification and fix bug of indeces overlapping, added first test. need testing with different shapes

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: TensorMMul backprop op #8174 some fixes and improvements need more testing

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: TensorMMul backprop op #8174 fixed order of matrix multiply

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: TensorMMul backprop op #8174 fixed issue of incorrect axes definition, add tests based on TF, need additional testing for case dLdC not equal 1

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: TensorMMul backprop op #8174 fixed scalar case add test

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: TensorMMul backprop op #8174 fixed bp algorithm, axes definition, need some mode testing with different orders combination f,c; c,f f,f and add some checks for inputs

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: TensorMMul backprop op #8174 some checks and corrections added tests, exists the problem with different input orders support A-f B-c and A-f B-f

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: TensorMMul backprop op #8174 sync master

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* - correct bug in MmulHelper::tensorDot(a, b, c, axes_a, axes_b,permutForC)

Signed-off-by: Yurii <iuriish@yahoo.com>

* Libnd4j: TensorMMul backprop op #8174 code clean up and refactoring

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* - add check for linspase ordered permutations in ShapeUtils::evalShapeForTensorDot

Signed-off-by: Yurii <iuriish@yahoo.com>

* - provide additional code in shape::reshape stuff in order to reduce amount of allocation/copy operations during reshaping procedure

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further work on problem of wrong shape evaluation during permute/reshape procedures

Signed-off-by: Yurii <iuriish@yahoo.com>

* - still looking for bug reason in reshape/permute stuff

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct bug in transform cuda native ops

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct bug in NDArray::assign

Signed-off-by: Yurii <iuriish@yahoo.com>

* - remove old shape::reshape stuff

Signed-off-by: Yurii <iuriish@yahoo.com>

* - add possibility to disable copy of old buffer to new buffer during reshape operation in NDArray class

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct bug in tensorDot which had to do with wrong pointers assigments

Signed-off-by: Yurii <iuriish@yahoo.com>

Co-authored-by: Oleh <oleg.semeniv@gmail.com>
2020-02-13 20:33:54 +03:00
shugeo f0c684020f
Shugeo resize area fix4 (#229)
* Fixed a couple of issues with resize_area op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added additional test for alternate params for resize_area testing.

Signed-off-by: shugeo <sgazeos@gmail.com>
2020-02-12 19:02:42 +03:00
raver119 237c137166 few more smaller compilation units (#226)
Signed-off-by: raver119 <raver119@gmail.com>
2020-02-10 10:57:18 +03:00
raver119 8a0d5e3b97
Compilation units (#224)
* - TrueBroadcastHelper split into multiple compilation units
- legacy gemm.cpp disabled

Signed-off-by: raver119 <raver119@gmail.com>

* - IndexReduce int32/int64 split into multiple compilation units

Signed-off-by: raver119 <raver119@gmail.com>

* - Reduce3 ops split into multiple compilation units

Signed-off-by: raver119 <raver119@gmail.com>
2020-02-09 19:48:32 +03:00
Abdelrauf bead656feb
Initial performance improvement for Bias Add and etc #8556 (#217)
* Initial  performance improvement for Bias Add, loop coords helpers and increment aligned parallel threading

Signed-off-by: AbdelRauf <rauf@konduit.ai>

* One more test for Rauf

Signed-off-by: raver119 <raver119@gmail.com>

* disable couple of perf tests

Signed-off-by: raver119 <raver119@gmail.com>

Co-authored-by: raver119 <raver119@gmail.com>
2020-02-08 15:31:30 +03:00
Yurii Shyrma 948646b32d
Shyrma mkl test (#211)
* - provide nhwc format in mkl conv ops

Signed-off-by: Yurii <iuriish@yahoo.com>

* - corrections in mkl conv3d

Signed-off-by: Yurii <iuriish@yahoo.com>

* - corrections in mkl batchnorm

Signed-off-by: Yurii <iuriish@yahoo.com>

* - corrections in mkl maxpooling2d

Signed-off-by: Yurii <iuriish@yahoo.com>

* - add format format_tag::any to outputs in mkl conv ops

Signed-off-by: Yurii <iuriish@yahoo.com>

* - complete corrections in mkl conv ops

Signed-off-by: Yurii <iuriish@yahoo.com>

* - add test for comparison of execution speeds of mkl conv2d op with different weights format

Signed-off-by: Yurii <iuriish@yahoo.com>

* - take into account order f in mkl conv ops

Signed-off-by: Yurii <iuriish@yahoo.com>
2020-02-06 21:12:54 +03:00
shugeo 5ae40f6e38
Shugeo sequence mask fix2 (#216)
* Fixed sequence_mask op and tests.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Cuda fix for sequence_mask op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed sequence_mask op for both platforms and tests.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed solve and triangular_solve for more than 2D for adjoint cases.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added adjoint solve test again.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added a set of tests for triangual_solve and generic solve ops.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added a pair tests for triangular_solve

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added tests for triangular_solve op.

Signed-off-by: shugeo <sgazeos@gmail.com>
2020-02-06 21:06:50 +03:00
raver119 5d28e6143d
OpContext handling (#214)
* nano tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* OpContext tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* OpContext deallocators

Signed-off-by: raver119 <raver119@gmail.com>

* get rid of few mkldnn safety checks

Signed-off-by: raver119 <raver119@gmail.com>

* databuffer setSpecial fix

Signed-off-by: raver119 <raver119@gmail.com>
2020-02-05 07:27:24 +03:00
shugeo 41ff907bc6
Shugeo solve linear (#191)
* linear equations systems solve op. Initial commit.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed compiling issues.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Linear equations systems solve. The next stage commit.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added test for linear equations systems solve operation.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added additional test and fixed lower matrix retrievance.

* Implementation for solve of the systems of linear equations."

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored permutation generation.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added restore for permutations batched with cuda helper for solve op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Finished cuda implementation for solve op helpers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored cpu helpers for solve op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fix gtest output on Windows

* Fixed issue with permutation matrix for cuda implementation.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed issue with permutation matrix for cpu implementation.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Eliminated waste comments.

Signed-off-by: shugeo <sgazeos@gmail.com>

* LinearSolve added

* Mapping added

* Javadoc added

* Refactored implementation of triangular_solve helpers and tests for solve matrix equations generally.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added a test for solve op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Solve test added

* Fix for TF import

Co-authored-by: Serhii Shepel <9946053+sshepel@users.noreply.github.com>
Co-authored-by: raver119 <raver119@gmail.com>
Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
2020-02-04 08:59:11 +03:00
raver119 9bb5798cac
Null arrays fix (#208)
* don't skip null arrays

Signed-off-by: raver119 <raver119@gmail.com>

* one test tweak

Signed-off-by: raver119 <raver119@gmail.com>
2020-02-02 23:14:00 +03:00
Oleh d52e67209e
Oleh convert (#200)
* StringUtils for utf convertor raw implementation of all possible combinations, need to be add counter of bytes per symbol for any type and add api to call convertors and store data

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor more corrections to support convertors

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor some corrections and bug fixes, need review to discuss how to add multi-threading

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 some corrections to move to multi-threading, add one test need discussion data inputs/outputs array presentation, need discussion the way of multi-threading

* StringUtils for utf convertor #8613 tests added some corrections to optimize build

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 some corrections and code clean up

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 code clean up and optimize usage, need update ndarray factory before replace std usage

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 some staff to integrate converters into NDArrayFactory, update tests and add some functionality

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 minor corrections and bug fix before discussion

* StringUtils for utf convertor #8613 some fixes and tets

* StringUtils for utf convertor #8613 some more staff to support different unicode

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 fix linking bug

* StringUtils for utf convertor #8613 corrected several tests as defaults for string ndarray changed

* StringUtils for utf convertor #8613 replace some incorrect implementation, revert some test changes, need sync before testing

* StringUtils for utf convertor #8613 fixed several thing that were badly implemented yesterday, need optimization, testing (before testing have to be add support of u32 and u16 buffer visualization)

* StringUtils for utf convertor #8613 fixed to support u16 and u32, and convertor in ndarray, fix buffer print, etc

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 merge master and sync with server

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 some correction for string cast, need print check only asci support

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 merge master, remove copies and add cast, need test, refactoring according review and clean up

* StringUtils for utf convertor #8613 fixed cast and copy issues

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 fixed cuda and update tests

* StringUtils for utf convertor #8613 integration into NdArray, fix several tests for build pass, refactoring, etc

* - avoid ambiguity of NDArray ctrs overloading in some tests

Signed-off-by: Yurii <iuriish@yahoo.com>

* StringUtils for utf convertor #8613 NDArray string constructors added, updated NDArrayFactory, refactoring unicode and tests, etc

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 fixed cuda build and test, refactoring and void* added to some functions

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613  void* integration, removed copy operation, refactoring, added tests for NDArray string constructors, etc

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 several more fixes, improvements and updates

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 master merge, code clean up and optimization before review

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 minor fixes string element size define

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 revert last changes as mistake

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 fixed NDArray constructor build problem, remove order from string factory, fixed order use for factory via project, added catch of incorrect sync in cast of arrays to data types, fixed e method for strings, etc

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 added javacpp hack, added multi-threading, minor corrections in license agreement

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* StringUtils for utf convertor #8613 windows builds fix, as "sting" is not treated as utf8

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

Co-authored-by: Yurii Shyrma <iuriish@yahoo.com>
2020-01-31 16:30:49 +03:00
raver119 1ab86d1306
Range op data type (#204)
* - range op now accepts dargs
- dargs now can be in signature

Signed-off-by: raver119 <raver119@gmail.com>

* range dtype java side

Signed-off-by: raver119 <raver119@gmail.com>

* linspace fix

Signed-off-by: raver119 <raver119@gmail.com>

* lin_space fix for scalar outputs

Signed-off-by: raver119 <raver119@gmail.com>
2020-01-31 10:45:40 +03:00
raver119 5d98cfcf47
Configurable DataType for ops (#201)
* initial commit

Signed-off-by: raver119 <raver119@gmail.com>

* - one more test for OneHot with dtype
- one more signature in Nd4j

Signed-off-by: raver119 <raver119@gmail.com>

* ones_as/zeros_as now accept dtype

Signed-off-by: raver119 <raver119@gmail.com>

* one more test

Signed-off-by: raver119 <raver119@gmail.com>

* - more updates for configurable data types
- ones_as/zeros_as java side + tests

Signed-off-by: raver119 <raver119@gmail.com>

* few c++ tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* few more changes around DArgs

Signed-off-by: raver119 <raver119@gmail.com>
2020-01-30 18:46:12 +03:00
raver119 ba961c7601
DataTypes & FlatBuffers (#197)
* flatbuffers version upgrade

Signed-off-by: raver119 <raver119@gmail.com>

* flatbuffers version upgrade java side

Signed-off-by: raver119 <raver119@gmail.com>

* flatbuffers dependency version upgrade java side

Signed-off-by: raver119 <raver119@gmail.com>

* MKLDNN version upgrade

Signed-off-by: raver119 <raver119@gmail.com>

* DArgs first pass

Signed-off-by: raver119 <raver119@gmail.com>

* signatures first pass

Signed-off-by: raver119 <raver119@gmail.com>

* signatures second pass

Signed-off-by: raver119 <raver119@gmail.com>

* signatures third pass

Signed-off-by: raver119 <raver119@gmail.com>

* signatures third pass

Signed-off-by: raver119 <raver119@gmail.com>

* signatures fourth pass

Signed-off-by: raver119 <raver119@gmail.com>

* signatures fifth pass

Signed-off-by: raver119 <raver119@gmail.com>

* flatbuffers UI version upgrade java side

Signed-off-by: raver119 <raver119@gmail.com>

* flatbuffers ui update

Signed-off-by: raver119 <raver119@gmail.com>

* flatbuffers downgrade

Signed-off-by: raver119 <raver119@gmail.com>

* flatbuffers downgrade java side

Signed-off-by: raver119 <raver119@gmail.com>
2020-01-30 10:07:24 +03:00
Yurii Shyrma 7a7ee4b021 Shyrma cudnn (#192)
* - implementation of cudnn batchnorm_bp op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - testing and fixing bugs in batchnorm_bp based on cudnn api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - move pooling mkl code and delete some unnecessary files

Signed-off-by: Yurii <iuriish@yahoo.com>

* - implementation and testing cudnn pooling2d ops (avg/max, ff/bp)

Signed-off-by: Yurii <iuriish@yahoo.com>

* - implementation and testing cudnn pooling 3d (ff/bp) ops

Signed-off-by: Yurii <iuriish@yahoo.com>

* - provide ff step in case of cudnn maxpool3d_bp op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - remove half type from set of supported types in mkl dpethwise conv op

Signed-off-by: Yurii <iuriish@yahoo.com>

* - bring back cudaStreamSynchronize in batchnorm and pooling cudnn ops

Signed-off-by: Yurii <iuriish@yahoo.com>

Co-authored-by: raver119 <raver119@gmail.com>
2020-01-28 18:23:07 +03:00
shugeo 99a54829c2 Shugeo resize area fix2 (#181)
* Added test for issue with resize_area op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added a pair of tests for resize_are op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored resize_area kernel to avoid shared memory overflow.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Eliminated prints with tests.

Signed-off-by: shugeo <sgazeos@gmail.com>

* ignore bad test

Signed-off-by: raver119 <raver119@gmail.com>

* Fixed test with resize_area.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed test for float constants.

Signed-off-by: shugeo <sgazeos@gmail.com>

Co-authored-by: raver119 <raver119@gmail.com>
2020-01-24 20:55:25 +03:00
raver119 5d69069177
[WIP] Memory limits (#167)
* initial commit

Signed-off-by: raver119 <raver119@gmail.com>

* one more initial commit

Signed-off-by: raver119 <raver119@gmail.com>

* additional initial commit

Signed-off-by: raver119 <raver119@gmail.com>

* subsequent initial commit

Signed-off-by: raver119 <raver119@gmail.com>

* initial commit testing

Signed-off-by: raver119 <raver119@gmail.com>

* initial commit per device

Signed-off-by: raver119 <raver119@gmail.com>

* initial commit per group

Signed-off-by: raver119 <raver119@gmail.com>

* initial commit for cuda

Signed-off-by: raver119 <raver119@gmail.com>

* initial commit for cuda + few missed lines

Signed-off-by: raver119 <raver119@gmail.com>

* initial commit for cuda + missed includes

Signed-off-by: raver119 <raver119@gmail.com>

* initial commit for cuda + one more missed include

Signed-off-by: raver119 <raver119@gmail.com>

* initial commit shouldn't count host mem as dev0 in cuda

Signed-off-by: raver119 <raver119@gmail.com>

* initial commit that tracks HOST group limits for CUDA

Signed-off-by: raver119 <raver119@gmail.com>

* initial commit with some Environment changes

Signed-off-by: raver119 <raver119@gmail.com>

* initial commit with more Environment changes

Signed-off-by: raver119 <raver119@gmail.com>

* initial commit with maxMasterThreads fix

Signed-off-by: raver119 <raver119@gmail.com>

* initial commit with maxMasterThreads fix

Signed-off-by: raver119 <raver119@gmail.com>

* initial commit without maxMasterThreads exception

Signed-off-by: raver119 <raver119@gmail.com>

* initial commit without Nd4jULong in Environment

Signed-off-by: raver119 <raver119@gmail.com>

* add sleep and more iterations for OOM cases

Signed-off-by: raver119 <raver119@gmail.com>

* limits propagation from java side

Signed-off-by: raver119 <raver119@gmail.com>

* - consume ErrorCode every time
- one test for memory limits

Signed-off-by: raver119 <raver119@gmail.com>

* unordered_map

Signed-off-by: raver119 <raver119@gmail.com>

* unordered_map

Signed-off-by: raver119 <raver119@gmail.com>

* unordered_map

Signed-off-by: raver119 <raver119@gmail.com>

* RSub op mapping fixed

Signed-off-by: raver119 <raver119@gmail.com>

* typo fixed

Signed-off-by: raver119 <raver119@gmail.com>

* one bad test fixed

Signed-off-by: raver119 <raver119@gmail.com>
2020-01-24 10:11:09 +03:00
shugeo 2717b25931 Shugeo qr (#153)
* Added qr op implementation. Initial version.

* Fixed doc for qr op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Implementation of QR decomposition. CPU platform version.

* Added a pair of tests for qr op testing.

Signed-off-by: shugeo <sgazeos@gmail.com>

* QR implementation.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Corrected norm using.

* Properly calculated intermediate results with QR decomposition.

* Another step to implement QR algorithm by householder.

* Cpu implementatio for QR decomposition. The first working edition.

* Corrected test to QR decomposition.

* Added tad multithreading with QR implementation.

* Finished cpu implementation for QR decomposition helpers.

* Refactored tests and improved multithreading.

* Refactored QR cpu implementation and update cuda implementation helpers.

* Cuda QR helper implementation. The first working edition.

* Eliminated waste prints.

* Restore multithreading with cuda implementation.

* Ops names corrected

* Refactored qr op helpers to optimize.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Eliminated waste manual ticking.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored memory allocation to avoid waste memory usage.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored matrixMinor method both for cuda and cpu platforms.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored method of vmul to use raw buffers instead type conversion.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored temporary array of matricies.

Signed-off-by: shugeo <sgazeos@gmail.com>

Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
Co-authored-by: raver119 <raver119@gmail.com>
2020-01-22 13:59:36 +03:00
shugeo 815a2908af Shugeo solve triangular (#173)
* Added implementation of the triangular_solve op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed compilation issues.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added verification of input data and helpers facilities for triangular_solve op.'

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added cpu implementation for triangular_solve helpers.

* Added tests and implementation for upper triangular equations.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added a pair of cases to tests.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added multithreading with cpu helpers for triangular_solve op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added cuda implementation of triangular_solve op helpers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Finished cuda implementation of triangular_solve helpers and tests.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed copyright marks.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Corrected grammar errors with doc and error messages.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored matricies processing with triangular_solve cuda helper implementation.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added triangular_solve wrapper

* Fixed mapping

* Added processing for adjoint with cpu helpers of triangular_solve op implementation.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added implementation for adjoint routine with cuda platform.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added multithreading with adjoint routine for cpu platform.

Signed-off-by: shugeo <sgazeos@gmail.com>

Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
2020-01-22 10:48:03 +03:00
shugeo e50b285c2c Shugeo resize area (#162)
* Added implementation for resize_area op. Initial commit.

* Added implementation of resize_area op. Initial revision.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Corrected resizeArea functor call.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Implementation of resize_area. Cpu platform helpers.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Implementation for resize_area helpers. The first part revision.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added a set of tests for resize_area op.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Cuda implementation for resize_area. Initial approach.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Adding multithreading for resize_area algorithm.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Cuda implementation of resize_area helpers. Shared memory approach.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored resizeAreaKernel with cuda implementation.

* Eliminated compilation errors.

* ResizeArea helpers for cuda platform. The first working revision.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added test for batched resize_area op testing.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Implementation of resize_are for cuda platform and tests.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Fixed multithreading with resize_area op helper.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Corrected copyright marks with sources.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Corrected copyright mark for resize_area op implementation.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Corrected copyright mark for parity ops header.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Corrected typo in strings and so on with image resize ops.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored resize_area helpers and multithreading.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added ResizeArea wrapper

* Added test with align_corners and fixed shape processing with only int args given for output size.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Added test

* TF mapping for ResizeArea

* Fixed implementation issues with resize_area op for both platforms.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Refactored image resizer struct to use flexible types for ints and floats.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Improved multithreading with resizeAreaKernel launch.

Signed-off-by: shugeo <sgazeos@gmail.com>

* Use asynchronical memory copying with cuda platform image resize allocations.

Signed-off-by: shugeo <sgazeos@gmail.com>

Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
2020-01-22 10:46:33 +03:00
raver119 7783012f39
cuDNN integration (#150)
* initial commit

Signed-off-by: raver119 <raver119@gmail.com>

* one file

Signed-off-by: raver119 <raver119@gmail.com>

* few more includes

Signed-off-by: raver119 <raver119@gmail.com>

* m?

Signed-off-by: raver119 <raver119@gmail.com>

* const

Signed-off-by: raver119 <raver119@gmail.com>

* cudnn linkage in tests

Signed-off-by: raver119 <raver119@gmail.com>

* culibos

Signed-off-by: raver119 <raver119@gmail.com>

* static reminder

Signed-off-by: raver119 <raver119@gmail.com>

* platform engine tag

Signed-off-by: raver119 <raver119@gmail.com>

* HAVE_CUDNN moved to config.h.in

Signed-off-by: raver119 <raver119@gmail.com>

* include

Signed-off-by: raver119 <raver119@gmail.com>

* include

Signed-off-by: raver119 <raver119@gmail.com>

* skip cudnn handle creation if there's not cudnn

Signed-off-by: raver119 <raver119@gmail.com>

* meh

Signed-off-by: raver119 <raver119@gmail.com>

* target device in context

Signed-off-by: raver119 <raver119@gmail.com>

* platform engines

Signed-off-by: raver119 <raver119@gmail.com>

* platform engines

Signed-off-by: raver119 <raver119@gmail.com>

* allow multiple -h args

Signed-off-by: raver119 <raver119@gmail.com>

* allow multiple -h args

Signed-off-by: raver119 <raver119@gmail.com>

* move mkldnn out of CPU block

Signed-off-by: raver119 <raver119@gmail.com>

* link to mkldnn on cuda

Signed-off-by: raver119 <raver119@gmail.com>

* less prints

Signed-off-by: raver119 <raver119@gmail.com>

* minor tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* next step

Signed-off-by: raver119 <raver119@gmail.com>

* conv2d NCHW draft

Signed-off-by: raver119 <raver119@gmail.com>

* conv2d biasAdd

Signed-off-by: raver119 <raver119@gmail.com>

* test for MKL/CUDNN combined use

Signed-off-by: raver119 <raver119@gmail.com>

* - provide additional code for conv2d ff based on cudnn api, not tested yet

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further work on conv2d helper based on using cudnn api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - fixing several cuda bugs which appeared after cudnn lib had been started to use

Signed-off-by: Yurii <iuriish@yahoo.com>

* - implementation of conv2d backprop op based on cudnn api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - implementaion of conv3d and conv3d_bp ops based on cudnn api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - bugs fixing in conv3d/conv3d_bp ops (cudnn in use)

Signed-off-by: Yurii <iuriish@yahoo.com>

* - implementation of depthwiseConv2d (ff/bp) op based on cudnn api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - implementation of batchnorm ff op based on cudnn api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - disable cudnn batchnorm temporary

Signed-off-by: Yurii <iuriish@yahoo.com>

* - add minor change in cmake

Signed-off-by: Yurii <iuriish@yahoo.com>

* engine for depthwise mkldnn

Signed-off-by: raver119 <raver119@gmail.com>

* couple of includes

Signed-off-by: raver119 <raver119@gmail.com>

* - provide permutation to cudnn batchnorm ff when format is NHWC

Signed-off-by: Yurii <iuriish@yahoo.com>

* lgamma fix

Signed-off-by: raver119 <raver119@gmail.com>

* - eliminate memory leak in two tests

Signed-off-by: Yurii <iuriish@yahoo.com>

Co-authored-by: Yurii Shyrma <iuriish@yahoo.com>
2020-01-20 21:32:46 +03:00
Oleh 8fc0e63ce7 Oleh powderev (#171)
* Libnd4j: Add broadcastable elementwise power derivative #7461 first step of Pow_bp operation implementation

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: Add broadcastable elementwise power derivative #7461 some corrections of calculation steps

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: Add broadcastable elementwise power derivative #7461 some bug fixes, the PowDerevative op made broadcastable, add the raw tests for op, need refactoring to use broadcast ops

* Libnd4j: Add broadcastable elementwise power derivative #7461 fixed several bugs add broadcast support and tests, need to fix scalar+array and array+scalar

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: Add broadcastable elementwise power derivative #7461 fixed bugs for scalar inputs, fixed multinomial tests, added tests

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: Add broadcastable elementwise power derivative #7461 fised bugs for different shapes support, tests updated

* Libnd4j: Add broadcastable elementwise power derivative #7461 applied all possible variants via tiled arrays, add support of broadcast for Pow and PowDerivative ops, covered by tests, before review have to be replaced tiled implementation by applyTrueBroadcast

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: Add broadcastable elementwise power derivative #7461 replaced tile by broadcast implementation, fixed issue with negative x input, corrected tests, need additional testing

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: Add broadcastable elementwise power derivative #7461 added and corrected test cases, corrected implementation need review

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: Add broadcastable elementwise power derivative #7461 code clean up

* Libnd4j: Add broadcastable elementwise power derivative #7461 code clean up, removed some tests, add tests with scalar

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: Add broadcastable elementwise power derivative #7461 code improvement and clean up, split tests

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: Add broadcastable elementwise power derivative #7461 some code clean up

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* Libnd4j: Add broadcastable elementwise power derivative replace __isnanf by internal realization

Signed-off-by: Oleg <oleg.semeniv@gmail.com>

* pow_bp wrapper

* Fixed PowBp wrapper

* Tests added

* Test fixed

* Fix return type

* Disable powBp usage

* Pow backprop changed

Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
2020-01-20 12:59:12 +03:00