* Added comments to tileKernel routine.
* Refactored kernel and added doc to it.
* Refactored setDiagonal kernel and added doc for it.
* Added doc for tnse cuda helpers.
* Added doc for diag kernels.
* Added doc for kernel.
* Refactored code with fake quantization.
* Added docs for image resize and crop kernels.
* Added docs for image suppression helpers.
* Added docs to matrix_band helpers.
* Added docs for matrix_diag_part and nth_element helpers.
* Fixed syntax error and refactored getIndexOffset usage.
* - profiling bias_add op
- add some docementation
Signed-off-by: Yurii <yurii@skymind.io>
* - minor change
Signed-off-by: Yurii <yurii@skymind.io>
* - provide addBias cuda kernel
Signed-off-by: Yurii <yurii@skymind.io>
* - improve shape::getIndexOfffset and change its signature
Signed-off-by: Yurii <yurii@skymind.io>
* - same as previous
Signed-off-by: Yurii <yurii@skymind.io>
* - improve and change signature in some shape:: stuff which has to do with calculation of offsets for array elements
Signed-off-by: Yurii <yurii@skymind.io>
* - minor changes in flatten
Signed-off-by: Yurii <shyrma@skymind.io>
* - add function shape::getIndexOffsetOrdered
Signed-off-by: Yurii <shyrma@skymind.io>
* - correct shape::getIndexOffsetOrdered()
Signed-off-by: Yurii <shyrma@skymind.io>
* - move getIndexOffsetOrdered to flatten.h header in order to isolate this function
Signed-off-by: Yurii <shyrma@skymind.io>
* Refactored extract_image_patches op helpers.
* Eliminated compliler errors with helper implementation.
* Finished implementation for extract_image_patches both cpu and cuda helpers.
* Improved cpu implementation.
* Improved cuda implementation for extract_image_patches helper.
* Added omp to ClipByGlobalNorm helpers implementation.
* Added implementation for thresholedrelu_bp op.
* Fixed cuda kernel with F order.
* Fixed tests for subarray.
* Refactored tests for Gaussian_3 and Truncated_22.
* Added tests for GaussianDistribution with native ops.
* Modified tests for Gaussian distribution.
* Fixed random tests.
* Fixed atomicMin/atomicMax for 64bit cases.
* Fixed tests for execReduce3TAD tests.
* Eliminated waste comments.