* libnd4j cast loop types
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j more type castination added to loops
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j sync casting types of iterated variable in loops
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j more loops reviewed for vectorization problem fix
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j fixed several typos
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j several more files reviewed to fix auto-vectorization problem in loops
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j merge master and reviewed more files to fix auto-vectorization problem in loops
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j several type casting added in broadcasting that were missed, fixed mac builds
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j double check all files and fix several more places in loops
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j fixed builds
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j revert changes for lup.cpp
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* - profiling bias_add op
- add some docementation
Signed-off-by: Yurii <yurii@skymind.io>
* - minor change
Signed-off-by: Yurii <yurii@skymind.io>
* - provide addBias cuda kernel
Signed-off-by: Yurii <yurii@skymind.io>
* - improve shape::getIndexOfffset and change its signature
Signed-off-by: Yurii <yurii@skymind.io>
* - same as previous
Signed-off-by: Yurii <yurii@skymind.io>
* - improve and change signature in some shape:: stuff which has to do with calculation of offsets for array elements
Signed-off-by: Yurii <yurii@skymind.io>
* - minor changes in flatten
Signed-off-by: Yurii <shyrma@skymind.io>
* - add function shape::getIndexOffsetOrdered
Signed-off-by: Yurii <shyrma@skymind.io>
* - correct shape::getIndexOffsetOrdered()
Signed-off-by: Yurii <shyrma@skymind.io>
* - move getIndexOffsetOrdered to flatten.h header in order to isolate this function
Signed-off-by: Yurii <shyrma@skymind.io>
* CUDA empty reduction
Signed-off-by: raver119 <raver119@gmail.com>
* - listdiff synchronization fix for CUDA
- listdiff test
Signed-off-by: raver119 <raver119@gmail.com>
* - IndexReduce ops now allow INDEXING_TYPES output
- topK op accepts only INDEXING_TYPES as output
Signed-off-by: raver119 <raver119@gmail.com>