cavis/libnd4j/include
Yurii Shyrma 66b84b38cf
Shyrma mmul (#58)
* - get rid of some copy procedures in mmulHelper ops

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further work on embedding cuda api for batched gemm (cublasGemmBatchedEx) in our mmulHelper class

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further work on cuda batched gamm api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - write own cuda kernel performing batched gemm

Signed-off-by: Yurii <iuriish@yahoo.com>

* missing include in MmulHelper

Signed-off-by: raver119 <raver119@gmail.com>

* - forgot to keep in code previous correct kernels for mmulNxN, since it may happen that new onw will fail for some reason in future

Signed-off-by: Yurii <iuriish@yahoo.com>

* disable old tensordot

Signed-off-by: raver119 <raver119@gmail.com>

* - rewrite cuda kernels for usualGemm and usualGemv

Signed-off-by: Yurii <iuriish@yahoo.com>

* - profiling mmul helpers

Signed-off-by: Yurii <iuriish@yahoo.com>

* - prints to check shapes were added

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct type of output array Cin mmulNxN

Signed-off-by: Yurii <iuriish@yahoo.com>

* - take into account possible nans in C array

Signed-off-by: Yurii <iuriish@yahoo.com>

* slightly change numThreads message

Signed-off-by: raver119 <raver119@gmail.com>

* - make corrections in accordance to given notes in pr review

Signed-off-by: Yurii <iuriish@yahoo.com>
2019-11-19 15:39:36 +02:00
..
array [WIP] ThreadPool (#8) 2019-11-13 17:04:59 +03:00
cnpy [WIP] ThreadPool (#8) 2019-11-13 17:04:59 +03:00
exceptions Eclipse Migration Initial Commit 2019-06-06 15:21:15 +03:00
execution [WIP] Platform helpers switches (#44) 2019-11-14 14:35:02 +03:00
graph build fix for clang 2019-11-16 22:18:50 +03:00
helpers Shyrma mmul (#58) 2019-11-19 15:39:36 +02:00
indexing Eclipse Migration Initial Commit 2019-06-06 15:21:15 +03:00
loops [WIP] Mish (#55) 2019-11-18 13:21:26 +03:00
memory [WIP] More of CUDA (#95) 2019-08-05 11:27:05 +10:00
ops Shyrma mmul (#58) 2019-11-19 15:39:36 +02:00
performance/benchmarking [WIP] ThreadPool (#8) 2019-11-13 17:04:59 +03:00
types Platform helpers (#8216) 2019-09-11 21:50:28 +03:00
Status.h Eclipse Migration Initial Commit 2019-06-06 15:21:15 +03:00
buffer.h [WIP] ThreadPool (#8) 2019-11-13 17:04:59 +03:00
cblas.h Platform helpers (#8216) 2019-09-11 21:50:28 +03:00
cblas_enum_conversion.h Eclipse Migration Initial Commit 2019-06-06 15:21:15 +03:00
config.h.in Eclipse Migration Initial Commit 2019-06-06 15:21:15 +03:00
dll.h [WIP] ThreadPool (#8) 2019-11-13 17:04:59 +03:00
enum_boilerplate.h Eclipse Migration Initial Commit 2019-06-06 15:21:15 +03:00
msvc.h [WIP] ThreadPool (#8) 2019-11-13 17:04:59 +03:00
nd4jmalloc.h Eclipse Migration Initial Commit 2019-06-06 15:21:15 +03:00
nd4jmemset.h Eclipse Migration Initial Commit 2019-06-06 15:21:15 +03:00
op_boilerplate.h [WIP] ThreadPool (#8) 2019-11-13 17:04:59 +03:00
op_enums.h [WIP] Int broadcastables (#195) 2019-08-30 10:12:40 +03:00
openmp_pragmas.h [WIP] ThreadPool (#8) 2019-11-13 17:04:59 +03:00
optype.h Eclipse Migration Initial Commit 2019-06-06 15:21:15 +03:00
pairwise_util.h Shugeo random uniform int (#30) 2019-11-06 12:49:27 +02:00
platform_boilerplate.h Platform helpers (#8216) 2019-09-11 21:50:28 +03:00
platformmath.h Eclipse Migration Initial Commit 2019-06-06 15:21:15 +03:00
play.h [WIP] multi-device support (#80) 2019-08-14 16:52:34 +03:00
pointercast.h [WIP] ThreadPool (#8) 2019-11-13 17:04:59 +03:00
templatemath.h [WIP] Mish (#55) 2019-11-18 13:21:26 +03:00
type_boilerplate.h [WIP] multi-device support (#80) 2019-08-14 16:52:34 +03:00
util.h Eclipse Migration Initial Commit 2019-06-06 15:21:15 +03:00