66b84b38cf
* - get rid of some copy procedures in mmulHelper ops Signed-off-by: Yurii <iuriish@yahoo.com> * - further work on embedding cuda api for batched gemm (cublasGemmBatchedEx) in our mmulHelper class Signed-off-by: Yurii <iuriish@yahoo.com> * - further work on cuda batched gamm api Signed-off-by: Yurii <iuriish@yahoo.com> * - write own cuda kernel performing batched gemm Signed-off-by: Yurii <iuriish@yahoo.com> * missing include in MmulHelper Signed-off-by: raver119 <raver119@gmail.com> * - forgot to keep in code previous correct kernels for mmulNxN, since it may happen that new onw will fail for some reason in future Signed-off-by: Yurii <iuriish@yahoo.com> * disable old tensordot Signed-off-by: raver119 <raver119@gmail.com> * - rewrite cuda kernels for usualGemm and usualGemv Signed-off-by: Yurii <iuriish@yahoo.com> * - profiling mmul helpers Signed-off-by: Yurii <iuriish@yahoo.com> * - prints to check shapes were added Signed-off-by: Yurii <iuriish@yahoo.com> * - correct type of output array Cin mmulNxN Signed-off-by: Yurii <iuriish@yahoo.com> * - take into account possible nans in C array Signed-off-by: Yurii <iuriish@yahoo.com> * slightly change numThreads message Signed-off-by: raver119 <raver119@gmail.com> * - make corrections in accordance to given notes in pr review Signed-off-by: Yurii <iuriish@yahoo.com> |
||
---|---|---|
.. | ||
src | ||
pom.xml | ||
valgrindCudaJava | ||
valgrindJava |