Commit Graph

2 Commits (483c3d7b8cb69d1d7e436b450e2c7b4218f5f2cb)

Author SHA1 Message Date
Yurii Shyrma 66b84b38cf
Shyrma mmul (#58)
* - get rid of some copy procedures in mmulHelper ops

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further work on embedding cuda api for batched gemm (cublasGemmBatchedEx) in our mmulHelper class

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further work on cuda batched gamm api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - write own cuda kernel performing batched gemm

Signed-off-by: Yurii <iuriish@yahoo.com>

* missing include in MmulHelper

Signed-off-by: raver119 <raver119@gmail.com>

* - forgot to keep in code previous correct kernels for mmulNxN, since it may happen that new onw will fail for some reason in future

Signed-off-by: Yurii <iuriish@yahoo.com>

* disable old tensordot

Signed-off-by: raver119 <raver119@gmail.com>

* - rewrite cuda kernels for usualGemm and usualGemv

Signed-off-by: Yurii <iuriish@yahoo.com>

* - profiling mmul helpers

Signed-off-by: Yurii <iuriish@yahoo.com>

* - prints to check shapes were added

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct type of output array Cin mmulNxN

Signed-off-by: Yurii <iuriish@yahoo.com>

* - take into account possible nans in C array

Signed-off-by: Yurii <iuriish@yahoo.com>

* slightly change numThreads message

Signed-off-by: raver119 <raver119@gmail.com>

* - make corrections in accordance to given notes in pr review

Signed-off-by: Yurii <iuriish@yahoo.com>
2019-11-19 15:39:36 +02:00
raver119 6de00bf75f
[WIP] Weekly update of repo (#8390)
* [WIP] Fix compilation after nd4j changes (#37)

* Fix compilation.

* Some tests fixed

* Disable tests temporarily.

* Restored test

* Tests restored.

* Test restored.

* [WIP] perf tests (#40)

* special maxpool test

Signed-off-by: raver119 <raver119@gmail.com>

* special maxpool test

Signed-off-by: raver119 <raver119@gmail.com>

* Shyrma bnorm bp (#41)

Batchnorm backprop mkldnn

* Add SameDiff memory reuse memory manager (array cache) (#39)

* Attention op comments

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* ArrayCacheMemoryMgr - first pass

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Tweak array cache for use with SameDiff identity arrays

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* ArrayCacheMemoryMgr javadoc and properly get max memory

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* LRU cache policy + add tests

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Fixes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Resize arrays internally if required for ArrayCacheMemoryMgr

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Test improvement

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Small polish

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* SameDiff op runtime benchmarking listener (#42)

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* INLINE_LOOPS for windows

Signed-off-by: raver119 <raver119@gmail.com>

* [WIP] ThreadPool (#8)

This PR removes OpenMP use in 95% of cases
2019-11-13 17:15:18 +03:00