cavis/libnd4j/include/ops/declarable/helpers/cuda
Yurii Shyrma 66b84b38cf
Shyrma mmul (#58)
* - get rid of some copy procedures in mmulHelper ops

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further work on embedding cuda api for batched gemm (cublasGemmBatchedEx) in our mmulHelper class

Signed-off-by: Yurii <iuriish@yahoo.com>

* - further work on cuda batched gamm api

Signed-off-by: Yurii <iuriish@yahoo.com>

* - write own cuda kernel performing batched gemm

Signed-off-by: Yurii <iuriish@yahoo.com>

* missing include in MmulHelper

Signed-off-by: raver119 <raver119@gmail.com>

* - forgot to keep in code previous correct kernels for mmulNxN, since it may happen that new onw will fail for some reason in future

Signed-off-by: Yurii <iuriish@yahoo.com>

* disable old tensordot

Signed-off-by: raver119 <raver119@gmail.com>

* - rewrite cuda kernels for usualGemm and usualGemv

Signed-off-by: Yurii <iuriish@yahoo.com>

* - profiling mmul helpers

Signed-off-by: Yurii <iuriish@yahoo.com>

* - prints to check shapes were added

Signed-off-by: Yurii <iuriish@yahoo.com>

* - correct type of output array Cin mmulNxN

Signed-off-by: Yurii <iuriish@yahoo.com>

* - take into account possible nans in C array

Signed-off-by: Yurii <iuriish@yahoo.com>

* slightly change numThreads message

Signed-off-by: raver119 <raver119@gmail.com>

* - make corrections in accordance to given notes in pr review

Signed-off-by: Yurii <iuriish@yahoo.com>
2019-11-19 15:39:36 +02:00
..
legacy [WIP] ThreadPool (#8) 2019-11-13 17:04:59 +03:00
BarnesHutTsne.cu Shugeo cuda doc2 (#255) 2019-09-11 21:04:43 +03:00
README.md Merge master to upstream (#7945) 2019-06-27 18:37:04 +03:00
activations.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
addBias.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
adjust_hue.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
adjust_saturation.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
axis.cu Shugeo cuda docs1 (#249) 2019-09-09 16:27:45 +03:00
batched_gemm.cu Merge master to upstream (#7945) 2019-06-27 18:37:04 +03:00
batchnorm.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
betaInc.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
col2im.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
compare_elem.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
concat.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
confusion.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
convolutions.cu Shyrma mmul (#58) 2019-11-19 15:39:36 +02:00
cross.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
d_t_s.cu Merge master to upstream (#7945) 2019-06-27 18:37:04 +03:00
diag.cu Shugeo cuda doc2 (#255) 2019-09-11 21:04:43 +03:00
dilation2d.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
dropout.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
dynamic.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
extract_patches.cu Shugeo cuda doc2 (#255) 2019-09-11 21:04:43 +03:00
fake_quantization.cu Added doc for fake_quant_with_min_max* op helpers cuda implementations. 2019-10-10 18:35:28 +03:00
flatten.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
gather.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
gather_nd.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
gradient.cu [WIP] minor (#218) 2019-09-02 11:25:48 +03:00
gru.cu Various fixes (#43) 2019-11-14 19:38:20 +11:00
hamming.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
hashcode.cu syncthreads (#136) 2019-08-20 18:28:43 +03:00
histogram.cu [WIP] minor (#218) 2019-09-02 11:25:48 +03:00
histogramFixedWidth.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
im2col.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
image_draw_bounding_boxes.cu The working implementation of draw_bounding_boxes op. 2019-10-08 15:42:27 +03:00
image_resize.cu SameDiff TF import (#49) 2019-11-19 22:44:29 +11:00
image_suppression.cu Shugeo suppression overlaps (#9) 2019-10-30 13:43:45 +02:00
ismax.cu [WIP] minor (#218) 2019-09-02 11:25:48 +03:00
legacy_helper.cu [WIP] ThreadPool (#8) 2019-11-13 17:04:59 +03:00
lrn.cu [WIP] minor (#218) 2019-09-02 11:25:48 +03:00
lstm.cu [WIP] More of CUDA (#95) 2019-08-05 11:27:05 +10:00
lup.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
matrixSetDiag.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
matrix_band.cu Shugeo cuda doc2 (#255) 2019-09-11 21:04:43 +03:00
matrix_diag_part.cu Shugeo cuda doc2 (#255) 2019-09-11 21:04:43 +03:00
max_pooling.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
maximum.cu [WIP] multi-device support (#80) 2019-08-14 16:52:34 +03:00
merge.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
meshgrid.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
minimum.cu [WIP] multi-device support (#80) 2019-08-14 16:52:34 +03:00
nth_element.cu Shugeo cuda doc2 (#255) 2019-09-11 21:04:43 +03:00
one_hot.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
pad.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
percentile.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
polyGamma.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
prefix.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
random.cu Shugeo random uniform int (#30) 2019-11-06 12:49:27 +02:00
random_crop.cu Eclipse Migration Initial Commit 2019-06-06 15:21:15 +03:00
range.cu [WIP] minor (#218) 2019-09-02 11:25:48 +03:00
reverse.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
roll.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
s_t_b.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
s_t_d.cu Merge master to upstream (#7945) 2019-06-27 18:37:04 +03:00
scatter.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
scatter_simple.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
scatter_update.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
segment.cu [WIP] multi-device support (#80) 2019-08-14 16:52:34 +03:00
segment_max.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
segment_mean.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
segment_min.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
segment_prod.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
segment_sqrtn.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
segment_sum.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
sequence_mask.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
sg_cb.cu syncthreads (#136) 2019-08-20 18:28:43 +03:00
shift.cu [WIP] right shift ops (#118) 2019-08-15 20:35:15 +03:00
sru.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
stack.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
svd.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
toggle_bits.cu [WIP] minor (#218) 2019-09-02 11:25:48 +03:00
top_k.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
transforms.cu Shyrma mmul (#58) 2019-11-19 15:39:36 +02:00
weights.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00
zeta.cu [WIP] bunch of improvements (#257) 2019-09-11 20:12:09 +03:00

README.md

This folder contains CUDA-specific implementations for operations.