* initial commit Signed-off-by: raver119 <raver119@gmail.com> * one file Signed-off-by: raver119 <raver119@gmail.com> * few more includes Signed-off-by: raver119 <raver119@gmail.com> * m? Signed-off-by: raver119 <raver119@gmail.com> * const Signed-off-by: raver119 <raver119@gmail.com> * cudnn linkage in tests Signed-off-by: raver119 <raver119@gmail.com> * culibos Signed-off-by: raver119 <raver119@gmail.com> * static reminder Signed-off-by: raver119 <raver119@gmail.com> * platform engine tag Signed-off-by: raver119 <raver119@gmail.com> * HAVE_CUDNN moved to config.h.in Signed-off-by: raver119 <raver119@gmail.com> * include Signed-off-by: raver119 <raver119@gmail.com> * include Signed-off-by: raver119 <raver119@gmail.com> * skip cudnn handle creation if there's not cudnn Signed-off-by: raver119 <raver119@gmail.com> * meh Signed-off-by: raver119 <raver119@gmail.com> * target device in context Signed-off-by: raver119 <raver119@gmail.com> * platform engines Signed-off-by: raver119 <raver119@gmail.com> * platform engines Signed-off-by: raver119 <raver119@gmail.com> * allow multiple -h args Signed-off-by: raver119 <raver119@gmail.com> * allow multiple -h args Signed-off-by: raver119 <raver119@gmail.com> * move mkldnn out of CPU block Signed-off-by: raver119 <raver119@gmail.com> * link to mkldnn on cuda Signed-off-by: raver119 <raver119@gmail.com> * less prints Signed-off-by: raver119 <raver119@gmail.com> * minor tweaks Signed-off-by: raver119 <raver119@gmail.com> * next step Signed-off-by: raver119 <raver119@gmail.com> * conv2d NCHW draft Signed-off-by: raver119 <raver119@gmail.com> * conv2d biasAdd Signed-off-by: raver119 <raver119@gmail.com> * test for MKL/CUDNN combined use Signed-off-by: raver119 <raver119@gmail.com> * - provide additional code for conv2d ff based on cudnn api, not tested yet Signed-off-by: Yurii <iuriish@yahoo.com> * - further work on conv2d helper based on using cudnn api Signed-off-by: Yurii <iuriish@yahoo.com> * - fixing several cuda bugs which appeared after cudnn lib had been started to use Signed-off-by: Yurii <iuriish@yahoo.com> * - implementation of conv2d backprop op based on cudnn api Signed-off-by: Yurii <iuriish@yahoo.com> * - implementaion of conv3d and conv3d_bp ops based on cudnn api Signed-off-by: Yurii <iuriish@yahoo.com> * - bugs fixing in conv3d/conv3d_bp ops (cudnn in use) Signed-off-by: Yurii <iuriish@yahoo.com> * - implementation of depthwiseConv2d (ff/bp) op based on cudnn api Signed-off-by: Yurii <iuriish@yahoo.com> * - implementation of batchnorm ff op based on cudnn api Signed-off-by: Yurii <iuriish@yahoo.com> * - disable cudnn batchnorm temporary Signed-off-by: Yurii <iuriish@yahoo.com> * - add minor change in cmake Signed-off-by: Yurii <iuriish@yahoo.com> * engine for depthwise mkldnn Signed-off-by: raver119 <raver119@gmail.com> * couple of includes Signed-off-by: raver119 <raver119@gmail.com> * - provide permutation to cudnn batchnorm ff when format is NHWC Signed-off-by: Yurii <iuriish@yahoo.com> * lgamma fix Signed-off-by: raver119 <raver119@gmail.com> * - eliminate memory leak in two tests Signed-off-by: Yurii <iuriish@yahoo.com> Co-authored-by: Yurii Shyrma <iuriish@yahoo.com> |
||
---|---|---|
.. | ||
legacy | ||
BarnesHutTsne.cu | ||
README.md | ||
activations.cu | ||
addBias.cu | ||
adjust_hue.cu | ||
adjust_saturation.cu | ||
axis.cu | ||
batched_gemm.cu | ||
batchnorm.cu | ||
betaInc.cu | ||
col2im.cu | ||
compare_elem.cu | ||
concat.cu | ||
confusion.cu | ||
convolutions.cu | ||
cross.cu | ||
d_t_s.cu | ||
diGamma.cu | ||
diag.cu | ||
dilation2d.cu | ||
dropout.cu | ||
dynamic.cu | ||
extract_patches.cu | ||
fake_quantization.cu | ||
flatten.cu | ||
gather.cu | ||
gather_nd.cu | ||
gradient.cu | ||
gru.cu | ||
hamming.cu | ||
hashcode.cu | ||
histogram.cu | ||
histogramFixedWidth.cu | ||
im2col.cu | ||
image_draw_bounding_boxes.cu | ||
image_resize.cu | ||
image_suppression.cu | ||
imagesHelpers.cu | ||
ismax.cu | ||
legacy_helper.cu | ||
lgamma.cu | ||
lrn.cu | ||
lstm.cu | ||
lup.cu | ||
matrixSetDiag.cu | ||
matrix_band.cu | ||
matrix_diag_part.cu | ||
max_pooling.cu | ||
maximum.cu | ||
merge.cu | ||
meshgrid.cu | ||
minimum.cu | ||
nth_element.cu | ||
one_hot.cu | ||
pad.cu | ||
percentile.cu | ||
polyGamma.cu | ||
prefix.cu | ||
print_variable.cu | ||
random.cu | ||
random_crop.cu | ||
range.cu | ||
reverse.cu | ||
roll.cu | ||
s_t_b.cu | ||
s_t_d.cu | ||
scatter.cu | ||
scatter_simple.cu | ||
scatter_update.cu | ||
segment.cu | ||
segment_max.cu | ||
segment_mean.cu | ||
segment_min.cu | ||
segment_prod.cu | ||
segment_sqrtn.cu | ||
segment_sum.cu | ||
sequence_mask.cu | ||
sg_cb.cu | ||
shift.cu | ||
sru.cu | ||
stack.cu | ||
svd.cu | ||
toggle_bits.cu | ||
top_k.cu | ||
transforms.cu | ||
weights.cu | ||
zeta.cu |
README.md
This folder contains CUDA-specific implementations for operations.