* fix double consumption of rng on cpu Signed-off-by: raver119 <raver119@gmail.com> * Shyrma docs (#222) * - documenting and profiling matrix_set_diag cuda kernel Signed-off-by: Yurii <yurii@skymind.io> * - correct formula of pnorm pooling in cuda 2d/3d kernels - remove helper matrix_diag which duplicates work of helper matrix_set_diag Signed-off-by: Yurii <yurii@skymind.io> * cublasHandle sharing + lock Signed-off-by: raver119 <raver119@gmail.com> * cublasHandle sharing + lock Signed-off-by: raver119 <raver119@gmail.com> * Documentation from serialization/deserialization in NLP (#221) * refactoring Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * Javadocs Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * Javadoc fixed Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * Cleanup Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * dedicated lock for getCudaCublasHandle Signed-off-by: raver119 <raver119@gmail.com> * Small fixes (#223) Signed-off-by: AlexDBlack <blacka101@gmail.com> * ELU DL4J fixes (#224) Signed-off-by: AlexDBlack <blacka101@gmail.com> * javadoc (#225) Signed-off-by: Robert Altena <Rob@Ra-ai.com> * Small test compilation fix (#226) Signed-off-by: AlexDBlack <blacka101@gmail.com> * #8182 remove spark version suffix (#227) Signed-off-by: AlexDBlack <blacka101@gmail.com> * [WIP] Thread safety (#229) * sync after cublas*gemm Signed-off-by: raver119 <raver119@gmail.com> * mutex for CublasHelper Signed-off-by: raver119 <raver119@gmail.com> * don't store cublasHandle in LaunchContext, it's per-device anyway Signed-off-by: raver119 <raver119@gmail.com> * some printout Signed-off-by: raver119 <raver119@gmail.com> * check for field instead Signed-off-by: raver119 <raver119@gmail.com> * pew-pew Signed-off-by: raver119 <raver119@gmail.com> * don't release ContextBuffers until device changed Signed-off-by: raver119 <raver119@gmail.com> * small tweak Signed-off-by: raver119 <raver119@gmail.com> * some logging in sgemm Signed-off-by: raver119 <raver119@gmail.com> * stream sync Signed-off-by: raver119 <raver119@gmail.com> * some more logging Signed-off-by: raver119 <raver119@gmail.com> * some more error checks Signed-off-by: raver119 <raver119@gmail.com> * one fancy test Signed-off-by: raver119 <raver119@gmail.com> * one fancy test Signed-off-by: raver119 <raver119@gmail.com> * minor AffinityManager fix Signed-off-by: raver119 <raver119@gmail.com> * cudaEvent error logging improvement Signed-off-by: raver119 <raver119@gmail.com> * ConstantHelper thread safety Signed-off-by: raver119 <raver119@gmail.com> * - minor corrections in ConstantTadHelper Signed-off-by: Yurii <yurii@skymind.io> * ConstantShapeHelper thread safety Signed-off-by: raver119 <raver119@gmail.com> * ConstantTadHelper.cu updated Signed-off-by: raver119 <raver119@gmail.com> * logging off Signed-off-by: raver119 <raver119@gmail.com> * logging off Signed-off-by: raver119 <raver119@gmail.com> |
||
---|---|---|
.. | ||
legacy | ||
BarnesHutTsne.cu | ||
README.md | ||
activations.cu | ||
adjust_hue.cu | ||
adjust_saturation.cu | ||
axis.cu | ||
batched_gemm.cu | ||
batchnorm.cu | ||
betaInc.cu | ||
col2im.cppc | ||
col2im.cu | ||
compare_elem.cu | ||
concat.cu | ||
confusion.cu | ||
convolutions.cu | ||
cross.cu | ||
d_t_s.cu | ||
diag.cu | ||
dilation2d.cu | ||
dropout.cu | ||
dynamic.cu | ||
extract_patches.cu | ||
fake_quantization.cu | ||
flatten.cu | ||
gather.cu | ||
gather_nd.cu | ||
gradient.cu | ||
gru.cu | ||
hamming.cu | ||
hashcode.cu | ||
histogram.cu | ||
histogramFixedWidth.cu | ||
im2col.cppc | ||
im2col.cu | ||
image_resize.cu | ||
image_suppression.cu | ||
ismax.cu | ||
legacy_helper.cu | ||
lrn.cu | ||
lstm.cu | ||
lup.cu | ||
matrixSetDiag.cu | ||
matrix_band.cu | ||
matrix_diag_part.cu | ||
max_pooling.cu | ||
maximum.cu | ||
merge.cu | ||
meshgrid.cu | ||
minimum.cu | ||
nth_element.cu | ||
one_hot.cu | ||
pad.cu | ||
percentile.cu | ||
polyGamma.cu | ||
prefix.cu | ||
random_crop.cu | ||
range.cu | ||
reverse.cu | ||
roll.cu | ||
s_t_b.cu | ||
s_t_d.cu | ||
scatter.cu | ||
scatter_simple.cu | ||
scatter_update.cu | ||
segment.cu | ||
segment_max.cu | ||
segment_mean.cu | ||
segment_min.cu | ||
segment_prod.cu | ||
segment_sqrtn.cu | ||
segment_sum.cu | ||
sequence_mask.cu | ||
sg_cb.cu | ||
shift.cu | ||
sru.cu | ||
stack.cu | ||
svd.cu | ||
toggle_bits.cu | ||
top_k.cu | ||
transforms.cu | ||
weights.cu | ||
zeta.cu |
README.md
This folder contains CUDA-specific implementations for operations.