raver119 c969b724bb [WIP] more CUDA stuff (#57)
* initial commit

Signed-off-by: raver119 <raver119@gmail.com>

* Added gradcheck test for dynamic_partition_bp op.

* - implementation of dilation op (cpu and cuda)

Signed-off-by: Yurii <yurii@skymind.io>

* Fixed broadcast_dynamic_shape 1D case and tests.

* Fixed usage of default integer arguments.

* Fixed dynamic_partition_bp op and tests.

* Eliminated test with grad check for dynamic_partition_bp op.

* start working on cuda svd - porting available corresponding api from cuSOLVER library

Signed-off-by: Yurii <yurii@skymind.io>

* provide prelu_bp

Signed-off-by: Yurii <yurii@skymind.io>

* - provide gruCell_bp (old version ??)

Signed-off-by: Yurii <yurii@skymind.io>

* - polishing cumsum_bp and cumprod_bp tests

Signed-off-by: Yurii <yurii@skymind.io>

* provide sparseSoftmaxCrossEntropyWithLogits and sparseSoftmaxCrossEntropyWithLogits_grad

Signed-off-by: Yurii <yurii@skymind.io>

* Fixed atomicMul with float input/output

* implementation of cuda kernel for triu_bp operation

Signed-off-by: Yurii <yurii@skymind.io>

* Refactored lup helper to add parrallel computing.

* cusolver libraries

Signed-off-by: raver119 <raver119@gmail.com>

* uncomment cuSolver APIs in svd.cu

Signed-off-by: Yurii <yurii@skymind.io>

* cusolver var

Signed-off-by: raver119 <raver119@gmail.com>

* - further work on cuSolver svd

Signed-off-by: Yurii <yurii@skymind.io>

* Implement usage of cuda solver to LUP decomposition.

* - correct naames in lup functions

Signed-off-by: Yurii <yurii@skymind.io>

* correct svdQR cuda

Signed-off-by: Yurii <yurii@skymind.io>

* - provide transpositions of input matrices in case of c order in svdCudaQR

Signed-off-by: Yurii <yurii@skymind.io>

* Fixed implementation issues with LUP usign cuda solver.

* Implementation of matrix_determinant helper with cuda kernels. Working revision.

* Implemented log_matrix_determinant helper with cuda kernels.

* - implementation of batched cuda svd

Signed-off-by: Yurii <yurii@skymind.io>

* Refactored cholesky helper and implementation of cuda solver cholesky batch.

* - implementation of cuda kernel for tile bp

Signed-off-by: Yurii <yurii@skymind.io>

* Implementation of cholesky and logdet with cuda kernels.

* - implementation of cuda kernel for sru_bidirectional

Signed-off-by: Yurii <yurii@skymind.io>

* Fixed cholesky helper.

* Cholesky op helper implementation. Working double-based cublas implementation.

* bad import excluded

Signed-off-by: raver119 <raver119@gmail.com>

* Finished with cuda implementation of cholesky helper and tests.

* - implementation of cuda kernel for sru_bidirectional_backprop operation

Signed-off-by: Yurii <yurii@skymind.io>

* Implementation of matrix_inverse op helper with cuda kernels. The first revision.

* - start working on gruCell_bp

Signed-off-by: Yurii <yurii@skymind.io>

* Implementation of matrix_inverse helper.

* - further work on new gruCell_bp

Signed-off-by: Yurii <yurii@skymind.io>

* cuBLAS related fixes

Signed-off-by: raver119 <raver119@gmail.com>

* calculateOutputShapes() now passes device buffers as well

Signed-off-by: raver119 <raver119@gmail.com>

* special concat/average/accumulate init host pointers now

Signed-off-by: raver119 <raver119@gmail.com>

* few more tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* additional CudaDataBufferFactory signatures certain for data types

Signed-off-by: raver119 <raver119@gmail.com>

* cuSolver host buffer

Signed-off-by: raver119 <raver119@gmail.com>

* buffer to buffer memcpy host ptr allocation

Signed-off-by: raver119 <raver119@gmail.com>
2019-07-20 23:05:21 +10:00

87 lines
3.2 KiB
C++

/*******************************************************************************
* Copyright (c) 2015-2018 Skymind, Inc.
*
* This program and the accompanying materials are made available under the
* terms of the Apache License, Version 2.0 which is available at
* https://www.apache.org/licenses/LICENSE-2.0.
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
* WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
* License for the specific language governing permissions and limitations
* under the License.
*
* SPDX-License-Identifier: Apache-2.0
******************************************************************************/
//
// @author raver119@gmail.com
//
#include <ops/declarable/helpers/helpers.h>
namespace nd4j {
namespace ops {
namespace helpers {
//////////////////////////////////////////////////////////////////////
void dilation2d(nd4j::LaunchContext* context, NDArray *input, NDArray *weights, NDArray *output, const int sH, const int sW, const int pH, const int pW, const int dH, const int dW);
//////////////////////////////////////////////////////////////////////
FORCEINLINE Nd4jStatus outputSize(nd4j::LaunchContext * context, const int inSize, const int k, const int d, const int s, bool isSameMode, int *outSize, int *padding_before, int *padding_after) {
if (s <= 0)
return Status::THROW("Dilation2D: Stride must be > 0");
if (d < 1)
return Status::THROW("Dilation2D: Dilation rate must be >= 1");
int kEff = (k - 1) * d + 1;
if (isSameMode) {
*outSize = (inSize + s - 1) / s;
const int padding_needed = nd4j::math::nd4j_max<int>(0, (*outSize - 1) * s + kEff -inSize);
*padding_before = padding_needed / 2;
*padding_after = padding_needed - *padding_before;
} else {
*outSize = (inSize - kEff + s) / s;
*padding_before = *padding_after = 0;
}
if (*outSize < 0)
return Status::THROW("Dilation2D: outSize has negative value");
return Status::OK();
}
//////////////////////////////////////////////////////////////////////
FORCEINLINE Nd4jStatus dilation_hw(nd4j::LaunchContext * context, Nd4jLong *in, Nd4jLong *wh, std::vector<int> &strides, std::vector<int> &rates, bool isSameMode, int *sH, int *sW, int *pH, int *pW, int *dH, int *dW, int *oH, int *oW) {
const int iH = shape::sizeAt(in, 1);
const int iW = shape::sizeAt(in, 2);
const int iC = shape::sizeAt(in, 3);
*sH = strides[1];
*sW = strides[2];
*dH = rates[1];
*dW = rates[2];
const int kH = shape::sizeAt(wh, 0);
const int kW = shape::sizeAt(wh, 1);
const int kHeff = kH + (kH - 1) * (*dH - 1);
const int kWeff = kW + (kW - 1) * (*dW - 1);
int padding_after_unusedA, padding_after_unusedB;
if (outputSize(context, iH, kHeff, 1, *sH, isSameMode, oH, pH, &padding_after_unusedA) != Status::OK())
return Status::THROW("Dilation2D: bad height");
if (outputSize(context, iW, kWeff, 1, *sW, isSameMode, oW, pW, &padding_after_unusedA) != Status::OK())
return Status::THROW("Dilation2D: bad width");
return Status::OK();
}
}
}
}