raver119 3c4e959e21 [WIP] More of CUDA (#95)
* initial commit

Signed-off-by: raver119 <raver119@gmail.com>

* Implementation of hashcode cuda helper. Working edition.

* Fixed parallel test input arangements.

* Fixed tests for hashcode op.

* Fixed shape calculation for image:crop_and_resize op and test.

* NativeOps tests. Initial test suite.

* Added tests for indexReduce methods.

* Added test on execBroadcast with NDArray as dimensions.

* Added test on execBroadcastBool with NDArray as dimensions.

* Added tests on execPairwiseTransform and execPairwiseTransofrmBool.

* Added tests for execReduce with scalar results.

* Added reduce tests for non-empty dims array.

* Added tests for reduce3.

* Added tests for execScalar.

* Added tests for execSummaryStats.

* - provide cpu/cuda code for batch_to_space
- testing it

Signed-off-by: Yurii <yurii@skymind.io>

* - remove old test for batch_to_space (had wrong format and numbers were not checked)

Signed-off-by: Yurii <yurii@skymind.io>

* Fixed complilation errors with test.

* Added test for execTransformFloat.

* Added test for execTransformSame.

* Added test for execTransformBool.

* Added test for execTransformStrict.

* Added tests for execScalar/execScalarBool with TADs.

* Added test for flatten.

* - provide cpu/cuda code for space_to_Batch operaion

Signed-off-by: Yurii <yurii@skymind.io>

* Added test for concat.

* comment unnecessary stuff in s_t_b

Signed-off-by: Yurii <yurii@skymind.io>

* Added test for specialConcat.

* Added tests for memcpy/set routines.

* Fixed pullRow cuda test.

* Added pullRow test.

* Added average test.

* - correct typo in NDArray::applyPairwiseTransform(nd4j::pairwise::BoolOps op...)

Signed-off-by: Yurii <yurii@skymind.io>

* - debugging and fixing cuda tests in JavaInteropTests file

Signed-off-by: Yurii <yurii@skymind.io>

* - correct some tests

Signed-off-by: Yurii <yurii@skymind.io>

* Added test for shuffle.

* Fixed ops declarations.

* Restored omp and added shuffle test.

* Added convertTypes test.

* Added tests for execRandom. Eliminated usage of RandomBuffer with NativeOps.

* Added sort tests.

* Added tests for execCustomOp.

* - further debuging and fixing tests terminated with crash

Signed-off-by: Yurii <yurii@skymind.io>

* Added tests for calculateOutputShapes.

* Addded Benchmarks test.

* Commented benchmark tests.

* change assertion

Signed-off-by: raver119 <raver119@gmail.com>

* Added tests for apply_sgd op. Added cpu helper for that op.

* Implement cuda helper for aplly_sgd op. Fixed tests for NativeOps.

* Added test for assign broadcastable.

* Added tests for assign_bp op.

* Added tests for axpy op.

* - assign/execScalar/execTransformAny signature change
- minor test fix

Signed-off-by: raver119 <raver119@gmail.com>

* Fixed axpy op.

* meh

Signed-off-by: raver119 <raver119@gmail.com>

* - fix tests for nativeOps::concat

Signed-off-by: Yurii <yurii@skymind.io>

* sequential transform/scalar

Signed-off-by: raver119 <raver119@gmail.com>

* allow nested parallelism

Signed-off-by: raver119 <raver119@gmail.com>

* assign_bp leak fix

Signed-off-by: raver119 <raver119@gmail.com>

* block setRNG fix

Signed-off-by: raver119 <raver119@gmail.com>

* enable parallelism by default

Signed-off-by: raver119 <raver119@gmail.com>

* enable nested parallelism by default

Signed-off-by: raver119 <raver119@gmail.com>

* Added cuda implementation for row_count helper.

* Added implementation for tnse gains op helper.

* - take into account possible situations when input arrays are empty in reduce_ cuda stuff

Signed-off-by: Yurii <yurii@skymind.io>

* Implemented tsne/edge_forces op cuda-based helper. Parallelized cpu-based helper for edge_forces.

* Added kernel for tsne/symmetrized op heleper.

* Implementation of tsne/symmetrized op cuda helper. Working edition.

* Eliminated waste printfs.

* Added test for broadcastgradientargs op.

* host-only fallback for empty reduce float

Signed-off-by: raver119 <raver119@gmail.com>

* - some tests fixes

Signed-off-by: Yurii <yurii@skymind.io>

* - correct the rest of reduce_ stuff

Signed-off-by: Yurii <yurii@skymind.io>

* - further correction of reduce_ stuff

Signed-off-by: Yurii <yurii@skymind.io>

* Added test for Cbow op. Also added cuda implementation for cbow helpers.

* - improve code of stack operation for scalar case

Signed-off-by: Yurii <yurii@skymind.io>

* - provide cuda kernel for gatherND operation

Signed-off-by: Yurii <yurii@skymind.io>

* Implementation of cbow helpers with cuda kernels.

* minor tests tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* minor tests tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* - further correction of cuda stuff

Signed-off-by: Yurii <yurii@skymind.io>

* Implementatation of cbow op helper with cuda kernels. Working edition.

* Skip random testing for cudablas case.

* lstmBlockCell context fix

Signed-off-by: raver119 <raver119@gmail.com>

* Added tests for ELU and ELU_BP ops.

* Added tests for eq_scalar, gt_scalar, gte_scalar and lte_scalar ops.

* Added tests for neq_scalar.

* Added test for noop.

* - further work on clipbynorm_bp

Signed-off-by: Yurii <yurii@skymind.io>

* - get rid of concat op call, use instead direct concat helper call

Signed-off-by: Yurii <yurii@skymind.io>

* lstmBlockCell context fix

Signed-off-by: raver119 <raver119@gmail.com>

* Added tests for lrelu and lrelu_bp.

* Added tests for selu and selu_bp.

* Fixed lrelu derivative helpers.

* - some corrections in lstm

Signed-off-by: Yurii <yurii@skymind.io>

* operator * result shape fix

Signed-off-by: raver119 <raver119@gmail.com>

* - correct typo in lstmCell

Signed-off-by: Yurii <yurii@skymind.io>

* few tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* CUDA inverse broadcast bool fix

Signed-off-by: raver119 <raver119@gmail.com>

* disable MMAP test for CUDA

Signed-off-by: raver119 <raver119@gmail.com>

* BooleanOp syncToDevice

Signed-off-by: raver119 <raver119@gmail.com>

* meh

Signed-off-by: raver119 <raver119@gmail.com>

* additional data types for im2col/col2im

Signed-off-by: raver119 <raver119@gmail.com>

* Added test for firas_sparse op.

* one more RandomBuffer test excluded

Signed-off-by: raver119 <raver119@gmail.com>

* Added tests for flatten op.

* Added test for Floor op.

* bunch of tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* mmulDot tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* more tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* Implemented floordiv_bp op and tests.

* Fixed scalar case with cuda implementation for bds.

* - work on cuda kernel for clip_by_norm backprop op is completed

Signed-off-by: Yurii <yurii@skymind.io>

* Eliminate cbow crach.

* more tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* more tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* Eliminated abortion with batched nlp test.

* more tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* Fixed shared flag initializing.

* disabled bunch of cpu workspaces tests

Signed-off-by: raver119 <raver119@gmail.com>

* scalar operators fix: missing registerSpecialUse call

Signed-off-by: raver119 <raver119@gmail.com>

* Fixed logdet for cuda and tests.

* - correct clipBynorm_bp

Signed-off-by: Yurii <yurii@skymind.io>

* Fixed crop_and_resize shape datatype.

* - correct some mmul tests

Signed-off-by: Yurii <yurii@skymind.io>
2019-08-05 11:27:05 +10:00

165 lines
5.4 KiB
C++

/*******************************************************************************
* Copyright (c) 2015-2018 Skymind, Inc.
*
* This program and the accompanying materials are made available under the
* terms of the Apache License, Version 2.0 which is available at
* https://www.apache.org/licenses/LICENSE-2.0.
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
* WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
* License for the specific language governing permissions and limitations
* under the License.
*
* SPDX-License-Identifier: Apache-2.0
******************************************************************************/
//
// Created by raver119 on 15.10.2017.
//
#include "testlayers.h"
#include <Graph.h>
#include <Node.h>
#include <ops/declarable/CustomOperations.h>
using namespace nd4j;
using namespace nd4j::graph;
class ScopeTests : public testing::Test {
public:
};
TEST_F(ScopeTests, BasicTests_1) {
Graph graph;
auto x = NDArrayFactory::create_<float>('c', {2, 2});
x->assign(0.0f);
auto variableSpace = graph.getVariableSpace();
variableSpace->putVariable(-1, x);
nd4j::ops::Scope opScope;
auto scopeBody = new Node(OpType_LOGIC, 10, 1);
scopeBody->setName("scopeBody");
scopeBody->setCustomOp(&opScope);
graph.addNode(scopeBody);
ASSERT_EQ(1, graph.totalNodes());
auto scopedB0 = new Node(OpType_SCALAR, 0, 6, {-1}, {}, {}, 1.0f);
scopedB0->markInplace(true);
scopedB0->setScopeInfo(1, "scopeBody");
graph.addNode(scopedB0);
ASSERT_EQ(1, graph.totalNodes());
}
/*
TEST_F(ScopeTests, RealTests_1) {
Graph graph;
auto x = NDArrayFactory::create_<float>('c', {2, 2});
x->assign(0.0f);
auto y = NDArrayFactory::create_<float>('c', {2, 2});
y->assign(0.0);
// auto scalar = NDArrayFactory::create_<float>('c', {1, 1});
auto scalar = NDArrayFactory::create_<float>(10.f);
//scalar->p(0, 10);
auto variableSpace = graph.getVariableSpace();
variableSpace->putVariable(-1, x);
variableSpace->putVariable(-2, y);
variableSpace->putVariable(-3, scalar);
// just few ops coming before while
auto nodeA = new Node(OpType_TRANSFORM_SAME, transform::OneMinus, 1, {-1});
auto nodeB = new Node(OpType_SCALAR, scalar::Add, 2, {1}, {}, {}, 1.0);
//
auto scopeCondition = new Node(OpType_LOGIC, logic::Scope, 3);
scopeCondition->setName("scopeCondition");
nd4j::ops::Scope opScope;
scopeCondition->setCustomOp(&opScope);
// this is scope of the body, it'll be executed multiple times
auto scopeBody = new Node(OpType_LOGIC, logic::Scope, 10);
scopeBody->setName("scopeBody");
scopeBody->setCustomOp(&opScope);
////////////////////////////////////////////////////////////////////////////////////////////////////
//// filling out condition scope
////////////////////////////////////////////////////////////////////////////////////////////////////
// this is Sum accumulation, which feed
auto scopedA0 = new Node(OpType_REDUCE_SAME, reduce::Sum, 4, {12});
scopedA0->setScopeInfo(3, "scopeCondition");
// this op compares LT A0 result with variable `scalar` which is 10;
nd4j::ops::lt_scalar op;
auto scopedA1 = new Node(&op, 5, {4, -3});
scopedA1->setScopeInfo(3, "scopeCondition");
////////////////////////////////////////////////////////////////////////////////////////////////////
//// filling out body scope
////////////////////////////////////////////////////////////////////////////////////////////////////
auto scopedB0 = new Node(OpType_SCALAR, scalar::Add, 6, {12}, {}, {}, 1.0f);
scopedB0->markInplace(false);
scopedB0->setScopeInfo(10, "scopeBody");
auto nodeReturn = new Node(OpType_LOGIC, logic::Return, 7, {6}, {12});
nd4j::ops::Return opReturn;
nodeReturn->setCustomOp(&opReturn);
nodeReturn->setScopeInfo(10, "scopeBody");
// WHILE operations takes 2 scopes - :0 is condition scope, and :1 is loop body scope
auto nodeWhile = new Node(OpType_LOGIC, logic::While, 12, {-2, 3, 10});
nd4j::ops::While opWhile;
nodeWhile->setCustomOp(&opWhile);
// adding root nodes first, nothing unusual expected here
graph.addNode(nodeA);
graph.addNode(nodeB);
// now we're registering our scopes
graph.addNode(scopeCondition);
graph.addNode(scopeBody);
// at this moment graph should have 4 (four) nodes registered
ASSERT_EQ(4, graph.totalNodes());
// adding node that's attached to some scope. so it should be pushed to specific scope
graph.addNode(scopedA0);
// we should still have 4 ops in graph, because node added above - goes directly into the scope
// thus falls out of the graph direct execution - it can be executed only via Scope
ASSERT_EQ(4, graph.totalNodes());
graph.addNode(scopedA1);
graph.addNode(scopedB0);
graph.addNode(nodeReturn);
// should be still 4. no options here.
ASSERT_EQ(4, graph.totalNodes());
// WHILE is valid node, so we expect nodes counter to go up
graph.addNode(nodeWhile);
ASSERT_EQ(5, graph.totalNodes());
// now, let's try to execute graph
Nd4jStatus status = GraphExecutioner::execute(&graph);
ASSERT_EQ(ND4J_STATUS_OK, status);
auto w = variableSpace->getVariable(12, 0)->getNDArray();
w->printShapeInfo("w shape");
ASSERT_NEAR(12.f, w->sumNumber().e<float>(0), 1e-5f);
}
*/