2019-06-06 14:21:15 +02:00
/*******************************************************************************
* Copyright ( c ) 2015 - 2018 Skymind , Inc .
*
* This program and the accompanying materials are made available under the
* terms of the Apache License , Version 2.0 which is available at
* https : //www.apache.org/licenses/LICENSE-2.0.
*
* Unless required by applicable law or agreed to in writing , software
* distributed under the License is distributed on an " AS IS " BASIS , WITHOUT
* WARRANTIES OR CONDITIONS OF ANY KIND , either express or implied . See the
* License for the specific language governing permissions and limitations
* under the License .
*
* SPDX - License - Identifier : Apache - 2.0
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
//
// @author raver119@gmail.com
// @author Yurii Shyrma (iuriish@yahoo.com)
//
# include "../MmulHelper.h"
2020-03-02 10:49:41 +01:00
# include <array/NDArrayFactory.h>
2019-06-06 14:21:15 +02:00
# include <helpers/BlasHelper.h>
2019-11-19 14:39:36 +01:00
# include <helpers/ShapeUtils.h>
[WIP] multi-device support (#80)
* fix pad javadoc and @see links. (#72)
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* [WIP] More fixes (#73)
* special tests for ConstantTadHelper/ConstantShapeHelper
Signed-off-by: raver119 <raver119@gmail.com>
* release methods for data buffers
Signed-off-by: raver119 <raver119@gmail.com>
* delete temporary buffer Java side
Signed-off-by: raver119 <raver119@gmail.com>
* delete temporary buffer Java side
Signed-off-by: raver119 <raver119@gmail.com>
* delete temporary TadPack C++/Java side (#74)
Signed-off-by: raver119 <raver119@gmail.com>
* Zoo model TF import test updates (#75)
* argLine fix, update compression_gru comment
* updated comment for xception
* undid but commented argLine change
* updated xlnet comment
* copyright headers
* - new NDArray methods like()/ulike() (#77)
- fix for depthwise_conv2d_bp + special test
Signed-off-by: raver119 <raver119@gmail.com>
* upsampling2d fix CUDA
Signed-off-by: raver119 <raver119@gmail.com>
* DL4J trace logging (#79)
* MLN/CG trace logging for debugging
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Tiny tweak
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* strided_slice_bp shape fn leak fix
Signed-off-by: raver119 <raver119@gmail.com>
* SameDiff fixes and naming (#78)
* remove SDVariable inplace methods
* import methods
* npe fix in OpVal
* removed SameDiff inplace ops from tests
* Naming updates, moved to centralized methods in SameDiff, should use op_#:# for everything
* quick fixes
* javadoc
* SDVariable eval with placeholders
* use regex match
* better matching
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* fix javadoc. (#76)
* fix javadoc.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* replace most @see with @link s.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* 4 additional tests
Signed-off-by: raver119 <raver119@gmail.com>
* launch context reorganization
Signed-off-by: raver119 <raver119@gmail.com>
* LaunchContext reorganization
Signed-off-by: raver119 <raver119@gmail.com>
* per-device LaunchContext
Signed-off-by: raver119 <raver119@gmail.com>
* Various DL4J/ND4J fixes (#81)
* #7954 Force refresh of UI when switching tabs on overview page
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8017 Concurrent modification exception (synchronize) fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8033 Don't initialize updater in middle of writing memory crash dump
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8208 Fix shape checks for ND4J int[] creator methods
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #6385 #7992 Keras import naming fixes + cleanup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8016 Upsampling3D - add NDHWC format support
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* ContextBuffers as separate entity
Signed-off-by: raver119 <raver119@gmail.com>
* Refactor NativeOps.h to export C functions
* Actually export functions from NativeOps.h
* Adapt the Java wrappers in ND4J generated with JavaCPP
* Create C wrappers for some of the C++ classes currently used by ND4J
* ContextBuffers as separate entity
Signed-off-by: raver119 <raver119@gmail.com>
* remove duplicate code in createBufferDetached. (#83)
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* Keras model import - updater lr fix (#84)
* Keras model import - updater lr fix
Signed-off-by: eraly <susan.eraly@gmail.com>
* Keras model import - updater lr fix, cleanup
Signed-off-by: eraly <susan.eraly@gmail.com>
* ContextBuffers as separate entity
Signed-off-by: raver119 <raver119@gmail.com>
* ContextBuffers as separate entity
Signed-off-by: raver119 <raver119@gmail.com>
* Fix functions of OpaqueVariablesSet
* thread-local buffers/affinity
Signed-off-by: raver119 <raver119@gmail.com>
* thread safety for LaunchContext
Signed-off-by: raver119 <raver119@gmail.com>
* more of thread safety
Signed-off-by: raver119 <raver119@gmail.com>
* one more multi threaded test
Signed-off-by: raver119 <raver119@gmail.com>
* SameDiff Convolution Config validation, better output methods (#82)
* Conv Config validation & tests
Signed-off-by: Ryan Nett <rnett@skymind.io>
* stackOutputs utility method
Signed-off-by: Ryan Nett <rnett@skymind.io>
* use constructor for validation, support negative kernel sizes (infered from weights)
Signed-off-by: Ryan Nett <rnett@skymind.io>
* better output methods
Signed-off-by: Ryan Nett <rnett@skymind.io>
* move output to be with fit and evaluate
Signed-off-by: Ryan Nett <rnett@skymind.io>
* fixes
Signed-off-by: Ryan Nett <rnett@skymind.io>
* more fixes
Signed-off-by: Ryan Nett <rnett@skymind.io>
* refactor duplicate code from pad methods. (#86)
* refactor duplicate code from pad methods.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* replace switch with if.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* Various ND4J/DL4J fixes and improvements (#87)
* Reshape and reallocate - small fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Reshape and reallocate - small fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #6488 ElementWiseVertex broadcast support
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Constructors and broadcast supported it Transforms.max/min
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8054 ElementWiseVertex now supports broadcast inputs
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8057 Nd4j.create overload dtype fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #7551 ND4J Shape validation fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* [WIP] Numpy boolean import (#91)
* numpy bool type
Signed-off-by: raver119 <raver119@gmail.com>
* numpy bool java side
Signed-off-by: raver119 <raver119@gmail.com>
* remove create method with unused parameter. (#89)
* remove create method with unused parameter.
* removed more unused methods.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* removing more unused code.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* last removal of unused code.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* remove createSparse methods. (#92)
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* Various ND4J/DL4J fixes (#90)
* Deprecate Old*Op instances
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8063 #8054 Broadcast exceptions + cleanup inplace ops
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Remove bad test condition
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #7993 Fix shape function issue in crop_and_resize op
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* DL4J SameDiff lambda layer fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8029 Fix for pnorm backprop math
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8038 Fix Op profiler NaN/Inf triggering + add tests (#93)
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* createUninitializedDetached refactoring. (#94)
* wip
* update interface, add null implementations.
* Breaking one test in a weird way.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* createUninitializedDetached refactored.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* cuda build fix for issues introduced by recent refactoring
Signed-off-by: raver119 <raver119@gmail.com>
* [WIP] More of CUDA (#95)
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* Implementation of hashcode cuda helper. Working edition.
* Fixed parallel test input arangements.
* Fixed tests for hashcode op.
* Fixed shape calculation for image:crop_and_resize op and test.
* NativeOps tests. Initial test suite.
* Added tests for indexReduce methods.
* Added test on execBroadcast with NDArray as dimensions.
* Added test on execBroadcastBool with NDArray as dimensions.
* Added tests on execPairwiseTransform and execPairwiseTransofrmBool.
* Added tests for execReduce with scalar results.
* Added reduce tests for non-empty dims array.
* Added tests for reduce3.
* Added tests for execScalar.
* Added tests for execSummaryStats.
* - provide cpu/cuda code for batch_to_space
- testing it
Signed-off-by: Yurii <yurii@skymind.io>
* - remove old test for batch_to_space (had wrong format and numbers were not checked)
Signed-off-by: Yurii <yurii@skymind.io>
* Fixed complilation errors with test.
* Added test for execTransformFloat.
* Added test for execTransformSame.
* Added test for execTransformBool.
* Added test for execTransformStrict.
* Added tests for execScalar/execScalarBool with TADs.
* Added test for flatten.
* - provide cpu/cuda code for space_to_Batch operaion
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for concat.
* comment unnecessary stuff in s_t_b
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for specialConcat.
* Added tests for memcpy/set routines.
* Fixed pullRow cuda test.
* Added pullRow test.
* Added average test.
* - correct typo in NDArray::applyPairwiseTransform(nd4j::pairwise::BoolOps op...)
Signed-off-by: Yurii <yurii@skymind.io>
* - debugging and fixing cuda tests in JavaInteropTests file
Signed-off-by: Yurii <yurii@skymind.io>
* - correct some tests
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for shuffle.
* Fixed ops declarations.
* Restored omp and added shuffle test.
* Added convertTypes test.
* Added tests for execRandom. Eliminated usage of RandomBuffer with NativeOps.
* Added sort tests.
* Added tests for execCustomOp.
* - further debuging and fixing tests terminated with crash
Signed-off-by: Yurii <yurii@skymind.io>
* Added tests for calculateOutputShapes.
* Addded Benchmarks test.
* Commented benchmark tests.
* change assertion
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for apply_sgd op. Added cpu helper for that op.
* Implement cuda helper for aplly_sgd op. Fixed tests for NativeOps.
* Added test for assign broadcastable.
* Added tests for assign_bp op.
* Added tests for axpy op.
* - assign/execScalar/execTransformAny signature change
- minor test fix
Signed-off-by: raver119 <raver119@gmail.com>
* Fixed axpy op.
* meh
Signed-off-by: raver119 <raver119@gmail.com>
* - fix tests for nativeOps::concat
Signed-off-by: Yurii <yurii@skymind.io>
* sequential transform/scalar
Signed-off-by: raver119 <raver119@gmail.com>
* allow nested parallelism
Signed-off-by: raver119 <raver119@gmail.com>
* assign_bp leak fix
Signed-off-by: raver119 <raver119@gmail.com>
* block setRNG fix
Signed-off-by: raver119 <raver119@gmail.com>
* enable parallelism by default
Signed-off-by: raver119 <raver119@gmail.com>
* enable nested parallelism by default
Signed-off-by: raver119 <raver119@gmail.com>
* Added cuda implementation for row_count helper.
* Added implementation for tnse gains op helper.
* - take into account possible situations when input arrays are empty in reduce_ cuda stuff
Signed-off-by: Yurii <yurii@skymind.io>
* Implemented tsne/edge_forces op cuda-based helper. Parallelized cpu-based helper for edge_forces.
* Added kernel for tsne/symmetrized op heleper.
* Implementation of tsne/symmetrized op cuda helper. Working edition.
* Eliminated waste printfs.
* Added test for broadcastgradientargs op.
* host-only fallback for empty reduce float
Signed-off-by: raver119 <raver119@gmail.com>
* - some tests fixes
Signed-off-by: Yurii <yurii@skymind.io>
* - correct the rest of reduce_ stuff
Signed-off-by: Yurii <yurii@skymind.io>
* - further correction of reduce_ stuff
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for Cbow op. Also added cuda implementation for cbow helpers.
* - improve code of stack operation for scalar case
Signed-off-by: Yurii <yurii@skymind.io>
* - provide cuda kernel for gatherND operation
Signed-off-by: Yurii <yurii@skymind.io>
* Implementation of cbow helpers with cuda kernels.
* minor tests tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* minor tests tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* - further correction of cuda stuff
Signed-off-by: Yurii <yurii@skymind.io>
* Implementatation of cbow op helper with cuda kernels. Working edition.
* Skip random testing for cudablas case.
* lstmBlockCell context fix
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for ELU and ELU_BP ops.
* Added tests for eq_scalar, gt_scalar, gte_scalar and lte_scalar ops.
* Added tests for neq_scalar.
* Added test for noop.
* - further work on clipbynorm_bp
Signed-off-by: Yurii <yurii@skymind.io>
* - get rid of concat op call, use instead direct concat helper call
Signed-off-by: Yurii <yurii@skymind.io>
* lstmBlockCell context fix
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for lrelu and lrelu_bp.
* Added tests for selu and selu_bp.
* Fixed lrelu derivative helpers.
* - some corrections in lstm
Signed-off-by: Yurii <yurii@skymind.io>
* operator * result shape fix
Signed-off-by: raver119 <raver119@gmail.com>
* - correct typo in lstmCell
Signed-off-by: Yurii <yurii@skymind.io>
* few tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA inverse broadcast bool fix
Signed-off-by: raver119 <raver119@gmail.com>
* disable MMAP test for CUDA
Signed-off-by: raver119 <raver119@gmail.com>
* BooleanOp syncToDevice
Signed-off-by: raver119 <raver119@gmail.com>
* meh
Signed-off-by: raver119 <raver119@gmail.com>
* additional data types for im2col/col2im
Signed-off-by: raver119 <raver119@gmail.com>
* Added test for firas_sparse op.
* one more RandomBuffer test excluded
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for flatten op.
* Added test for Floor op.
* bunch of tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* mmulDot tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* Implemented floordiv_bp op and tests.
* Fixed scalar case with cuda implementation for bds.
* - work on cuda kernel for clip_by_norm backprop op is completed
Signed-off-by: Yurii <yurii@skymind.io>
* Eliminate cbow crach.
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* Eliminated abortion with batched nlp test.
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* Fixed shared flag initializing.
* disabled bunch of cpu workspaces tests
Signed-off-by: raver119 <raver119@gmail.com>
* scalar operators fix: missing registerSpecialUse call
Signed-off-by: raver119 <raver119@gmail.com>
* Fixed logdet for cuda and tests.
* - correct clipBynorm_bp
Signed-off-by: Yurii <yurii@skymind.io>
* Fixed crop_and_resize shape datatype.
* - correct some mmul tests
Signed-off-by: Yurii <yurii@skymind.io>
* build fix
Signed-off-by: raver119 <raver119@gmail.com>
* exclude two methods for JNI
Signed-off-by: raver119 <raver119@gmail.com>
* exclude two methods for JNI
Signed-off-by: raver119 <raver119@gmail.com>
* exclude two methods for JNI (#97)
Signed-off-by: raver119 <raver119@gmail.com>
* temporary stack fix
Signed-off-by: raver119 <raver119@gmail.com>
* round robin affinity test
Signed-off-by: raver119 <raver119@gmail.com>
* get rid of legacy CudaContext methods
Signed-off-by: raver119 <raver119@gmail.com>
* get rid of legacy ContextPool classes/methods
Signed-off-by: raver119 <raver119@gmail.com>
* one legacy test removed
Signed-off-by: raver119 <raver119@gmail.com>
* few more fields rearranged
Signed-off-by: raver119 <raver119@gmail.com>
* OpaqueLaunchContext
Signed-off-by: raver119 <raver119@gmail.com>
* OpaqueLaunchContext++
Signed-off-by: raver119 <raver119@gmail.com>
* more of OpaqueLaunchContext methods
Signed-off-by: raver119 <raver119@gmail.com>
* LaunchContext -> CudaContext
Signed-off-by: raver119 <raver119@gmail.com>
* AffinityManger changes
Signed-off-by: raver119 <raver119@gmail.com>
* AffinityManger changes
Signed-off-by: raver119 <raver119@gmail.com>
* cusolver handles
Signed-off-by: raver119 <raver119@gmail.com>
* typo
Signed-off-by: raver119 <raver119@gmail.com>
* cusolver method
Signed-off-by: raver119 <raver119@gmail.com>
* cusolver handle propagated
Signed-off-by: raver119 <raver119@gmail.com>
* blas/solver handles
Signed-off-by: raver119 <raver119@gmail.com>
* one more test
Signed-off-by: raver119 <raver119@gmail.com>
* legacy concat implementations replaced with new CustomOp
Signed-off-by: raver119 <raver119@gmail.com>
* one more test
Signed-off-by: raver119 <raver119@gmail.com>
* concat now uses way more blocks
Signed-off-by: raver119 <raver119@gmail.com>
* print
Signed-off-by: raver119 <raver119@gmail.com>
* no more triple template mmul
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of kernels have dtypes reconsidered
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of kernels have dtypes reconsidered
Signed-off-by: raver119 <raver119@gmail.com>
* bitonic sort reorganized
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of cpu stuff removed from cuda scope
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of cpu stuff removed from cuda scope
Signed-off-by: raver119 <raver119@gmail.com>
* type conversions moved to generic impl
Signed-off-by: raver119 <raver119@gmail.com>
* cpu data types pass
Signed-off-by: raver119 <raver119@gmail.com>
* non_max_suppression
Signed-off-by: raver119 <raver119@gmail.com>
* sortByValue fix
Signed-off-by: raver119 <raver119@gmail.com>
* ignore all mixed datatype tests for mmul
Signed-off-by: raver119 <raver119@gmail.com>
* special handling of OpProfiler exceptions
Signed-off-by: raver119 <raver119@gmail.com>
* - one failing concat test in cpp
- Nd4j.tile now uses op internally
Signed-off-by: raver119 <raver119@gmail.com>
* get back dtype exception for legacy arrays deserialization
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-14 15:52:34 +02:00
# include <exceptions/datatype_exception.h>
2019-11-13 15:15:18 +01:00
# include <execution/Threads.h>
2019-06-06 14:21:15 +02:00
2020-03-02 10:49:41 +01:00
namespace sd {
2019-06-06 14:21:15 +02:00
//////////////////////////////////////////////////////////////////////////////
2019-11-19 14:39:36 +01:00
// MXK x KxN = MxN -> actual sequence of axes doesn't matter
2019-06-06 14:21:15 +02:00
template < typename T1 , typename T2 , typename T3 >
2019-11-19 14:39:36 +01:00
static void usualGemm ( const NDArray * vA , const NDArray * vB , NDArray * vC ,
const int aMaxis , const int aKaxis , const int bKaxis , const int bNaxis , const int cMaxis , const int cNaxis ,
const double alpha , const double beta ) {
2019-06-06 14:21:15 +02:00
2019-11-19 14:39:36 +01:00
const T1 * A = vA - > bufferAsT < T1 > ( ) ;
const T2 * B = vB - > bufferAsT < T2 > ( ) ;
T3 * C = vC - > bufferAsT < T3 > ( ) ;
2019-06-06 14:21:15 +02:00
2019-11-19 14:39:36 +01:00
const T3 alphaZ = alpha ;
const T3 betaZ = beta ;
2019-06-06 14:21:15 +02:00
2019-11-19 14:39:36 +01:00
const bool betaPersent = beta ;
2019-06-06 14:21:15 +02:00
2020-05-09 07:06:14 +02:00
const Nd4jLong * aShapeInfo = vA - > shapeInfo ( ) ;
const Nd4jLong * bShapeInfo = vB - > shapeInfo ( ) ;
const Nd4jLong * cShapeInfo = vC - > shapeInfo ( ) ;
2019-06-06 14:21:15 +02:00
2019-11-19 14:39:36 +01:00
const int aRank = vA - > rankOf ( ) ;
const int bRank = vB - > rankOf ( ) ;
const int cRank = vC - > rankOf ( ) ;
2019-06-06 14:21:15 +02:00
2019-11-19 14:39:36 +01:00
const Nd4jLong cLen = vC - > lengthOf ( ) ;
2019-06-06 14:21:15 +02:00
2019-11-19 14:39:36 +01:00
const int K = vA - > sizeAt ( aKaxis ) ;
2019-11-13 15:15:18 +01:00
2019-11-19 14:39:36 +01:00
auto func = PRAGMA_THREADS_FOR {
2019-11-13 15:15:18 +01:00
2019-11-19 14:39:36 +01:00
std : : vector < Nd4jLong > aCoords ( 2 ) , bCoords ( 2 ) , cCoords ( 2 ) ;
for ( auto i = start ; i < stop ; + + i ) {
// evaluate C coordinates
2020-03-11 14:21:59 +01:00
shape : : index2coordsCPU ( start , i , cShapeInfo , cCoords . data ( ) ) ;
2019-11-19 14:39:36 +01:00
// evaluate A coordinates
aCoords [ aMaxis ] = cCoords [ cMaxis ] ;
aCoords [ aKaxis ] = 0 ;
// evaluate B coordinates
bCoords [ bKaxis ] = 0 ;
bCoords [ bNaxis ] = cCoords [ cNaxis ] ;
auto aOffset = shape : : getOffset ( aShapeInfo , aCoords . data ( ) ) ;
auto bOffset = shape : : getOffset ( bShapeInfo , bCoords . data ( ) ) ;
T3 val = A [ aOffset ] * B [ bOffset ] ; // first iteration
2020-02-28 15:04:45 +01:00
for ( int j = 1 ; j < K ; + + j ) { // rest iterations
2019-11-19 14:39:36 +01:00
aOffset + = shape : : stride ( aShapeInfo ) [ aKaxis ] ;
bOffset + = shape : : stride ( bShapeInfo ) [ bKaxis ] ;
val = val + A [ aOffset ] * B [ bOffset ] ;
2019-06-06 14:21:15 +02:00
}
2019-11-19 14:39:36 +01:00
auto cOffset = shape : : getOffset ( cShapeInfo , cCoords . data ( ) ) ;
if ( betaPersent )
C [ cOffset ] = alphaZ * val + betaZ * C [ cOffset ] ;
else
C [ cOffset ] = alphaZ * val ;
2019-11-13 15:15:18 +01:00
}
} ;
2020-03-09 06:22:49 +01:00
samediff : : Threads : : parallel_tad ( func , 0 , cLen ) ;
2019-06-06 14:21:15 +02:00
}
2019-11-19 14:39:36 +01:00
2019-06-06 14:21:15 +02:00
//////////////////////////////////////////////////////////////////////////////
2019-11-19 14:39:36 +01:00
// MXN x N = M -> actual sequence of {M,N} axes doesn't matter
2019-06-06 14:21:15 +02:00
template < typename T1 , typename T2 , typename T3 >
2019-11-19 14:39:36 +01:00
static void usualGemv ( const NDArray * vA , const NDArray * vX , NDArray * vY , const int incx , const int incy , const int aMaxis , const double alpha , const double beta ) {
2019-06-06 14:21:15 +02:00
2019-11-19 14:39:36 +01:00
const T1 * A = vA - > bufferAsT < T1 > ( ) ;
const T2 * X = vX - > bufferAsT < T2 > ( ) ;
T3 * Y = vY - > bufferAsT < T3 > ( ) ;
const T3 alphaZ = alpha ;
const T3 betaZ = beta ;
const bool betaPersent = beta ;
2020-05-09 07:06:14 +02:00
const Nd4jLong * aShapeInfo = vA - > shapeInfo ( ) ;
const Nd4jLong * xShapeInfo = vX - > shapeInfo ( ) ;
const Nd4jLong * yShapeInfo = vY - > shapeInfo ( ) ;
2019-11-19 14:39:36 +01:00
const int N = vX - > lengthOf ( ) ;
const int M = vY - > lengthOf ( ) ;
const auto aMstride = vA - > strideAt ( aMaxis ) ;
const auto aNstride = vA - > strideAt ( aMaxis = = 0 ? 1 : 0 ) ;
2019-06-06 14:21:15 +02:00
2019-11-13 15:15:18 +01:00
auto func = PRAGMA_THREADS_FOR {
2019-11-19 14:39:36 +01:00
for ( auto i = start ; i < stop ; + + i ) {
2019-11-13 15:15:18 +01:00
2019-11-19 14:39:36 +01:00
// evaluate offsets
auto aOffset = i * aMstride ;
auto xOffset = 0 ;
T3 val = A [ aOffset ] * X [ xOffset ] ; // first iteration
2020-02-28 15:04:45 +01:00
for ( int j = 1 ; j < N ; + + j ) { // rest iterations
2019-11-19 14:39:36 +01:00
aOffset + = aNstride ;
xOffset + = incx ;
val = val + A [ aOffset ] * X [ xOffset ] ;
2019-11-13 15:15:18 +01:00
}
2019-11-19 14:39:36 +01:00
auto yOffset = i * incy ;
if ( betaPersent )
Y [ yOffset ] = alphaZ * val + betaZ * Y [ yOffset ] ;
2019-11-13 15:15:18 +01:00
else
2019-11-19 14:39:36 +01:00
Y [ yOffset ] = alphaZ * val ;
2019-06-06 14:21:15 +02:00
}
2019-11-13 15:15:18 +01:00
} ;
2020-03-09 06:22:49 +01:00
samediff : : Threads : : parallel_tad ( func , 0 , M ) ;
2019-06-06 14:21:15 +02:00
}
//////////////////////////////////////////////////////////////////////////////
// (X*Y) = Z[0]
template < typename T1 , typename T2 , typename T3 >
static void usualDot ( const Nd4jLong length , const double alpha , const void * vX , const Nd4jLong incx , const void * vY , const Nd4jLong incy , const double beta , void * vZ ) {
T1 * X = reinterpret_cast < T1 * > ( const_cast < void * > ( vX ) ) ;
T2 * Y = reinterpret_cast < T2 * > ( const_cast < void * > ( vY ) ) ;
T3 * Z = reinterpret_cast < T3 * > ( vZ ) ;
T3 alphaZ ( alpha ) , betaZ ( beta ) ;
2019-11-19 14:39:36 +01:00
const bool betaPersent = beta ;
2019-06-06 14:21:15 +02:00
T3 sum = 0 ;
2020-06-06 14:26:55 +02:00
PRAGMA_OMP_PARALLEL_FOR_ARGS ( OMP_IF ( length > Environment : : getInstance ( ) . elementwiseThreshold ( ) ) schedule ( guided ) reduction ( OMP_SUMT : sum ) )
2020-02-28 15:04:45 +01:00
for ( Nd4jLong i = 0 ; i < length ; + + i )
2019-11-13 15:15:18 +01:00
sum + = X [ i * incx ] * Y [ i * incy ] ;
2019-11-19 14:39:36 +01:00
if ( betaPersent )
* Z = alphaZ * sum + betaZ * * Z ;
else
* Z = alphaZ * sum ;
2019-06-06 14:21:15 +02:00
}
//////////////////////////////////////////////////////////////////////////////
// MXK x KxN = MxN
[WIP] multi-device support (#80)
* fix pad javadoc and @see links. (#72)
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* [WIP] More fixes (#73)
* special tests for ConstantTadHelper/ConstantShapeHelper
Signed-off-by: raver119 <raver119@gmail.com>
* release methods for data buffers
Signed-off-by: raver119 <raver119@gmail.com>
* delete temporary buffer Java side
Signed-off-by: raver119 <raver119@gmail.com>
* delete temporary buffer Java side
Signed-off-by: raver119 <raver119@gmail.com>
* delete temporary TadPack C++/Java side (#74)
Signed-off-by: raver119 <raver119@gmail.com>
* Zoo model TF import test updates (#75)
* argLine fix, update compression_gru comment
* updated comment for xception
* undid but commented argLine change
* updated xlnet comment
* copyright headers
* - new NDArray methods like()/ulike() (#77)
- fix for depthwise_conv2d_bp + special test
Signed-off-by: raver119 <raver119@gmail.com>
* upsampling2d fix CUDA
Signed-off-by: raver119 <raver119@gmail.com>
* DL4J trace logging (#79)
* MLN/CG trace logging for debugging
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Tiny tweak
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* strided_slice_bp shape fn leak fix
Signed-off-by: raver119 <raver119@gmail.com>
* SameDiff fixes and naming (#78)
* remove SDVariable inplace methods
* import methods
* npe fix in OpVal
* removed SameDiff inplace ops from tests
* Naming updates, moved to centralized methods in SameDiff, should use op_#:# for everything
* quick fixes
* javadoc
* SDVariable eval with placeholders
* use regex match
* better matching
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* fix javadoc. (#76)
* fix javadoc.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* replace most @see with @link s.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* 4 additional tests
Signed-off-by: raver119 <raver119@gmail.com>
* launch context reorganization
Signed-off-by: raver119 <raver119@gmail.com>
* LaunchContext reorganization
Signed-off-by: raver119 <raver119@gmail.com>
* per-device LaunchContext
Signed-off-by: raver119 <raver119@gmail.com>
* Various DL4J/ND4J fixes (#81)
* #7954 Force refresh of UI when switching tabs on overview page
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8017 Concurrent modification exception (synchronize) fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8033 Don't initialize updater in middle of writing memory crash dump
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8208 Fix shape checks for ND4J int[] creator methods
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #6385 #7992 Keras import naming fixes + cleanup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8016 Upsampling3D - add NDHWC format support
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* ContextBuffers as separate entity
Signed-off-by: raver119 <raver119@gmail.com>
* Refactor NativeOps.h to export C functions
* Actually export functions from NativeOps.h
* Adapt the Java wrappers in ND4J generated with JavaCPP
* Create C wrappers for some of the C++ classes currently used by ND4J
* ContextBuffers as separate entity
Signed-off-by: raver119 <raver119@gmail.com>
* remove duplicate code in createBufferDetached. (#83)
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* Keras model import - updater lr fix (#84)
* Keras model import - updater lr fix
Signed-off-by: eraly <susan.eraly@gmail.com>
* Keras model import - updater lr fix, cleanup
Signed-off-by: eraly <susan.eraly@gmail.com>
* ContextBuffers as separate entity
Signed-off-by: raver119 <raver119@gmail.com>
* ContextBuffers as separate entity
Signed-off-by: raver119 <raver119@gmail.com>
* Fix functions of OpaqueVariablesSet
* thread-local buffers/affinity
Signed-off-by: raver119 <raver119@gmail.com>
* thread safety for LaunchContext
Signed-off-by: raver119 <raver119@gmail.com>
* more of thread safety
Signed-off-by: raver119 <raver119@gmail.com>
* one more multi threaded test
Signed-off-by: raver119 <raver119@gmail.com>
* SameDiff Convolution Config validation, better output methods (#82)
* Conv Config validation & tests
Signed-off-by: Ryan Nett <rnett@skymind.io>
* stackOutputs utility method
Signed-off-by: Ryan Nett <rnett@skymind.io>
* use constructor for validation, support negative kernel sizes (infered from weights)
Signed-off-by: Ryan Nett <rnett@skymind.io>
* better output methods
Signed-off-by: Ryan Nett <rnett@skymind.io>
* move output to be with fit and evaluate
Signed-off-by: Ryan Nett <rnett@skymind.io>
* fixes
Signed-off-by: Ryan Nett <rnett@skymind.io>
* more fixes
Signed-off-by: Ryan Nett <rnett@skymind.io>
* refactor duplicate code from pad methods. (#86)
* refactor duplicate code from pad methods.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* replace switch with if.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* Various ND4J/DL4J fixes and improvements (#87)
* Reshape and reallocate - small fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Reshape and reallocate - small fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #6488 ElementWiseVertex broadcast support
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Constructors and broadcast supported it Transforms.max/min
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8054 ElementWiseVertex now supports broadcast inputs
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8057 Nd4j.create overload dtype fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #7551 ND4J Shape validation fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* [WIP] Numpy boolean import (#91)
* numpy bool type
Signed-off-by: raver119 <raver119@gmail.com>
* numpy bool java side
Signed-off-by: raver119 <raver119@gmail.com>
* remove create method with unused parameter. (#89)
* remove create method with unused parameter.
* removed more unused methods.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* removing more unused code.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* last removal of unused code.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* remove createSparse methods. (#92)
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* Various ND4J/DL4J fixes (#90)
* Deprecate Old*Op instances
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8063 #8054 Broadcast exceptions + cleanup inplace ops
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Remove bad test condition
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #7993 Fix shape function issue in crop_and_resize op
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* DL4J SameDiff lambda layer fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8029 Fix for pnorm backprop math
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8038 Fix Op profiler NaN/Inf triggering + add tests (#93)
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* createUninitializedDetached refactoring. (#94)
* wip
* update interface, add null implementations.
* Breaking one test in a weird way.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* createUninitializedDetached refactored.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* cuda build fix for issues introduced by recent refactoring
Signed-off-by: raver119 <raver119@gmail.com>
* [WIP] More of CUDA (#95)
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* Implementation of hashcode cuda helper. Working edition.
* Fixed parallel test input arangements.
* Fixed tests for hashcode op.
* Fixed shape calculation for image:crop_and_resize op and test.
* NativeOps tests. Initial test suite.
* Added tests for indexReduce methods.
* Added test on execBroadcast with NDArray as dimensions.
* Added test on execBroadcastBool with NDArray as dimensions.
* Added tests on execPairwiseTransform and execPairwiseTransofrmBool.
* Added tests for execReduce with scalar results.
* Added reduce tests for non-empty dims array.
* Added tests for reduce3.
* Added tests for execScalar.
* Added tests for execSummaryStats.
* - provide cpu/cuda code for batch_to_space
- testing it
Signed-off-by: Yurii <yurii@skymind.io>
* - remove old test for batch_to_space (had wrong format and numbers were not checked)
Signed-off-by: Yurii <yurii@skymind.io>
* Fixed complilation errors with test.
* Added test for execTransformFloat.
* Added test for execTransformSame.
* Added test for execTransformBool.
* Added test for execTransformStrict.
* Added tests for execScalar/execScalarBool with TADs.
* Added test for flatten.
* - provide cpu/cuda code for space_to_Batch operaion
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for concat.
* comment unnecessary stuff in s_t_b
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for specialConcat.
* Added tests for memcpy/set routines.
* Fixed pullRow cuda test.
* Added pullRow test.
* Added average test.
* - correct typo in NDArray::applyPairwiseTransform(nd4j::pairwise::BoolOps op...)
Signed-off-by: Yurii <yurii@skymind.io>
* - debugging and fixing cuda tests in JavaInteropTests file
Signed-off-by: Yurii <yurii@skymind.io>
* - correct some tests
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for shuffle.
* Fixed ops declarations.
* Restored omp and added shuffle test.
* Added convertTypes test.
* Added tests for execRandom. Eliminated usage of RandomBuffer with NativeOps.
* Added sort tests.
* Added tests for execCustomOp.
* - further debuging and fixing tests terminated with crash
Signed-off-by: Yurii <yurii@skymind.io>
* Added tests for calculateOutputShapes.
* Addded Benchmarks test.
* Commented benchmark tests.
* change assertion
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for apply_sgd op. Added cpu helper for that op.
* Implement cuda helper for aplly_sgd op. Fixed tests for NativeOps.
* Added test for assign broadcastable.
* Added tests for assign_bp op.
* Added tests for axpy op.
* - assign/execScalar/execTransformAny signature change
- minor test fix
Signed-off-by: raver119 <raver119@gmail.com>
* Fixed axpy op.
* meh
Signed-off-by: raver119 <raver119@gmail.com>
* - fix tests for nativeOps::concat
Signed-off-by: Yurii <yurii@skymind.io>
* sequential transform/scalar
Signed-off-by: raver119 <raver119@gmail.com>
* allow nested parallelism
Signed-off-by: raver119 <raver119@gmail.com>
* assign_bp leak fix
Signed-off-by: raver119 <raver119@gmail.com>
* block setRNG fix
Signed-off-by: raver119 <raver119@gmail.com>
* enable parallelism by default
Signed-off-by: raver119 <raver119@gmail.com>
* enable nested parallelism by default
Signed-off-by: raver119 <raver119@gmail.com>
* Added cuda implementation for row_count helper.
* Added implementation for tnse gains op helper.
* - take into account possible situations when input arrays are empty in reduce_ cuda stuff
Signed-off-by: Yurii <yurii@skymind.io>
* Implemented tsne/edge_forces op cuda-based helper. Parallelized cpu-based helper for edge_forces.
* Added kernel for tsne/symmetrized op heleper.
* Implementation of tsne/symmetrized op cuda helper. Working edition.
* Eliminated waste printfs.
* Added test for broadcastgradientargs op.
* host-only fallback for empty reduce float
Signed-off-by: raver119 <raver119@gmail.com>
* - some tests fixes
Signed-off-by: Yurii <yurii@skymind.io>
* - correct the rest of reduce_ stuff
Signed-off-by: Yurii <yurii@skymind.io>
* - further correction of reduce_ stuff
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for Cbow op. Also added cuda implementation for cbow helpers.
* - improve code of stack operation for scalar case
Signed-off-by: Yurii <yurii@skymind.io>
* - provide cuda kernel for gatherND operation
Signed-off-by: Yurii <yurii@skymind.io>
* Implementation of cbow helpers with cuda kernels.
* minor tests tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* minor tests tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* - further correction of cuda stuff
Signed-off-by: Yurii <yurii@skymind.io>
* Implementatation of cbow op helper with cuda kernels. Working edition.
* Skip random testing for cudablas case.
* lstmBlockCell context fix
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for ELU and ELU_BP ops.
* Added tests for eq_scalar, gt_scalar, gte_scalar and lte_scalar ops.
* Added tests for neq_scalar.
* Added test for noop.
* - further work on clipbynorm_bp
Signed-off-by: Yurii <yurii@skymind.io>
* - get rid of concat op call, use instead direct concat helper call
Signed-off-by: Yurii <yurii@skymind.io>
* lstmBlockCell context fix
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for lrelu and lrelu_bp.
* Added tests for selu and selu_bp.
* Fixed lrelu derivative helpers.
* - some corrections in lstm
Signed-off-by: Yurii <yurii@skymind.io>
* operator * result shape fix
Signed-off-by: raver119 <raver119@gmail.com>
* - correct typo in lstmCell
Signed-off-by: Yurii <yurii@skymind.io>
* few tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA inverse broadcast bool fix
Signed-off-by: raver119 <raver119@gmail.com>
* disable MMAP test for CUDA
Signed-off-by: raver119 <raver119@gmail.com>
* BooleanOp syncToDevice
Signed-off-by: raver119 <raver119@gmail.com>
* meh
Signed-off-by: raver119 <raver119@gmail.com>
* additional data types for im2col/col2im
Signed-off-by: raver119 <raver119@gmail.com>
* Added test for firas_sparse op.
* one more RandomBuffer test excluded
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for flatten op.
* Added test for Floor op.
* bunch of tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* mmulDot tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* Implemented floordiv_bp op and tests.
* Fixed scalar case with cuda implementation for bds.
* - work on cuda kernel for clip_by_norm backprop op is completed
Signed-off-by: Yurii <yurii@skymind.io>
* Eliminate cbow crach.
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* Eliminated abortion with batched nlp test.
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* Fixed shared flag initializing.
* disabled bunch of cpu workspaces tests
Signed-off-by: raver119 <raver119@gmail.com>
* scalar operators fix: missing registerSpecialUse call
Signed-off-by: raver119 <raver119@gmail.com>
* Fixed logdet for cuda and tests.
* - correct clipBynorm_bp
Signed-off-by: Yurii <yurii@skymind.io>
* Fixed crop_and_resize shape datatype.
* - correct some mmul tests
Signed-off-by: Yurii <yurii@skymind.io>
* build fix
Signed-off-by: raver119 <raver119@gmail.com>
* exclude two methods for JNI
Signed-off-by: raver119 <raver119@gmail.com>
* exclude two methods for JNI
Signed-off-by: raver119 <raver119@gmail.com>
* exclude two methods for JNI (#97)
Signed-off-by: raver119 <raver119@gmail.com>
* temporary stack fix
Signed-off-by: raver119 <raver119@gmail.com>
* round robin affinity test
Signed-off-by: raver119 <raver119@gmail.com>
* get rid of legacy CudaContext methods
Signed-off-by: raver119 <raver119@gmail.com>
* get rid of legacy ContextPool classes/methods
Signed-off-by: raver119 <raver119@gmail.com>
* one legacy test removed
Signed-off-by: raver119 <raver119@gmail.com>
* few more fields rearranged
Signed-off-by: raver119 <raver119@gmail.com>
* OpaqueLaunchContext
Signed-off-by: raver119 <raver119@gmail.com>
* OpaqueLaunchContext++
Signed-off-by: raver119 <raver119@gmail.com>
* more of OpaqueLaunchContext methods
Signed-off-by: raver119 <raver119@gmail.com>
* LaunchContext -> CudaContext
Signed-off-by: raver119 <raver119@gmail.com>
* AffinityManger changes
Signed-off-by: raver119 <raver119@gmail.com>
* AffinityManger changes
Signed-off-by: raver119 <raver119@gmail.com>
* cusolver handles
Signed-off-by: raver119 <raver119@gmail.com>
* typo
Signed-off-by: raver119 <raver119@gmail.com>
* cusolver method
Signed-off-by: raver119 <raver119@gmail.com>
* cusolver handle propagated
Signed-off-by: raver119 <raver119@gmail.com>
* blas/solver handles
Signed-off-by: raver119 <raver119@gmail.com>
* one more test
Signed-off-by: raver119 <raver119@gmail.com>
* legacy concat implementations replaced with new CustomOp
Signed-off-by: raver119 <raver119@gmail.com>
* one more test
Signed-off-by: raver119 <raver119@gmail.com>
* concat now uses way more blocks
Signed-off-by: raver119 <raver119@gmail.com>
* print
Signed-off-by: raver119 <raver119@gmail.com>
* no more triple template mmul
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of kernels have dtypes reconsidered
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of kernels have dtypes reconsidered
Signed-off-by: raver119 <raver119@gmail.com>
* bitonic sort reorganized
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of cpu stuff removed from cuda scope
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of cpu stuff removed from cuda scope
Signed-off-by: raver119 <raver119@gmail.com>
* type conversions moved to generic impl
Signed-off-by: raver119 <raver119@gmail.com>
* cpu data types pass
Signed-off-by: raver119 <raver119@gmail.com>
* non_max_suppression
Signed-off-by: raver119 <raver119@gmail.com>
* sortByValue fix
Signed-off-by: raver119 <raver119@gmail.com>
* ignore all mixed datatype tests for mmul
Signed-off-by: raver119 <raver119@gmail.com>
* special handling of OpProfiler exceptions
Signed-off-by: raver119 <raver119@gmail.com>
* - one failing concat test in cpp
- Nd4j.tile now uses op internally
Signed-off-by: raver119 <raver119@gmail.com>
* get back dtype exception for legacy arrays deserialization
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-14 15:52:34 +02:00
NDArray * MmulHelper : : mmulMxM ( const NDArray * A , const NDArray * B , NDArray * C , const double alpha , const double beta , const char outOrder ) {
if ( A - > dataType ( ) ! = B - > dataType ( ) )
throw datatype_exception : : build ( " mmulMxM expects all data types to be the same " , A - > dataType ( ) , B - > dataType ( ) ) ;
if ( C ! = nullptr & & A - > dataType ( ) ! = C - > dataType ( ) )
throw datatype_exception : : build ( " mmulMxM expects all data types to be the same " , A - > dataType ( ) , C - > dataType ( ) ) ;
2019-06-06 14:21:15 +02:00
if ( A - > rankOf ( ) ! = 2 )
throw std : : runtime_error ( " MmulHelper::mmulMxM: rank of A array is not equal 2 ! " ) ;
if ( B - > rankOf ( ) ! = 2 )
2019-11-19 14:39:36 +01:00
throw std : : runtime_error ( " MmulHelper::mmulMxM: rank of B array is not equal 2 ! " ) ;
2019-06-06 14:21:15 +02:00
2019-11-21 20:17:30 +01:00
const auto M = A - > sizeAt ( 0 ) ;
const auto K = A - > sizeAt ( 1 ) ;
const auto N = B - > sizeAt ( 1 ) ;
2019-11-19 14:39:36 +01:00
2019-06-06 14:21:15 +02:00
if ( C ! = nullptr & & C - > rankOf ( ) ! = 2 )
throw std : : runtime_error ( " MmulHelper::mmulMxM: rank of C array is not equal 2 ! " ) ;
2019-11-19 14:39:36 +01:00
if ( B - > sizeAt ( 0 ) ! = K )
2019-06-06 14:21:15 +02:00
throw std : : runtime_error ( " MmulHelper::mmulMxM: B array has wrong number of rows ! " ) ;
if ( C ! = nullptr & & C - > sizeAt ( 0 ) ! = M )
throw std : : runtime_error ( " MmulHelper::mmulMxM: C array has wrong number of rows ! " ) ;
if ( C ! = nullptr & & C - > sizeAt ( 1 ) ! = N )
throw std : : runtime_error ( " MmulHelper::mmulMxM: C array has wrong number of columns ! " ) ;
if ( C = = nullptr )
2019-11-19 14:39:36 +01:00
C = new NDArray ( outOrder , { M , N } , DataTypeUtils : : pickPairwiseResultType ( A - > dataType ( ) , B - > dataType ( ) ) , A - > getContext ( ) ) ;
2019-12-30 13:06:12 +01:00
if ( C - > isEmpty ( ) )
return C ;
2019-11-19 14:39:36 +01:00
const auto aType = A - > dataType ( ) ;
const auto bType = B - > dataType ( ) ;
const auto cType = C - > dataType ( ) ;
const bool AB ( aType = = bType ) , AC ( aType = = cType ) , ABC ( AB & & AC ) ;
2020-06-06 14:26:55 +02:00
const bool hasGemm = BlasHelper : : getInstance ( ) . hasGEMM ( aType ) ;
2019-06-06 14:21:15 +02:00
2019-11-19 14:39:36 +01:00
const bool typeDouble = hasGemm & & ABC & & aType = = DataType : : DOUBLE ;
const bool typeFloat = hasGemm & & ABC & & aType = = DataType : : FLOAT32 ;
2019-06-06 14:21:15 +02:00
2019-11-19 14:39:36 +01:00
if ( ! typeFloat & & ! typeDouble ) {
BUILD_SINGLE_SELECTOR_THRICE ( aType , usualGemm , ( A , B , C , 0 , 1 , 0 , 1 , 0 , 1 , alpha , beta ) , NUMERIC_TYPES ) ;
// BUILD_TRIPLE_SELECTOR(aType, bType, cType, usualGemm, (A, B, C, 0, 1, 0, 1, 0, 1, alpha, beta), LIBND4J_TYPES, FLOAT_TYPES, FLOAT_TYPES);
}
else {
2019-06-06 14:21:15 +02:00
2019-11-19 14:39:36 +01:00
std : : vector < NDArray * > toDelete ;
2019-06-06 14:21:15 +02:00
2019-11-19 14:39:36 +01:00
NDArray * pA ( const_cast < NDArray * > ( A ) ) , * pB ( const_cast < NDArray * > ( B ) ) , * pC ( const_cast < NDArray * > ( C ) ) ;
2019-06-06 14:21:15 +02:00
2019-11-19 14:39:36 +01:00
bool aMcont = M = = 1 | | A - > strideAt ( 0 ) = = 1 ;
bool aKcont = K = = 1 | | A - > strideAt ( 1 ) = = 1 ;
bool bKcont = K = = 1 | | B - > strideAt ( 0 ) = = 1 ;
bool bNcont = N = = 1 | | B - > strideAt ( 1 ) = = 1 ;
bool cMcont = M = = 1 | | C - > strideAt ( 0 ) = = 1 ;
bool cNcont = N = = 1 | | C - > strideAt ( 1 ) = = 1 ;
2019-06-06 14:21:15 +02:00
2019-11-19 14:39:36 +01:00
if ( ! aMcont & & ! aKcont ) {
2019-12-20 20:35:39 +01:00
pA = new NDArray ( A - > dup ( ' f ' ) ) ;
2019-11-19 14:39:36 +01:00
toDelete . push_back ( pA ) ;
aMcont = true ;
}
if ( ! bKcont & & ! bNcont ) {
2019-12-20 20:35:39 +01:00
pB = new NDArray ( B - > dup ( ' f ' ) ) ;
2019-11-19 14:39:36 +01:00
toDelete . push_back ( pB ) ;
bKcont = true ;
}
if ( ! cMcont & & ! cNcont ) {
2019-12-20 20:35:39 +01:00
pC = new NDArray ( C - > dup ( ' f ' ) ) ;
2019-11-19 14:39:36 +01:00
toDelete . push_back ( pC ) ;
cMcont = true ;
}
2019-06-06 14:21:15 +02:00
2019-11-19 14:39:36 +01:00
const CBLAS_ORDER blasOrder = cMcont ? CblasColMajor : CblasRowMajor ;
2019-06-06 14:21:15 +02:00
2019-11-19 14:39:36 +01:00
const bool transA = ( ! aMcont & & cMcont ) | | ( aMcont & & ! cMcont ) ;
const bool transB = ( ! bKcont & & cMcont ) | | ( bKcont & & ! cMcont ) ;
const CBLAS_TRANSPOSE transAblas = transA ? CblasTrans : CblasNoTrans ;
const CBLAS_TRANSPOSE transBblas = transB ? CblasTrans : CblasNoTrans ;
2019-06-06 14:21:15 +02:00
2019-11-19 14:39:36 +01:00
const int lda = ( aMcont & & aKcont ) ? M : ! aMcont ? pA - > strideAt ( 0 ) : pA - > strideAt ( 1 ) ;
const int ldb = ( bKcont & & bNcont ) ? K : ! bKcont ? pB - > strideAt ( 0 ) : pB - > strideAt ( 1 ) ;
const int ldc = ( cMcont & & cNcont ) ? M : ! cMcont ? pC - > strideAt ( 0 ) : pC - > strideAt ( 1 ) ;
if ( typeFloat ) {
2020-06-06 14:26:55 +02:00
BlasHelper : : getInstance ( ) . sgemm ( ) ( blasOrder , transAblas , transBblas , M , N , K , ( float ) alpha , pA - > bufferAsT < float > ( ) , lda , pB - > bufferAsT < float > ( ) , ldb , ( float ) beta , pC - > bufferAsT < float > ( ) , ldc ) ;
2019-11-19 14:39:36 +01:00
}
else if ( typeDouble ) {
2020-06-06 14:26:55 +02:00
BlasHelper : : getInstance ( ) . dgemm ( ) ( blasOrder , transAblas , transBblas , M , N , K , ( double ) alpha , pA - > bufferAsT < double > ( ) , lda , pB - > bufferAsT < double > ( ) , ldb , ( double ) beta , pC - > bufferAsT < double > ( ) , ldc ) ;
2019-11-19 14:39:36 +01:00
}
if ( pC ! = C ) {
C - > assign ( pC ) ;
delete pC ;
}
if ( pA ! = A )
delete pA ;
if ( pB ! = B )
delete pB ;
2019-06-06 14:21:15 +02:00
}
return C ;
}
////////////////////////////////////////////////////////////////////////////
// MXN x N = M
2020-03-02 10:49:41 +01:00
NDArray * MmulHelper : : mmulMxV ( const NDArray * A , const NDArray * X , sd : : NDArray * Y , const double alpha , const double beta , const char outOrder ) {
2019-11-19 14:39:36 +01:00
[WIP] multi-device support (#80)
* fix pad javadoc and @see links. (#72)
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* [WIP] More fixes (#73)
* special tests for ConstantTadHelper/ConstantShapeHelper
Signed-off-by: raver119 <raver119@gmail.com>
* release methods for data buffers
Signed-off-by: raver119 <raver119@gmail.com>
* delete temporary buffer Java side
Signed-off-by: raver119 <raver119@gmail.com>
* delete temporary buffer Java side
Signed-off-by: raver119 <raver119@gmail.com>
* delete temporary TadPack C++/Java side (#74)
Signed-off-by: raver119 <raver119@gmail.com>
* Zoo model TF import test updates (#75)
* argLine fix, update compression_gru comment
* updated comment for xception
* undid but commented argLine change
* updated xlnet comment
* copyright headers
* - new NDArray methods like()/ulike() (#77)
- fix for depthwise_conv2d_bp + special test
Signed-off-by: raver119 <raver119@gmail.com>
* upsampling2d fix CUDA
Signed-off-by: raver119 <raver119@gmail.com>
* DL4J trace logging (#79)
* MLN/CG trace logging for debugging
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Tiny tweak
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* strided_slice_bp shape fn leak fix
Signed-off-by: raver119 <raver119@gmail.com>
* SameDiff fixes and naming (#78)
* remove SDVariable inplace methods
* import methods
* npe fix in OpVal
* removed SameDiff inplace ops from tests
* Naming updates, moved to centralized methods in SameDiff, should use op_#:# for everything
* quick fixes
* javadoc
* SDVariable eval with placeholders
* use regex match
* better matching
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* fix javadoc. (#76)
* fix javadoc.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* replace most @see with @link s.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* 4 additional tests
Signed-off-by: raver119 <raver119@gmail.com>
* launch context reorganization
Signed-off-by: raver119 <raver119@gmail.com>
* LaunchContext reorganization
Signed-off-by: raver119 <raver119@gmail.com>
* per-device LaunchContext
Signed-off-by: raver119 <raver119@gmail.com>
* Various DL4J/ND4J fixes (#81)
* #7954 Force refresh of UI when switching tabs on overview page
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8017 Concurrent modification exception (synchronize) fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8033 Don't initialize updater in middle of writing memory crash dump
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8208 Fix shape checks for ND4J int[] creator methods
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #6385 #7992 Keras import naming fixes + cleanup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8016 Upsampling3D - add NDHWC format support
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* ContextBuffers as separate entity
Signed-off-by: raver119 <raver119@gmail.com>
* Refactor NativeOps.h to export C functions
* Actually export functions from NativeOps.h
* Adapt the Java wrappers in ND4J generated with JavaCPP
* Create C wrappers for some of the C++ classes currently used by ND4J
* ContextBuffers as separate entity
Signed-off-by: raver119 <raver119@gmail.com>
* remove duplicate code in createBufferDetached. (#83)
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* Keras model import - updater lr fix (#84)
* Keras model import - updater lr fix
Signed-off-by: eraly <susan.eraly@gmail.com>
* Keras model import - updater lr fix, cleanup
Signed-off-by: eraly <susan.eraly@gmail.com>
* ContextBuffers as separate entity
Signed-off-by: raver119 <raver119@gmail.com>
* ContextBuffers as separate entity
Signed-off-by: raver119 <raver119@gmail.com>
* Fix functions of OpaqueVariablesSet
* thread-local buffers/affinity
Signed-off-by: raver119 <raver119@gmail.com>
* thread safety for LaunchContext
Signed-off-by: raver119 <raver119@gmail.com>
* more of thread safety
Signed-off-by: raver119 <raver119@gmail.com>
* one more multi threaded test
Signed-off-by: raver119 <raver119@gmail.com>
* SameDiff Convolution Config validation, better output methods (#82)
* Conv Config validation & tests
Signed-off-by: Ryan Nett <rnett@skymind.io>
* stackOutputs utility method
Signed-off-by: Ryan Nett <rnett@skymind.io>
* use constructor for validation, support negative kernel sizes (infered from weights)
Signed-off-by: Ryan Nett <rnett@skymind.io>
* better output methods
Signed-off-by: Ryan Nett <rnett@skymind.io>
* move output to be with fit and evaluate
Signed-off-by: Ryan Nett <rnett@skymind.io>
* fixes
Signed-off-by: Ryan Nett <rnett@skymind.io>
* more fixes
Signed-off-by: Ryan Nett <rnett@skymind.io>
* refactor duplicate code from pad methods. (#86)
* refactor duplicate code from pad methods.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* replace switch with if.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* Various ND4J/DL4J fixes and improvements (#87)
* Reshape and reallocate - small fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Reshape and reallocate - small fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #6488 ElementWiseVertex broadcast support
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Constructors and broadcast supported it Transforms.max/min
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8054 ElementWiseVertex now supports broadcast inputs
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8057 Nd4j.create overload dtype fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #7551 ND4J Shape validation fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* [WIP] Numpy boolean import (#91)
* numpy bool type
Signed-off-by: raver119 <raver119@gmail.com>
* numpy bool java side
Signed-off-by: raver119 <raver119@gmail.com>
* remove create method with unused parameter. (#89)
* remove create method with unused parameter.
* removed more unused methods.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* removing more unused code.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* last removal of unused code.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* remove createSparse methods. (#92)
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* Various ND4J/DL4J fixes (#90)
* Deprecate Old*Op instances
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8063 #8054 Broadcast exceptions + cleanup inplace ops
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Remove bad test condition
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #7993 Fix shape function issue in crop_and_resize op
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* DL4J SameDiff lambda layer fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8029 Fix for pnorm backprop math
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8038 Fix Op profiler NaN/Inf triggering + add tests (#93)
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* createUninitializedDetached refactoring. (#94)
* wip
* update interface, add null implementations.
* Breaking one test in a weird way.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* createUninitializedDetached refactored.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* cuda build fix for issues introduced by recent refactoring
Signed-off-by: raver119 <raver119@gmail.com>
* [WIP] More of CUDA (#95)
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* Implementation of hashcode cuda helper. Working edition.
* Fixed parallel test input arangements.
* Fixed tests for hashcode op.
* Fixed shape calculation for image:crop_and_resize op and test.
* NativeOps tests. Initial test suite.
* Added tests for indexReduce methods.
* Added test on execBroadcast with NDArray as dimensions.
* Added test on execBroadcastBool with NDArray as dimensions.
* Added tests on execPairwiseTransform and execPairwiseTransofrmBool.
* Added tests for execReduce with scalar results.
* Added reduce tests for non-empty dims array.
* Added tests for reduce3.
* Added tests for execScalar.
* Added tests for execSummaryStats.
* - provide cpu/cuda code for batch_to_space
- testing it
Signed-off-by: Yurii <yurii@skymind.io>
* - remove old test for batch_to_space (had wrong format and numbers were not checked)
Signed-off-by: Yurii <yurii@skymind.io>
* Fixed complilation errors with test.
* Added test for execTransformFloat.
* Added test for execTransformSame.
* Added test for execTransformBool.
* Added test for execTransformStrict.
* Added tests for execScalar/execScalarBool with TADs.
* Added test for flatten.
* - provide cpu/cuda code for space_to_Batch operaion
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for concat.
* comment unnecessary stuff in s_t_b
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for specialConcat.
* Added tests for memcpy/set routines.
* Fixed pullRow cuda test.
* Added pullRow test.
* Added average test.
* - correct typo in NDArray::applyPairwiseTransform(nd4j::pairwise::BoolOps op...)
Signed-off-by: Yurii <yurii@skymind.io>
* - debugging and fixing cuda tests in JavaInteropTests file
Signed-off-by: Yurii <yurii@skymind.io>
* - correct some tests
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for shuffle.
* Fixed ops declarations.
* Restored omp and added shuffle test.
* Added convertTypes test.
* Added tests for execRandom. Eliminated usage of RandomBuffer with NativeOps.
* Added sort tests.
* Added tests for execCustomOp.
* - further debuging and fixing tests terminated with crash
Signed-off-by: Yurii <yurii@skymind.io>
* Added tests for calculateOutputShapes.
* Addded Benchmarks test.
* Commented benchmark tests.
* change assertion
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for apply_sgd op. Added cpu helper for that op.
* Implement cuda helper for aplly_sgd op. Fixed tests for NativeOps.
* Added test for assign broadcastable.
* Added tests for assign_bp op.
* Added tests for axpy op.
* - assign/execScalar/execTransformAny signature change
- minor test fix
Signed-off-by: raver119 <raver119@gmail.com>
* Fixed axpy op.
* meh
Signed-off-by: raver119 <raver119@gmail.com>
* - fix tests for nativeOps::concat
Signed-off-by: Yurii <yurii@skymind.io>
* sequential transform/scalar
Signed-off-by: raver119 <raver119@gmail.com>
* allow nested parallelism
Signed-off-by: raver119 <raver119@gmail.com>
* assign_bp leak fix
Signed-off-by: raver119 <raver119@gmail.com>
* block setRNG fix
Signed-off-by: raver119 <raver119@gmail.com>
* enable parallelism by default
Signed-off-by: raver119 <raver119@gmail.com>
* enable nested parallelism by default
Signed-off-by: raver119 <raver119@gmail.com>
* Added cuda implementation for row_count helper.
* Added implementation for tnse gains op helper.
* - take into account possible situations when input arrays are empty in reduce_ cuda stuff
Signed-off-by: Yurii <yurii@skymind.io>
* Implemented tsne/edge_forces op cuda-based helper. Parallelized cpu-based helper for edge_forces.
* Added kernel for tsne/symmetrized op heleper.
* Implementation of tsne/symmetrized op cuda helper. Working edition.
* Eliminated waste printfs.
* Added test for broadcastgradientargs op.
* host-only fallback for empty reduce float
Signed-off-by: raver119 <raver119@gmail.com>
* - some tests fixes
Signed-off-by: Yurii <yurii@skymind.io>
* - correct the rest of reduce_ stuff
Signed-off-by: Yurii <yurii@skymind.io>
* - further correction of reduce_ stuff
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for Cbow op. Also added cuda implementation for cbow helpers.
* - improve code of stack operation for scalar case
Signed-off-by: Yurii <yurii@skymind.io>
* - provide cuda kernel for gatherND operation
Signed-off-by: Yurii <yurii@skymind.io>
* Implementation of cbow helpers with cuda kernels.
* minor tests tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* minor tests tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* - further correction of cuda stuff
Signed-off-by: Yurii <yurii@skymind.io>
* Implementatation of cbow op helper with cuda kernels. Working edition.
* Skip random testing for cudablas case.
* lstmBlockCell context fix
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for ELU and ELU_BP ops.
* Added tests for eq_scalar, gt_scalar, gte_scalar and lte_scalar ops.
* Added tests for neq_scalar.
* Added test for noop.
* - further work on clipbynorm_bp
Signed-off-by: Yurii <yurii@skymind.io>
* - get rid of concat op call, use instead direct concat helper call
Signed-off-by: Yurii <yurii@skymind.io>
* lstmBlockCell context fix
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for lrelu and lrelu_bp.
* Added tests for selu and selu_bp.
* Fixed lrelu derivative helpers.
* - some corrections in lstm
Signed-off-by: Yurii <yurii@skymind.io>
* operator * result shape fix
Signed-off-by: raver119 <raver119@gmail.com>
* - correct typo in lstmCell
Signed-off-by: Yurii <yurii@skymind.io>
* few tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA inverse broadcast bool fix
Signed-off-by: raver119 <raver119@gmail.com>
* disable MMAP test for CUDA
Signed-off-by: raver119 <raver119@gmail.com>
* BooleanOp syncToDevice
Signed-off-by: raver119 <raver119@gmail.com>
* meh
Signed-off-by: raver119 <raver119@gmail.com>
* additional data types for im2col/col2im
Signed-off-by: raver119 <raver119@gmail.com>
* Added test for firas_sparse op.
* one more RandomBuffer test excluded
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for flatten op.
* Added test for Floor op.
* bunch of tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* mmulDot tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* Implemented floordiv_bp op and tests.
* Fixed scalar case with cuda implementation for bds.
* - work on cuda kernel for clip_by_norm backprop op is completed
Signed-off-by: Yurii <yurii@skymind.io>
* Eliminate cbow crach.
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* Eliminated abortion with batched nlp test.
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* Fixed shared flag initializing.
* disabled bunch of cpu workspaces tests
Signed-off-by: raver119 <raver119@gmail.com>
* scalar operators fix: missing registerSpecialUse call
Signed-off-by: raver119 <raver119@gmail.com>
* Fixed logdet for cuda and tests.
* - correct clipBynorm_bp
Signed-off-by: Yurii <yurii@skymind.io>
* Fixed crop_and_resize shape datatype.
* - correct some mmul tests
Signed-off-by: Yurii <yurii@skymind.io>
* build fix
Signed-off-by: raver119 <raver119@gmail.com>
* exclude two methods for JNI
Signed-off-by: raver119 <raver119@gmail.com>
* exclude two methods for JNI
Signed-off-by: raver119 <raver119@gmail.com>
* exclude two methods for JNI (#97)
Signed-off-by: raver119 <raver119@gmail.com>
* temporary stack fix
Signed-off-by: raver119 <raver119@gmail.com>
* round robin affinity test
Signed-off-by: raver119 <raver119@gmail.com>
* get rid of legacy CudaContext methods
Signed-off-by: raver119 <raver119@gmail.com>
* get rid of legacy ContextPool classes/methods
Signed-off-by: raver119 <raver119@gmail.com>
* one legacy test removed
Signed-off-by: raver119 <raver119@gmail.com>
* few more fields rearranged
Signed-off-by: raver119 <raver119@gmail.com>
* OpaqueLaunchContext
Signed-off-by: raver119 <raver119@gmail.com>
* OpaqueLaunchContext++
Signed-off-by: raver119 <raver119@gmail.com>
* more of OpaqueLaunchContext methods
Signed-off-by: raver119 <raver119@gmail.com>
* LaunchContext -> CudaContext
Signed-off-by: raver119 <raver119@gmail.com>
* AffinityManger changes
Signed-off-by: raver119 <raver119@gmail.com>
* AffinityManger changes
Signed-off-by: raver119 <raver119@gmail.com>
* cusolver handles
Signed-off-by: raver119 <raver119@gmail.com>
* typo
Signed-off-by: raver119 <raver119@gmail.com>
* cusolver method
Signed-off-by: raver119 <raver119@gmail.com>
* cusolver handle propagated
Signed-off-by: raver119 <raver119@gmail.com>
* blas/solver handles
Signed-off-by: raver119 <raver119@gmail.com>
* one more test
Signed-off-by: raver119 <raver119@gmail.com>
* legacy concat implementations replaced with new CustomOp
Signed-off-by: raver119 <raver119@gmail.com>
* one more test
Signed-off-by: raver119 <raver119@gmail.com>
* concat now uses way more blocks
Signed-off-by: raver119 <raver119@gmail.com>
* print
Signed-off-by: raver119 <raver119@gmail.com>
* no more triple template mmul
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of kernels have dtypes reconsidered
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of kernels have dtypes reconsidered
Signed-off-by: raver119 <raver119@gmail.com>
* bitonic sort reorganized
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of cpu stuff removed from cuda scope
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of cpu stuff removed from cuda scope
Signed-off-by: raver119 <raver119@gmail.com>
* type conversions moved to generic impl
Signed-off-by: raver119 <raver119@gmail.com>
* cpu data types pass
Signed-off-by: raver119 <raver119@gmail.com>
* non_max_suppression
Signed-off-by: raver119 <raver119@gmail.com>
* sortByValue fix
Signed-off-by: raver119 <raver119@gmail.com>
* ignore all mixed datatype tests for mmul
Signed-off-by: raver119 <raver119@gmail.com>
* special handling of OpProfiler exceptions
Signed-off-by: raver119 <raver119@gmail.com>
* - one failing concat test in cpp
- Nd4j.tile now uses op internally
Signed-off-by: raver119 <raver119@gmail.com>
* get back dtype exception for legacy arrays deserialization
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-14 15:52:34 +02:00
if ( X - > dataType ( ) ! = A - > dataType ( ) )
throw datatype_exception : : build ( " mmulMxV expects all data types to be the same " , A - > dataType ( ) , X - > dataType ( ) ) ;
if ( Y ! = nullptr & & X - > dataType ( ) ! = Y - > dataType ( ) )
throw datatype_exception : : build ( " mmulMxV expects all data types to be the same " , A - > dataType ( ) , Y - > dataType ( ) ) ;
2019-06-06 14:21:15 +02:00
int xLenDim , yLenDim ( 0 ) ;
if ( A - > rankOf ( ) ! = 2 )
throw std : : runtime_error ( " MmulHelper::mmulMxV: rank of A array is not equal 2 ! " ) ;
2020-05-09 07:06:14 +02:00
if ( ! shape : : isCommonVector ( X - > shapeInfo ( ) , xLenDim ) )
2019-11-19 14:39:36 +01:00
throw std : : runtime_error ( " MmulHelper::mmulMxV: X array must be vector ! " ) ;
2019-06-06 14:21:15 +02:00
2019-11-19 14:39:36 +01:00
const auto M = A - > sizeAt ( 0 ) ;
2019-06-06 14:21:15 +02:00
const auto N = A - > sizeAt ( 1 ) ;
2019-11-19 14:39:36 +01:00
2020-05-09 07:06:14 +02:00
if ( Y ! = nullptr & & ! shape : : isCommonVector ( Y - > shapeInfo ( ) , yLenDim ) )
2019-06-06 14:21:15 +02:00
throw std : : runtime_error ( " MmulHelper::mmulMxV: Y array must be vector ! " ) ;
if ( X - > lengthOf ( ) ! = N )
throw std : : runtime_error ( " MmulHelper::mmulMxV: X vector has wrong length ! " ) ;
if ( Y ! = nullptr & & Y - > lengthOf ( ) ! = M )
2019-11-19 14:39:36 +01:00
throw std : : runtime_error ( " MmulHelper::mmulMxV: Y array has wrong length ! " ) ;
2019-06-06 14:21:15 +02:00
2019-11-19 14:39:36 +01:00
if ( Y = = nullptr )
2019-06-06 14:21:15 +02:00
Y = new NDArray ( outOrder , { M } , DataTypeUtils : : pickPairwiseResultType ( A - > dataType ( ) , X - > dataType ( ) ) , A - > getContext ( ) ) ;
2019-11-19 14:39:36 +01:00
2019-12-30 13:06:12 +01:00
if ( Y - > isEmpty ( ) )
return Y ;
2019-06-06 14:21:15 +02:00
const int incx = X - > stridesOf ( ) [ xLenDim ] ;
const int incy = Y - > stridesOf ( ) [ yLenDim ] ;
2019-11-19 14:39:36 +01:00
const auto aType = A - > dataType ( ) ;
2019-06-06 14:21:15 +02:00
const auto xType = X - > dataType ( ) ;
const auto yType = Y - > dataType ( ) ;
const bool AX ( aType = = xType ) , AY ( aType = = yType ) , AXY ( AX & & AY ) ;
2020-06-06 14:26:55 +02:00
const bool hasGemv = BlasHelper : : getInstance ( ) . hasGEMV ( aType ) ;
2019-11-19 14:39:36 +01:00
const bool typeDouble = hasGemv & & AXY & & aType = = DataType : : DOUBLE ;
const bool typeFloat = hasGemv & & AXY & & aType = = DataType : : FLOAT32 ;
if ( ! typeDouble & & ! typeFloat ) {
BUILD_SINGLE_SELECTOR_THRICE ( aType , usualGemv , ( A , X , Y , incx , incy , 0 , alpha , beta ) , NUMERIC_TYPES ) ;
// BUILD_TRIPLE_SELECTOR(aType, xType, yType, usualGemv, (A, X, Y, incx, incy, 0, alpha, beta), LIBND4J_TYPES, FLOAT_TYPES, FLOAT_TYPES);
2019-06-06 14:21:15 +02:00
}
else {
2019-11-19 14:39:36 +01:00
NDArray * pA ( const_cast < NDArray * > ( A ) ) ;
bool aMcont = M = = 1 | | A - > strideAt ( 0 ) = = 1 ;
bool aNcont = N = = 1 | | A - > strideAt ( 1 ) = = 1 ;
if ( ! aMcont & & ! aNcont ) {
2019-12-20 20:35:39 +01:00
pA = new NDArray ( A - > dup ( ' f ' ) ) ;
2019-11-19 14:39:36 +01:00
aMcont = true ;
}
const CBLAS_ORDER blasOrder = aMcont ? CblasColMajor : CblasRowMajor ;
const int lda = ( aMcont & & aNcont ) ? M : ! aMcont ? pA - > strideAt ( 0 ) : pA - > strideAt ( 1 ) ;
// choose appropriate cuda gemm api depending on data types
if ( typeDouble ) {
2020-06-06 14:26:55 +02:00
BlasHelper : : getInstance ( ) . dgemv ( ) ( blasOrder , CblasNoTrans , M , N , alpha , ( double * ) pA - > buffer ( ) , lda , ( double * ) X - > buffer ( ) , incx , beta , ( double * ) Y - > buffer ( ) , incy ) ;
2019-11-19 14:39:36 +01:00
}
else if ( typeFloat ) {
2020-06-06 14:26:55 +02:00
BlasHelper : : getInstance ( ) . sgemv ( ) ( blasOrder , CblasNoTrans , M , N , ( float ) alpha , ( float * ) pA - > buffer ( ) , lda , ( float * ) X - > buffer ( ) , incx , ( float ) beta , ( float * ) Y - > buffer ( ) , incy ) ;
2019-11-19 14:39:36 +01:00
}
if ( pA ! = A )
delete pA ;
2019-06-06 14:21:15 +02:00
}
return Y ;
}
////////////////////////////////////////////////////////////////////////////
// (X * Y) = Z[0]
2020-03-02 10:49:41 +01:00
NDArray * MmulHelper : : dot ( const NDArray * X , const NDArray * Y , sd : : NDArray * Z , const double alpha , const double beta ) {
[WIP] multi-device support (#80)
* fix pad javadoc and @see links. (#72)
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* [WIP] More fixes (#73)
* special tests for ConstantTadHelper/ConstantShapeHelper
Signed-off-by: raver119 <raver119@gmail.com>
* release methods for data buffers
Signed-off-by: raver119 <raver119@gmail.com>
* delete temporary buffer Java side
Signed-off-by: raver119 <raver119@gmail.com>
* delete temporary buffer Java side
Signed-off-by: raver119 <raver119@gmail.com>
* delete temporary TadPack C++/Java side (#74)
Signed-off-by: raver119 <raver119@gmail.com>
* Zoo model TF import test updates (#75)
* argLine fix, update compression_gru comment
* updated comment for xception
* undid but commented argLine change
* updated xlnet comment
* copyright headers
* - new NDArray methods like()/ulike() (#77)
- fix for depthwise_conv2d_bp + special test
Signed-off-by: raver119 <raver119@gmail.com>
* upsampling2d fix CUDA
Signed-off-by: raver119 <raver119@gmail.com>
* DL4J trace logging (#79)
* MLN/CG trace logging for debugging
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Tiny tweak
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* strided_slice_bp shape fn leak fix
Signed-off-by: raver119 <raver119@gmail.com>
* SameDiff fixes and naming (#78)
* remove SDVariable inplace methods
* import methods
* npe fix in OpVal
* removed SameDiff inplace ops from tests
* Naming updates, moved to centralized methods in SameDiff, should use op_#:# for everything
* quick fixes
* javadoc
* SDVariable eval with placeholders
* use regex match
* better matching
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* fix javadoc. (#76)
* fix javadoc.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* replace most @see with @link s.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* 4 additional tests
Signed-off-by: raver119 <raver119@gmail.com>
* launch context reorganization
Signed-off-by: raver119 <raver119@gmail.com>
* LaunchContext reorganization
Signed-off-by: raver119 <raver119@gmail.com>
* per-device LaunchContext
Signed-off-by: raver119 <raver119@gmail.com>
* Various DL4J/ND4J fixes (#81)
* #7954 Force refresh of UI when switching tabs on overview page
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8017 Concurrent modification exception (synchronize) fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8033 Don't initialize updater in middle of writing memory crash dump
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8208 Fix shape checks for ND4J int[] creator methods
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #6385 #7992 Keras import naming fixes + cleanup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8016 Upsampling3D - add NDHWC format support
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* ContextBuffers as separate entity
Signed-off-by: raver119 <raver119@gmail.com>
* Refactor NativeOps.h to export C functions
* Actually export functions from NativeOps.h
* Adapt the Java wrappers in ND4J generated with JavaCPP
* Create C wrappers for some of the C++ classes currently used by ND4J
* ContextBuffers as separate entity
Signed-off-by: raver119 <raver119@gmail.com>
* remove duplicate code in createBufferDetached. (#83)
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* Keras model import - updater lr fix (#84)
* Keras model import - updater lr fix
Signed-off-by: eraly <susan.eraly@gmail.com>
* Keras model import - updater lr fix, cleanup
Signed-off-by: eraly <susan.eraly@gmail.com>
* ContextBuffers as separate entity
Signed-off-by: raver119 <raver119@gmail.com>
* ContextBuffers as separate entity
Signed-off-by: raver119 <raver119@gmail.com>
* Fix functions of OpaqueVariablesSet
* thread-local buffers/affinity
Signed-off-by: raver119 <raver119@gmail.com>
* thread safety for LaunchContext
Signed-off-by: raver119 <raver119@gmail.com>
* more of thread safety
Signed-off-by: raver119 <raver119@gmail.com>
* one more multi threaded test
Signed-off-by: raver119 <raver119@gmail.com>
* SameDiff Convolution Config validation, better output methods (#82)
* Conv Config validation & tests
Signed-off-by: Ryan Nett <rnett@skymind.io>
* stackOutputs utility method
Signed-off-by: Ryan Nett <rnett@skymind.io>
* use constructor for validation, support negative kernel sizes (infered from weights)
Signed-off-by: Ryan Nett <rnett@skymind.io>
* better output methods
Signed-off-by: Ryan Nett <rnett@skymind.io>
* move output to be with fit and evaluate
Signed-off-by: Ryan Nett <rnett@skymind.io>
* fixes
Signed-off-by: Ryan Nett <rnett@skymind.io>
* more fixes
Signed-off-by: Ryan Nett <rnett@skymind.io>
* refactor duplicate code from pad methods. (#86)
* refactor duplicate code from pad methods.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* replace switch with if.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* Various ND4J/DL4J fixes and improvements (#87)
* Reshape and reallocate - small fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Reshape and reallocate - small fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #6488 ElementWiseVertex broadcast support
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Constructors and broadcast supported it Transforms.max/min
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8054 ElementWiseVertex now supports broadcast inputs
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8057 Nd4j.create overload dtype fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #7551 ND4J Shape validation fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* [WIP] Numpy boolean import (#91)
* numpy bool type
Signed-off-by: raver119 <raver119@gmail.com>
* numpy bool java side
Signed-off-by: raver119 <raver119@gmail.com>
* remove create method with unused parameter. (#89)
* remove create method with unused parameter.
* removed more unused methods.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* removing more unused code.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* last removal of unused code.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* remove createSparse methods. (#92)
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* Various ND4J/DL4J fixes (#90)
* Deprecate Old*Op instances
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8063 #8054 Broadcast exceptions + cleanup inplace ops
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Remove bad test condition
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #7993 Fix shape function issue in crop_and_resize op
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* DL4J SameDiff lambda layer fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8029 Fix for pnorm backprop math
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8038 Fix Op profiler NaN/Inf triggering + add tests (#93)
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* createUninitializedDetached refactoring. (#94)
* wip
* update interface, add null implementations.
* Breaking one test in a weird way.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* createUninitializedDetached refactored.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* cuda build fix for issues introduced by recent refactoring
Signed-off-by: raver119 <raver119@gmail.com>
* [WIP] More of CUDA (#95)
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* Implementation of hashcode cuda helper. Working edition.
* Fixed parallel test input arangements.
* Fixed tests for hashcode op.
* Fixed shape calculation for image:crop_and_resize op and test.
* NativeOps tests. Initial test suite.
* Added tests for indexReduce methods.
* Added test on execBroadcast with NDArray as dimensions.
* Added test on execBroadcastBool with NDArray as dimensions.
* Added tests on execPairwiseTransform and execPairwiseTransofrmBool.
* Added tests for execReduce with scalar results.
* Added reduce tests for non-empty dims array.
* Added tests for reduce3.
* Added tests for execScalar.
* Added tests for execSummaryStats.
* - provide cpu/cuda code for batch_to_space
- testing it
Signed-off-by: Yurii <yurii@skymind.io>
* - remove old test for batch_to_space (had wrong format and numbers were not checked)
Signed-off-by: Yurii <yurii@skymind.io>
* Fixed complilation errors with test.
* Added test for execTransformFloat.
* Added test for execTransformSame.
* Added test for execTransformBool.
* Added test for execTransformStrict.
* Added tests for execScalar/execScalarBool with TADs.
* Added test for flatten.
* - provide cpu/cuda code for space_to_Batch operaion
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for concat.
* comment unnecessary stuff in s_t_b
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for specialConcat.
* Added tests for memcpy/set routines.
* Fixed pullRow cuda test.
* Added pullRow test.
* Added average test.
* - correct typo in NDArray::applyPairwiseTransform(nd4j::pairwise::BoolOps op...)
Signed-off-by: Yurii <yurii@skymind.io>
* - debugging and fixing cuda tests in JavaInteropTests file
Signed-off-by: Yurii <yurii@skymind.io>
* - correct some tests
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for shuffle.
* Fixed ops declarations.
* Restored omp and added shuffle test.
* Added convertTypes test.
* Added tests for execRandom. Eliminated usage of RandomBuffer with NativeOps.
* Added sort tests.
* Added tests for execCustomOp.
* - further debuging and fixing tests terminated with crash
Signed-off-by: Yurii <yurii@skymind.io>
* Added tests for calculateOutputShapes.
* Addded Benchmarks test.
* Commented benchmark tests.
* change assertion
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for apply_sgd op. Added cpu helper for that op.
* Implement cuda helper for aplly_sgd op. Fixed tests for NativeOps.
* Added test for assign broadcastable.
* Added tests for assign_bp op.
* Added tests for axpy op.
* - assign/execScalar/execTransformAny signature change
- minor test fix
Signed-off-by: raver119 <raver119@gmail.com>
* Fixed axpy op.
* meh
Signed-off-by: raver119 <raver119@gmail.com>
* - fix tests for nativeOps::concat
Signed-off-by: Yurii <yurii@skymind.io>
* sequential transform/scalar
Signed-off-by: raver119 <raver119@gmail.com>
* allow nested parallelism
Signed-off-by: raver119 <raver119@gmail.com>
* assign_bp leak fix
Signed-off-by: raver119 <raver119@gmail.com>
* block setRNG fix
Signed-off-by: raver119 <raver119@gmail.com>
* enable parallelism by default
Signed-off-by: raver119 <raver119@gmail.com>
* enable nested parallelism by default
Signed-off-by: raver119 <raver119@gmail.com>
* Added cuda implementation for row_count helper.
* Added implementation for tnse gains op helper.
* - take into account possible situations when input arrays are empty in reduce_ cuda stuff
Signed-off-by: Yurii <yurii@skymind.io>
* Implemented tsne/edge_forces op cuda-based helper. Parallelized cpu-based helper for edge_forces.
* Added kernel for tsne/symmetrized op heleper.
* Implementation of tsne/symmetrized op cuda helper. Working edition.
* Eliminated waste printfs.
* Added test for broadcastgradientargs op.
* host-only fallback for empty reduce float
Signed-off-by: raver119 <raver119@gmail.com>
* - some tests fixes
Signed-off-by: Yurii <yurii@skymind.io>
* - correct the rest of reduce_ stuff
Signed-off-by: Yurii <yurii@skymind.io>
* - further correction of reduce_ stuff
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for Cbow op. Also added cuda implementation for cbow helpers.
* - improve code of stack operation for scalar case
Signed-off-by: Yurii <yurii@skymind.io>
* - provide cuda kernel for gatherND operation
Signed-off-by: Yurii <yurii@skymind.io>
* Implementation of cbow helpers with cuda kernels.
* minor tests tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* minor tests tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* - further correction of cuda stuff
Signed-off-by: Yurii <yurii@skymind.io>
* Implementatation of cbow op helper with cuda kernels. Working edition.
* Skip random testing for cudablas case.
* lstmBlockCell context fix
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for ELU and ELU_BP ops.
* Added tests for eq_scalar, gt_scalar, gte_scalar and lte_scalar ops.
* Added tests for neq_scalar.
* Added test for noop.
* - further work on clipbynorm_bp
Signed-off-by: Yurii <yurii@skymind.io>
* - get rid of concat op call, use instead direct concat helper call
Signed-off-by: Yurii <yurii@skymind.io>
* lstmBlockCell context fix
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for lrelu and lrelu_bp.
* Added tests for selu and selu_bp.
* Fixed lrelu derivative helpers.
* - some corrections in lstm
Signed-off-by: Yurii <yurii@skymind.io>
* operator * result shape fix
Signed-off-by: raver119 <raver119@gmail.com>
* - correct typo in lstmCell
Signed-off-by: Yurii <yurii@skymind.io>
* few tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA inverse broadcast bool fix
Signed-off-by: raver119 <raver119@gmail.com>
* disable MMAP test for CUDA
Signed-off-by: raver119 <raver119@gmail.com>
* BooleanOp syncToDevice
Signed-off-by: raver119 <raver119@gmail.com>
* meh
Signed-off-by: raver119 <raver119@gmail.com>
* additional data types for im2col/col2im
Signed-off-by: raver119 <raver119@gmail.com>
* Added test for firas_sparse op.
* one more RandomBuffer test excluded
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for flatten op.
* Added test for Floor op.
* bunch of tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* mmulDot tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* Implemented floordiv_bp op and tests.
* Fixed scalar case with cuda implementation for bds.
* - work on cuda kernel for clip_by_norm backprop op is completed
Signed-off-by: Yurii <yurii@skymind.io>
* Eliminate cbow crach.
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* Eliminated abortion with batched nlp test.
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* Fixed shared flag initializing.
* disabled bunch of cpu workspaces tests
Signed-off-by: raver119 <raver119@gmail.com>
* scalar operators fix: missing registerSpecialUse call
Signed-off-by: raver119 <raver119@gmail.com>
* Fixed logdet for cuda and tests.
* - correct clipBynorm_bp
Signed-off-by: Yurii <yurii@skymind.io>
* Fixed crop_and_resize shape datatype.
* - correct some mmul tests
Signed-off-by: Yurii <yurii@skymind.io>
* build fix
Signed-off-by: raver119 <raver119@gmail.com>
* exclude two methods for JNI
Signed-off-by: raver119 <raver119@gmail.com>
* exclude two methods for JNI
Signed-off-by: raver119 <raver119@gmail.com>
* exclude two methods for JNI (#97)
Signed-off-by: raver119 <raver119@gmail.com>
* temporary stack fix
Signed-off-by: raver119 <raver119@gmail.com>
* round robin affinity test
Signed-off-by: raver119 <raver119@gmail.com>
* get rid of legacy CudaContext methods
Signed-off-by: raver119 <raver119@gmail.com>
* get rid of legacy ContextPool classes/methods
Signed-off-by: raver119 <raver119@gmail.com>
* one legacy test removed
Signed-off-by: raver119 <raver119@gmail.com>
* few more fields rearranged
Signed-off-by: raver119 <raver119@gmail.com>
* OpaqueLaunchContext
Signed-off-by: raver119 <raver119@gmail.com>
* OpaqueLaunchContext++
Signed-off-by: raver119 <raver119@gmail.com>
* more of OpaqueLaunchContext methods
Signed-off-by: raver119 <raver119@gmail.com>
* LaunchContext -> CudaContext
Signed-off-by: raver119 <raver119@gmail.com>
* AffinityManger changes
Signed-off-by: raver119 <raver119@gmail.com>
* AffinityManger changes
Signed-off-by: raver119 <raver119@gmail.com>
* cusolver handles
Signed-off-by: raver119 <raver119@gmail.com>
* typo
Signed-off-by: raver119 <raver119@gmail.com>
* cusolver method
Signed-off-by: raver119 <raver119@gmail.com>
* cusolver handle propagated
Signed-off-by: raver119 <raver119@gmail.com>
* blas/solver handles
Signed-off-by: raver119 <raver119@gmail.com>
* one more test
Signed-off-by: raver119 <raver119@gmail.com>
* legacy concat implementations replaced with new CustomOp
Signed-off-by: raver119 <raver119@gmail.com>
* one more test
Signed-off-by: raver119 <raver119@gmail.com>
* concat now uses way more blocks
Signed-off-by: raver119 <raver119@gmail.com>
* print
Signed-off-by: raver119 <raver119@gmail.com>
* no more triple template mmul
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of kernels have dtypes reconsidered
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of kernels have dtypes reconsidered
Signed-off-by: raver119 <raver119@gmail.com>
* bitonic sort reorganized
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of cpu stuff removed from cuda scope
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of cpu stuff removed from cuda scope
Signed-off-by: raver119 <raver119@gmail.com>
* type conversions moved to generic impl
Signed-off-by: raver119 <raver119@gmail.com>
* cpu data types pass
Signed-off-by: raver119 <raver119@gmail.com>
* non_max_suppression
Signed-off-by: raver119 <raver119@gmail.com>
* sortByValue fix
Signed-off-by: raver119 <raver119@gmail.com>
* ignore all mixed datatype tests for mmul
Signed-off-by: raver119 <raver119@gmail.com>
* special handling of OpProfiler exceptions
Signed-off-by: raver119 <raver119@gmail.com>
* - one failing concat test in cpp
- Nd4j.tile now uses op internally
Signed-off-by: raver119 <raver119@gmail.com>
* get back dtype exception for legacy arrays deserialization
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-14 15:52:34 +02:00
if ( X - > dataType ( ) ! = Y - > dataType ( ) )
throw datatype_exception : : build ( " Dot expects all data types to be the same " , X - > dataType ( ) , Y - > dataType ( ) ) ;
if ( Z ! = nullptr & & X - > dataType ( ) ! = Z - > dataType ( ) )
throw datatype_exception : : build ( " Dot expects all data types to be the same " , X - > dataType ( ) , Z - > dataType ( ) ) ;
2019-06-06 14:21:15 +02:00
int xLenDim ( 0 ) , yLenDim ( 0 ) ;
2020-05-09 07:06:14 +02:00
if ( ! shape : : isCommonVector ( X - > shapeInfo ( ) , xLenDim ) )
2020-04-13 12:21:51 +02:00
throw std : : runtime_error ( " MmulHelper::dot: X array must be vector ! " ) ;
2020-05-09 07:06:14 +02:00
if ( ! shape : : isCommonVector ( Y - > shapeInfo ( ) , yLenDim ) )
2020-04-13 12:21:51 +02:00
throw std : : runtime_error ( " MmulHelper::dot: Y array must be vector ! " ) ;
2019-06-06 14:21:15 +02:00
if ( Z ! = nullptr & & ! Z - > isScalar ( ) )
2020-04-13 12:21:51 +02:00
throw std : : runtime_error ( " MmulHelper::dot: Z array must be scalar ! " ) ;
2019-06-06 14:21:15 +02:00
const auto length = X - > lengthOf ( ) ;
if ( Y - > lengthOf ( ) ! = length )
2020-04-13 12:21:51 +02:00
throw std : : runtime_error ( " MmulHelper::dot: lengths of input vectors are different ! " ) ;
2019-06-06 14:21:15 +02:00
2019-11-19 14:39:36 +01:00
if ( Z = = nullptr )
2019-06-06 14:21:15 +02:00
Z = new NDArray ( DataTypeUtils : : pickPairwiseResultType ( X - > dataType ( ) , Y - > dataType ( ) ) , X - > getContext ( ) ) ;
2019-11-19 14:39:36 +01:00
2019-06-06 14:21:15 +02:00
const Nd4jLong incx = X - > stridesOf ( ) [ xLenDim ] ;
const Nd4jLong incy = Y - > stridesOf ( ) [ yLenDim ] ;
const auto xType = X - > dataType ( ) ;
const auto yType = Y - > dataType ( ) ;
const auto zType = Z - > dataType ( ) ;
2019-11-19 14:39:36 +01:00
2020-05-09 07:06:14 +02:00
BUILD_SINGLE_SELECTOR_THRICE ( xType , usualDot , ( length , alpha , X - > buffer ( ) , incx , Y - > buffer ( ) , incy , beta , Z - > buffer ( ) ) , NUMERIC_TYPES ) ;
//BUILD_TRIPLE_SELECTOR(xType, yType, zType, usualDot, (length, alpha, X->buffer(), incx, Y->buffer(), incy, beta, Z->buffer()), LIBND4J_TYPES, FLOAT_TYPES, FLOAT_TYPES);
2019-06-06 14:21:15 +02:00
return Z ;
}
2019-11-19 14:39:36 +01:00
//////////////////////////////////////////////////////////////////////////////
// [bS,M,K] x [bS,K,N] = [bS,M,N]
// [bS,M,K] x [K,N] = [bS,M,N]
// [M,K] x [bS,K,N] = [bS,M,N]
// bS could stand for several axes
template < typename T1 , typename T2 , typename T3 >
static void batchedGemm ( const NDArray * vA , const NDArray * vB , NDArray * vC ,
const int * aBatchDims , const int * bBatchDims , const int * cBatchDims ,
const int aMaxis , const int aKaxis , const int bKaxis , const int bNaxis , const int cMaxis , const int cNaxis ,
const double alpha , const double beta ) {
const T1 * A = vA - > bufferAsT < T1 > ( ) ;
const T2 * B = vB - > bufferAsT < T2 > ( ) ;
T3 * C = vC - > bufferAsT < T3 > ( ) ;
const T3 alphaZ = alpha ;
const T3 betaZ = beta ;
const bool betaPersent = beta ;
2020-05-09 07:06:14 +02:00
const Nd4jLong * aShapeInfo = vA - > shapeInfo ( ) ;
const Nd4jLong * bShapeInfo = vB - > shapeInfo ( ) ;
const Nd4jLong * cShapeInfo = vC - > shapeInfo ( ) ;
2019-11-19 14:39:36 +01:00
const int aRank = vA - > rankOf ( ) ;
const int bRank = vB - > rankOf ( ) ;
const int cRank = vC - > rankOf ( ) ;
const Nd4jLong cLen = vC - > lengthOf ( ) ;
const int K = vA - > sizeAt ( aKaxis ) ;
auto func = PRAGMA_THREADS_FOR {
2020-03-11 14:21:59 +01:00
std : : vector < int > aCoords ( aRank ) , bCoords ( bRank ) , cCoords ( cRank ) ;
2019-11-19 14:39:36 +01:00
for ( auto i = start ; i < stop ; + + i ) {
// evaluate C coordinates
2020-03-11 14:21:59 +01:00
shape : : index2coordsCPU ( start , i , cShapeInfo , cCoords . data ( ) ) ;
2019-11-19 14:39:36 +01:00
// calculate index of current batch
Nd4jLong batchInd ;
if ( cRank > 2 )
2020-07-26 14:59:27 +02:00
batchInd = shape : : coords2index ( cShapeInfo , cBatchDims , cRank - 2 , cCoords . data ( ) ) ;
2019-11-19 14:39:36 +01:00
// evaluate A coordinates
if ( aRank > 2 )
2020-07-26 14:59:27 +02:00
shape : : index2coords ( batchInd , aShapeInfo , aBatchDims , aRank - 2 , aCoords . data ( ) ) ;
2019-11-19 14:39:36 +01:00
aCoords [ aMaxis ] = cCoords [ cMaxis ] ;
aCoords [ aKaxis ] = 0 ;
// evaluate B coordinates
if ( bRank > 2 )
2020-07-26 14:59:27 +02:00
shape : : index2coords ( batchInd , bShapeInfo , bBatchDims , bRank - 2 , bCoords . data ( ) ) ;
2019-11-19 14:39:36 +01:00
bCoords [ bKaxis ] = 0 ;
bCoords [ bNaxis ] = cCoords [ cNaxis ] ;
auto aOffset = shape : : getOffset ( aShapeInfo , aCoords . data ( ) ) ;
auto bOffset = shape : : getOffset ( bShapeInfo , bCoords . data ( ) ) ;
T3 val = A [ aOffset ] * B [ bOffset ] ; // first iteration
2020-02-28 15:04:45 +01:00
for ( int j = 1 ; j < K ; + + j ) { // rest iterations
2019-11-19 14:39:36 +01:00
aOffset + = shape : : stride ( aShapeInfo ) [ aKaxis ] ;
bOffset + = shape : : stride ( bShapeInfo ) [ bKaxis ] ;
val = val + A [ aOffset ] * B [ bOffset ] ;
}
auto cOffset = shape : : getOffset ( cShapeInfo , cCoords . data ( ) ) ;
if ( betaPersent )
C [ cOffset ] = alphaZ * val + betaZ * C [ cOffset ] ;
else
C [ cOffset ] = alphaZ * val ;
}
} ;
2020-03-09 06:22:49 +01:00
samediff : : Threads : : parallel_tad ( func , 0 , cLen ) ;
2019-11-19 14:39:36 +01:00
}
//////////////////////////////////////////////////////////////////////////
// [bS,M,K] x [bS,K,N] = [bS,M,N]
// [bS,M,K] x [K,N] = [bS,M,N]
// [M,K] x [bS,K,N] = [bS,M,N]
// bS could stand for several axes
NDArray * MmulHelper : : mmulNxN ( const NDArray * A , const NDArray * B , NDArray * C , const double alpha , const double beta , const char outOrder ) {
const int aRank = A - > rankOf ( ) ;
const int bRank = B - > rankOf ( ) ;
// input ranks validation
if ( aRank > bRank & & bRank ! = 2 )
throw std : : runtime_error ( " MmulHelper::mmulNxN: rank of B array should be equal 2 ! " ) ;
else if ( bRank > aRank & & aRank ! = 2 )
throw std : : runtime_error ( " MmulHelper::mmulNxN: rank of A array should be equal 2 ! " ) ;
else if ( aRank = = bRank ) {
for ( int i = 0 ; i < aRank - 2 ; + + i )
if ( A - > sizeAt ( i ) ! = B - > sizeAt ( i ) )
throw std : : runtime_error ( " MmulHelper::mmulNxN: shapes of A and B arrays are not suitable for matrix multiplication ! " ) ;
}
if ( A - > sizeAt ( - 1 ) ! = B - > sizeAt ( - 2 ) )
throw std : : runtime_error ( " MmulHelper::mmulNxN: shapes of A and B arrays are not suitable for matrix multiplication ! " ) ;
// validation of C array
std : : vector < Nd4jLong > cExpectedShape = aRank > bRank ? A - > getShapeAsVector ( ) : B - > getShapeAsVector ( ) ;
cExpectedShape [ cExpectedShape . size ( ) - 2 ] = A - > sizeAt ( - 2 ) ;
cExpectedShape [ cExpectedShape . size ( ) - 1 ] = B - > sizeAt ( - 1 ) ;
if ( C ! = nullptr ) {
if ( ! C - > isSameShape ( cExpectedShape ) )
throw std : : runtime_error ( " MmulHelper::mmulNxN: shape of C array is not suitable for AxB matrix multiplication ! " ) ;
}
else {
C = new NDArray ( outOrder , cExpectedShape , B - > dataType ( ) ) ;
}
2019-12-30 13:06:12 +01:00
if ( C - > isEmpty ( ) )
return C ;
2019-11-19 14:39:36 +01:00
const int cRank = C - > rankOf ( ) ;
const int aMaxis ( aRank - 2 ) , aKaxis ( aRank - 1 ) , bKaxis ( bRank - 2 ) , bNaxis ( bRank - 1 ) , cMaxis ( cRank - 2 ) , cNaxis ( cRank - 1 ) ;
std : : vector < int > aBatchDims , bBatchDims , cBatchDims ;
if ( aRank > 2 )
aBatchDims = ShapeUtils : : evalDimsToExclude ( aRank , { aMaxis , aKaxis } ) ;
if ( bRank > 2 )
bBatchDims = ShapeUtils : : evalDimsToExclude ( bRank , { bKaxis , bNaxis } ) ;
if ( cRank > 2 )
cBatchDims = ShapeUtils : : evalDimsToExclude ( cRank , { cMaxis , cNaxis } ) ;
// BUILD_TRIPLE_SELECTOR(A->dataType(), B->dataType(), C->dataType(), batchedGemm, (A, B, C, aBatchDims.data(), bBatchDims.data(), cBatchDims.data(), aMaxis, aKaxis, bKaxis, bNaxis, cMaxis, cNaxis, alpha, beta), LIBND4J_TYPES, FLOAT_TYPES, FLOAT_TYPES);
BUILD_SINGLE_SELECTOR_THRICE ( A - > dataType ( ) , batchedGemm , ( A , B , C , aBatchDims . data ( ) , bBatchDims . data ( ) , cBatchDims . data ( ) , aMaxis , aKaxis , bKaxis , bNaxis , cMaxis , cNaxis , alpha , beta ) , NUMERIC_TYPES ) ;
return C ;
}
/*
//////////////////////////////////////////////////////////////////////////
NDArray * MmulHelper : : mmulNxN ( const NDArray * A , const NDArray * B , NDArray * C , const double alpha , const double beta , const char outOrder ) {
const int aRank = A - > rankOf ( ) ;
const int bRank = B - > rankOf ( ) ;
// input ranks validation
if ( aRank > bRank & & bRank ! = 2 )
throw std : : runtime_error ( " MmulHelper::mmulNxN: rank of B array should be equal 2 ! " ) ;
else if ( bRank > aRank & & aRank ! = 2 )
throw std : : runtime_error ( " MmulHelper::mmulNxN: rank of A array should be equal 2 ! " ) ;
else if ( aRank = = bRank ) {
for ( int i = 0 ; i < aRank - 2 ; + + i )
if ( A - > sizeAt ( i ) ! = B - > sizeAt ( i ) )
throw std : : runtime_error ( " MmulHelper::mmulNxN: shapes of A and B arrays are not suitable for matrix multiplication ! " ) ;
}
if ( A - > sizeAt ( - 1 ) ! = B - > sizeAt ( - 2 ) )
throw std : : runtime_error ( " MmulHelper::mmulNxN: shapes of A and B arrays are not suitable for matrix multiplication ! " ) ;
// validation of C array
std : : vector < Nd4jLong > cExpectedShape = aRank > bRank ? A - > getShapeAsVector ( ) : B - > getShapeAsVector ( ) ;
cExpectedShape [ cExpectedShape . size ( ) - 2 ] = A - > sizeAt ( - 2 ) ;
cExpectedShape [ cExpectedShape . size ( ) - 1 ] = B - > sizeAt ( - 1 ) ;
if ( C ! = nullptr ) {
if ( ! C - > isSameShape ( cExpectedShape ) )
throw std : : runtime_error ( " MmulHelper::mmulNxN: shape of C array is not suitable for AxB matrix multiplication ! " ) ;
}
else {
C = new NDArray ( outOrder , cExpectedShape , B - > dataType ( ) ) ;
}
// multiplication
const std : : vector < int > dimsToExclude = ShapeUtils : : evalDimsToExclude ( C - > rankOf ( ) , { - 2 , - 1 } ) ;
2020-05-09 07:06:14 +02:00
const Nd4jLong numOfSubArrs = ShapeUtils : : getNumOfSubArrs ( C - > shapeInfo ( ) , dimsToExclude ) ;
2019-11-19 14:39:36 +01:00
std : : vector < Nd4jLong > idxRanges ( 2 * C - > rankOf ( ) ) ;
// #pragma omp parallel for schedule(guided) firstprivate(idxRanges)
for ( Nd4jLong i = 0 ; i < numOfSubArrs ; + + i ) {
2020-05-09 07:06:14 +02:00
ShapeUtils : : evalIdxRangesForSubArr ( i , C - > shapeInfo ( ) , dimsToExclude , idxRanges . data ( ) ) ;
2019-11-19 14:39:36 +01:00
NDArray cSubArr = ( * C ) ( idxRanges ) ;
if ( aRank > bRank ) {
NDArray aSubArr = ( * A ) ( idxRanges ) ;
mmulMxM ( & aSubArr , B , & cSubArr , 1. , 0. , outOrder ) ;
}
else if ( bRank > aRank ) {
NDArray bSubArr = ( * B ) ( idxRanges ) ;
mmulMxM ( A , & bSubArr , & cSubArr , 1. , 0 , outOrder ) ;
}
else {
NDArray aSubArr = ( * A ) ( idxRanges ) ;
NDArray bSubArr = ( * B ) ( idxRanges ) ;
mmulMxM ( & aSubArr , & bSubArr , & cSubArr , 1. , 0. , outOrder ) ;
}
}
return C ;
}
//////////////////////////////////////////////////////////////////////////////
// MXK x KxN = MxN
template < typename T1 , typename T2 , typename T3 >
static void usualGemm ( const char cOrder , const bool transA , const bool transB , const int M , const int N , const int K , const double alpha , const void * vA , const int lda , const void * vB , const int ldb , const double beta , void * vC , const int ldc ) {
T1 * A = reinterpret_cast < T1 * > ( const_cast < void * > ( vA ) ) ;
T2 * B = reinterpret_cast < T2 * > ( const_cast < void * > ( vB ) ) ;
T3 * C = reinterpret_cast < T3 * > ( vC ) ;
T3 alphaZ ( alpha ) , betaZ ( beta ) ;
const bool flagC = cOrder = = ' f ' ;
const bool flagA = ( flagC & & transA ) | | ( ! flagC & & ! transA ) ;
const bool flagB = ( flagC & & transB ) | | ( ! flagC & & ! transB ) ;
2020-06-06 14:26:55 +02:00
// PRAGMA_OMP_PARALLEL_FOR_ARGS(OMP_IF(M*N > Environment::getInstance().elementwiseThreshold()) schedule(guided))
2019-11-19 14:39:36 +01:00
// for(uint row = 0; row < M; ++row) {
// T3* c = flagC ? (C + row) : (C + row * ldc);
// for(uint col = 0; col < N; ++col)
// c[flagC ? col * ldc : col] = 0;
// for(uint i = 0; i < K; ++i) {
// T3* b = flagB ? (B + i * ldb) : (B + i);
// T3* a = flagA ? (A + row * lda + i) : (A + row + i * lda);
// if(flagC) {
// for(uint col = 0; col < N; ++col) {
// if(betaZ)
// c[col * ldc] += a * b[flagB ? col : col * ldb] + betaZ * c[col * ldc];
// else
// c[col * ldc] += a * b[flagB ? col : col * ldb];
// }
// }
// else {
// for(uint col = 0; col < N; ++col) {
// if(betaZ)
// c[col] += a * b[flagB ? col : col * ldb] + betaZ * c[col];
// else
// c[col] += a * b[flagB ? col : col * ldb];
// }
// }
// }
// }
auto func = PRAGMA_THREADS_FOR_2D { ;
for ( auto row = start_x ; row < stop_x ; row + = inc_x ) {
for ( auto col = start_y ; col < stop_y ; col + = inc_y ) {
T3 * c = flagC ? ( C + row + col * ldc ) : ( C + row * ldc + col ) ;
T3 val = 0 ;
for ( uint i = 0 ; i < K ; + + i ) {
T3 a = flagA ? * ( A + row * lda + i ) : * ( A + row + i * lda ) ;
T3 b = flagB ? * ( B + col + i * ldb ) : * ( B + col * ldb + i ) ;
val + = alphaZ * a * b ;
}
if ( betaZ )
* c = val + betaZ * * c ;
else
* c = val ;
}
}
} ;
2020-03-09 06:22:49 +01:00
samediff : : Threads : : parallel_tad ( func , 0 , M , 1 , 0 , N , 1 ) ;
2019-11-19 14:39:36 +01:00
}
//////////////////////////////////////////////////////////////////////////////
// MXN x N = M
template < typename T1 , typename T2 , typename T3 >
static void usualGemv ( const char aOrder , const int M , const int N , const double alpha , const void * vA , const int lda , const void * vX , const int incx , const double beta , void * vY , const int incy ) {
T1 * A = reinterpret_cast < T1 * > ( const_cast < void * > ( vA ) ) ;
T2 * X = reinterpret_cast < T2 * > ( const_cast < void * > ( vX ) ) ;
T3 * Y = reinterpret_cast < T3 * > ( vY ) ;
T3 alphaZ ( alpha ) , betaZ ( beta ) ;
const bool flagA = aOrder = = ' f ' ;
auto func = PRAGMA_THREADS_FOR {
for ( auto row = start ; row < stop ; row + = increment ) {
T3 * y = Y + row * incy ;
T3 val = 0 ;
for ( int i = 0 ; i < N ; + + i ) {
T3 a = flagA ? * ( A + row + i * lda ) : * ( A + row * lda + i ) ;
T3 x = * ( X + i * incx ) ;
val + = alphaZ * a * x ;
}
if ( betaZ )
* y = val + betaZ * * y ;
else
* y = val ;
}
} ;
2020-03-09 06:22:49 +01:00
samediff : : Threads : : parallel_tad ( func , 0 , M ) ;
2019-11-19 14:39:36 +01:00
}
*/
[WIP] multi-device support (#80)
* fix pad javadoc and @see links. (#72)
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* [WIP] More fixes (#73)
* special tests for ConstantTadHelper/ConstantShapeHelper
Signed-off-by: raver119 <raver119@gmail.com>
* release methods for data buffers
Signed-off-by: raver119 <raver119@gmail.com>
* delete temporary buffer Java side
Signed-off-by: raver119 <raver119@gmail.com>
* delete temporary buffer Java side
Signed-off-by: raver119 <raver119@gmail.com>
* delete temporary TadPack C++/Java side (#74)
Signed-off-by: raver119 <raver119@gmail.com>
* Zoo model TF import test updates (#75)
* argLine fix, update compression_gru comment
* updated comment for xception
* undid but commented argLine change
* updated xlnet comment
* copyright headers
* - new NDArray methods like()/ulike() (#77)
- fix for depthwise_conv2d_bp + special test
Signed-off-by: raver119 <raver119@gmail.com>
* upsampling2d fix CUDA
Signed-off-by: raver119 <raver119@gmail.com>
* DL4J trace logging (#79)
* MLN/CG trace logging for debugging
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Tiny tweak
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* strided_slice_bp shape fn leak fix
Signed-off-by: raver119 <raver119@gmail.com>
* SameDiff fixes and naming (#78)
* remove SDVariable inplace methods
* import methods
* npe fix in OpVal
* removed SameDiff inplace ops from tests
* Naming updates, moved to centralized methods in SameDiff, should use op_#:# for everything
* quick fixes
* javadoc
* SDVariable eval with placeholders
* use regex match
* better matching
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* fix javadoc. (#76)
* fix javadoc.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* replace most @see with @link s.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* 4 additional tests
Signed-off-by: raver119 <raver119@gmail.com>
* launch context reorganization
Signed-off-by: raver119 <raver119@gmail.com>
* LaunchContext reorganization
Signed-off-by: raver119 <raver119@gmail.com>
* per-device LaunchContext
Signed-off-by: raver119 <raver119@gmail.com>
* Various DL4J/ND4J fixes (#81)
* #7954 Force refresh of UI when switching tabs on overview page
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8017 Concurrent modification exception (synchronize) fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8033 Don't initialize updater in middle of writing memory crash dump
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8208 Fix shape checks for ND4J int[] creator methods
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #6385 #7992 Keras import naming fixes + cleanup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8016 Upsampling3D - add NDHWC format support
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* ContextBuffers as separate entity
Signed-off-by: raver119 <raver119@gmail.com>
* Refactor NativeOps.h to export C functions
* Actually export functions from NativeOps.h
* Adapt the Java wrappers in ND4J generated with JavaCPP
* Create C wrappers for some of the C++ classes currently used by ND4J
* ContextBuffers as separate entity
Signed-off-by: raver119 <raver119@gmail.com>
* remove duplicate code in createBufferDetached. (#83)
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* Keras model import - updater lr fix (#84)
* Keras model import - updater lr fix
Signed-off-by: eraly <susan.eraly@gmail.com>
* Keras model import - updater lr fix, cleanup
Signed-off-by: eraly <susan.eraly@gmail.com>
* ContextBuffers as separate entity
Signed-off-by: raver119 <raver119@gmail.com>
* ContextBuffers as separate entity
Signed-off-by: raver119 <raver119@gmail.com>
* Fix functions of OpaqueVariablesSet
* thread-local buffers/affinity
Signed-off-by: raver119 <raver119@gmail.com>
* thread safety for LaunchContext
Signed-off-by: raver119 <raver119@gmail.com>
* more of thread safety
Signed-off-by: raver119 <raver119@gmail.com>
* one more multi threaded test
Signed-off-by: raver119 <raver119@gmail.com>
* SameDiff Convolution Config validation, better output methods (#82)
* Conv Config validation & tests
Signed-off-by: Ryan Nett <rnett@skymind.io>
* stackOutputs utility method
Signed-off-by: Ryan Nett <rnett@skymind.io>
* use constructor for validation, support negative kernel sizes (infered from weights)
Signed-off-by: Ryan Nett <rnett@skymind.io>
* better output methods
Signed-off-by: Ryan Nett <rnett@skymind.io>
* move output to be with fit and evaluate
Signed-off-by: Ryan Nett <rnett@skymind.io>
* fixes
Signed-off-by: Ryan Nett <rnett@skymind.io>
* more fixes
Signed-off-by: Ryan Nett <rnett@skymind.io>
* refactor duplicate code from pad methods. (#86)
* refactor duplicate code from pad methods.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* replace switch with if.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* Various ND4J/DL4J fixes and improvements (#87)
* Reshape and reallocate - small fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Reshape and reallocate - small fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #6488 ElementWiseVertex broadcast support
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Constructors and broadcast supported it Transforms.max/min
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8054 ElementWiseVertex now supports broadcast inputs
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8057 Nd4j.create overload dtype fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #7551 ND4J Shape validation fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* [WIP] Numpy boolean import (#91)
* numpy bool type
Signed-off-by: raver119 <raver119@gmail.com>
* numpy bool java side
Signed-off-by: raver119 <raver119@gmail.com>
* remove create method with unused parameter. (#89)
* remove create method with unused parameter.
* removed more unused methods.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* removing more unused code.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* last removal of unused code.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* remove createSparse methods. (#92)
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* Various ND4J/DL4J fixes (#90)
* Deprecate Old*Op instances
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8063 #8054 Broadcast exceptions + cleanup inplace ops
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Remove bad test condition
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #7993 Fix shape function issue in crop_and_resize op
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* DL4J SameDiff lambda layer fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8029 Fix for pnorm backprop math
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* #8038 Fix Op profiler NaN/Inf triggering + add tests (#93)
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* createUninitializedDetached refactoring. (#94)
* wip
* update interface, add null implementations.
* Breaking one test in a weird way.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* createUninitializedDetached refactored.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* cuda build fix for issues introduced by recent refactoring
Signed-off-by: raver119 <raver119@gmail.com>
* [WIP] More of CUDA (#95)
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* Implementation of hashcode cuda helper. Working edition.
* Fixed parallel test input arangements.
* Fixed tests for hashcode op.
* Fixed shape calculation for image:crop_and_resize op and test.
* NativeOps tests. Initial test suite.
* Added tests for indexReduce methods.
* Added test on execBroadcast with NDArray as dimensions.
* Added test on execBroadcastBool with NDArray as dimensions.
* Added tests on execPairwiseTransform and execPairwiseTransofrmBool.
* Added tests for execReduce with scalar results.
* Added reduce tests for non-empty dims array.
* Added tests for reduce3.
* Added tests for execScalar.
* Added tests for execSummaryStats.
* - provide cpu/cuda code for batch_to_space
- testing it
Signed-off-by: Yurii <yurii@skymind.io>
* - remove old test for batch_to_space (had wrong format and numbers were not checked)
Signed-off-by: Yurii <yurii@skymind.io>
* Fixed complilation errors with test.
* Added test for execTransformFloat.
* Added test for execTransformSame.
* Added test for execTransformBool.
* Added test for execTransformStrict.
* Added tests for execScalar/execScalarBool with TADs.
* Added test for flatten.
* - provide cpu/cuda code for space_to_Batch operaion
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for concat.
* comment unnecessary stuff in s_t_b
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for specialConcat.
* Added tests for memcpy/set routines.
* Fixed pullRow cuda test.
* Added pullRow test.
* Added average test.
* - correct typo in NDArray::applyPairwiseTransform(nd4j::pairwise::BoolOps op...)
Signed-off-by: Yurii <yurii@skymind.io>
* - debugging and fixing cuda tests in JavaInteropTests file
Signed-off-by: Yurii <yurii@skymind.io>
* - correct some tests
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for shuffle.
* Fixed ops declarations.
* Restored omp and added shuffle test.
* Added convertTypes test.
* Added tests for execRandom. Eliminated usage of RandomBuffer with NativeOps.
* Added sort tests.
* Added tests for execCustomOp.
* - further debuging and fixing tests terminated with crash
Signed-off-by: Yurii <yurii@skymind.io>
* Added tests for calculateOutputShapes.
* Addded Benchmarks test.
* Commented benchmark tests.
* change assertion
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for apply_sgd op. Added cpu helper for that op.
* Implement cuda helper for aplly_sgd op. Fixed tests for NativeOps.
* Added test for assign broadcastable.
* Added tests for assign_bp op.
* Added tests for axpy op.
* - assign/execScalar/execTransformAny signature change
- minor test fix
Signed-off-by: raver119 <raver119@gmail.com>
* Fixed axpy op.
* meh
Signed-off-by: raver119 <raver119@gmail.com>
* - fix tests for nativeOps::concat
Signed-off-by: Yurii <yurii@skymind.io>
* sequential transform/scalar
Signed-off-by: raver119 <raver119@gmail.com>
* allow nested parallelism
Signed-off-by: raver119 <raver119@gmail.com>
* assign_bp leak fix
Signed-off-by: raver119 <raver119@gmail.com>
* block setRNG fix
Signed-off-by: raver119 <raver119@gmail.com>
* enable parallelism by default
Signed-off-by: raver119 <raver119@gmail.com>
* enable nested parallelism by default
Signed-off-by: raver119 <raver119@gmail.com>
* Added cuda implementation for row_count helper.
* Added implementation for tnse gains op helper.
* - take into account possible situations when input arrays are empty in reduce_ cuda stuff
Signed-off-by: Yurii <yurii@skymind.io>
* Implemented tsne/edge_forces op cuda-based helper. Parallelized cpu-based helper for edge_forces.
* Added kernel for tsne/symmetrized op heleper.
* Implementation of tsne/symmetrized op cuda helper. Working edition.
* Eliminated waste printfs.
* Added test for broadcastgradientargs op.
* host-only fallback for empty reduce float
Signed-off-by: raver119 <raver119@gmail.com>
* - some tests fixes
Signed-off-by: Yurii <yurii@skymind.io>
* - correct the rest of reduce_ stuff
Signed-off-by: Yurii <yurii@skymind.io>
* - further correction of reduce_ stuff
Signed-off-by: Yurii <yurii@skymind.io>
* Added test for Cbow op. Also added cuda implementation for cbow helpers.
* - improve code of stack operation for scalar case
Signed-off-by: Yurii <yurii@skymind.io>
* - provide cuda kernel for gatherND operation
Signed-off-by: Yurii <yurii@skymind.io>
* Implementation of cbow helpers with cuda kernels.
* minor tests tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* minor tests tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* - further correction of cuda stuff
Signed-off-by: Yurii <yurii@skymind.io>
* Implementatation of cbow op helper with cuda kernels. Working edition.
* Skip random testing for cudablas case.
* lstmBlockCell context fix
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for ELU and ELU_BP ops.
* Added tests for eq_scalar, gt_scalar, gte_scalar and lte_scalar ops.
* Added tests for neq_scalar.
* Added test for noop.
* - further work on clipbynorm_bp
Signed-off-by: Yurii <yurii@skymind.io>
* - get rid of concat op call, use instead direct concat helper call
Signed-off-by: Yurii <yurii@skymind.io>
* lstmBlockCell context fix
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for lrelu and lrelu_bp.
* Added tests for selu and selu_bp.
* Fixed lrelu derivative helpers.
* - some corrections in lstm
Signed-off-by: Yurii <yurii@skymind.io>
* operator * result shape fix
Signed-off-by: raver119 <raver119@gmail.com>
* - correct typo in lstmCell
Signed-off-by: Yurii <yurii@skymind.io>
* few tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA inverse broadcast bool fix
Signed-off-by: raver119 <raver119@gmail.com>
* disable MMAP test for CUDA
Signed-off-by: raver119 <raver119@gmail.com>
* BooleanOp syncToDevice
Signed-off-by: raver119 <raver119@gmail.com>
* meh
Signed-off-by: raver119 <raver119@gmail.com>
* additional data types for im2col/col2im
Signed-off-by: raver119 <raver119@gmail.com>
* Added test for firas_sparse op.
* one more RandomBuffer test excluded
Signed-off-by: raver119 <raver119@gmail.com>
* Added tests for flatten op.
* Added test for Floor op.
* bunch of tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* mmulDot tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* Implemented floordiv_bp op and tests.
* Fixed scalar case with cuda implementation for bds.
* - work on cuda kernel for clip_by_norm backprop op is completed
Signed-off-by: Yurii <yurii@skymind.io>
* Eliminate cbow crach.
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* Eliminated abortion with batched nlp test.
* more tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* Fixed shared flag initializing.
* disabled bunch of cpu workspaces tests
Signed-off-by: raver119 <raver119@gmail.com>
* scalar operators fix: missing registerSpecialUse call
Signed-off-by: raver119 <raver119@gmail.com>
* Fixed logdet for cuda and tests.
* - correct clipBynorm_bp
Signed-off-by: Yurii <yurii@skymind.io>
* Fixed crop_and_resize shape datatype.
* - correct some mmul tests
Signed-off-by: Yurii <yurii@skymind.io>
* build fix
Signed-off-by: raver119 <raver119@gmail.com>
* exclude two methods for JNI
Signed-off-by: raver119 <raver119@gmail.com>
* exclude two methods for JNI
Signed-off-by: raver119 <raver119@gmail.com>
* exclude two methods for JNI (#97)
Signed-off-by: raver119 <raver119@gmail.com>
* temporary stack fix
Signed-off-by: raver119 <raver119@gmail.com>
* round robin affinity test
Signed-off-by: raver119 <raver119@gmail.com>
* get rid of legacy CudaContext methods
Signed-off-by: raver119 <raver119@gmail.com>
* get rid of legacy ContextPool classes/methods
Signed-off-by: raver119 <raver119@gmail.com>
* one legacy test removed
Signed-off-by: raver119 <raver119@gmail.com>
* few more fields rearranged
Signed-off-by: raver119 <raver119@gmail.com>
* OpaqueLaunchContext
Signed-off-by: raver119 <raver119@gmail.com>
* OpaqueLaunchContext++
Signed-off-by: raver119 <raver119@gmail.com>
* more of OpaqueLaunchContext methods
Signed-off-by: raver119 <raver119@gmail.com>
* LaunchContext -> CudaContext
Signed-off-by: raver119 <raver119@gmail.com>
* AffinityManger changes
Signed-off-by: raver119 <raver119@gmail.com>
* AffinityManger changes
Signed-off-by: raver119 <raver119@gmail.com>
* cusolver handles
Signed-off-by: raver119 <raver119@gmail.com>
* typo
Signed-off-by: raver119 <raver119@gmail.com>
* cusolver method
Signed-off-by: raver119 <raver119@gmail.com>
* cusolver handle propagated
Signed-off-by: raver119 <raver119@gmail.com>
* blas/solver handles
Signed-off-by: raver119 <raver119@gmail.com>
* one more test
Signed-off-by: raver119 <raver119@gmail.com>
* legacy concat implementations replaced with new CustomOp
Signed-off-by: raver119 <raver119@gmail.com>
* one more test
Signed-off-by: raver119 <raver119@gmail.com>
* concat now uses way more blocks
Signed-off-by: raver119 <raver119@gmail.com>
* print
Signed-off-by: raver119 <raver119@gmail.com>
* no more triple template mmul
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of kernels have dtypes reconsidered
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of kernels have dtypes reconsidered
Signed-off-by: raver119 <raver119@gmail.com>
* bitonic sort reorganized
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of cpu stuff removed from cuda scope
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of cpu stuff removed from cuda scope
Signed-off-by: raver119 <raver119@gmail.com>
* type conversions moved to generic impl
Signed-off-by: raver119 <raver119@gmail.com>
* cpu data types pass
Signed-off-by: raver119 <raver119@gmail.com>
* non_max_suppression
Signed-off-by: raver119 <raver119@gmail.com>
* sortByValue fix
Signed-off-by: raver119 <raver119@gmail.com>
* ignore all mixed datatype tests for mmul
Signed-off-by: raver119 <raver119@gmail.com>
* special handling of OpProfiler exceptions
Signed-off-by: raver119 <raver119@gmail.com>
* - one failing concat test in cpp
- Nd4j.tile now uses op internally
Signed-off-by: raver119 <raver119@gmail.com>
* get back dtype exception for legacy arrays deserialization
Signed-off-by: raver119 <raver119@gmail.com>
2019-08-14 15:52:34 +02:00
//BUILD_TRIPLE_TEMPLATE(template void usualGemm, (const char cOrder, const bool transA, const bool transB, const int M, const int N, const int K, const double alpha, const void* A, const int lda, const void* B, const int ldb, const double beta, void* C, const int ldc), LIBND4J_TYPES, FLOAT_TYPES, FLOAT_TYPES);
//BUILD_TRIPLE_TEMPLATE(template void usualGemv, (const char aOrder, const int M, const int N, const double alpha, const void* A, const int lda, const void* B, const int incx, const double beta, void* C, const int incy), LIBND4J_TYPES, FLOAT_TYPES, FLOAT_TYPES);
//BUILD_TRIPLE_TEMPLATE(template void usualDot, (const Nd4jLong length, const double alpha, const void* vX, const Nd4jLong incx, const void* vY, const Nd4jLong incy, const double beta, void* vZ), LIBND4J_TYPES, FLOAT_TYPES, FLOAT_TYPES);
2019-06-06 14:21:15 +02:00
2019-07-18 13:13:56 +02:00
}