* codegen for SDLoss. WIP.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* first pass of SDLoss.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* wip. Firsat cut of new op constructors. UNTESTED , NOT COMPILED YET.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* updated op signatures.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* add NDLoss tests.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* fix test.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* adds loss default params. factory.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* Regenerate NDLoss
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* adds tests for null weights.
Signed-off-by: Robert Altena <Rob@Ra-ai.com>
* Last few tweaks
Signed-off-by: Alex Black <blacka101@gmail.com>
Co-authored-by: Robert Altena <Rob@Ra-ai.com>
* Add check to ensure ALL tests extend BaseND4JTest for proper timeouts + logging
Signed-off-by: Alex Black <blacka101@gmail.com>
* Add 'must extend BaseDL4JTest' check for deeplearning4j-core
Signed-off-by: Alex Black <blacka101@gmail.com>
* Flush logging on workspace exit during tests
Signed-off-by: Alex Black <blacka101@gmail.com>
* Add Maven profiles for ARM builds to pom.xml files
Signed-off-by: Samuel Audet <samuel.audet@gmail.com>
* Remove mkl from dependencies when running on non intel/amd platforms
* Downgrade openblas for now
* Change back to 0.3.8
Co-authored-by: Adam Gibson <1144306+agibsonccc@users.noreply.github.com>
* initial set of include changes
Signed-off-by: raver119 <raver119@gmail.com>
* one more tweak
Signed-off-by: raver119 <raver119@gmail.com>
* few more rearrangements
Signed-off-by: raver119 <raver119@gmail.com>
* few more rearrangements
Signed-off-by: raver119 <raver119@gmail.com>
* few more rearrangements
Signed-off-by: raver119 <raver119@gmail.com>
* cuda includes rearrangements
Signed-off-by: raver119 <raver119@gmail.com>
* java update
Signed-off-by: raver119 <raver119@gmail.com>
* = namespace changed to sd
- few CMake variables renamed with SD_ prefix
Signed-off-by: raver119 <raver119@gmail.com>
* java update
Signed-off-by: raver119 <raver119@gmail.com>
* LoopKind minor fix
Signed-off-by: raver119 <raver119@gmail.com>
* few more changes
Signed-off-by: raver119 <raver119@gmail.com>
* few more changes
Signed-off-by: raver119 <raver119@gmail.com>
* few more changes
Signed-off-by: raver119 <raver119@gmail.com>
* sanitizer is optional now
Signed-off-by: raver119 <raver119@gmail.com>
* dev tests updated
Signed-off-by: raver119 <raver119@gmail.com>
* few more changes
Signed-off-by: raver119 <raver119@gmail.com>
* last update
Signed-off-by: raver119 <raver119@gmail.com>
* java update
Signed-off-by: raver119 <raver119@gmail.com>
* #8565 Normalizer toString/hashcode
Signed-off-by: Alex Black <blacka101@gmail.com>
* #8731 ImagePreProcessingScaler lables/segmentation fix
Signed-off-by: Alex Black <blacka101@gmail.com>
* #8691 Fix SameDiffLayer/Vertx finetuning and parameter setting support
Signed-off-by: Alex Black <blacka101@gmail.com>
* #8663 DL4J embedding layer weight init - don't depend on vocab size
Signed-off-by: Alex Black <blacka101@gmail.com>
* EmbeddingLayer test tweak
Signed-off-by: Alex Black <blacka101@gmail.com>
* - profiling of concat op (both cuda and cpu)
Signed-off-by: Yurii <iuriish@yahoo.com>
* better comparison for large concat
Signed-off-by: raver119 <raver119@gmail.com>
* - further improving of concat op
Signed-off-by: Yurii <iuriish@yahoo.com>
* some loggin
Signed-off-by: raver119 <raver119@gmail.com>
* - add possibility to verify presence of trailing unities in shape and set strides/ews correspondingly
- restrict second simple case in concat op to c order only
Signed-off-by: Yurii <iuriish@yahoo.com>
* - move concat op to specials_single.cpp file
Signed-off-by: Yurii <iuriish@yahoo.com>
* - get rid of second concat op declaration in transforms.cpp file
Signed-off-by: Yurii <iuriish@yahoo.com>
Co-authored-by: raver119 <raver119@gmail.com>
* Libnd4j: TensorMMul backprop op #8174, raw implementation
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: TensorMMul backprop op #8174 merge master and some corrections
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: TensorMMul backprop op #8174 algorithm update, need testing, sync with master
* Libnd4j: TensorMMul backprop op #8174 fixed incorrect B axes calculation
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: TensorMMul backprop op #8174 optimize axes identification and fix bug of indeces overlapping, added first test. need testing with different shapes
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: TensorMMul backprop op #8174 some fixes and improvements need more testing
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: TensorMMul backprop op #8174 fixed order of matrix multiply
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: TensorMMul backprop op #8174 fixed issue of incorrect axes definition, add tests based on TF, need additional testing for case dLdC not equal 1
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: TensorMMul backprop op #8174 fixed scalar case add test
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: TensorMMul backprop op #8174 fixed bp algorithm, axes definition, need some mode testing with different orders combination f,c; c,f f,f and add some checks for inputs
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: TensorMMul backprop op #8174 some checks and corrections added tests, exists the problem with different input orders support A-f B-c and A-f B-f
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: TensorMMul backprop op #8174 sync master
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* - correct bug in MmulHelper::tensorDot(a, b, c, axes_a, axes_b,permutForC)
Signed-off-by: Yurii <iuriish@yahoo.com>
* Libnd4j: TensorMMul backprop op #8174 code clean up and refactoring
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* - add check for linspase ordered permutations in ShapeUtils::evalShapeForTensorDot
Signed-off-by: Yurii <iuriish@yahoo.com>
* - provide additional code in shape::reshape stuff in order to reduce amount of allocation/copy operations during reshaping procedure
Signed-off-by: Yurii <iuriish@yahoo.com>
* - further work on problem of wrong shape evaluation during permute/reshape procedures
Signed-off-by: Yurii <iuriish@yahoo.com>
* - still looking for bug reason in reshape/permute stuff
Signed-off-by: Yurii <iuriish@yahoo.com>
* - correct bug in transform cuda native ops
Signed-off-by: Yurii <iuriish@yahoo.com>
* - correct bug in NDArray::assign
Signed-off-by: Yurii <iuriish@yahoo.com>
* - remove old shape::reshape stuff
Signed-off-by: Yurii <iuriish@yahoo.com>
* - add possibility to disable copy of old buffer to new buffer during reshape operation in NDArray class
Signed-off-by: Yurii <iuriish@yahoo.com>
* - correct bug in tensorDot which had to do with wrong pointers assigments
Signed-off-by: Yurii <iuriish@yahoo.com>
Co-authored-by: Oleh <oleg.semeniv@gmail.com>
* Gradients tests added
* Fix for Standard deviation serialization + test
Signed-off-by: Alex Black <blacka101@gmail.com>
* More fixes
Signed-off-by: Alex Black <blacka101@gmail.com>
* Test fixed
* Spark config driver host config for CI
Signed-off-by: Alex Black <blacka101@gmail.com>
* Op validation timeout increase
Signed-off-by: Alex Black <blacka101@gmail.com>
* Gradient check - fix for low probability test failure due to randomly all 0s mask
Signed-off-by: AlexDBlack <blacka101@gmail.com>
Co-authored-by: Alex Black <blacka101@gmail.com>
* special workaround methods for DataBuffer.write
Signed-off-by: raver119 <raver119@gmail.com>
* one test removed
Signed-off-by: raver119 <raver119@gmail.com>
* more of unsynced
Signed-off-by: raver119 <raver119@gmail.com>
* missing asLong for BaseCudaDataBuffer
Signed-off-by: raver119 <raver119@gmail.com>
* linear equations systems solve op. Initial commit.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Fixed compiling issues.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Linear equations systems solve. The next stage commit.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Added test for linear equations systems solve operation.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Added additional test and fixed lower matrix retrievance.
* Implementation for solve of the systems of linear equations."
Signed-off-by: shugeo <sgazeos@gmail.com>
* Refactored permutation generation.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Added restore for permutations batched with cuda helper for solve op.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Finished cuda implementation for solve op helpers.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Refactored cpu helpers for solve op.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Fix gtest output on Windows
* Fixed issue with permutation matrix for cuda implementation.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Fixed issue with permutation matrix for cpu implementation.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Eliminated waste comments.
Signed-off-by: shugeo <sgazeos@gmail.com>
* LinearSolve added
* Mapping added
* Javadoc added
* Refactored implementation of triangular_solve helpers and tests for solve matrix equations generally.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Added a test for solve op.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Solve test added
* Fix for TF import
Co-authored-by: Serhii Shepel <9946053+sshepel@users.noreply.github.com>
Co-authored-by: raver119 <raver119@gmail.com>
Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
* - range op now accepts dargs
- dargs now can be in signature
Signed-off-by: raver119 <raver119@gmail.com>
* range dtype java side
Signed-off-by: raver119 <raver119@gmail.com>
* linspace fix
Signed-off-by: raver119 <raver119@gmail.com>
* lin_space fix for scalar outputs
Signed-off-by: raver119 <raver119@gmail.com>
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* - one more test for OneHot with dtype
- one more signature in Nd4j
Signed-off-by: raver119 <raver119@gmail.com>
* ones_as/zeros_as now accept dtype
Signed-off-by: raver119 <raver119@gmail.com>
* one more test
Signed-off-by: raver119 <raver119@gmail.com>
* - more updates for configurable data types
- ones_as/zeros_as java side + tests
Signed-off-by: raver119 <raver119@gmail.com>
* few c++ tests fixed
Signed-off-by: raver119 <raver119@gmail.com>
* few more changes around DArgs
Signed-off-by: raver119 <raver119@gmail.com>
* Cleanup modules
* Moving subprojects to nd4j-api
* Project cleanup
* Dropped AWS sub-project
* dl4j-util moved to core
* dl4j-perf moved to core
* Tests coverage
* Revert "Moving subprojects to nd4j-api"
This reverts commit bc6eb573c6b60c407ade47172c5d204725077e6b.
* Moved nd4j-buffer and nd4j-context to nd4j-api
* Rolled back change
* Revert "Project cleanup"
This reverts commit 64ac7f369b2d968f7be437718034f093fc886ffc.
* Datavec cleaned up
* Revert "Moved nd4j-buffer and nd4j-context to nd4j-api"
This reverts commit 75f4e8da80d2551e44e1251dd6c5923289fff8e1.
# Conflicts:
# nd4j/nd4j-backends/nd4j-tests/src/test/java/org/nd4j/autodiff/opvalidation/ReductionBpOpValidation.java
* Resolve conflict
* Compilation fixed.
* nd4j-context and nd4j-buffer moved to nd4j-api
* Fixed TF mapping for mmul
* Fix for dl4j-cuda tests
Signed-off-by: Alex Black <blacka101@gmail.com>
* Move last few tests from deeplearning4j-nn to -core
Signed-off-by: Alex Black <blacka101@gmail.com>
* Remove incorrect TF import mapping for TensorMmul op
Signed-off-by: Alex Black <blacka101@gmail.com>
* Cleaned TF mapping
* Fix path for test results on windows
* Remove old dependency
Signed-off-by: Alex Black <blacka101@gmail.com>
* One more attempt to fix path for test results on windows
* fixup! One more attempt to fix path for test results on windows
* fixup! One more attempt to fix path for test results on windows
Co-authored-by: Alex Black <blacka101@gmail.com>
Co-authored-by: Serhii Shepel <9946053+sshepel@users.noreply.github.com>
Co-authored-by: raver119 <raver119@gmail.com>
* missing alloc validation in RandomGenerator for CUDA
Signed-off-by: raver119 <raver119@gmail.com>
* set error message if rng alloc failed
Signed-off-by: raver119 <raver119@gmail.com>
* check for error code during RNG creation in java
Signed-off-by: raver119 <raver119@gmail.com>
* nd4j-aeron profiles
Signed-off-by: raver119 <raver119@gmail.com>
* nd4j-aeron profiles
Signed-off-by: raver119 <raver119@gmail.com>
* skip one long test
Signed-off-by: raver119 <raver119@gmail.com>
* skip one long test
Signed-off-by: raver119 <raver119@gmail.com>
* kryo profile
Signed-off-by: raver119 <raver119@gmail.com>
* few more profiles
Signed-off-by: raver119 <raver119@gmail.com>
* few more profiles
Signed-off-by: raver119 <raver119@gmail.com>
* few more profiles
Signed-off-by: raver119 <raver119@gmail.com>
* Add maven profile + base tests methods for integration tests
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Switch from system property to environment variable; seems more reliable in intellij
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Add nd4j-common-tests module, and common base test; cleanup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Ensure all ND4J tests extend BaseND4JTest
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Test spam reduction, import fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Add test logging to nd4j-aeron
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix unintended change
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Reduce sprint test log spam
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More test spam cleanup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Significantly speed up TSNE tests
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* W2V iterator test unit/integration split
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More NLP test speedups
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Avoid debug/verbose mode leaking between tests
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* test tweak
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Arbiter extends base DL4J test
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Arbiter test speedup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* nlp-uima test speedup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More test speedups
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix ND4J base test
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Few small ND4J test speed improvements
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* DL4J tests speedup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More tweaks
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Even more test speedups
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More tweaks
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Various test fixes
Signed-off-by: Alex Black <blacka101@gmail.com>
* More test fixes
Signed-off-by: Alex Black <blacka101@gmail.com>
* Add ability to specify number of threads for C++ ops in BaseDL4JTest and BaseND4JTest
Signed-off-by: Alex Black <blacka101@gmail.com>
* nd4j-aeron test profile fix for CUDA
Signed-off-by: Alex Black <blacka101@gmail.com>
* Added qr op implementation. Initial version.
* Fixed doc for qr op.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Implementation of QR decomposition. CPU platform version.
* Added a pair of tests for qr op testing.
Signed-off-by: shugeo <sgazeos@gmail.com>
* QR implementation.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Corrected norm using.
* Properly calculated intermediate results with QR decomposition.
* Another step to implement QR algorithm by householder.
* Cpu implementatio for QR decomposition. The first working edition.
* Corrected test to QR decomposition.
* Added tad multithreading with QR implementation.
* Finished cpu implementation for QR decomposition helpers.
* Refactored tests and improved multithreading.
* Refactored QR cpu implementation and update cuda implementation helpers.
* Cuda QR helper implementation. The first working edition.
* Eliminated waste prints.
* Restore multithreading with cuda implementation.
* Ops names corrected
* Refactored qr op helpers to optimize.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Eliminated waste manual ticking.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Refactored memory allocation to avoid waste memory usage.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Refactored matrixMinor method both for cuda and cpu platforms.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Refactored method of vmul to use raw buffers instead type conversion.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Refactored temporary array of matricies.
Signed-off-by: shugeo <sgazeos@gmail.com>
Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
Co-authored-by: raver119 <raver119@gmail.com>
* Added implementation of the triangular_solve op.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Fixed compilation issues.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Added verification of input data and helpers facilities for triangular_solve op.'
Signed-off-by: shugeo <sgazeos@gmail.com>
* Added cpu implementation for triangular_solve helpers.
* Added tests and implementation for upper triangular equations.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Added a pair of cases to tests.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Added multithreading with cpu helpers for triangular_solve op.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Added cuda implementation of triangular_solve op helpers.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Finished cuda implementation of triangular_solve helpers and tests.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Fixed copyright marks.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Corrected grammar errors with doc and error messages.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Refactored matricies processing with triangular_solve cuda helper implementation.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Added triangular_solve wrapper
* Fixed mapping
* Added processing for adjoint with cpu helpers of triangular_solve op implementation.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Added implementation for adjoint routine with cuda platform.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Added multithreading with adjoint routine for cpu platform.
Signed-off-by: shugeo <sgazeos@gmail.com>
Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
* Added implementation for resize_area op. Initial commit.
* Added implementation of resize_area op. Initial revision.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Corrected resizeArea functor call.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Implementation of resize_area. Cpu platform helpers.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Implementation for resize_area helpers. The first part revision.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Added a set of tests for resize_area op.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Cuda implementation for resize_area. Initial approach.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Adding multithreading for resize_area algorithm.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Cuda implementation of resize_area helpers. Shared memory approach.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Refactored resizeAreaKernel with cuda implementation.
* Eliminated compilation errors.
* ResizeArea helpers for cuda platform. The first working revision.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Added test for batched resize_area op testing.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Implementation of resize_are for cuda platform and tests.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Fixed multithreading with resize_area op helper.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Corrected copyright marks with sources.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Corrected copyright mark for resize_area op implementation.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Corrected copyright mark for parity ops header.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Corrected typo in strings and so on with image resize ops.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Refactored resize_area helpers and multithreading.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Added ResizeArea wrapper
* Added test with align_corners and fixed shape processing with only int args given for output size.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Added test
* TF mapping for ResizeArea
* Fixed implementation issues with resize_area op for both platforms.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Refactored image resizer struct to use flexible types for ints and floats.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Improved multithreading with resizeAreaKernel launch.
Signed-off-by: shugeo <sgazeos@gmail.com>
* Use asynchronical memory copying with cuda platform image resize allocations.
Signed-off-by: shugeo <sgazeos@gmail.com>
Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
* initial commit
Signed-off-by: raver119 <raver119@gmail.com>
* one file
Signed-off-by: raver119 <raver119@gmail.com>
* few more includes
Signed-off-by: raver119 <raver119@gmail.com>
* m?
Signed-off-by: raver119 <raver119@gmail.com>
* const
Signed-off-by: raver119 <raver119@gmail.com>
* cudnn linkage in tests
Signed-off-by: raver119 <raver119@gmail.com>
* culibos
Signed-off-by: raver119 <raver119@gmail.com>
* static reminder
Signed-off-by: raver119 <raver119@gmail.com>
* platform engine tag
Signed-off-by: raver119 <raver119@gmail.com>
* HAVE_CUDNN moved to config.h.in
Signed-off-by: raver119 <raver119@gmail.com>
* include
Signed-off-by: raver119 <raver119@gmail.com>
* include
Signed-off-by: raver119 <raver119@gmail.com>
* skip cudnn handle creation if there's not cudnn
Signed-off-by: raver119 <raver119@gmail.com>
* meh
Signed-off-by: raver119 <raver119@gmail.com>
* target device in context
Signed-off-by: raver119 <raver119@gmail.com>
* platform engines
Signed-off-by: raver119 <raver119@gmail.com>
* platform engines
Signed-off-by: raver119 <raver119@gmail.com>
* allow multiple -h args
Signed-off-by: raver119 <raver119@gmail.com>
* allow multiple -h args
Signed-off-by: raver119 <raver119@gmail.com>
* move mkldnn out of CPU block
Signed-off-by: raver119 <raver119@gmail.com>
* link to mkldnn on cuda
Signed-off-by: raver119 <raver119@gmail.com>
* less prints
Signed-off-by: raver119 <raver119@gmail.com>
* minor tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* next step
Signed-off-by: raver119 <raver119@gmail.com>
* conv2d NCHW draft
Signed-off-by: raver119 <raver119@gmail.com>
* conv2d biasAdd
Signed-off-by: raver119 <raver119@gmail.com>
* test for MKL/CUDNN combined use
Signed-off-by: raver119 <raver119@gmail.com>
* - provide additional code for conv2d ff based on cudnn api, not tested yet
Signed-off-by: Yurii <iuriish@yahoo.com>
* - further work on conv2d helper based on using cudnn api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - fixing several cuda bugs which appeared after cudnn lib had been started to use
Signed-off-by: Yurii <iuriish@yahoo.com>
* - implementation of conv2d backprop op based on cudnn api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - implementaion of conv3d and conv3d_bp ops based on cudnn api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - bugs fixing in conv3d/conv3d_bp ops (cudnn in use)
Signed-off-by: Yurii <iuriish@yahoo.com>
* - implementation of depthwiseConv2d (ff/bp) op based on cudnn api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - implementation of batchnorm ff op based on cudnn api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - disable cudnn batchnorm temporary
Signed-off-by: Yurii <iuriish@yahoo.com>
* - add minor change in cmake
Signed-off-by: Yurii <iuriish@yahoo.com>
* engine for depthwise mkldnn
Signed-off-by: raver119 <raver119@gmail.com>
* couple of includes
Signed-off-by: raver119 <raver119@gmail.com>
* - provide permutation to cudnn batchnorm ff when format is NHWC
Signed-off-by: Yurii <iuriish@yahoo.com>
* lgamma fix
Signed-off-by: raver119 <raver119@gmail.com>
* - eliminate memory leak in two tests
Signed-off-by: Yurii <iuriish@yahoo.com>
Co-authored-by: Yurii Shyrma <iuriish@yahoo.com>
* Libnd4j: Add broadcastable elementwise power derivative #7461 first step of Pow_bp operation implementation
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: Add broadcastable elementwise power derivative #7461 some corrections of calculation steps
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: Add broadcastable elementwise power derivative #7461 some bug fixes, the PowDerevative op made broadcastable, add the raw tests for op, need refactoring to use broadcast ops
* Libnd4j: Add broadcastable elementwise power derivative #7461 fixed several bugs add broadcast support and tests, need to fix scalar+array and array+scalar
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: Add broadcastable elementwise power derivative #7461 fixed bugs for scalar inputs, fixed multinomial tests, added tests
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: Add broadcastable elementwise power derivative #7461 fised bugs for different shapes support, tests updated
* Libnd4j: Add broadcastable elementwise power derivative #7461 applied all possible variants via tiled arrays, add support of broadcast for Pow and PowDerivative ops, covered by tests, before review have to be replaced tiled implementation by applyTrueBroadcast
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: Add broadcastable elementwise power derivative #7461 replaced tile by broadcast implementation, fixed issue with negative x input, corrected tests, need additional testing
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: Add broadcastable elementwise power derivative #7461 added and corrected test cases, corrected implementation need review
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: Add broadcastable elementwise power derivative #7461 code clean up
* Libnd4j: Add broadcastable elementwise power derivative #7461 code clean up, removed some tests, add tests with scalar
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: Add broadcastable elementwise power derivative #7461 code improvement and clean up, split tests
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: Add broadcastable elementwise power derivative #7461 some code clean up
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* Libnd4j: Add broadcastable elementwise power derivative replace __isnanf by internal realization
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* pow_bp wrapper
* Fixed PowBp wrapper
* Tests added
* Test fixed
* Fix return type
* Disable powBp usage
* Pow backprop changed
Co-authored-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>
* SameDiff exec: Fix for switch op when predicate is constant, and op is inside loop
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Update ignores for failing zoo models
Signed-off-by: AlexDBlack <blacka101@gmail.com>