* Add Maven profiles for ARM builds to pom.xml files
Signed-off-by: Samuel Audet <samuel.audet@gmail.com>
* Remove mkl from dependencies when running on non intel/amd platforms
* Downgrade openblas for now
* Change back to 0.3.8
Co-authored-by: Adam Gibson <1144306+agibsonccc@users.noreply.github.com>
* - profiling of stack and unstack ops
Signed-off-by: Yurii <iuriish@yahoo.com>
* - fix bug in cpu concat op
Signed-off-by: Yurii <iuriish@yahoo.com>
* - correction of cuda stack and unstack
Signed-off-by: Yurii <iuriish@yahoo.com>
* - change shape.h method which operates with unity dimensions strides
Signed-off-by: Yurii <iuriish@yahoo.com>
* - rearrange stack tests
Signed-off-by: Yurii <iuriish@yahoo.com>
* - correct evaluation of smallest stride for moving through contiguous axis
Signed-off-by: Yurii <iuriish@yahoo.com>
* - forgot to update signature of function strideOverContigAxis in cuda concat and split ops
Signed-off-by: Yurii <iuriish@yahoo.com>
* - remove ShapeUtils::shapeAsString method applied before input arrays validations
Signed-off-by: Yurii <iuriish@yahoo.com>
* - further removing of ShapeUtils::shapeAsString
Signed-off-by: Yurii <iuriish@yahoo.com>
* - take sub-array shapeIndo/offset calculation out of NDArray class
- add possibility of contiguous memory copy in execTransformAny op if opNum == assign
Signed-off-by: Yurii <iuriish@yahoo.com>
* - correct test_empty_scatter_2 in EmptyTests.cpp
Signed-off-by: Yurii <iuriish@yahoo.com>
* - profiling of slice op
Signed-off-by: Yurii <iuriish@yahoo.com>
* - get rid of contiguous memcpy for some cases in concat and split ops
Signed-off-by: Yurii <iuriish@yahoo.com>
* - forgot to declare oid nd4j::SpecialMethods<T>::splitCpuGeneric
Signed-off-by: Yurii <iuriish@yahoo.com>
* - correct typo in calculation of threads in cuda split op
Signed-off-by: Yurii <iuriish@yahoo.com>
* - forgot to correct another set of threads variables in split cuda ops
Signed-off-by: Yurii <iuriish@yahoo.com>
* - further conflicts resolving
Signed-off-by: Yurii <iuriish@yahoo.com>
Co-authored-by: raver119 <raver119@gmail.com>
* initial set of include changes
Signed-off-by: raver119 <raver119@gmail.com>
* one more tweak
Signed-off-by: raver119 <raver119@gmail.com>
* few more rearrangements
Signed-off-by: raver119 <raver119@gmail.com>
* few more rearrangements
Signed-off-by: raver119 <raver119@gmail.com>
* few more rearrangements
Signed-off-by: raver119 <raver119@gmail.com>
* cuda includes rearrangements
Signed-off-by: raver119 <raver119@gmail.com>
* java update
Signed-off-by: raver119 <raver119@gmail.com>
* = namespace changed to sd
- few CMake variables renamed with SD_ prefix
Signed-off-by: raver119 <raver119@gmail.com>
* java update
Signed-off-by: raver119 <raver119@gmail.com>
* LoopKind minor fix
Signed-off-by: raver119 <raver119@gmail.com>
* few more changes
Signed-off-by: raver119 <raver119@gmail.com>
* few more changes
Signed-off-by: raver119 <raver119@gmail.com>
* few more changes
Signed-off-by: raver119 <raver119@gmail.com>
* sanitizer is optional now
Signed-off-by: raver119 <raver119@gmail.com>
* dev tests updated
Signed-off-by: raver119 <raver119@gmail.com>
* few more changes
Signed-off-by: raver119 <raver119@gmail.com>
* last update
Signed-off-by: raver119 <raver119@gmail.com>
* java update
Signed-off-by: raver119 <raver119@gmail.com>
* #8565 Normalizer toString/hashcode
Signed-off-by: Alex Black <blacka101@gmail.com>
* #8731 ImagePreProcessingScaler lables/segmentation fix
Signed-off-by: Alex Black <blacka101@gmail.com>
* #8691 Fix SameDiffLayer/Vertx finetuning and parameter setting support
Signed-off-by: Alex Black <blacka101@gmail.com>
* #8663 DL4J embedding layer weight init - don't depend on vocab size
Signed-off-by: Alex Black <blacka101@gmail.com>
* EmbeddingLayer test tweak
Signed-off-by: Alex Black <blacka101@gmail.com>
* libnd4j cast loop types
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j more type castination added to loops
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j sync casting types of iterated variable in loops
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j more loops reviewed for vectorization problem fix
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j fixed several typos
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j several more files reviewed to fix auto-vectorization problem in loops
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j merge master and reviewed more files to fix auto-vectorization problem in loops
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j several type casting added in broadcasting that were missed, fixed mac builds
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j double check all files and fix several more places in loops
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j fixed builds
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j revert changes for lup.cpp
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j more files reviewed for auto-vectorization problem fix
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j cast loop types
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j more type castination added to loops
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j sync casting types of iterated variable in loops
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j more loops reviewed for vectorization problem fix
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j fixed several typos
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j several more files reviewed to fix auto-vectorization problem in loops
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j merge master and reviewed more files to fix auto-vectorization problem in loops
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j several type casting added in broadcasting that were missed, fixed mac builds
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j double check all files and fix several more places in loops
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j fixed builds
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j revert changes for lup.cpp
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j moved split operation implementation to helpers before special case adding
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j minor fixes for general split operation move, merge master
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libndj4 split cpu implementation
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* - provide cuda helper for split op
Signed-off-by: Yurii <iuriish@yahoo.com>
* - minor correction
Signed-off-by: Yurii <iuriish@yahoo.com>
* - minor correction 2
Signed-off-by: Yurii <iuriish@yahoo.com>
Co-authored-by: Yurii Shyrma <iuriish@yahoo.com>
Co-authored-by: raver119 <raver119@gmail.com>
* libnd4j raw implementation of native broadcast for special cases
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j fixed bugs for special case of 4D loop broadcast, add some tests, need more testing and discussion
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j added 3D and 5D cases support and tests, need testing with different orders
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j correctd case selection for broadcast 3,4,5D loops, fixed several places for more stable behavior, clean up
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j minor corrections to avoid some risks in strides selection, added tests and rename some variables
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j optimize usage the stride selection for all loops in separate ShapeUtils method copyCertainStridesFromShapeInfo, merge master
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j remove per request several tests for 3D, 4D and 5D broadcast loops
Signed-off-by: Oleg <oleg.semeniv@gmail.com>
* libnd4j removed some loac changes that had not been sync with serve playground, turn on new loops usage
* - profiling of concat op (both cuda and cpu)
Signed-off-by: Yurii <iuriish@yahoo.com>
* better comparison for large concat
Signed-off-by: raver119 <raver119@gmail.com>
* - further improving of concat op
Signed-off-by: Yurii <iuriish@yahoo.com>
* some loggin
Signed-off-by: raver119 <raver119@gmail.com>
* - add possibility to verify presence of trailing unities in shape and set strides/ews correspondingly
- restrict second simple case in concat op to c order only
Signed-off-by: Yurii <iuriish@yahoo.com>
* - move concat op to specials_single.cpp file
Signed-off-by: Yurii <iuriish@yahoo.com>
* - get rid of second concat op declaration in transforms.cpp file
Signed-off-by: Yurii <iuriish@yahoo.com>
Co-authored-by: raver119 <raver119@gmail.com>
* static increments in loops
Signed-off-by: raver119 <raver119@gmail.com>
* specials and concat split into separate units
Signed-off-by: raver119 <raver119@gmail.com>
* - profiling gather op for aurora
Signed-off-by: Yurii <iuriish@yahoo.com>
* - include contiguous memcpy in gather op
Signed-off-by: Yurii <iuriish@yahoo.com>
* - provide matmul code based on mkl api
Signed-off-by: Yurii <iuriish@yahoo.com>
* - correct typo in mkl matmul op
Signed-off-by: Yurii <iuriish@yahoo.com>
* - take into account empty arrays in mkl matmul op
Signed-off-by: Yurii <iuriish@yahoo.com>
* - fix bug in mkl matmul and group all matmul tests in one file
Signed-off-by: Yurii <iuriish@yahoo.com>
* Test spam reduction
Signed-off-by: Alex Black <blacka101@gmail.com>
* Arbiter bad import fixes
Signed-off-by: Alex Black <blacka101@gmail.com>
* Small spark test tweak
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Arbiter test log spam reduction
Signed-off-by: Alex Black <blacka101@gmail.com>
* More test spam reduction
Signed-off-by: Alex Black <blacka101@gmail.com>
* broadcast as scalar edge case
Signed-off-by: raver119 <raver119@gmail.com>
* missing return
Signed-off-by: raver119 <raver119@gmail.com>
* few fixes
Signed-off-by: raver119 <raver119@gmail.com>
* one more fix
Signed-off-by: raver119 <raver119@gmail.com>
* no need for lambdas
Signed-off-by: raver119 <raver119@gmail.com>
* - provide contiguous strides for ouput in transpose op
Signed-off-by: Yurii <iuriish@yahoo.com>
* - provide contiguous strides for output in permute op
Signed-off-by: Yurii <iuriish@yahoo.com>
* - take into account empty shapes properly in transpose/permute op
Signed-off-by: Yurii <iuriish@yahoo.com>