Alex Black
3f0b4a2d4c
SameDiff execution, TF and memory management overhaul ( #10 )
...
* SameDiff execution memory management improvements, round 1
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Round 2
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Round 3
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Clear node outputs closed array references; Slight change to OpValidation internals to not rely on cached op outputs
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Next steps
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Next step
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Cleanup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More polish
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Add WeakIdentityHashmap
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Session fixes for control ops and next steps
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* First steps for training session + in-line updating
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Next steps
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix losses and history during training
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* BiasAdd and other fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Don't use SDVariable.getArr() in TFGraphTestAllHelper (import tests)
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Cleanup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* First steps for new dependency tracking approach
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Start integrating dependency tracking for memory management
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Non-control op dependency tracking works/passes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Switch/merge
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Next steps
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Cleanup and next steps
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix issue dependency tracking for initial variables/constants
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Add check for aliases when determining if safe to close array
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* First pass on new TF graph import class
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Import fixes, op fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Cleanup and fixes for new TF import mapper
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More cleanup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Partial implementation of new dependency tracker
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Next steps
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* AbstractDependencyTracker for shared code
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Overhaul SameDiff graph execution (dependency tracking)
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More fixes, cleanup, next steps
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Ad no-op memory manager, cleanup, fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix switch dependency tracking
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* INDArray.toString: no exception on closed arrays, just note closed
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix enter and exit dependency tracking
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* TensorArray memory management fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Add unique ID for INDArray instances
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix memory management for NextIteration outputs in multi-iteration loops
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Remove (now unnecessary) special case handling for nested enters
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Cleanup
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Handle control dependencies during execution; javadoc for memory managers
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Cleanup, polish, code comments, javadoc
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Cleanup and more javadoc
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Add memory validation for all TF import tests - ensure all arrays (except outputs) are released
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Clean up arrays waiting on unexecuted ops at the end of execution
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fixes for enter op memory managent in the context of multiple non-nested loops/frames
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix order of operation issues for dependency tracker
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Always clear op fields after execution to avoid leaks or unintended array reuse
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Re-implement dtype conversion
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix for control dependencies execution (dependency tracking)
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix TF import overrides and filtering
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix for constant enter array dependency tracking
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* DL4J Fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More DL4J fixes
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Cleanup and polish
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More polish and javadoc
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More logging level tweaks, small DL4J fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small fix to DL4J SameDiffLayer
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix empty array deserialization, add extra deserialization checks
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* FlatBuffers control dep serialization fixes; test serialization as part of all TF import tests
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Variable control dependencies serialization fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix issue with removing inputs for ops
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* FlatBuffers NDArray deserialization fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* FlatBuffers NDArray deserialization fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Small fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Final cleanup/polish
Signed-off-by: AlexDBlack <blacka101@gmail.com>
2019-10-23 21:19:50 +11:00
Alexander Stoyakin
f31661e13b
Merge pull request #7 from KonduitAI/asto_nd4s_10172019
...
KDTree optimization
2019-10-23 12:11:25 +03:00
Yurii
99be467f76
- minor change in recurrent.h
...
Signed-off-by: Yurii <iuriish@yahoo.com>
2019-10-17 20:46:51 +03:00
Yurii
70bd925abd
- write 2 versions of new lstmLayer: one is based on own code, second uses mkl dnn api
2019-10-17 20:44:52 +03:00
Alexander Stoyakin
630bb3c9b6
Merge pull request #2 from KonduitAI/asto_ops_wrapper
...
[WIP] New ops wrapper
2019-10-16 20:21:50 +03:00
shugeo
3662657d5c
Merge pull request #1 from KonduitAI/shugeo_gamma
...
Shugeo gamma
2019-10-16 18:49:33 +03:00
shugeo
24a2b2933f
Added gamma and lgamma functions.
2019-10-16 18:22:18 +03:00
Alexander Stoyakin
96a9a1a733
Fixed output from operation.
2019-10-16 18:07:52 +03:00
shugeo
7617682a46
Added declarations for igamma and igammac ops.
2019-10-16 14:45:10 +03:00
shugeo
478a0c1f97
Added igamma and igammac broadcastable ops implementations and tests.
2019-10-16 14:02:53 +03:00
shugeo
7103aca8c5
Added broadcastable IGamma and IGammac ops.
2019-10-16 13:58:32 +03:00
shugeo
f90e6da97e
Added nd4j_gamma, nd4j_igamma and nd4j_igammac functions.
2019-10-16 13:53:31 +03:00
shugeo
df2448613e
Added gamma distribution functions.
2019-10-15 20:00:07 +03:00
AlexDBlack
2d750b69e5
Merge remote-tracking branch 'konduit/master'
2019-10-14 17:21:23 +11:00
shugeo
ace65355c5
Added doc for fake_quant_with_min_max* op helpers cuda implementations.
2019-10-10 18:35:28 +03:00
shugeo
c890de5a7b
Added doc for fake_quant_with_min_max* op helpers implementations.
2019-10-10 18:31:17 +03:00
shugeo
c3f755d975
Refactored helpers both for cuda and cpu platforms.
2019-10-10 18:02:49 +03:00
shugeo
a09cb5e2be
Added doc for fake_quant_with_min_max_per_channel op declaration.
2019-10-10 17:13:33 +03:00
shugeo
92636b0b86
Eliminated waste operator.
2019-10-10 17:08:59 +03:00
shugeo
d5b352273d
Implementation of cuda kernel for fake_quant_with_min_max_vars_per_channels op. Final revision.
2019-10-10 16:51:29 +03:00
shugeo
02d8616692
Implementation of cuda kernel for fake_quant_with_min_max_vars_per_channels op.
2019-10-10 16:40:56 +03:00
shugeo
3504b0cda9
Implemented fake_quant_with_min_max_vars_per_channel fop cuda helper. The first working revision.
2019-10-10 15:44:50 +03:00
shugeo
753565145c
Refactored fake_quant_with_min_max_vars op cuda implementation.
2019-10-10 14:00:49 +03:00
shugeo
c13e945a96
Fixed fake_quant_with_min_max_vars op and tests.
2019-10-10 13:23:11 +03:00
shugeo
3c0c59ab88
Refactored fake_quant_with_min_max_vars op.
2019-10-09 22:09:33 +03:00
shugeo
352f1eee80
Implemented fake_quant_with_min_max_per_channel helper for cpu platform. The first approach.
2019-10-09 21:39:59 +03:00
shugeo
d0cbd33b0e
Added input checks for op.
2019-10-09 15:52:13 +03:00
shugeo
cb56b0b06a
The first approach for fake_quant_with_min_max_vars_per_channel op implementation.
2019-10-08 19:00:41 +03:00
shugeo
8fe5a1fa96
The working implementation of draw_bounding_boxes op.
2019-10-08 15:42:27 +03:00
shugeo
30a8af566c
The first working implementation of cuda kernel for draw_bounding_boxes op helper.
2019-10-08 13:45:18 +03:00
shugeo
ae09cfee32
Next approach of cuda imlementation for draw_bounding_boxes op helper.
2019-10-08 00:09:46 +03:00
shugeo
6cf3a8fa9c
Refactored cpu implementatio and added cuda aproach.
2019-10-07 17:51:07 +03:00
shugeo
78443ffebf
Working implementation of draw_bounding_boxes op for cpu.
2019-10-07 15:04:44 +03:00
shugeo
16a66a65e3
Added helper declaration for draw_bounding_boxes op.
2019-10-04 21:16:34 +03:00
shugeo
53a2ebddbe
Added test and helpers for draw_bounding_boxes op both cpu and cuda related.
2019-10-04 20:46:26 +03:00
shugeo
8f70b4441f
draw_bounding_boxes op implementation. Inital revision.
2019-10-04 18:32:21 +03:00
shugeo
908e4c4912
Added implementation for divide_no_nan op and tests.
2019-10-04 10:29:15 +03:00
raver119
cff26f13c5
Revert "Implement divide_no_nan op."
2019-10-03 20:25:52 +03:00
shugeo
6eaca179d6
Implement divide_no_nan op.
2019-10-03 18:22:17 +03:00
shugeo
130ee25682
Implemented compare_and_bitpack op.
2019-10-03 10:57:48 +03:00
shugeo
f3e42173ef
Refactored buffer copying to avoid wrong usage of buffers.
2019-10-02 16:51:09 +03:00
shugeo
1c6173d218
Added implementation of bitcast op.
2019-10-02 15:04:59 +03:00
shugeo
a27e61553a
Added tests and fixed op name.
2019-10-02 15:04:28 +03:00
shugeo
863ff76878
Added declaration for bincast op.
2019-10-02 12:17:00 +03:00
shugeo
afeb524238
Refactored implementation for adjust_contrast ops.
2019-10-01 14:13:09 +03:00
shugeo
1575c704ae
Added implementation for adjust_contrast_v2 op and tests.
2019-10-01 11:44:27 +03:00
raver119
44a8d19ac6
[WIP] Broadcast changes ( #8257 )
...
* - provide correct call NDArray::applyBroadcast inside of NDArray::applyTrueBroadcast
Signed-off-by: Yurii <yurii@skymind.io>
* - provide new trueBroadcast helper
Signed-off-by: Yurii <yurii@skymind.io>
* example for yurii
Signed-off-by: raver119 <raver119@gmail.com>
* - provide new trueBroadcast helper for cpu
Signed-off-by: Yurii <yurii@skymind.io>
* - start working on new trueBroadcat helper for cuda
Signed-off-by: Yurii <yurii@skymind.io>
* - further work on trueBroadcast for cuda
Signed-off-by: Yurii <yurii@skymind.io>
* - fix bugs in cuda helper trueBroadcast
Signed-off-by: Yurii <yurii@skymind.io>
2019-10-01 09:10:19 +03:00
shugeo
e06dfb5dcc
Implementation of adjust_contrast op.
2019-09-30 18:24:12 +03:00
AlexDBlack
a66e03355e
Merge remote-tracking branch 'fork/master'
2019-09-12 12:20:57 +10:00
raver119
98e2814879
Platform helpers ( #8216 )
...
* platform helpers draft
Signed-off-by: raver119 <raver119@gmail.com>
* typo
Signed-off-by: raver119 <raver119@gmail.com>
* disable platform cmake
Signed-off-by: raver119 <raver119@gmail.com>
* another draft
Signed-off-by: raver119 <raver119@gmail.com>
* mkldnn convolution refactored
Signed-off-by: raver119 <raver119@gmail.com>
* minor tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* one more safety check
Signed-off-by: raver119 <raver119@gmail.com>
* prototype works
Signed-off-by: raver119 <raver119@gmail.com>
* meh
Signed-off-by: raver119 <raver119@gmail.com>
* force static library mode for mkldnn
Signed-off-by: raver119 <raver119@gmail.com>
* - ismax fix
- experimental arg fix
- don't enforce openblas on Apple hardware
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of small fixes
Signed-off-by: raver119@gmail.com <raver119@gmail.com>
* declare concurrent
Signed-off-by: raver119@gmail.com <raver119@gmail.com>
* - MKLDNN version upgrade to 1.0.2
- avgpool2d/maxpool2d APIs update
Signed-off-by: raver119 <raver119@gmail.com>
* - avgpool2d_bp/maxpool2d_bp APIs update
Signed-off-by: raver119 <raver119@gmail.com>
* - conv2d/batchnorm APIs update
Signed-off-by: raver119 <raver119@gmail.com>
* - lrn/conv2d_bp/conv3d/conv3d_bp APIs update
Signed-off-by: raver119 <raver119@gmail.com>
* all ops converted to MKLDNN 1.x
Signed-off-by: raver119 <raver119@gmail.com>
* bunch of tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* namespace for platform helpers
Signed-off-by: raver119 <raver119@gmail.com>
* make sure platform helpers aren't opimized out
Signed-off-by: raver119 <raver119@gmail.com>
* build cpu_features on x86 systems
Signed-off-by: raver119 <raver119@gmail.com>
* build cpu_features on x86 systems
Signed-off-by: raver119 <raver119@gmail.com>
* more of cpu_features
Signed-off-by: raver119 <raver119@gmail.com>
* - mkldnn removed from java
- cpu_features checks in CpuNDArrayFactory
Signed-off-by: raver119 <raver119@gmail.com>
* F16C definition renamed
Signed-off-by: raver119 <raver119@gmail.com>
* some mkldnn rearrangements
Signed-off-by: raver119 <raver119@gmail.com>
* check supported instructions before doing anything
Signed-off-by: raver119 <raver119@gmail.com>
* typo
Signed-off-by: raver119 <raver119@gmail.com>
* missied impl
Signed-off-by: raver119 <raver119@gmail.com>
* BUILD_PIC option
Signed-off-by: raver119 <raver119@gmail.com>
* conv2d fix
Signed-off-by: raver119 <raver119@gmail.com>
* avgpool3d fix
Signed-off-by: raver119 <raver119@gmail.com>
* avgpool3d_bp fix
Signed-off-by: raver119 <raver119@gmail.com>
* avgpool2d_bp leak fix
Signed-off-by: raver119 <raver119@gmail.com>
* avgpool3d_bp leak fix
Signed-off-by: raver119 <raver119@gmail.com>
* maxpool bp leaks fixed
Signed-off-by: raver119 <raver119@gmail.com>
* printf removed
Signed-off-by: raver119 <raver119@gmail.com>
* batchnorm fix
Signed-off-by: raver119 <raver119@gmail.com>
* AVX warning/error polishing
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Fix
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* More polish
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* Polish
Signed-off-by: AlexDBlack <blacka101@gmail.com>
* remove previous MKL-DNN support layer
Signed-off-by: raver119 <raver119@gmail.com>
* avx2 tweak
Signed-off-by: raver119 <raver119@gmail.com>
* allow static for apple
Signed-off-by: raver119@gmail.com <raver119@gmail.com>
* exclude mkldnn in one more place
Signed-off-by: raver119 <raver119@gmail.com>
* exclude mkldnn in one more place
Signed-off-by: raver119 <raver119@gmail.com>
* restore OPENBLAS_PATH use
Signed-off-by: raver119 <raver119@gmail.com>
* add runtime check for avx/avx2 support
Signed-off-by: raver119 <raver119@gmail.com>
* convolution_auto
Signed-off-by: raver119 <raver119@gmail.com>
* Add logic for helper argument
* minor test fix
Signed-off-by: raver119 <raver119@gmail.com>
* few tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* few tweaks
Signed-off-by: raver119 <raver119@gmail.com>
* skip OpTracker props for non-x86 builds
Signed-off-by: raver119 <raver119@gmail.com>
* linux arm isn't x86 :)
Signed-off-by: raver119 <raver119@gmail.com>
* avx-512
Signed-off-by: raver119 <raver119@gmail.com>
* CUDA presets fix
Signed-off-by: raver119 <raver119@gmail.com>
* BUILD_PIC
Signed-off-by: raver119 <raver119@gmail.com>
* prefetchw for avx2
Signed-off-by: raver119 <raver119@gmail.com>
* BUILD_PIC again
Signed-off-by: raver119 <raver119@gmail.com>
2019-09-11 21:50:28 +03:00