cavis/datavec
raver119 ec847e034b
[WIP] Remote inference (#96)
* fix pad javadoc and @see links. (#72)

Signed-off-by: Robert Altena <Rob@Ra-ai.com>

* [WIP] More fixes (#73)

* special tests for ConstantTadHelper/ConstantShapeHelper

Signed-off-by: raver119 <raver119@gmail.com>

* release methods for data buffers

Signed-off-by: raver119 <raver119@gmail.com>

* delete temporary buffer Java side

Signed-off-by: raver119 <raver119@gmail.com>

* delete temporary buffer Java side

Signed-off-by: raver119 <raver119@gmail.com>

* delete temporary TadPack C++/Java side (#74)

Signed-off-by: raver119 <raver119@gmail.com>

* Zoo model TF import test updates (#75)

* argLine fix, update compression_gru comment

* updated comment for xception

* undid but commented argLine change

* updated xlnet comment

* copyright headers

* - new NDArray methods like()/ulike() (#77)

- fix for depthwise_conv2d_bp + special test

Signed-off-by: raver119 <raver119@gmail.com>

* upsampling2d fix CUDA

Signed-off-by: raver119 <raver119@gmail.com>

* DL4J trace logging (#79)

* MLN/CG trace logging for debugging

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Tiny tweak

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* strided_slice_bp shape fn leak fix

Signed-off-by: raver119 <raver119@gmail.com>

* SameDiff fixes and naming (#78)

* remove SDVariable inplace methods

* import methods

* npe fix in OpVal

* removed SameDiff inplace ops from tests

* Naming updates, moved to centralized methods in SameDiff, should use op_#:# for everything

* quick fixes

* javadoc

* SDVariable eval with placeholders

* use regex match

* better matching

* fix javadoc. (#76)

* fix javadoc.

Signed-off-by: Robert Altena <Rob@Ra-ai.com>

* replace most @see with @link s.

Signed-off-by: Robert Altena <Rob@Ra-ai.com>

* 4 additional tests

Signed-off-by: raver119 <raver119@gmail.com>

* Various DL4J/ND4J fixes (#81)

* #7954 Force refresh of UI when switching tabs on overview page

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8017 Concurrent modification exception (synchronize) fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8033 Don't initialize updater in middle of writing memory crash dump

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8208 Fix shape checks for ND4J int[] creator methods

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #6385 #7992 Keras import naming fixes + cleanup

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8016 Upsampling3D - add NDHWC format support

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Refactor NativeOps.h to export C functions

* Actually export functions from NativeOps.h

* Adapt the Java wrappers in ND4J generated with JavaCPP

* Create C wrappers for some of the C++ classes currently used by ND4J

* remove duplicate code in createBufferDetached. (#83)

Signed-off-by: Robert Altena <Rob@Ra-ai.com>

* Keras model import - updater lr fix (#84)

* Keras model import - updater lr fix

Signed-off-by: eraly <susan.eraly@gmail.com>

* Keras model import - updater lr fix, cleanup

Signed-off-by: eraly <susan.eraly@gmail.com>

* Fix functions of OpaqueVariablesSet

* SameDiff Convolution Config validation, better output methods (#82)

* Conv Config validation & tests

Signed-off-by: Ryan Nett <rnett@skymind.io>

* stackOutputs utility method

Signed-off-by: Ryan Nett <rnett@skymind.io>

* use constructor for validation, support negative kernel sizes (infered from weights)

Signed-off-by: Ryan Nett <rnett@skymind.io>

* better output methods

Signed-off-by: Ryan Nett <rnett@skymind.io>

* move output to be with fit and evaluate

Signed-off-by: Ryan Nett <rnett@skymind.io>

* fixes

Signed-off-by: Ryan Nett <rnett@skymind.io>

* more fixes

Signed-off-by: Ryan Nett <rnett@skymind.io>

* refactor duplicate code from pad methods. (#86)

* refactor duplicate code from pad methods.

Signed-off-by: Robert Altena <Rob@Ra-ai.com>

* replace switch with if.

Signed-off-by: Robert Altena <Rob@Ra-ai.com>

* Various ND4J/DL4J fixes and improvements (#87)

* Reshape and reallocate - small fixes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Reshape and reallocate - small fixes

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #6488 ElementWiseVertex broadcast support

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Constructors and broadcast supported it Transforms.max/min

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8054 ElementWiseVertex now supports broadcast inputs

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8057 Nd4j.create overload dtype fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #7551 ND4J Shape validation fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* [WIP] Numpy boolean import (#91)

* numpy bool type

Signed-off-by: raver119 <raver119@gmail.com>

* numpy bool java side

Signed-off-by: raver119 <raver119@gmail.com>

* remove create method with unused parameter. (#89)

* remove create method with unused parameter.

* removed more unused methods.

Signed-off-by: Robert Altena <Rob@Ra-ai.com>

* removing more unused code.

Signed-off-by: Robert Altena <Rob@Ra-ai.com>

* last removal of unused code.

Signed-off-by: Robert Altena <Rob@Ra-ai.com>

* remove createSparse methods. (#92)

Signed-off-by: Robert Altena <Rob@Ra-ai.com>

* Various ND4J/DL4J fixes (#90)

* Deprecate Old*Op instances

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8063 #8054 Broadcast exceptions + cleanup inplace ops

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Small fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Remove bad test condition

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #7993 Fix shape function issue in crop_and_resize op

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* DL4J SameDiff lambda layer fix

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8029 Fix for pnorm backprop math

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* #8038 Fix Op profiler NaN/Inf triggering + add tests (#93)

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* createUninitializedDetached refactoring. (#94)

* wip

* update interface, add null implementations.

* Breaking one test in a weird way.

Signed-off-by: Robert Altena <Rob@Ra-ai.com>

* createUninitializedDetached refactored.

Signed-off-by: Robert Altena <Rob@Ra-ai.com>

* cuda build fix for issues introduced by recent refactoring

Signed-off-by: raver119 <raver119@gmail.com>

* initial commit

Signed-off-by: raver119 <raver119@gmail.com>

* deps tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* initial prototype

Signed-off-by: raver119 <raver119@gmail.com>

* modules reorganized

Signed-off-by: raver119 <raver119@gmail.com>

* gprc module moved to nd4j-remote as well

Signed-off-by: raver119 <raver119@gmail.com>

* gprc module moved to nd4j-remote as well

Signed-off-by: raver119 <raver119@gmail.com>

* serving prototype

Signed-off-by: raver119 <raver119@gmail.com>

* serving prototype

Signed-off-by: raver119 <raver119@gmail.com>

* serving prototype

Signed-off-by: raver119 <raver119@gmail.com>

* serving prototype

Signed-off-by: raver119 <raver119@gmail.com>

* [WIP] More of CUDA (#95)

* initial commit

Signed-off-by: raver119 <raver119@gmail.com>

* Implementation of hashcode cuda helper. Working edition.

* Fixed parallel test input arangements.

* Fixed tests for hashcode op.

* Fixed shape calculation for image:crop_and_resize op and test.

* NativeOps tests. Initial test suite.

* Added tests for indexReduce methods.

* Added test on execBroadcast with NDArray as dimensions.

* Added test on execBroadcastBool with NDArray as dimensions.

* Added tests on execPairwiseTransform and execPairwiseTransofrmBool.

* Added tests for execReduce with scalar results.

* Added reduce tests for non-empty dims array.

* Added tests for reduce3.

* Added tests for execScalar.

* Added tests for execSummaryStats.

* - provide cpu/cuda code for batch_to_space
- testing it

Signed-off-by: Yurii <yurii@skymind.io>

* - remove old test for batch_to_space (had wrong format and numbers were not checked)

Signed-off-by: Yurii <yurii@skymind.io>

* Fixed complilation errors with test.

* Added test for execTransformFloat.

* Added test for execTransformSame.

* Added test for execTransformBool.

* Added test for execTransformStrict.

* Added tests for execScalar/execScalarBool with TADs.

* Added test for flatten.

* - provide cpu/cuda code for space_to_Batch operaion

Signed-off-by: Yurii <yurii@skymind.io>

* Added test for concat.

* comment unnecessary stuff in s_t_b

Signed-off-by: Yurii <yurii@skymind.io>

* Added test for specialConcat.

* Added tests for memcpy/set routines.

* Fixed pullRow cuda test.

* Added pullRow test.

* Added average test.

* - correct typo in NDArray::applyPairwiseTransform(nd4j::pairwise::BoolOps op...)

Signed-off-by: Yurii <yurii@skymind.io>

* - debugging and fixing cuda tests in JavaInteropTests file

Signed-off-by: Yurii <yurii@skymind.io>

* - correct some tests

Signed-off-by: Yurii <yurii@skymind.io>

* Added test for shuffle.

* Fixed ops declarations.

* Restored omp and added shuffle test.

* Added convertTypes test.

* Added tests for execRandom. Eliminated usage of RandomBuffer with NativeOps.

* Added sort tests.

* Added tests for execCustomOp.

* - further debuging and fixing tests terminated with crash

Signed-off-by: Yurii <yurii@skymind.io>

* Added tests for calculateOutputShapes.

* Addded Benchmarks test.

* Commented benchmark tests.

* change assertion

Signed-off-by: raver119 <raver119@gmail.com>

* Added tests for apply_sgd op. Added cpu helper for that op.

* Implement cuda helper for aplly_sgd op. Fixed tests for NativeOps.

* Added test for assign broadcastable.

* Added tests for assign_bp op.

* Added tests for axpy op.

* - assign/execScalar/execTransformAny signature change
- minor test fix

Signed-off-by: raver119 <raver119@gmail.com>

* Fixed axpy op.

* meh

Signed-off-by: raver119 <raver119@gmail.com>

* - fix tests for nativeOps::concat

Signed-off-by: Yurii <yurii@skymind.io>

* sequential transform/scalar

Signed-off-by: raver119 <raver119@gmail.com>

* allow nested parallelism

Signed-off-by: raver119 <raver119@gmail.com>

* assign_bp leak fix

Signed-off-by: raver119 <raver119@gmail.com>

* block setRNG fix

Signed-off-by: raver119 <raver119@gmail.com>

* enable parallelism by default

Signed-off-by: raver119 <raver119@gmail.com>

* enable nested parallelism by default

Signed-off-by: raver119 <raver119@gmail.com>

* Added cuda implementation for row_count helper.

* Added implementation for tnse gains op helper.

* - take into account possible situations when input arrays are empty in reduce_ cuda stuff

Signed-off-by: Yurii <yurii@skymind.io>

* Implemented tsne/edge_forces op cuda-based helper. Parallelized cpu-based helper for edge_forces.

* Added kernel for tsne/symmetrized op heleper.

* Implementation of tsne/symmetrized op cuda helper. Working edition.

* Eliminated waste printfs.

* Added test for broadcastgradientargs op.

* host-only fallback for empty reduce float

Signed-off-by: raver119 <raver119@gmail.com>

* - some tests fixes

Signed-off-by: Yurii <yurii@skymind.io>

* - correct the rest of reduce_ stuff

Signed-off-by: Yurii <yurii@skymind.io>

* - further correction of reduce_ stuff

Signed-off-by: Yurii <yurii@skymind.io>

* Added test for Cbow op. Also added cuda implementation for cbow helpers.

* - improve code of stack operation for scalar case

Signed-off-by: Yurii <yurii@skymind.io>

* - provide cuda kernel for gatherND operation

Signed-off-by: Yurii <yurii@skymind.io>

* Implementation of cbow helpers with cuda kernels.

* minor tests tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* minor tests tweaks

Signed-off-by: raver119 <raver119@gmail.com>

* - further correction of cuda stuff

Signed-off-by: Yurii <yurii@skymind.io>

* Implementatation of cbow op helper with cuda kernels. Working edition.

* Skip random testing for cudablas case.

* lstmBlockCell context fix

Signed-off-by: raver119 <raver119@gmail.com>

* Added tests for ELU and ELU_BP ops.

* Added tests for eq_scalar, gt_scalar, gte_scalar and lte_scalar ops.

* Added tests for neq_scalar.

* Added test for noop.

* - further work on clipbynorm_bp

Signed-off-by: Yurii <yurii@skymind.io>

* - get rid of concat op call, use instead direct concat helper call

Signed-off-by: Yurii <yurii@skymind.io>

* lstmBlockCell context fix

Signed-off-by: raver119 <raver119@gmail.com>

* Added tests for lrelu and lrelu_bp.

* Added tests for selu and selu_bp.

* Fixed lrelu derivative helpers.

* - some corrections in lstm

Signed-off-by: Yurii <yurii@skymind.io>

* operator * result shape fix

Signed-off-by: raver119 <raver119@gmail.com>

* - correct typo in lstmCell

Signed-off-by: Yurii <yurii@skymind.io>

* few tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* CUDA inverse broadcast bool fix

Signed-off-by: raver119 <raver119@gmail.com>

* disable MMAP test for CUDA

Signed-off-by: raver119 <raver119@gmail.com>

* BooleanOp syncToDevice

Signed-off-by: raver119 <raver119@gmail.com>

* meh

Signed-off-by: raver119 <raver119@gmail.com>

* additional data types for im2col/col2im

Signed-off-by: raver119 <raver119@gmail.com>

* Added test for firas_sparse op.

* one more RandomBuffer test excluded

Signed-off-by: raver119 <raver119@gmail.com>

* Added tests for flatten op.

* Added test for Floor op.

* bunch of tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* mmulDot tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* more tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* Implemented floordiv_bp op and tests.

* Fixed scalar case with cuda implementation for bds.

* - work on cuda kernel for clip_by_norm backprop op is completed

Signed-off-by: Yurii <yurii@skymind.io>

* Eliminate cbow crach.

* more tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* more tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* Eliminated abortion with batched nlp test.

* more tests fixed

Signed-off-by: raver119 <raver119@gmail.com>

* Fixed shared flag initializing.

* disabled bunch of cpu workspaces tests

Signed-off-by: raver119 <raver119@gmail.com>

* scalar operators fix: missing registerSpecialUse call

Signed-off-by: raver119 <raver119@gmail.com>

* Fixed logdet for cuda and tests.

* - correct clipBynorm_bp

Signed-off-by: Yurii <yurii@skymind.io>

* Fixed crop_and_resize shape datatype.

* - correct some mmul tests

Signed-off-by: Yurii <yurii@skymind.io>

* build fix

Signed-off-by: raver119 <raver119@gmail.com>

* exclude two methods for JNI

Signed-off-by: raver119 <raver119@gmail.com>

* exclude two methods for JNI

Signed-off-by: raver119 <raver119@gmail.com>

* exclude two methods for JNI (#97)

Signed-off-by: raver119 <raver119@gmail.com>

* temporary stack fix

Signed-off-by: raver119 <raver119@gmail.com>

* downgrade jetty to latest stable version

Signed-off-by: raver119 <raver119@gmail.com>

* test and profiles

Signed-off-by: raver119 <raver119@gmail.com>

* Servlet skeleton

* one test case

Signed-off-by: raver119 <raver119@gmail.com>

* one test case

Signed-off-by: raver119 <raver119@gmail.com>

* compilation fix

Signed-off-by: raver119 <raver119@gmail.com>

* draft improvements

Signed-off-by: raver119 <raver119@gmail.com>

* draft improvements

Signed-off-by: raver119 <raver119@gmail.com>

* proof of concept works

Signed-off-by: raver119 <raver119@gmail.com>

* proof of concept works

Signed-off-by: raver119 <raver119@gmail.com>

* Servlet

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* logging + simple timing

Signed-off-by: raver119 <raver119@gmail.com>

* Content type fixed

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* Profile required

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* Servlet tests

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* Post test

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* Tests added:

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* Minor tweaks

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* Constants used

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* Check content type

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* Some tests

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* Errors checking

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* Constraints and tests

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* Minor tweaks

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* Dl4j servlet skeleton

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* Moving class to dl4j

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* Builder extended

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* initial dl4j commit

Signed-off-by: raver119 <raver119@gmail.com>

* unirest version change

Signed-off-by: raver119 <raver119@gmail.com>

* temp fallback

Signed-off-by: raver119 <raver119@gmail.com>

* Reverted unirest version

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* Reverted unirest version

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* revert back unirest version change

Signed-off-by: raver119 <raver119@gmail.com>

* revert unirest change

Signed-off-by: raver119 <raver119@gmail.com>

* some additional checks in builder

Signed-off-by: raver119 <raver119@gmail.com>

* few more fields

Signed-off-by: raver119 <raver119@gmail.com>

* Test added

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* lombok

Signed-off-by: raver119 <raver119@gmail.com>

* Tests added

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* deps

Signed-off-by: raver119 <raver119@gmail.com>

* profiles re-introduced

Signed-off-by: raver119 <raver119@gmail.com>

* Added tests

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* Model servlet

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* builders

Signed-off-by: raver119 <raver119@gmail.com>

* builders

Signed-off-by: raver119 <raver119@gmail.com>

* Servlet skeleton

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* Servlet tests

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* builders

Signed-off-by: raver119 <raver119@gmail.com>

* get rid of old class

Signed-off-by: raver119 <raver119@gmail.com>

* use PI for inference

Signed-off-by: raver119 <raver119@gmail.com>

* superbuilder

Signed-off-by: raver119 <raver119@gmail.com>

* get back builder

Signed-off-by: raver119 <raver119@gmail.com>

* Servlet builder

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* PI setup

Signed-off-by: raver119 <raver119@gmail.com>

* get rid of superbuilder

Signed-off-by: raver119 <raver119@gmail.com>

* SameDiffServlet inheritance constructor

Signed-off-by: raver119 <raver119@gmail.com>

* dl4jservlet attached to samediffservlet

Signed-off-by: raver119 <raver119@gmail.com>

* builder types fix

Signed-off-by: raver119 <raver119@gmail.com>

* dummy model

Signed-off-by: raver119 <raver119@gmail.com>

* single out

Signed-off-by: raver119 <raver119@gmail.com>

* loss

Signed-off-by: raver119 <raver119@gmail.com>

* Tests added

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* missed builder type

Signed-off-by: raver119 <raver119@gmail.com>

* working serving example

Signed-off-by: raver119 <raver119@gmail.com>

* sd model fix

Signed-off-by: raver119 <raver119@gmail.com>

* fix unirest version

Signed-off-by: raver119 <raver119@gmail.com>

* More tests

Signed-off-by: AlexDBlack <blacka101@gmail.com>

* Tests added:

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* Minor tests fixes

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* Tests fixed

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* Build fixed

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* Test added

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* Tests fixed

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* Ser/deser added

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* one more unirest fix

Signed-off-by: raver119 <raver119@gmail.com>

* Custom serializers

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* Tests disabled

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* revert back unirest version change

Signed-off-by: raver119 <raver119@gmail.com>

* update

Signed-off-by: raver119 <raver119@gmail.com>

* some default fields values

Signed-off-by: raver119 <raver119@gmail.com>

* some comments/javadoc

Signed-off-by: raver119 <raver119@gmail.com>

* - move serde impls to client module
- get rid of INDArray serde for now

Signed-off-by: raver119 <raver119@gmail.com>

* jackson-based serde for float[], double[] and String

Signed-off-by: raver119 <raver119@gmail.com>

* more of basic ser/de + tests

Signed-off-by: raver119 <raver119@gmail.com>

* minor api changes

Signed-off-by: raver119 <raver119@gmail.com>

* change imports/signatures

Signed-off-by: raver119 <raver119@gmail.com>

* Optional parralel inference

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* Insert pause between tests as workaround for unavailable port issue

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* few unused imports removed

Signed-off-by: raver119 <raver119@gmail.com>

* Models usage

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* Models usage

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* - InputAdapter + OutputAdapter = InferenceAdapter
- JsonModelServer now allows separate configuration of InputAdapter and OutputAdapter

Signed-off-by: raver119 <raver119@gmail.com>

* unused import

Signed-off-by: raver119 <raver119@gmail.com>

* input adapter..

Signed-off-by: raver119 <raver119@gmail.com>

* minor signature change

Signed-off-by: raver119 <raver119@gmail.com>

* few more signatures updated

Signed-off-by: raver119 <raver119@gmail.com>

* input/output adapter

Signed-off-by: raver119 <raver119@gmail.com>

* Tests added

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* javadocs added

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* Test fixed

Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com>

* minor polishing

Signed-off-by: raver119 <raver119@gmail.com>

* more of javadoc

Signed-off-by: raver119 <raver119@gmail.com>

* signature change

Signed-off-by: raver119 <raver119@gmail.com>
2019-08-14 12:11:09 +03:00
..
.github Eclipse Migration Initial Commit 2019-06-06 15:21:15 +03:00
ci Eclipse Migration Initial Commit 2019-06-06 15:21:15 +03:00
contrib Eclipse Migration Initial Commit 2019-06-06 15:21:15 +03:00
datavec-api Small number of fixes + cleanup + some missing op methods + constructors (#100) 2019-08-05 22:31:46 +10:00
datavec-arrow Eclipse Migration Initial Commit 2019-06-06 15:21:15 +03:00
datavec-camel Eclipse Migration Initial Commit 2019-06-06 15:21:15 +03:00
datavec-data Eclipse Migration Initial Commit 2019-06-06 15:21:15 +03:00
datavec-excel Eclipse Migration Initial Commit 2019-06-06 15:21:15 +03:00
datavec-geo Eclipse Migration Initial Commit 2019-06-06 15:21:15 +03:00
datavec-hadoop Eclipse Migration Initial Commit 2019-06-06 15:21:15 +03:00
datavec-jdbc Eclipse Migration Initial Commit 2019-06-06 15:21:15 +03:00
datavec-local Eclipse Migration Initial Commit 2019-06-06 15:21:15 +03:00
datavec-perf Eclipse Migration Initial Commit 2019-06-06 15:21:15 +03:00
datavec-python Eclipse Migration Initial Commit 2019-06-06 15:21:15 +03:00
datavec-spark Eclipse Migration Initial Commit 2019-06-06 15:21:15 +03:00
datavec-spark-inference-parent [WIP] Remote inference (#96) 2019-08-14 12:11:09 +03:00
.travis.yml Eclipse Migration Initial Commit 2019-06-06 15:21:15 +03:00
LICENSE Eclipse Migration Initial Commit 2019-06-06 15:21:15 +03:00
README.md Eclipse Migration Initial Commit 2019-06-06 15:21:15 +03:00
buildmultiplescalaversions.sh Eclipse Migration Initial Commit 2019-06-06 15:21:15 +03:00
pom.xml Eclipse Migration Initial Commit 2019-06-06 15:21:15 +03:00

README.md

DataVec

Join the chat at https://gitter.im/deeplearning4j/deeplearning4j Maven Central Javadoc

DataVec is an Apache 2.0-licensed library for machine-learning ETL (Extract, Transform, Load) operations. DataVec's purpose is to transform raw data into usable vector formats that can be fed to machine learning algorithms. By contributing code to this repository, you agree to make your contribution available under an Apache 2.0 license.

Why Would I Use DataVec?

Data handling is sometimes messy, and we believe it should be distinct from high-performance algebra libraries (such as nd4j or Deeplearning4j).

DataVec allows a practitioner to take raw data and produce open standard compliant vectorized data (svmLight, etc) quickly. Current input data types supported out of the box:

  • CSV Data
  • Raw Text Data (Tweets, Text Documents, etc)
  • Image Data
  • LibSVM
  • SVMLight
  • MatLab (MAT) format
  • JSON, XML, YAML, XML

Datavec draws inspiration from a lot of the Hadoop ecosystem tools, and in particular accesses data on disk through the Hadoop API (like Spark does), which means it's compatible with many records.

DataVec also includes sophisticated functionality for feature engineering, data cleaning and data normalization both for static data and for sequences (time series). Such operations can be executed on Apache Spark using DataVec-Spark.

Datavec's architecture : API, transforms and filters, and schema management

Apart from obviously providing readers for classic data formats, DataVec also provides an interface. So if you wanted to ingest specific custom data, you wouldn't have to build the whole pipeline. You would just have to write the very first step. For example, if you describe through the API how your data fits into a common format that complies with the interface, DataVec would return a list of Writables for each record. You'll find more detail on the API in the corresponding module.

Another thing you can do with DataVec is data cleaning. Instead of having clean, ready-to-go data, let's say you start with data in different forms or from different sources. You might need to do sampling, filtering, or several incredibly messy ETL tasks needed to prepare data in the real world. DataVec offers filters and transformations that help with curating, preparing and massaging your data. It leverages Apache Spark to do this at scale.

Finally, DataVec tracks a schema for your columnar data, across all transformations. This schema is actively checked through probing, and DataVec will raise exceptions if your data does not match the schema. You can specify filters as well: you can attach a regular expression to an input column of type String, for example, and DataVec will only keep data that matches this filter.

On Distribution

Distributed treatment through Apache Spark is optional, including running Spark in local-mode (where your cluster is emulated with multi-threading) when necessary. Datavec aims to abstract away from the actual execution, and create at compile time, a logical set of operations to execute. While we have some code that uses Spark, we do not want to be locked into a single tool, and using Apache Flink or Beam are possibilities - projects on which we would welcome collaboration.

Examples

Examples for using DataVec are available here: https://github.com/deeplearning4j/dl4j-examples


Contribute

Where to contribute?

We have a lot in the pipeline, and we'd love to receive contributions. We want to support representing data as more than a collection of simple types ("writables"), and rather as binary data — that will help with GC pressure across our pipelines and fit better with media-based use cases, where columnar data is not essential. We also expect it will streamline a lot of the specialized operations we now do on primitive types.

That being said, an area that could use a first contribution is the implementations of the RecordReader interface, since this is relatively self-contained. Of note, to support most of the distributed file formats of the Hadoop ecosystem, we use Apache Camel. Camel supports a pluggable DataFormat to allow messages to be marshalled to and from binary or text formats to support a kind of Message Translator.

Another area that is relatively self-contained is transformations, where you might find a filter or data munging operation that has not been implemented yet, and provide it in a self-contained way.

Which maintainers to contact?

It's useful to know which maintainers to contact to get information on a particular part of the code, including reviewing your pull requests, or asking questions on our gitter channel. For this you can use the following, indicative mapping:

  • RecordReader implementations: @saudet and @agibsonccc
  • Transformations and their API: @agibsonccc and @AlexDBlack
  • Spark and distributed processing: @AlexDBlack, @agibsonccc and @huitseeker
  • Native formats, geodata: @saudet

How to contribute

  1. Check for open issues, or open a new issue to start a discussion around a feature idea or a bug.

  2. If you feel uncomfortable or uncertain about an issue or your changes, feel free to contact us on Gitter using the link above.

  3. Fork the repository on GitHub to start making your changes.

  4. Write a test, which shows that the bug was fixed or that the feature works as expected.

  5. Note the repository follows the Google Java style with two modifications: 120-char column wrap and 4-spaces indentation. You can format your code to this format by typing mvn formatter:format in the subproject you work on, by using the contrib/formatter.xml at the root of the repository to configure the Eclipse formatter, or by using the INtellij plugin.

  6. Send a pull request, and bug us on Gitter until it gets merged and published.

Eclipse Setup

  1. Downloading the latest JAR from https://projectlombok.org/download
  2. Double-click the JAR file to install the plugin for Eclipse
  3. Clone Datavec to your system
  4. Import the project as a Maven project
  5. You will also need clone and build ND4J and libnd4j