ec847e034b
* fix pad javadoc and @see links. (#72) Signed-off-by: Robert Altena <Rob@Ra-ai.com> * [WIP] More fixes (#73) * special tests for ConstantTadHelper/ConstantShapeHelper Signed-off-by: raver119 <raver119@gmail.com> * release methods for data buffers Signed-off-by: raver119 <raver119@gmail.com> * delete temporary buffer Java side Signed-off-by: raver119 <raver119@gmail.com> * delete temporary buffer Java side Signed-off-by: raver119 <raver119@gmail.com> * delete temporary TadPack C++/Java side (#74) Signed-off-by: raver119 <raver119@gmail.com> * Zoo model TF import test updates (#75) * argLine fix, update compression_gru comment * updated comment for xception * undid but commented argLine change * updated xlnet comment * copyright headers * - new NDArray methods like()/ulike() (#77) - fix for depthwise_conv2d_bp + special test Signed-off-by: raver119 <raver119@gmail.com> * upsampling2d fix CUDA Signed-off-by: raver119 <raver119@gmail.com> * DL4J trace logging (#79) * MLN/CG trace logging for debugging Signed-off-by: AlexDBlack <blacka101@gmail.com> * Tiny tweak Signed-off-by: AlexDBlack <blacka101@gmail.com> * strided_slice_bp shape fn leak fix Signed-off-by: raver119 <raver119@gmail.com> * SameDiff fixes and naming (#78) * remove SDVariable inplace methods * import methods * npe fix in OpVal * removed SameDiff inplace ops from tests * Naming updates, moved to centralized methods in SameDiff, should use op_#:# for everything * quick fixes * javadoc * SDVariable eval with placeholders * use regex match * better matching * fix javadoc. (#76) * fix javadoc. Signed-off-by: Robert Altena <Rob@Ra-ai.com> * replace most @see with @link s. Signed-off-by: Robert Altena <Rob@Ra-ai.com> * 4 additional tests Signed-off-by: raver119 <raver119@gmail.com> * Various DL4J/ND4J fixes (#81) * #7954 Force refresh of UI when switching tabs on overview page Signed-off-by: AlexDBlack <blacka101@gmail.com> * #8017 Concurrent modification exception (synchronize) fix Signed-off-by: AlexDBlack <blacka101@gmail.com> * #8033 Don't initialize updater in middle of writing memory crash dump Signed-off-by: AlexDBlack <blacka101@gmail.com> * #8208 Fix shape checks for ND4J int[] creator methods Signed-off-by: AlexDBlack <blacka101@gmail.com> * #6385 #7992 Keras import naming fixes + cleanup Signed-off-by: AlexDBlack <blacka101@gmail.com> * #8016 Upsampling3D - add NDHWC format support Signed-off-by: AlexDBlack <blacka101@gmail.com> * Refactor NativeOps.h to export C functions * Actually export functions from NativeOps.h * Adapt the Java wrappers in ND4J generated with JavaCPP * Create C wrappers for some of the C++ classes currently used by ND4J * remove duplicate code in createBufferDetached. (#83) Signed-off-by: Robert Altena <Rob@Ra-ai.com> * Keras model import - updater lr fix (#84) * Keras model import - updater lr fix Signed-off-by: eraly <susan.eraly@gmail.com> * Keras model import - updater lr fix, cleanup Signed-off-by: eraly <susan.eraly@gmail.com> * Fix functions of OpaqueVariablesSet * SameDiff Convolution Config validation, better output methods (#82) * Conv Config validation & tests Signed-off-by: Ryan Nett <rnett@skymind.io> * stackOutputs utility method Signed-off-by: Ryan Nett <rnett@skymind.io> * use constructor for validation, support negative kernel sizes (infered from weights) Signed-off-by: Ryan Nett <rnett@skymind.io> * better output methods Signed-off-by: Ryan Nett <rnett@skymind.io> * move output to be with fit and evaluate Signed-off-by: Ryan Nett <rnett@skymind.io> * fixes Signed-off-by: Ryan Nett <rnett@skymind.io> * more fixes Signed-off-by: Ryan Nett <rnett@skymind.io> * refactor duplicate code from pad methods. (#86) * refactor duplicate code from pad methods. Signed-off-by: Robert Altena <Rob@Ra-ai.com> * replace switch with if. Signed-off-by: Robert Altena <Rob@Ra-ai.com> * Various ND4J/DL4J fixes and improvements (#87) * Reshape and reallocate - small fixes Signed-off-by: AlexDBlack <blacka101@gmail.com> * Reshape and reallocate - small fixes Signed-off-by: AlexDBlack <blacka101@gmail.com> * #6488 ElementWiseVertex broadcast support Signed-off-by: AlexDBlack <blacka101@gmail.com> * Constructors and broadcast supported it Transforms.max/min Signed-off-by: AlexDBlack <blacka101@gmail.com> * #8054 ElementWiseVertex now supports broadcast inputs Signed-off-by: AlexDBlack <blacka101@gmail.com> * #8057 Nd4j.create overload dtype fix Signed-off-by: AlexDBlack <blacka101@gmail.com> * #7551 ND4J Shape validation fix Signed-off-by: AlexDBlack <blacka101@gmail.com> * [WIP] Numpy boolean import (#91) * numpy bool type Signed-off-by: raver119 <raver119@gmail.com> * numpy bool java side Signed-off-by: raver119 <raver119@gmail.com> * remove create method with unused parameter. (#89) * remove create method with unused parameter. * removed more unused methods. Signed-off-by: Robert Altena <Rob@Ra-ai.com> * removing more unused code. Signed-off-by: Robert Altena <Rob@Ra-ai.com> * last removal of unused code. Signed-off-by: Robert Altena <Rob@Ra-ai.com> * remove createSparse methods. (#92) Signed-off-by: Robert Altena <Rob@Ra-ai.com> * Various ND4J/DL4J fixes (#90) * Deprecate Old*Op instances Signed-off-by: AlexDBlack <blacka101@gmail.com> * #8063 #8054 Broadcast exceptions + cleanup inplace ops Signed-off-by: AlexDBlack <blacka101@gmail.com> * Small fix Signed-off-by: AlexDBlack <blacka101@gmail.com> * Remove bad test condition Signed-off-by: AlexDBlack <blacka101@gmail.com> * #7993 Fix shape function issue in crop_and_resize op Signed-off-by: AlexDBlack <blacka101@gmail.com> * DL4J SameDiff lambda layer fix Signed-off-by: AlexDBlack <blacka101@gmail.com> * #8029 Fix for pnorm backprop math Signed-off-by: AlexDBlack <blacka101@gmail.com> * #8038 Fix Op profiler NaN/Inf triggering + add tests (#93) Signed-off-by: AlexDBlack <blacka101@gmail.com> * createUninitializedDetached refactoring. (#94) * wip * update interface, add null implementations. * Breaking one test in a weird way. Signed-off-by: Robert Altena <Rob@Ra-ai.com> * createUninitializedDetached refactored. Signed-off-by: Robert Altena <Rob@Ra-ai.com> * cuda build fix for issues introduced by recent refactoring Signed-off-by: raver119 <raver119@gmail.com> * initial commit Signed-off-by: raver119 <raver119@gmail.com> * deps tweaks Signed-off-by: raver119 <raver119@gmail.com> * initial prototype Signed-off-by: raver119 <raver119@gmail.com> * modules reorganized Signed-off-by: raver119 <raver119@gmail.com> * gprc module moved to nd4j-remote as well Signed-off-by: raver119 <raver119@gmail.com> * gprc module moved to nd4j-remote as well Signed-off-by: raver119 <raver119@gmail.com> * serving prototype Signed-off-by: raver119 <raver119@gmail.com> * serving prototype Signed-off-by: raver119 <raver119@gmail.com> * serving prototype Signed-off-by: raver119 <raver119@gmail.com> * serving prototype Signed-off-by: raver119 <raver119@gmail.com> * [WIP] More of CUDA (#95) * initial commit Signed-off-by: raver119 <raver119@gmail.com> * Implementation of hashcode cuda helper. Working edition. * Fixed parallel test input arangements. * Fixed tests for hashcode op. * Fixed shape calculation for image:crop_and_resize op and test. * NativeOps tests. Initial test suite. * Added tests for indexReduce methods. * Added test on execBroadcast with NDArray as dimensions. * Added test on execBroadcastBool with NDArray as dimensions. * Added tests on execPairwiseTransform and execPairwiseTransofrmBool. * Added tests for execReduce with scalar results. * Added reduce tests for non-empty dims array. * Added tests for reduce3. * Added tests for execScalar. * Added tests for execSummaryStats. * - provide cpu/cuda code for batch_to_space - testing it Signed-off-by: Yurii <yurii@skymind.io> * - remove old test for batch_to_space (had wrong format and numbers were not checked) Signed-off-by: Yurii <yurii@skymind.io> * Fixed complilation errors with test. * Added test for execTransformFloat. * Added test for execTransformSame. * Added test for execTransformBool. * Added test for execTransformStrict. * Added tests for execScalar/execScalarBool with TADs. * Added test for flatten. * - provide cpu/cuda code for space_to_Batch operaion Signed-off-by: Yurii <yurii@skymind.io> * Added test for concat. * comment unnecessary stuff in s_t_b Signed-off-by: Yurii <yurii@skymind.io> * Added test for specialConcat. * Added tests for memcpy/set routines. * Fixed pullRow cuda test. * Added pullRow test. * Added average test. * - correct typo in NDArray::applyPairwiseTransform(nd4j::pairwise::BoolOps op...) Signed-off-by: Yurii <yurii@skymind.io> * - debugging and fixing cuda tests in JavaInteropTests file Signed-off-by: Yurii <yurii@skymind.io> * - correct some tests Signed-off-by: Yurii <yurii@skymind.io> * Added test for shuffle. * Fixed ops declarations. * Restored omp and added shuffle test. * Added convertTypes test. * Added tests for execRandom. Eliminated usage of RandomBuffer with NativeOps. * Added sort tests. * Added tests for execCustomOp. * - further debuging and fixing tests terminated with crash Signed-off-by: Yurii <yurii@skymind.io> * Added tests for calculateOutputShapes. * Addded Benchmarks test. * Commented benchmark tests. * change assertion Signed-off-by: raver119 <raver119@gmail.com> * Added tests for apply_sgd op. Added cpu helper for that op. * Implement cuda helper for aplly_sgd op. Fixed tests for NativeOps. * Added test for assign broadcastable. * Added tests for assign_bp op. * Added tests for axpy op. * - assign/execScalar/execTransformAny signature change - minor test fix Signed-off-by: raver119 <raver119@gmail.com> * Fixed axpy op. * meh Signed-off-by: raver119 <raver119@gmail.com> * - fix tests for nativeOps::concat Signed-off-by: Yurii <yurii@skymind.io> * sequential transform/scalar Signed-off-by: raver119 <raver119@gmail.com> * allow nested parallelism Signed-off-by: raver119 <raver119@gmail.com> * assign_bp leak fix Signed-off-by: raver119 <raver119@gmail.com> * block setRNG fix Signed-off-by: raver119 <raver119@gmail.com> * enable parallelism by default Signed-off-by: raver119 <raver119@gmail.com> * enable nested parallelism by default Signed-off-by: raver119 <raver119@gmail.com> * Added cuda implementation for row_count helper. * Added implementation for tnse gains op helper. * - take into account possible situations when input arrays are empty in reduce_ cuda stuff Signed-off-by: Yurii <yurii@skymind.io> * Implemented tsne/edge_forces op cuda-based helper. Parallelized cpu-based helper for edge_forces. * Added kernel for tsne/symmetrized op heleper. * Implementation of tsne/symmetrized op cuda helper. Working edition. * Eliminated waste printfs. * Added test for broadcastgradientargs op. * host-only fallback for empty reduce float Signed-off-by: raver119 <raver119@gmail.com> * - some tests fixes Signed-off-by: Yurii <yurii@skymind.io> * - correct the rest of reduce_ stuff Signed-off-by: Yurii <yurii@skymind.io> * - further correction of reduce_ stuff Signed-off-by: Yurii <yurii@skymind.io> * Added test for Cbow op. Also added cuda implementation for cbow helpers. * - improve code of stack operation for scalar case Signed-off-by: Yurii <yurii@skymind.io> * - provide cuda kernel for gatherND operation Signed-off-by: Yurii <yurii@skymind.io> * Implementation of cbow helpers with cuda kernels. * minor tests tweaks Signed-off-by: raver119 <raver119@gmail.com> * minor tests tweaks Signed-off-by: raver119 <raver119@gmail.com> * - further correction of cuda stuff Signed-off-by: Yurii <yurii@skymind.io> * Implementatation of cbow op helper with cuda kernels. Working edition. * Skip random testing for cudablas case. * lstmBlockCell context fix Signed-off-by: raver119 <raver119@gmail.com> * Added tests for ELU and ELU_BP ops. * Added tests for eq_scalar, gt_scalar, gte_scalar and lte_scalar ops. * Added tests for neq_scalar. * Added test for noop. * - further work on clipbynorm_bp Signed-off-by: Yurii <yurii@skymind.io> * - get rid of concat op call, use instead direct concat helper call Signed-off-by: Yurii <yurii@skymind.io> * lstmBlockCell context fix Signed-off-by: raver119 <raver119@gmail.com> * Added tests for lrelu and lrelu_bp. * Added tests for selu and selu_bp. * Fixed lrelu derivative helpers. * - some corrections in lstm Signed-off-by: Yurii <yurii@skymind.io> * operator * result shape fix Signed-off-by: raver119 <raver119@gmail.com> * - correct typo in lstmCell Signed-off-by: Yurii <yurii@skymind.io> * few tests fixed Signed-off-by: raver119 <raver119@gmail.com> * CUDA inverse broadcast bool fix Signed-off-by: raver119 <raver119@gmail.com> * disable MMAP test for CUDA Signed-off-by: raver119 <raver119@gmail.com> * BooleanOp syncToDevice Signed-off-by: raver119 <raver119@gmail.com> * meh Signed-off-by: raver119 <raver119@gmail.com> * additional data types for im2col/col2im Signed-off-by: raver119 <raver119@gmail.com> * Added test for firas_sparse op. * one more RandomBuffer test excluded Signed-off-by: raver119 <raver119@gmail.com> * Added tests for flatten op. * Added test for Floor op. * bunch of tests fixed Signed-off-by: raver119 <raver119@gmail.com> * mmulDot tests fixed Signed-off-by: raver119 <raver119@gmail.com> * more tests fixed Signed-off-by: raver119 <raver119@gmail.com> * Implemented floordiv_bp op and tests. * Fixed scalar case with cuda implementation for bds. * - work on cuda kernel for clip_by_norm backprop op is completed Signed-off-by: Yurii <yurii@skymind.io> * Eliminate cbow crach. * more tests fixed Signed-off-by: raver119 <raver119@gmail.com> * more tests fixed Signed-off-by: raver119 <raver119@gmail.com> * Eliminated abortion with batched nlp test. * more tests fixed Signed-off-by: raver119 <raver119@gmail.com> * Fixed shared flag initializing. * disabled bunch of cpu workspaces tests Signed-off-by: raver119 <raver119@gmail.com> * scalar operators fix: missing registerSpecialUse call Signed-off-by: raver119 <raver119@gmail.com> * Fixed logdet for cuda and tests. * - correct clipBynorm_bp Signed-off-by: Yurii <yurii@skymind.io> * Fixed crop_and_resize shape datatype. * - correct some mmul tests Signed-off-by: Yurii <yurii@skymind.io> * build fix Signed-off-by: raver119 <raver119@gmail.com> * exclude two methods for JNI Signed-off-by: raver119 <raver119@gmail.com> * exclude two methods for JNI Signed-off-by: raver119 <raver119@gmail.com> * exclude two methods for JNI (#97) Signed-off-by: raver119 <raver119@gmail.com> * temporary stack fix Signed-off-by: raver119 <raver119@gmail.com> * downgrade jetty to latest stable version Signed-off-by: raver119 <raver119@gmail.com> * test and profiles Signed-off-by: raver119 <raver119@gmail.com> * Servlet skeleton * one test case Signed-off-by: raver119 <raver119@gmail.com> * one test case Signed-off-by: raver119 <raver119@gmail.com> * compilation fix Signed-off-by: raver119 <raver119@gmail.com> * draft improvements Signed-off-by: raver119 <raver119@gmail.com> * draft improvements Signed-off-by: raver119 <raver119@gmail.com> * proof of concept works Signed-off-by: raver119 <raver119@gmail.com> * proof of concept works Signed-off-by: raver119 <raver119@gmail.com> * Servlet Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * logging + simple timing Signed-off-by: raver119 <raver119@gmail.com> * Content type fixed Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * Profile required Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * Servlet tests Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * Post test Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * Tests added: Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * Minor tweaks Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * Constants used Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * Check content type Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * Some tests Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * Errors checking Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * Constraints and tests Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * Minor tweaks Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * Dl4j servlet skeleton Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * Moving class to dl4j Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * Builder extended Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * initial dl4j commit Signed-off-by: raver119 <raver119@gmail.com> * unirest version change Signed-off-by: raver119 <raver119@gmail.com> * temp fallback Signed-off-by: raver119 <raver119@gmail.com> * Reverted unirest version Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * Reverted unirest version Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * revert back unirest version change Signed-off-by: raver119 <raver119@gmail.com> * revert unirest change Signed-off-by: raver119 <raver119@gmail.com> * some additional checks in builder Signed-off-by: raver119 <raver119@gmail.com> * few more fields Signed-off-by: raver119 <raver119@gmail.com> * Test added Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * lombok Signed-off-by: raver119 <raver119@gmail.com> * Tests added Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * deps Signed-off-by: raver119 <raver119@gmail.com> * profiles re-introduced Signed-off-by: raver119 <raver119@gmail.com> * Added tests Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * Model servlet Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * builders Signed-off-by: raver119 <raver119@gmail.com> * builders Signed-off-by: raver119 <raver119@gmail.com> * Servlet skeleton Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * Servlet tests Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * builders Signed-off-by: raver119 <raver119@gmail.com> * get rid of old class Signed-off-by: raver119 <raver119@gmail.com> * use PI for inference Signed-off-by: raver119 <raver119@gmail.com> * superbuilder Signed-off-by: raver119 <raver119@gmail.com> * get back builder Signed-off-by: raver119 <raver119@gmail.com> * Servlet builder Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * PI setup Signed-off-by: raver119 <raver119@gmail.com> * get rid of superbuilder Signed-off-by: raver119 <raver119@gmail.com> * SameDiffServlet inheritance constructor Signed-off-by: raver119 <raver119@gmail.com> * dl4jservlet attached to samediffservlet Signed-off-by: raver119 <raver119@gmail.com> * builder types fix Signed-off-by: raver119 <raver119@gmail.com> * dummy model Signed-off-by: raver119 <raver119@gmail.com> * single out Signed-off-by: raver119 <raver119@gmail.com> * loss Signed-off-by: raver119 <raver119@gmail.com> * Tests added Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * missed builder type Signed-off-by: raver119 <raver119@gmail.com> * working serving example Signed-off-by: raver119 <raver119@gmail.com> * sd model fix Signed-off-by: raver119 <raver119@gmail.com> * fix unirest version Signed-off-by: raver119 <raver119@gmail.com> * More tests Signed-off-by: AlexDBlack <blacka101@gmail.com> * Tests added: Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * Minor tests fixes Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * Tests fixed Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * Build fixed Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * Test added Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * Tests fixed Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * Ser/deser added Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * one more unirest fix Signed-off-by: raver119 <raver119@gmail.com> * Custom serializers Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * Tests disabled Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * revert back unirest version change Signed-off-by: raver119 <raver119@gmail.com> * update Signed-off-by: raver119 <raver119@gmail.com> * some default fields values Signed-off-by: raver119 <raver119@gmail.com> * some comments/javadoc Signed-off-by: raver119 <raver119@gmail.com> * - move serde impls to client module - get rid of INDArray serde for now Signed-off-by: raver119 <raver119@gmail.com> * jackson-based serde for float[], double[] and String Signed-off-by: raver119 <raver119@gmail.com> * more of basic ser/de + tests Signed-off-by: raver119 <raver119@gmail.com> * minor api changes Signed-off-by: raver119 <raver119@gmail.com> * change imports/signatures Signed-off-by: raver119 <raver119@gmail.com> * Optional parralel inference Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * Insert pause between tests as workaround for unavailable port issue Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * few unused imports removed Signed-off-by: raver119 <raver119@gmail.com> * Models usage Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * Models usage Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * - InputAdapter + OutputAdapter = InferenceAdapter - JsonModelServer now allows separate configuration of InputAdapter and OutputAdapter Signed-off-by: raver119 <raver119@gmail.com> * unused import Signed-off-by: raver119 <raver119@gmail.com> * input adapter.. Signed-off-by: raver119 <raver119@gmail.com> * minor signature change Signed-off-by: raver119 <raver119@gmail.com> * few more signatures updated Signed-off-by: raver119 <raver119@gmail.com> * input/output adapter Signed-off-by: raver119 <raver119@gmail.com> * Tests added Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * javadocs added Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * Test fixed Signed-off-by: Alexander Stoyakin <alexander.stoyakin@gmail.com> * minor polishing Signed-off-by: raver119 <raver119@gmail.com> * more of javadoc Signed-off-by: raver119 <raver119@gmail.com> * signature change Signed-off-by: raver119 <raver119@gmail.com> |
||
---|---|---|
.. | ||
.github | ||
ci | ||
contrib | ||
datavec-api | ||
datavec-arrow | ||
datavec-camel | ||
datavec-data | ||
datavec-excel | ||
datavec-geo | ||
datavec-hadoop | ||
datavec-jdbc | ||
datavec-local | ||
datavec-perf | ||
datavec-python | ||
datavec-spark | ||
datavec-spark-inference-parent | ||
.travis.yml | ||
LICENSE | ||
README.md | ||
buildmultiplescalaversions.sh | ||
pom.xml |
README.md
DataVec
DataVec is an Apache 2.0-licensed library for machine-learning ETL (Extract, Transform, Load) operations. DataVec's purpose is to transform raw data into usable vector formats that can be fed to machine learning algorithms. By contributing code to this repository, you agree to make your contribution available under an Apache 2.0 license.
Why Would I Use DataVec?
Data handling is sometimes messy, and we believe it should be distinct from high-performance algebra libraries (such as nd4j or Deeplearning4j).
DataVec allows a practitioner to take raw data and produce open standard compliant vectorized data (svmLight, etc) quickly. Current input data types supported out of the box:
- CSV Data
- Raw Text Data (Tweets, Text Documents, etc)
- Image Data
- LibSVM
- SVMLight
- MatLab (MAT) format
- JSON, XML, YAML, XML
Datavec draws inspiration from a lot of the Hadoop ecosystem tools, and in particular accesses data on disk through the Hadoop API (like Spark does), which means it's compatible with many records.
DataVec also includes sophisticated functionality for feature engineering, data cleaning and data normalization both for static data and for sequences (time series). Such operations can be executed on Apache Spark using DataVec-Spark.
Datavec's architecture : API, transforms and filters, and schema management
Apart from obviously providing readers for classic data formats, DataVec also provides an interface. So if you wanted to ingest specific custom data, you wouldn't have to build the whole pipeline. You would just have to write the very first step. For example, if you describe through the API how your data fits into a common format that complies with the interface, DataVec would return a list of Writables for each record. You'll find more detail on the API in the corresponding module.
Another thing you can do with DataVec is data cleaning. Instead of having clean, ready-to-go data, let's say you start with data in different forms or from different sources. You might need to do sampling, filtering, or several incredibly messy ETL tasks needed to prepare data in the real world. DataVec offers filters and transformations that help with curating, preparing and massaging your data. It leverages Apache Spark to do this at scale.
Finally, DataVec tracks a schema for your columnar data, across all transformations. This schema is actively checked
through probing, and DataVec will raise exceptions if your data does not match the schema. You can specify filters as
well: you can attach a regular expression to an input column of type String
, for example, and DataVec will only keep
data that matches this filter.
On Distribution
Distributed treatment through Apache Spark is optional, including running Spark in local-mode (where your cluster is emulated with multi-threading) when necessary. Datavec aims to abstract away from the actual execution, and create at compile time, a logical set of operations to execute. While we have some code that uses Spark, we do not want to be locked into a single tool, and using Apache Flink or Beam are possibilities - projects on which we would welcome collaboration.
Examples
Examples for using DataVec are available here: https://github.com/deeplearning4j/dl4j-examples
Contribute
Where to contribute?
We have a lot in the pipeline, and we'd love to receive contributions. We want to support representing data as more than a collection of simple types ("writables"), and rather as binary data — that will help with GC pressure across our pipelines and fit better with media-based use cases, where columnar data is not essential. We also expect it will streamline a lot of the specialized operations we now do on primitive types.
That being said, an area that could use a first contribution is the implementations of the RecordReader
interface, since this is relatively self-contained. Of note, to support most of the distributed file formats of the
Hadoop ecosystem, we use Apache Camel. Camel supports
a pluggable DataFormat to allow messages to be marshalled to and from
binary or text formats to support a kind of Message Translator.
Another area that is relatively self-contained is transformations, where you might find a filter or data munging operation that has not been implemented yet, and provide it in a self-contained way.
Which maintainers to contact?
It's useful to know which maintainers to contact to get information on a particular part of the code, including reviewing your pull requests, or asking questions on our gitter channel. For this you can use the following, indicative mapping:
RecordReader
implementations: @saudet and @agibsonccc- Transformations and their API: @agibsonccc and @AlexDBlack
- Spark and distributed processing: @AlexDBlack, @agibsonccc and @huitseeker
- Native formats, geodata: @saudet
How to contribute
-
Check for open issues, or open a new issue to start a discussion around a feature idea or a bug.
-
If you feel uncomfortable or uncertain about an issue or your changes, feel free to contact us on Gitter using the link above.
-
Fork the repository on GitHub to start making your changes.
-
Write a test, which shows that the bug was fixed or that the feature works as expected.
-
Note the repository follows the Google Java style with two modifications: 120-char column wrap and 4-spaces indentation. You can format your code to this format by typing
mvn formatter:format
in the subproject you work on, by using thecontrib/formatter.xml
at the root of the repository to configure the Eclipse formatter, or by using the INtellij plugin. -
Send a pull request, and bug us on Gitter until it gets merged and published.
Eclipse Setup
- Downloading the latest JAR from https://projectlombok.org/download
- Double-click the JAR file to install the plugin for Eclipse
- Clone Datavec to your system
- Import the project as a Maven project
- You will also need clone and build ND4J and libnd4j