cavis/docs/datavec/templates/executors.md

43 lines
1.6 KiB
Markdown
Raw Normal View History

2019-06-06 14:21:15 +02:00
---
title: DataVec Executors
short_title: Executors
description: Execute ETL and vectorization in a local instance.
category: DataVec
weight: 3
---
## Local or remote execution?
Because datasets are commonly large by nature, you can decide on an execution mechanism that best suits your needs. For example, if you are vectorizing a large training dataset, you can process it in a distributed Spark cluster. However, if you need to do real-time inference, DataVec also provides a local executor that doesn't require any additional setup.
## Executing a transform process
Once you've created your `TransformProcess` using your `Schema`, and you've either loaded your dataset into a Apache Spark `JavaRDD` or have a `RecordReader` that load your dataset, you can execute a transform.
Locally this looks like:
```java
import org.datavec.local.transforms.LocalTransformExecutor;
List<List<Writable>> transformed = LocalTransformExecutor.execute(recordReader, transformProcess)
List<List<List<Writable>>> transformedSeq = LocalTransformExecutor.executeToSequence(sequenceReader, transformProcess)
List<List<Writable>> joined = LocalTransformExecutor.executeJoin(join, leftReader, rightReader)
```
When using Spark this looks like:
```java
import org.datavec.spark.transforms.SparkTransformExecutor;
JavaRDD<List<Writable>> transformed = SparkTransformExecutor.execute(inputRdd, transformProcess)
JavaRDD<List<List<Writable>>> transformedSeq = SparkTransformExecutor.executeToSequence(inputSequenceRdd, transformProcess)
JavaRDD<List<Writable>> joined = SparkTransformExecutor.executeJoin(join, leftRdd, rightRdd)
```
## Available executors
{{autogenerated}}