cavis/docs/datavec/templates/transforms.md

64 lines
2.8 KiB
Markdown

---
title: DataVec Transforms
short_title: Transforms
description: Data wrangling and mapping from one schema to another.
category: DataVec
weight: 1
---
## Data wrangling
One of the key tools in DataVec is transformations. DataVec helps the user map a dataset from one schema to another, and provides a list of operations to convert types, format data, and convert a 2D dataset to sequence data.
## Building a transform process
A transform process requires a `Schema` to successfully transform data. Both schema and transform process classes come with a helper `Builder` class which are useful for organizing code and avoiding complex constructors.
When both are combined together they look like the sample code below. Note how `inputDataSchema` is passed into the `Builder` constructor. Your transform process will fail to compile without it.
```java
import org.datavec.api.transform.TransformProcess;
TransformProcess tp = new TransformProcess.Builder(inputDataSchema)
.removeColumns("CustomerID","MerchantID")
.filter(new ConditionFilter(new CategoricalColumnCondition("MerchantCountryCode", ConditionOp.NotInSet, new HashSet<>(Arrays.asList("USA","CAN")))))
.conditionalReplaceValueTransform(
"TransactionAmountUSD", //Column to operate on
new DoubleWritable(0.0), //New value to use, when the condition is satisfied
new DoubleColumnCondition("TransactionAmountUSD",ConditionOp.LessThan, 0.0)) //Condition: amount < 0.0
.stringToTimeTransform("DateTimeString","YYYY-MM-DD HH:mm:ss.SSS", DateTimeZone.UTC)
.renameColumn("DateTimeString", "DateTime")
.transform(new DeriveColumnsFromTimeTransform.Builder("DateTime").addIntegerDerivedColumn("HourOfDay", DateTimeFieldType.hourOfDay()).build())
.removeColumns("DateTime")
.build();
```
## Executing a transformation
Different "backends" for executors are available. Using the `tp` transform process above, here's how you can execute it locally using plain DataVec.
```java
import org.datavec.local.transforms.LocalTransformExecutor;
List<List<Writable>> processedData = LocalTransformExecutor.execute(originalData, tp);
```
## Debugging
Each operation in a transform process represents a "step" in schema changes. Sometimes, the resulting transformation is not the intended result. You can debug this by printing each step in the transform `tp` with the following:
```java
//Now, print the schema after each time step:
int numActions = tp.getActionList().size();
for(int i=0; i<numActions; i++ ){
System.out.println("\n\n==================================================");
System.out.println("-- Schema after step " + i + " (" + tp.getActionList().get(i) + ") --");
System.out.println(tp.getSchemaAfterStep(i));
}
```
## Available transformations and conversions
{{autogenerated}}