cavis/docs/datavec/templates/transforms.md

---
title: DataVec Transforms
short_title: Transforms
description: Data wrangling and mapping from one schema to another.
category: DataVec
weight: 1
---

## Data wrangling

One of the key tools in DataVec is transformations. DataVec helps the user map a dataset from one schema to another, and provides a list of operations to convert types, format data, and convert a 2D dataset to sequence data.

## Building a transform process

A transform process requires a `Schema` to successfully transform data. Both schema and transform process classes come with a helper `Builder` class which are useful for organizing code and avoiding complex constructors.

When both are combined together they look like the sample code below. Note how `inputDataSchema` is passed into the `Builder` constructor. Your transform process will fail to compile without it.

```java
import org.datavec.api.transform.TransformProcess;

TransformProcess tp = new TransformProcess.Builder(inputDataSchema)
    .removeColumns("CustomerID","MerchantID")
    .filter(new ConditionFilter(new CategoricalColumnCondition("MerchantCountryCode", ConditionOp.NotInSet, new HashSet<>(Arrays.asList("USA","CAN")))))
    .conditionalReplaceValueTransform(
        "TransactionAmountUSD",     //Column to operate on
        new DoubleWritable(0.0),    //New value to use, when the condition is satisfied
        new DoubleColumnCondition("TransactionAmountUSD",ConditionOp.LessThan, 0.0)) //Condition: amount < 0.0
    .stringToTimeTransform("DateTimeString","YYYY-MM-DD HH:mm:ss.SSS", DateTimeZone.UTC)
    .renameColumn("DateTimeString", "DateTime")
    .transform(new DeriveColumnsFromTimeTransform.Builder("DateTime").addIntegerDerivedColumn("HourOfDay", DateTimeFieldType.hourOfDay()).build())
    .removeColumns("DateTime")
    .build();
```

## Executing a transformation

Different "backends" for executors are available. Using the `tp` transform process above, here's how you can execute it locally using plain DataVec.

```java
import org.datavec.local.transforms.LocalTransformExecutor;

List<List<Writable>> processedData = LocalTransformExecutor.execute(originalData, tp);
```

## Debugging

Each operation in a transform process represents a "step" in schema changes. Sometimes, the resulting transformation is not the intended result. You can debug this by printing each step in the transform `tp` with the following:

```java
//Now, print the schema after each time step:
int numActions = tp.getActionList().size();

for(int i=0; i<numActions; i++ ){
    System.out.println("\n\n==================================================");
    System.out.println("-- Schema after step " + i + " (" + tp.getActionList().get(i) + ") --");

    System.out.println(tp.getSchemaAfterStep(i));
}
```

## Available transformations and conversions

{{autogenerated}}
Eclipse Migration Initial Commit 2019-06-06 14:21:15 +02:00			`---`
			`title: DataVec Transforms`
			`short_title: Transforms`
			`description: Data wrangling and mapping from one schema to another.`
			`category: DataVec`
			`weight: 1`
			`---`

			`## Data wrangling`

			`One of the key tools in DataVec is transformations. DataVec helps the user map a dataset from one schema to another, and provides a list of operations to convert types, format data, and convert a 2D dataset to sequence data.`

			`## Building a transform process`

			A transform process requires a `Schema` to successfully transform data. Both schema and transform process classes come with a helper `Builder` class which are useful for organizing code and avoiding complex constructors.

			When both are combined together they look like the sample code below. Note how `inputDataSchema` is passed into the `Builder` constructor. Your transform process will fail to compile without it.

			```java
			`import org.datavec.api.transform.TransformProcess;`

			`TransformProcess tp = new TransformProcess.Builder(inputDataSchema)`
			`.removeColumns("CustomerID","MerchantID")`
			`.filter(new ConditionFilter(new CategoricalColumnCondition("MerchantCountryCode", ConditionOp.NotInSet, new HashSet<>(Arrays.asList("USA","CAN")))))`
			`.conditionalReplaceValueTransform(`
			`"TransactionAmountUSD", //Column to operate on`
			`new DoubleWritable(0.0), //New value to use, when the condition is satisfied`
			`new DoubleColumnCondition("TransactionAmountUSD",ConditionOp.LessThan, 0.0)) //Condition: amount < 0.0`
			`.stringToTimeTransform("DateTimeString","YYYY-MM-DD HH:mm:ss.SSS", DateTimeZone.UTC)`
			`.renameColumn("DateTimeString", "DateTime")`
			`.transform(new DeriveColumnsFromTimeTransform.Builder("DateTime").addIntegerDerivedColumn("HourOfDay", DateTimeFieldType.hourOfDay()).build())`
			`.removeColumns("DateTime")`
			`.build();`
			```

			`## Executing a transformation`

			Different "backends" for executors are available. Using the `tp` transform process above, here's how you can execute it locally using plain DataVec.

			```java
			`import org.datavec.local.transforms.LocalTransformExecutor;`

			`List<List<Writable>> processedData = LocalTransformExecutor.execute(originalData, tp);`
			```

			`## Debugging`

			Each operation in a transform process represents a "step" in schema changes. Sometimes, the resulting transformation is not the intended result. You can debug this by printing each step in the transform `tp` with the following:

			```java
			`//Now, print the schema after each time step:`
			`int numActions = tp.getActionList().size();`

			`for(int i=0; i<numActions; i++ ){`
			`System.out.println("\n\n==================================================");`
			`System.out.println("-- Schema after step " + i + " (" + tp.getActionList().get(i) + ") --");`

			`System.out.println(tp.getSchemaAfterStep(i));`
			`}`
			```

			`## Available transformations and conversions`

			`{{autogenerated}}`