32 lines
1.3 KiB
Markdown
32 lines
1.3 KiB
Markdown
|
---
|
||
|
title: DataVec Readers
|
||
|
short_title: Readers
|
||
|
description: Read individual records from different formats.
|
||
|
category: DataVec
|
||
|
weight: 2
|
||
|
---
|
||
|
|
||
|
## Why readers?
|
||
|
|
||
|
Readers iterate records from a dataset in storage and load the data into DataVec. The usefulness of readers beyond individual entries in a dataset includes: what if you wanted to train a text generator on a corpus? Or programmatically compose two entries together to form a new record? Reader implementations are useful for complex file types or distributed storage mechanisms.
|
||
|
|
||
|
Readers return `Writable` classes that describe each column in a `Record`. These classes are used to convert each record to a tensor/ND-Array format.
|
||
|
|
||
|
## Usage
|
||
|
|
||
|
Each reader implementation extends `BaseRecordReader` and provides a simple API for selecting the next record in a dataset, acting similarly to iterators.
|
||
|
|
||
|
Useful methods include:
|
||
|
|
||
|
- `next`: Return a batch of `Writable`.
|
||
|
- `nextRecord`: Return a single `Record`, optionally with `RecordMetaData`.
|
||
|
- `reset`: Reset the underlying iterator.
|
||
|
- `hasNext`: Iterator method to determine if another record is available.
|
||
|
|
||
|
## Listeners
|
||
|
|
||
|
You can hook a custom `RecordListener` to a record reader for debugging or visualization purposes. Pass your custom listener to the `addListener` base method immediately after initializing your class.
|
||
|
|
||
|
## Types of readers
|
||
|
|
||
|
{{autogenerated}}
|