cavis/docs/datavec/templates/readers.md

1.3 KiB

title short_title description category weight
DataVec Readers Readers Read individual records from different formats. DataVec 2

Why readers?

Readers iterate records from a dataset in storage and load the data into DataVec. The usefulness of readers beyond individual entries in a dataset includes: what if you wanted to train a text generator on a corpus? Or programmatically compose two entries together to form a new record? Reader implementations are useful for complex file types or distributed storage mechanisms.

Readers return Writable classes that describe each column in a Record. These classes are used to convert each record to a tensor/ND-Array format.

Usage

Each reader implementation extends BaseRecordReader and provides a simple API for selecting the next record in a dataset, acting similarly to iterators.

Useful methods include:

  • next: Return a batch of Writable.
  • nextRecord: Return a single Record, optionally with RecordMetaData.
  • reset: Reset the underlying iterator.
  • hasNext: Iterator method to determine if another record is available.

Listeners

You can hook a custom RecordListener to a record reader for debugging or visualization purposes. Pass your custom listener to the addListener base method immediately after initializing your class.

Types of readers

{{autogenerated}}