Skip to content
Paul Rogers edited this page Aug 22, 2016 · 1 revision

Record Readers

Record readers read data from files in DFS, converting the data into a series of value vectors. A reader is associated with a FormatPlugin and defined by a FormatPluginConfig. Each format plugin is associated with a StoragePlugin which provides access to the file system which stores the data read by the storage plugin. Each format plugin can also define a RecordWriter to support CTAS operations.

Each record reader instance is associated with a single file (or portion of a file) in the file system defined by the storage plugin.

Configuation and Creation

Data Flow

The Reader API

The actual RecordReader API is quite simple:

void setup(OperatorContext context, OutputMutator output) throws ExecutionSetupException;
void allocate(Map<String, ValueVector> vectorMap) throws OutOfMemoryException;
int next();

The setup() method ...

The allocate() method ...

The next() method reads a fixed number of records into a previously-allocated record batch (set of value vectors.) Each call to next() returns a new schema, uses the existing schema, or signals EOF (by returning 0). Note that each schema change must occur at record batch boundaries.

Schema Handling

Value Vector Creation

Clone this wiki locally