Skip to content
Paul Rogers edited this page Aug 4, 2016 · 1 revision

Operator Creation

Operators are transform nodes within the execution graph that perform a single operation on an incoming tuple set to produce an outgoing tuple set. Operators are defined using an operator definition materialized in Java from a JSON description of the operator. The following discusses the process within a fragment of creating an operator from an operator definition.

  • The operator definition is deserialized from a JSON payload sent from (the foreman?).
  • The class of the operator definition is a key into a map of definition classes to operator creator classes.
  • The operator creator takes a pair of (definition, incoming operator) and produces the operator class.
  • The opeator class is created and initialized.
  • Upon receipt of the first record schema, the operator generates the code that performs the actual operator functionality.

Operator Definition

The operator definition is a POJO deserilzed from JSON. There is no "operator definition" base class.

Operator Factory Registry

Operators are not created directly. Instead, each operator has an associate factory class (termed a Creator in Drill.) For example, the FilterRecordBatch operator has a corresponding FilterBatchCreator. Drill must somehow find the correct factory for each operator. Drill starts with a definition. It falls to the OperatorCreatorRegistry to map from definition to factory class.

The registry is intitialized using a class scan created elsewhere. The scan enumerates all class visible to Drill (that is, all classes in all jars on the Drill class path.) The registry looks for all classes which extend BatchCreator. To identify the associated definition, the registry uses Java introspection to look at the type of the second argument to the getBatch method of the factory class. For example:

public class FilterBatchCreator implements BatchCreator<Filter>{
  public FilterRecordBatch getBatch(FragmentContext context, Filter config, List<RecordBatch> children)

The Filter config argument above gives the definition class type.

Operator Instance

Recall that a "record batch" acts both as the runtime instance for an operator, and the tuple set produced by that operator. As an operator, the record batch must be given the incoming (upstream, child) operator that produces its input. As a tuple set, the operator requires a schema. The operator determines the schema when it receives the schema from the incoming operator.

Generated Operator Implementation

The operator also needs an implementation of the actual operation. Recall that value vectors are strongly typed, leading to over 100 different classes. The operator uses code generation to implement the actual operation. (Code generation avoids the overhead of interpreting the operation.) The generated code is based on a complex system (described elsewhere). The code implements an interface specific to the opeator. The operator then delegates to the generated code for each tuple set.

Clone this wiki locally