Skip to content

Runtime Operator Menagerie

Paul Rogers edited this page Sep 21, 2016 · 4 revisions

Runtime Operator Menagerie

Drill provides a large collection of runtime operators used to implement a query. Below is a list of (a subset of) the operators along with their major attributes.

Note: A big thanks to Aman Sinha: some of the material below is borrowed from one of his documents.

Most class names below are from the org.apache.drill.exec.physical.impl package. Operators derive from RecordBatch, often via an abstract base class.

HashToRandomExchange

Gets an input row, computes a hash value on the distribution key, determines the destination receiver based on the hash value, and sends the row in a batch operation. The join key or aggregation group-by keys are examples of distribution keys. The destination receiver is a minor fragment on a destination node.

HashToMergeExchange

Similar to the HashToRandomExchange operator, except that each destination receiver merges incoming streams of sorted data received from a sender.

UnionExchange

A serialization operator in which each sender sends to a single (common) destination. The receiver “unions” the input streams from various senders.

SingleMergeExchange

A distribution operator in which each sender sends a sorted stream of data to a single receiver. The receiver performs a Merge operation to merge all of the incoming streams. This operator is useful when performing an ORDER BY operation that requires a final global ordering.

BroadcastExchange

A distribution operation in which each sender sends its input data to all N receivers via a broadcast.

UnorderedMuxExchange

Multiplexes the data from all minor fragments on a node so the data can be sent out on a single channel to a destination receiver. A sender node only needs to maintain buffers for each receiving node instead of each receiving minor fragment on every node.

Sort (non-spillable)

Reads the entire data set from its input batch (upstream operator), buffers the data in memory, applies the desired sort, and returns the resulting data (divided into batches) to its downstream operator.

Properties:

  • Runtime class: SortBatch
  • Memory use: Unbounded, depends on the size of the upstream result set.
  • Schema change: not supported. A schema change from the upstream batch causes a runtime exception.
  • Usage: Never actually used; this is a historical artifact.

Sort (Spillable)

Sorts data, spilling to disk as needed. Reads the entire data set from its input batch (upstream operator), buffers the data in memory or on disk, applies the desired sort, and returns the resulting data (divided into batches) to its downstream operator.

Properties:

  • Runtime class: ExternalSortBatch
  • Spill location: Default: /tmp

Configuration Properties:

  • drill.exec.sort.external.spill.fs - The file system to use for spilling. Local file system by default.
  • drill.exec.sort.external.spill.directories - List of directories to use for spilling. [/tmp] by default.
  • drill.exec.sort.external.spill.group.size
  • drill.exec.sort.external.spill.threshold

References:

Clone this wiki locally