Skip to content

Drill Executor Memory Management

Paul Rogers edited this page Aug 19, 2016 · 7 revisions

Drill Executor Memory Management

Drill uses two kinds of memory: the usual Java heap memory for Java-created objects, and an elaborate memory system built on direct memory for execution-time buffers. This article focuses on the second.

Memory Management Goals

Memory management is easier to understand if we start with the goals:

  • Use direct ("unsafe") memory allocated from by the OS
  • Use buffers compatible with Netty: the network library which Drill uses
  • Build value vectors from the Netty-compatible buffers
  • Support value vector operations that implement relational operators: select, project, broadcast, merge, etc.
  • Minimize data copies by transferring ownership of memory chunks.

The Drill memory allocator starts with Netty and layers functionality on top needed to extend Netty functionality to work for Drill value vectors.

Netty and ByteBuf

Drill is a distributed system and thus make heavy use of network transport. Tuple sets are often transfered from one Drillbit to another. Drill's designers sought to use the same data format for both internal value vectors and network transport. Netty handles network transport using an asynchronous pipeline model that performs I/O using byte buffers. Netty is built around Java NIO. NIO defines the ByteBuffer class that serves the dual purpose of holding a buffer of bytes for network I/O, and provides a rich set of operations to access the data encoded in the bytes.

Netty provides a similar class called ByteBuf. A key difference between ByteBuffer and ByteBuf is that the Netty version is reference counted, while the Java version relies on the usual Java garbage collection.

The life-cycle of a buffer depends on the "direction" of the buffer in Netty: input or output.

Netty creates input buffers to hold data read from the wire:

  • The application provides a ByteBufAllocator implementation (or uses the Netty default) to allocate buffers.
  • Netty's channel class calls ByteBufAllocator.ioBuffer() (via RecvByteBufAllocator.Handle) to allocate an input buffer.
  • Netty places received bytes into the buffer.
  • Netty places the buffer into the input pipeline.
  • Netty calls the channelRead() method on the application's ChannelInboundHandlerAdapter handler subclass.
  • The application processes the data.
  • channelRead() (or elsewhere in the application) calls ByteBuf.release().
  • The ByteBuf implementation releases memory if the reference count drops to zero.

The application creates output buffers and hands them to Netty, which writes the data and discards the buffer:

  • Application creates the buffer.
  • Application calls ChannelHandlerContext.write() to add the buffer to the Netty pipeline.
  • Netty's channel class implementation writes the buffer to the output channel and calls ByteBuf.release().
  • The ByteBuf implementation releases memory if the reference count drops to zero.

The details of memory allocation and release are abstacted out into the implementaion of the ByteBuf quasi-interface.

Drill's use of Netty's Memory Manager

For Drill to manage memory, it must present the memory to Netty as a ByteBuf. ByteBuf is a completely abstract class (which apparently could have been an interface), which allows Drill to provide its own implementation.

Unfortunately, the structure of ByteBuf makes Drill's implemnentation quite complex. ByteBuf performs multiple tasks.

  • An interface to a buffer of bytes.
  • Implements reference-counted memory management.
  • Provides a large number of accessors to the buffer of bytes.
  • Provides methods to read from, and write to Netty channels.

The result is that the implementation of these various operations becomes rather complex.

Direct Memory and Java "Unsafe"

The Java heap is not well suited to handling the large number of large buffers required by a tool like Drill. Directly using system memory ("direct memory") is a better answer. Java (grudgingly) provides a mechanism called "unsafe", originally for use only by the implementation of the Java standard library, but since informally documented elsewhere.

Unsafe provides direct allocation and freeing of memory blocks, similar to the C malloc function. The use of unsafe introduces all the system instability opportunities that the JVM tries so hard to hide. For example, programs must be sure to write to the correct memory locations, must explicitly manage allocating and freeing memory, and must be very careful not to write to a memory block after it is freed. Using unsafe wrong will crash the JVM, not with a Java stack trace, but instead with a Linux core dump.

Java has no direct way to read and write pointer locations, however. Instead, unsafe provides methods to read and write bytes at a location given an address (expressed as a long):

public static byte getByte(long addr);
public static void putByte(long addr, byte value);  

So, all direct memory accesses must ultimately resolve down to the above two calls. (Unsafe also provides a complete set of get/put methods for all scalar types.)

DrillBuf

Combining the above two concepts (implement a Netty ByteBuf backed by a Java unsafe buffer) gives us the DrillBuf class and its backing UnsafeDirectLittleEndian buffer management class, described below.

DrillBuf implements the (very wide) ByteBuf "interface", including:

  • Reference counting and associated memory management machinery
  • Storage provided by UnsafeDirectLittleEndian
  • Checked access to the bytes in the buffer (to prevent invalid memory accesses)
  • Buffer-level operations (see below)
  • Implementations of all of ByteBuf's access methods

DrillBuf might have been simple if it could implement only the methods that Drill needs and those that make sense for a direct memory backing buffer. Unfortunately, ByteBuf also provides a large number of "secondary" methods, and those (such as convert data to an array of bytes) which are not a good fit for the Drill storage model. These mismatches further complicate the DrillBuf code.

UnsafeDirectLittleEndian

Although the ByteBuf class defines an facade for a block of memory, Drill adds an additional layer of abstraction. The DrillBuf class provides reference counting and access methods, but actual storage is encapsulated in the UnsafeDirectLittleEndian class (often abbreviated "udle".)

A udle describes a block of memory and so needs two basic parameters: buffer address and length. It is called "little endian" to refer to the Intel "little endian" byte order. Because byte order is baked into Drill, Drill can run only on little-endian processors (which, fortunately, account for all popular processors.)

While the udle is conceptually simple, implementation is complex for a number of reasons:

  • Udle is, itself, derived from ByteBuf and thus caries the full ByteBuf functionality.
  • The udle provides the complete set of get/put methods for all scalar data types.
  • Udles can either directly represent a block of memory, or can represent a "slice" of another udle that acts as the backing store. The slice acts as a "view" into the underlying buffer.
  • Reference counting to account for the slices.

In summary, DrillBuf is a reference counted implementation of a ByteBuf backed by a UnsafeDirectLittleEndian which is itself a reference counted implementation of a ByteBuf that can, in turn be backed by yet another UnsafeDirectLittleEndian.

Drill Memory Allocator, Part 1

The Drill memory allocator is defined by the BufferAllocator interface as a mechanism to allocate DrillBuf instances (which inherit from ByteBuf, making them usable by Netty.)

The Drill memory allocator is actually a hierarchy of allocators, as explained below. For now, we will ignore that aspect of the allocator.

Netty Configuration

The BasicServer class in Drill configures Netty, including setting Drill's own memory allocator:

public BasicServer(final RpcConfig rpcMapping, ByteBufAllocator alloc, EventLoopGroup eventLoopGroup) {
  ...
  b = new ServerBootstrap()
      ...
      .childOption(ChannelOption.ALLOCATOR, alloc)

Where the allocator is obtained from BufferAllocator.getAsByteBufAllocator() as an instance of DrillByteBufAllocator.

This step ensures that Netty will use Drill's own memory allocator when Netty creates buffers for reading.

Drill Memory Allocator, Part 2

With the basic memory allocation plumbing under out belts, we can how dive into additional details.

The Java memory allocator assumes requests will look much like malloc or Java heap allocations: many requests of varying sizes of up to ~64K each. System memory allocations, hwoever, are best done in large chunks, in the range of megabytes. Further, once an application request is freed, it cannot go back to the system; it must go into a Drill-managed pool for reuse.

Therefore, Drill uses a two-tier memory allocator which we can call the wholesale/retail model. Drill requests memory from the system in "wholesale" chunks of 8 MB. The allocator distributes memory to the application in "retail" quantities of any (typically small) size. Further, the memory allocator maintains a free memory pool to recycle and reuse freed blocks.

The Drill runtime is an engine to run query fragments, each of which (conceptually) owns its own memory allocator. The implementation of this idea results in a tree of memory allocators. Each Drillbit has a single RootAllocator that manages the entire collection of direct memory. The root allocator then defers to a set of ChildAllocators, each associated with a fragement executor ("minor fragment"). Fragment-level allocators further delegate to operator-level ChildAllocators.

In theory, this provides a detailed memory management scheme. In practice, the scheme is used only to account for memory use, but not for budgeting memory across fragments.

Memory Accounting

BaseAllocator extends Accountant which tracks amount of memory reserved and allocated at each allocator level. The Accountant does not track actual memory chunks, only aggregated bytes. The accountant can request more memory from its parent, when needed, and release that memory back to the parent when a buffer is released. Each Accountant tracks an allocation limit, locally held memory and the peak allocation.

An allocator can make a Reservation: a contract to supply a given number of bytes at a later time. Later, the holder of the reservation can cash in the reservation to receive a DrillBuf of the size of the reservation.

Memory Splits and Shares

Drill strives to eliminate memory copies. Instead of a copy, the destination gets a reference to the actual data. For example, a value vector may start as a single allocation, but may be sliced by various operators with parts of the vector shared by multiple users.

The AllocationManager tracks allocations against a single block of memory represented by a UnsafeDirectLittleEndian. This block represents (what? a vector, a 8 MB wholesale allocation.)

Vector Operations

References

Clone this wiki locally