Skip to content
James Turton edited this page Sep 2, 2022 · 19 revisions

Drill 2.0 Proposal

This page serves to document a proposal for Drill 2.0. At the time of writing, we are currently gearing up to release Drill 1.20.0 which means that we have released 19 versions of Drill since it was deemed stable enough to warrant a 1.0 label. Since the second phase of this project's life began there have been some things which have been discussed which are breaking changes. This page serves to document proposed breaking changes that could be included in a Drill 2.0.

Please feel free to add your ideas in the knowledge that a subset will be ultimately be selected by the dev team as the basis for a 2.0 release. Changes recorded here need not neccessarily be user breaking. Anything that is a significant change from how Drill 1.x works is welcome.

APIs and connectors

  • Replace the public API based on Netty with a simpler row-based one. (One of my earliest projects was to create "Jig", a row-based API for Drill. The vector stuff actually grew out of that.) The Netty API is a nightmare to use in anything other than Java, which forces people to use the REST interface, which doesn't scale or handle sessions.

Config system

  • Add a shared component that applies configuration priorities (... session opt > storage/format config opt > system opt ...) and make all plugins use this component for reading options.

Cluster management and RPC

  • Replace the home-grown RPC with GRPC or something more modern and less complex.

Query planner

  • Rebase on current Calcite and review our customisations.

Project structure, packaging and distribution

  • Split Drill's monorepo into multiple parts. The current repo would be the core while the contrib stuff could move to its own repo(s) under the Drill project.
  • Ensure we have a good way to build and install plugins separate from the Drill code (early work was done, some Jira tickets exist to explain how its done in other tools).
  • Explain how to create a plugin in a users own repo, built against Drill.

Additional context for the three items above can be found in the conversation in #2359.

  • Split Drill installation packages into "core" and "extra".
  • Install plugins and UDFs from an online marketplace, a la the Eclipse marketplace.

Data types

  • Fix the cursed TIMESTAMP type. (DRILL-8101).
  • Continue to complete the UNION type keeping it experimental for now.

SQL functions

  • Unify nearestdate and date_trunc

Storage and format plugins, reader framework

  • Add support for different Drill users to use their own credentials in plugins.

  • Decide what to do about schema. The metastore has been added, which is a good step. How should it be used? When can we stop trying to do miracles in code to resolve ambiguous situations and simply suggest the user provide a schema?

  • Upgrade all pre-EVF and EVF v1 plugins to EVF v2.

  • Scrap the columns[] array wherever it occurs (only TextReader?) in favour of distinct, numbered fields column1, column2, column3, ...

  • Increase consistency of available options across plugins

    • Column name and type information only allowed in provided schema not format config.
    • Standard pushdown enable/disable switches for storage plugins.
  • Use leading underscore for all implicit fields. Some plugins and connectors do this, but some don't. Particularly the file plugin in core Drill. This could lead to strange results if a file has a column called file.

  • In INFORMATION_SCHEMA, replace hard coded "DRILL" catalog with storage plugin names?

  • Options that let users control how Drill treats invalid data e.g. invalid_data_handling_mode = abort, skip, null, ...

  • Almost every Data Source has some internal properties. We are placing it in the HashMap props for Phoenix, HashMap configProps for Hive, Map properties field for Iceberg, Properties kafkaConsumerProps for Kafka etc...We can unify them and place in the StoragePluginConfig, we can use common type Properties (it has richer interface than HashMap and designed for these type of data) and use the common field name (breaking change for all storage plugin configs).

  • Rip out the old patches, old versions of readers and writers and other such cruft.

Vector layer

  • Replace ValueVectors
  • Do something about ObjectHolder.
  • Employ new Java SIMD instrinsics?

Web UI

  • Replace it with something modern.

General

  • Remove deprecated code and config options.
  • Update and deploy the MapR test framework so it can test Drill 2.
  • Unbundle MapR code?
  • Upgrade JUnit to v5
Clone this wiki locally