From 81592dba2b03a0b8c920cd67de36067cf46f185c Mon Sep 17 00:00:00 2001 From: comphead Date: Mon, 18 Sep 2023 15:54:03 -0700 Subject: [PATCH 1/4] Minor: add more example for Create Table doc --- docs/source/user-guide/cli.md | 26 ++++++++++++++++++++------ 1 file changed, 20 insertions(+), 6 deletions(-) diff --git a/docs/source/user-guide/cli.md b/docs/source/user-guide/cli.md index e3a8cd74c33b..3d834f63331e 100644 --- a/docs/source/user-guide/cli.md +++ b/docs/source/user-guide/cli.md @@ -23,6 +23,11 @@ The DataFusion CLI is a command-line interactive SQL utility for executing queries against any supported data files. It is a convenient way to try DataFusion's SQL support with your own data. +## Selecting files directly + +Files can be queried directly by enclosing the file or +directory name in single `'` quotes as shown in the example. + ## Example Create a CSV file to query. @@ -131,17 +136,14 @@ OPTIONS: -V, --version Print version information ``` -## Selecting files directly - -Files can be queried directly by enclosing the file or -directory name in single `'` quotes as shown in the example. +## Creating external tables It is also possible to create a table backed by files by explicitly -via `CREATE EXTERNAL TABLE` as shown below. +via `CREATE EXTERNAL TABLE` as shown below. Filemask wildcards supported ## Registering Parquet Data Sources -Parquet data sources can be registered by executing a `CREATE EXTERNAL TABLE` SQL statement. It is not necessary to provide schema information for Parquet files. +Parquet data sources can be registered by executing a `CREATE EXTERNAL TABLE` SQL statement. The schema information will be derived automatically. ```sql CREATE EXTERNAL TABLE taxi @@ -149,6 +151,18 @@ STORED AS PARQUET LOCATION '/mnt/nyctaxi/tripdata.parquet'; ``` +```sql +CREATE EXTERNAL TABLE taxi +STORED AS PARQUET +LOCATION '/mnt/nyctaxi/'; +``` + +```sql +CREATE EXTERNAL TABLE taxi +STORED AS PARQUET +LOCATION '/mnt/nyctaxi/*.parquet'; +``` + ## Registering CSV Data Sources CSV data sources can be registered by executing a `CREATE EXTERNAL TABLE` SQL statement. From 15d0e44c6235f62e8ab411828bf17d07f7be4d72 Mon Sep 17 00:00:00 2001 From: comphead Date: Mon, 18 Sep 2023 16:16:46 -0700 Subject: [PATCH 2/4] More desc --- docs/source/user-guide/cli.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/source/user-guide/cli.md b/docs/source/user-guide/cli.md index 3d834f63331e..f6a14ddfc3e6 100644 --- a/docs/source/user-guide/cli.md +++ b/docs/source/user-guide/cli.md @@ -145,18 +145,21 @@ via `CREATE EXTERNAL TABLE` as shown below. Filemask wildcards supported Parquet data sources can be registered by executing a `CREATE EXTERNAL TABLE` SQL statement. The schema information will be derived automatically. +Register a single file parquet datasource ```sql CREATE EXTERNAL TABLE taxi STORED AS PARQUET LOCATION '/mnt/nyctaxi/tripdata.parquet'; ``` +Register a single folder parquet datasource. All files inside must be valid parquet files! ```sql CREATE EXTERNAL TABLE taxi STORED AS PARQUET LOCATION '/mnt/nyctaxi/'; ``` +Register a single folder parquet datasource by specifying a wildcard for files to read ```sql CREATE EXTERNAL TABLE taxi STORED AS PARQUET From 4369005ec310801699beb7b872d8cbe6d8a603bb Mon Sep 17 00:00:00 2001 From: comphead Date: Mon, 18 Sep 2023 16:41:30 -0700 Subject: [PATCH 3/4] fmt --- docs/source/user-guide/cli.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/source/user-guide/cli.md b/docs/source/user-guide/cli.md index f6a14ddfc3e6..7f32741968a5 100644 --- a/docs/source/user-guide/cli.md +++ b/docs/source/user-guide/cli.md @@ -146,6 +146,7 @@ via `CREATE EXTERNAL TABLE` as shown below. Filemask wildcards supported Parquet data sources can be registered by executing a `CREATE EXTERNAL TABLE` SQL statement. The schema information will be derived automatically. Register a single file parquet datasource + ```sql CREATE EXTERNAL TABLE taxi STORED AS PARQUET @@ -153,6 +154,7 @@ LOCATION '/mnt/nyctaxi/tripdata.parquet'; ``` Register a single folder parquet datasource. All files inside must be valid parquet files! + ```sql CREATE EXTERNAL TABLE taxi STORED AS PARQUET @@ -160,6 +162,7 @@ LOCATION '/mnt/nyctaxi/'; ``` Register a single folder parquet datasource by specifying a wildcard for files to read + ```sql CREATE EXTERNAL TABLE taxi STORED AS PARQUET From 20d3298e1d61ed7853295098fe9bdc95e05fff30 Mon Sep 17 00:00:00 2001 From: comphead Date: Mon, 18 Sep 2023 17:41:43 -0700 Subject: [PATCH 4/4] reorg --- docs/source/user-guide/cli.md | 96 +++++++++++++++++------------------ 1 file changed, 48 insertions(+), 48 deletions(-) diff --git a/docs/source/user-guide/cli.md b/docs/source/user-guide/cli.md index 7f32741968a5..e1f332baf38b 100644 --- a/docs/source/user-guide/cli.md +++ b/docs/source/user-guide/cli.md @@ -23,54 +23,6 @@ The DataFusion CLI is a command-line interactive SQL utility for executing queries against any supported data files. It is a convenient way to try DataFusion's SQL support with your own data. -## Selecting files directly - -Files can be queried directly by enclosing the file or -directory name in single `'` quotes as shown in the example. - -## Example - -Create a CSV file to query. - -```shell -$ echo "a,b" > data.csv -$ echo "1,2" >> data.csv -``` - -Query that single file (the CLI also supports parquet, compressed csv, avro, json and more) - -```shell -$ datafusion-cli -DataFusion CLI v17.0.0 -❯ select * from 'data.csv'; -+---+---+ -| a | b | -+---+---+ -| 1 | 2 | -+---+---+ -1 row in set. Query took 0.007 seconds. -``` - -You can also query directories of files with compatible schemas: - -```shell -$ ls data_dir/ -data.csv data2.csv -``` - -```shell -$ datafusion-cli -DataFusion CLI v16.0.0 -❯ select * from 'data_dir'; -+---+---+ -| a | b | -+---+---+ -| 3 | 4 | -| 1 | 2 | -+---+---+ -2 rows in set. Query took 0.007 seconds. -``` - ## Installation ### Install and run using Cargo @@ -136,6 +88,54 @@ OPTIONS: -V, --version Print version information ``` +## Querying data from the files directly + +Files can be queried directly by enclosing the file or +directory name in single `'` quotes as shown in the example. + +## Example + +Create a CSV file to query. + +```shell +$ echo "a,b" > data.csv +$ echo "1,2" >> data.csv +``` + +Query that single file (the CLI also supports parquet, compressed csv, avro, json and more) + +```shell +$ datafusion-cli +DataFusion CLI v17.0.0 +❯ select * from 'data.csv'; ++---+---+ +| a | b | ++---+---+ +| 1 | 2 | ++---+---+ +1 row in set. Query took 0.007 seconds. +``` + +You can also query directories of files with compatible schemas: + +```shell +$ ls data_dir/ +data.csv data2.csv +``` + +```shell +$ datafusion-cli +DataFusion CLI v16.0.0 +❯ select * from 'data_dir'; ++---+---+ +| a | b | ++---+---+ +| 3 | 4 | +| 1 | 2 | ++---+---+ +2 rows in set. Query took 0.007 seconds. +``` + ## Creating external tables It is also possible to create a table backed by files by explicitly