From 2a8088333f6a451651940b841e2451e1850d3d07 Mon Sep 17 00:00:00 2001 From: Devin D'Angelo Date: Sun, 10 Sep 2023 14:43:32 -0400 Subject: [PATCH] prettier --- docs/source/user-guide/sql/dml.md | 1 - docs/source/user-guide/sql/write_options.md | 33 ++++++++++----------- 2 files changed, 16 insertions(+), 18 deletions(-) diff --git a/docs/source/user-guide/sql/dml.md b/docs/source/user-guide/sql/dml.md index c7d62f83dbae..54907534b0d4 100644 --- a/docs/source/user-guide/sql/dml.md +++ b/docs/source/user-guide/sql/dml.md @@ -28,7 +28,6 @@ Copies the contents of a table or query to file(s). Supported file formats are `parquet`, `csv`, and `json` and can be inferred based on filename if writing to a single file. -
 COPY { table_name | query } TO 'file_name' [ ( option [, ... ] ) ]
 
diff --git a/docs/source/user-guide/sql/write_options.md b/docs/source/user-guide/sql/write_options.md index 31a8daac1895..67a0679b7fd9 100644 --- a/docs/source/user-guide/sql/write_options.md +++ b/docs/source/user-guide/sql/write_options.md @@ -19,15 +19,15 @@ # Write Options -DataFusion supports customizing how data is written out to disk as a result of a ```COPY``` or ```INSERT INTO``` query. There are a few special options, file format (e.g. CSV or parquet) specific options, and parquet column specific options. Options can also in some cases be specified in multiple ways with a set order of precedence. +DataFusion supports customizing how data is written out to disk as a result of a `COPY` or `INSERT INTO` query. There are a few special options, file format (e.g. CSV or parquet) specific options, and parquet column specific options. Options can also in some cases be specified in multiple ways with a set order of precedence. ## Specifying Options and Order of Precedence Write related options can be specified in the following ways: -* Session level config defaults -* ```CREATE EXTERNAL TABLE``` options -* ```COPY``` option tuples +- Session level config defaults +- `CREATE EXTERNAL TABLE` options +- `COPY` option tuples For a list of supported session level config defaults see [Configuration Settings](https://arrow.apache.org/datafusion/user-guide/configs.html). These defaults apply to all write operations but have the lowest level of precedence. @@ -47,13 +47,13 @@ NULL_VALUE 'NAN' ); ``` -When running ```INSERT INTO my_table ...```, the above specified options will be respected (gzip compression, special delimiter, and header row included). Note that compression, header, and delimeter settings can also be specified within the ```OPTIONS``` tuple list. Dedicated syntax within the SQL statement always takes precedence over arbitrary option tuples, so if both are specified the ```OPTIONS``` setting will be ignored. CREATE_LOCAL_PATH is a special option that indicates if DataFusion should create local file paths when writing new files if they do not already exist. This option is useful if you wish to create an external table from scratch, using only DataFusion SQL statements. Finally, NULL_VALUE is a CSV format specific option that determines how null values should be encoded within the CSV file. +When running `INSERT INTO my_table ...`, the above specified options will be respected (gzip compression, special delimiter, and header row included). Note that compression, header, and delimeter settings can also be specified within the `OPTIONS` tuple list. Dedicated syntax within the SQL statement always takes precedence over arbitrary option tuples, so if both are specified the `OPTIONS` setting will be ignored. CREATE_LOCAL_PATH is a special option that indicates if DataFusion should create local file paths when writing new files if they do not already exist. This option is useful if you wish to create an external table from scratch, using only DataFusion SQL statements. Finally, NULL_VALUE is a CSV format specific option that determines how null values should be encoded within the CSV file. -Finally, options can be passed when running a ```COPY``` command. +Finally, options can be passed when running a `COPY` command. ```sql -COPY source_table -TO 'test/table_with_options' +COPY source_table +TO 'test/table_with_options' (format parquet, single_file_output false, compression snappy, @@ -61,18 +61,17 @@ compression snappy, ) ``` -In this example, we write the entirety of ```source_table``` out to a folder of parquet files. The option ```single_file_output``` set to false, indicates that the destination path should be interpreted as a folder to which the query will output multiple files. One parquet file will be written in parallel to the folder for each partition in the query. The next option ```compression``` set to ```snappy``` indicates that unless otherwise specified all columns should use the snappy compression codec. The option ```compression::col1``` sets an override, so that the column ```col1``` in the parquet file will use ```ZSTD``` compression codec with compression level ```5```. In general, parquet option which support column specific settings can be specified with the syntax ```OPTION::COLUMN.NESTED.PATH```. +In this example, we write the entirety of `source_table` out to a folder of parquet files. The option `single_file_output` set to false, indicates that the destination path should be interpreted as a folder to which the query will output multiple files. One parquet file will be written in parallel to the folder for each partition in the query. The next option `compression` set to `snappy` indicates that unless otherwise specified all columns should use the snappy compression codec. The option `compression::col1` sets an override, so that the column `col1` in the parquet file will use `ZSTD` compression codec with compression level `5`. In general, parquet option which support column specific settings can be specified with the syntax `OPTION::COLUMN.NESTED.PATH`. ## Available Options - ### COPY Specific Options -The following special options are specific to the ```COPY``` query. +The following special options are specific to the `COPY` query. | Option | Description | Default Value | -|--------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------| -| SINGLE_FILE_OUTPUT | If true, COPY query will write output to a single file. | true | +| ------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------- | +| SINGLE_FILE_OUTPUT | If true, COPY query will write output to a single file. | true | | FORMAT | Specifies the file format COPY query will write out. If single_file_output is false or format cannot be inferred from file extension, then FORMAT must be specified. | N/A | ### CREATE EXTERNAL TABLE Specific Options @@ -80,7 +79,7 @@ The following special options are specific to the ```COPY``` query. The following special options are specific to creating an external table. | Option | Description | Default Value | -|-------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------| +| ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------- | | SINGLE_FILE | If true, indicates that this external table is backed by a single file. INSERT INTO queries will append to this file. | false | | CREATE_LOCAL_PATH | If true, the folder or file backing this table will be created on the local file system if it does not already exist when running INSERT INTO queries. | false | | INSERT_MODE | Determines if INSERT INTO queries should append to existing files or append new files to an existing directory. Valid values are append_to_file, append_new_files, and error. Note that "error" will block inserting data into this table. | CSV and JSON default to append_to_file. Parquet defaults to append_new_files | @@ -90,7 +89,7 @@ The following special options are specific to creating an external table. The following options are available when writting JSON files. Note: if any unsupported option is specified, an error will be raised and the query will fail. | Option | Description | Default Value | -|-------------|------------------------------------------------------------------------------------------------------------------------------------|---------------| +| ----------- | ---------------------------------------------------------------------------------------------------------------------------------- | ------------- | | COMPRESSION | Sets the compression that should be applied to the entire JSON file. Supported values are GZIP, BZIP2, XZ, ZSTD, and UNCOMPRESSED. | UNCOMPRESSED | ### CSV Format Sepcific Options @@ -98,7 +97,7 @@ The following options are available when writting JSON files. Note: if any unsup The following options are available when writing CSV files. Note: if any unsupported options is specified an error will be raised and the query will fail. | Option | Description | Default Value | -|-----------------|-----------------------------------------------------------------------------------------------------------------------------------|------------------| +| --------------- | --------------------------------------------------------------------------------------------------------------------------------- | ---------------- | | COMPRESSION | Sets the compression that should be applied to the entire CSV file. Supported values are GZIP, BZIP2, XZ, ZSTD, and UNCOMPRESSED. | UNCOMPRESSED | | HEADER | Sets if the CSV file should include column headers | false | | DATE_FORMAT | Sets the format that dates should be encoded in within the CSV file | arrow-rs default | @@ -113,7 +112,7 @@ The following options are available when writing CSV files. Note: if any unsuppo The following options are available when writing parquet files. If any unsupported option is specified an error will be raised and the query will fail. If a column specific option is specified for a column which does not exist, the option will be ignored without error. For default values, see: [Configuration Settings](https://arrow.apache.org/datafusion/user-guide/configs.html). | Option | Can be Column Specific? | Description | -|------------------------------|-------------------------|---------------------------------------------------------------------------------------------------------------| +| ---------------------------- | ----------------------- | ------------------------------------------------------------------------------------------------------------- | | COMPRESSION | Yes | Sets the compression codec and if applicable compression level to use | | MAX_ROW_GROUP_SIZE | No | Sets the maximum number of rows that can be encoded in a single row group | | DATA_PAGESIZE_LIMIT | No | Sets the best effort maximum page size in bytes |