Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation Updates for New Write Related Features #7520

Merged
merged 11 commits into from
Sep 12, 2023

Conversation

devinjdangelo
Copy link
Contributor

Which issue does this PR close?

Closes #7499

Rationale for this change

We have added new options for writing files and changed some names around. We should update the documentation so the current state is clear.

What changes are included in this PR?

New documentation for write related options.

Are these changes tested?

Yes by existing tests.

Are there any user-facing changes?

New docs

Copy link
Member

@Weijun-H Weijun-H left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @devinjdangelo, I noticed some typos in this pr.

docs/source/user-guide/sql/write_options.md Outdated Show resolved Hide resolved

| Option | Description | Default Value |
| ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------- |
| SINGLE_FILE | If true, indicates that this external table is backed by a single file. INSERT INTO queries will append to this file. | false |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| SINGLE_FILE | If true, indicates that this external table is backed by a single file. INSERT INTO queries will append to this file. | false |
| SINGLE_FILE | If true, indicates that this external table is backed by a single file. INSERT INTO queries will be appended to this file. | false |

docs/source/user-guide/sql/write_options.md Outdated Show resolved Hide resolved
)
```

In this example, we write the entirety of `source_table` out to a folder of parquet files. The option `single_file_output` set to false, indicates that the destination path should be interpreted as a folder to which the query will output multiple files. One parquet file will be written in parallel to the folder for each partition in the query. The next option `compression` set to `snappy` indicates that unless otherwise specified all columns should use the snappy compression codec. The option `compression::col1` sets an override, so that the column `col1` in the parquet file will use `ZSTD` compression codec with compression level `5`. In general, parquet option which support column specific settings can be specified with the syntax `OPTION::COLUMN.NESTED.PATH`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In this example, we write the entirety of `source_table` out to a folder of parquet files. The option `single_file_output` set to false, indicates that the destination path should be interpreted as a folder to which the query will output multiple files. One parquet file will be written in parallel to the folder for each partition in the query. The next option `compression` set to `snappy` indicates that unless otherwise specified all columns should use the snappy compression codec. The option `compression::col1` sets an override, so that the column `col1` in the parquet file will use `ZSTD` compression codec with compression level `5`. In general, parquet option which support column specific settings can be specified with the syntax `OPTION::COLUMN.NESTED.PATH`.
In this example, we write the entirety of `source_table` out to a folder of parquet files. The option `single_file_output` set to false, indicates that the destination path should be interpreted as a folder to which the query will output multiple files. One parquet file will be written in parallel to the folder for each partition in the query. The next option `compression` set to `snappy` indicates that unless otherwise specified all columns should use the snappy compression codec. The option `compression::col1` sets an override, so that the column `col1` in the parquet file will use `ZSTD` compression codec with compression level `5`. In general, the parquet option which supports column-specific settings can be specified with the syntax `OPTION::COLUMN.NESTED.PATH`.

docs/source/user-guide/sql/write_options.md Outdated Show resolved Hide resolved
docs/source/user-guide/sql/write_options.md Outdated Show resolved Hide resolved
docs/source/user-guide/sql/write_options.md Outdated Show resolved Hide resolved
Copy link
Member

@Weijun-H Weijun-H left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @devinjdangelo, I noticed some typos in this pr.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @devinjdangelo -- this looks really great ❤️

Thank you @Weijun-H for the additional review

docs/source/user-guide/sql/ddl.md Outdated Show resolved Hide resolved
docs/source/user-guide/sql/dml.md Outdated Show resolved Hide resolved
@@ -55,7 +49,7 @@ Copy the contents of `source_table` to one or more Parquet formatted
files in the `dir_name` directory:

```sql
> COPY source_table TO 'dir_name' (FORMAT parquet, PER_THREAD_OUTPUT true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

docs/source/user-guide/sql/write_options.md Outdated Show resolved Hide resolved
under the License.
-->

# Write Options
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be a good idea to add a link to this page into the index https://github.com/apache/arrow-datafusion/blob/main/docs/source/user-guide/sql/index.rst so it show up in the left hand nav bar

docs/source/user-guide/sql/write_options.md Outdated Show resolved Hide resolved
docs/source/user-guide/sql/write_options.md Outdated Show resolved Hide resolved
docs/source/user-guide/sql/write_options.md Outdated Show resolved Hide resolved
@alamb alamb added the documentation Improvements or additions to documentation label Sep 12, 2023
@alamb
Copy link
Contributor

alamb commented Sep 12, 2023

Thanks again @devinjdangelo and @Weijun-H -- I'll merge this and we can continue iterating on the docs in follow on PRs

@alamb alamb merged commit 561e0d7 into apache:main Sep 12, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document COPY parquet specific options
3 participants