Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exporting data to Google Cloud Storage in Parquet format available but undocumented #614

Open
pegoenrico opened this issue Jun 11, 2024 · 3 comments

Comments

@pegoenrico
Copy link

pegoenrico commented Jun 11, 2024

Hello all.
I'm trying to export queried data from a BigQuery database table. Since the resulting table can be large (2.5GB or more), I followed the suggestion "Larger datasets" from the bq_table_download() help, and I used bq_table_save() to save the data in multiple files in Google Cloud Storage.

When I tried to apply bq_table_save(), I discovered an undocumented option to export the files: destination_format = "PARQUET" in place of "NEWLINE_DELIMITED_JSON" or "CSV". If I use this parameter, bq_table_save() saves correctly the data in multiple "parquet" files.

Can I use this option without problems? It seems to me that it works very well: it is very performant, and the use of parquet files saves me a lot of work to check data types.

The following code summarizes at most the code I used to export data succesfully to a Google Cloud Storage bucket:

project_id  <- "<project identifier>"
sql_dwn <- "SELECT * FROM <table from which to extract data>"
tb <- bq_project_query(project_id, sql_dwn)
bq_table_save(tb, destination_uris = "destination_bucket/folder/filename_*.parquet", destination_format="PARQUET") 

Thank you in advance for your help.

@pegoenrico pegoenrico changed the title Exporting data to Google Cloud Storage in Parquet format Exporting data to Google Cloud Storage in Parquet format available but undocumented Jun 12, 2024
@pegoenrico
Copy link
Author

pegoenrico commented Jun 13, 2024

Does anyone help me, please?

@apalacio9502
Copy link
Contributor

Hi @pegoenrico,

The Parquet format is supported, according to the BigQuery documentation (https://cloud.google.com/bigquery/docs/exporting-data), and in this case, the library documentation needs to be updated.

I expect that in a few days, the documentation for the development version will be updated #618.

If you use the parameter destination_format = "PARQUET", please note that the supported compression formats are "SNAPPY", "GZIP", "ZSTD", or "NONE".

Regards,

@pegoenrico
Copy link
Author

Hi @apalacio9502,
thank you very much for your update.
Now I'll feel free to use the PARQUET format to export data from BigQuery tables.
Best regards!
Enrico

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants