Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Specify compression per column instead of globally #1594

Open
ozgrakkurt opened this issue Nov 23, 2023 · 0 comments
Open

Specify compression per column instead of globally #1594

ozgrakkurt opened this issue Nov 23, 2023 · 0 comments

Comments

@ozgrakkurt
Copy link
Contributor

ozgrakkurt commented Nov 23, 2023

Maybe a similar api to how we pass encodings into RowGroupIterator.

This will allow to have different compression config for different columns. It would be very useful in cases where we have a sizeable column with random binary data like hash etc. Or if we are using rle/dictionary encoding, there might not be much point in compressing/decompressing.

This would give significant performance boost for my use case since when I look at timings for querying parquet, it shows 1/4. 1/2 of time is spent decompressing

I would like to work on this if I can get how I should modify the public api for this

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant