Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qsv 0.131.0 #180595

Merged
merged 2 commits into from
Aug 9, 2024
Merged

qsv 0.131.0 #180595

merged 2 commits into from
Aug 9, 2024

Conversation

BrewTestBot
Copy link
Member

Created by brew bump


Created with brew bump-formula-pr.

release notes
### Highlights
* __Refactored `frequency` to make it smarter and faster.__   
`frequency`'s core algorithm essentially compiles an in-memory hashmap to determine the frequency of each unique value for each column. It does this using multi-threaded, multi-I/O techniques to make it blazing fast.   
However, for columns with ALL unique values (e.g. ID columns), this takes a comparatively long time and consumes a lot of memory as it essentially compiles a hashmap of the entire column.  
Now, with the new `--stats-mode` option (enabled by default), `frequency` can compile the dataset in a more intelligent way by looking up a column's cardinality in the stats cache.  
If the cardinality of a column is equal to the CSV's rowcount (indicating a column with ALL unique values), it short-circuits frequency calculations for that column - dramatically reducing the time and memory requirements for the ID column as it eliminates the need to maintain a hashmap for it.  
Practically speaking, this makes `frequency` able to handle "real-world" datasets of any size.  
To ensure `frequency` is as fast as possible, be sure to `index` and compute `stats` for your datasets beforehand.
* __Setting the stage for Datapusher+ v1 and...__  
The "[itches we've been scratching](https://en.wikipedia.org/wiki/The_Cathedral_and_the_Bazaar#Lessons_for_creating_good_open_source_software)" the past few months have been informed by our work at several clients towards the release of Datapusher+ 1.0 and qsv pro 1.0 (more info below) - both targeted for release this month.  
[DP+](https://github.com/dathere/datapusher-plus) is our third-gen, high-speed data ingestion/registration tool for CKAN that uses qsv as its data wrangling/analysis engine. It will enable us to reinvent the way data is ingested into CKAN - with exponentially faster data ingestion, metadata inferencing, data validation, computed metadata fields, and more!  
We're particularly excited how qsv will allow us to compute and infer high-quality metadata for datasets (with a focus on inferring optional recommended [DCAT-US v3](https://doi-do.github.io/dcat-us/) metadata fields) in "near real-time", while dataset publishers are still entering metadata. This will be a game-changer for CKAN administrators and data publishers!
* __...qsv pro 1.0__  
[qsv pro](https://qsvpro.dathere.com) is [datHere](https://dathere.com)'s enterprise-grade data wrangling/curation workbench that’s planned for v1.0 release this month.
Building the core functionality of qsv pro's Workflow feature is one of the primary reasons for a v1.0 release.  
We feel qsv pro may be a game-changer for data wranglers and data curators who need to work with spreadsheets and large datasets to view statistical data and metadata while also performing complex data wrangling operations in a user-friendly way without having to write code.

Added

Changed

Fixed

Removed

New Contributors

Full Changelog: jqnatividad/qsv@0.130.0...0.131.0

@github-actions github-actions bot added rust Rust use is a significant feature of the PR or issue bump-formula-pr PR was created using `brew bump-formula-pr` labels Aug 9, 2024
@p-linnane p-linnane added the pending-rust-update Blocked by `rust` upgrade PR label Aug 9, 2024
@chenrui333 chenrui333 removed the pending-rust-update Blocked by `rust` upgrade PR label Aug 9, 2024
Copy link
Contributor

github-actions bot commented Aug 9, 2024

🤖 An automated task has requested bottles to be published to this PR.

@github-actions github-actions bot added the CI-published-bottle-commits The commits for the built bottles have been pushed to the PR branch. label Aug 9, 2024
@BrewTestBot BrewTestBot added this pull request to the merge queue Aug 9, 2024
Merged via the queue into master with commit a08bf48 Aug 9, 2024
15 checks passed
@BrewTestBot BrewTestBot deleted the bump-qsv-0.131.0 branch August 9, 2024 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bump-formula-pr PR was created using `brew bump-formula-pr` CI-published-bottle-commits The commits for the built bottles have been pushed to the PR branch. rust Rust use is a significant feature of the PR or issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants