Skip to content

Commit

Permalink
docs: changelog with DRAFT highlights before release
Browse files Browse the repository at this point in the history
[skip ci]
  • Loading branch information
jqnatividad committed Sep 10, 2024
1 parent a9cbf9a commit 3170c5f
Showing 1 changed file with 26 additions and 1 deletion.
27 changes: 26 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,34 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

## [0.134.0] - 2024-09-09
## [0.134.0] - 2024-09-10

## qsv pro v1 is here! 🎉
If you've been using qsv for a while, even if you're a command-line ninja, you'll find a lot of new capabilities in qsv pro that can make your data wrangling experience even better!

Apart from making qsv easier to use, qsv pro has a multitude of features including: view interactive data tables; browse stats/frequency/metadata; run recipes and tools (scripts); run Polars SQL queries; an interface using Retrieval Augmented Generation (RAG) techniques to convert Natural Language queries to Polars SQL; regular expression search; export to multiple file formats; download/upload from/to compatible CKAN instances; design custom node-based flows and data pipelines; interact with a local API from external programs including the qsv pro command, and run various qsv commands in a graphical user interface; and the list goes on!

That's just the beginning, there's more to come more to come! You just have to try it!

Download qsv pro v1 now at [qsvpro.dathere.com](https://qsvpro.dathere.com/.

Other highlights include:
`pro`: new command to allow qsv to interact with the qsv pro API to tap qsv pro exclusive features
- `lens`: new command to interactively view CSVs using the [csvlens](https://github.com/YS-L/csvlens) crate.
- The ludicrously fast `diff` command is now easier to use with its `--drop-equal-fields` option. @janriemer continues to work on his `csv-diff` crate, and there's more `diff` UX improvements coming soon!
- `stats` adds `sum_length` and `avg_length` "streaming" statistics in addition to the existing `min_length` and `max_length` metrics. These are especially useful for datasets with a lot of "free text" columns.
- `stats` also got "smarter" and "faster" by [dog-fooding](https://en.wikipedia.org/wiki/Eating_your_own_dog_food) its own statistics to make it run faster!
 
It's a little complicated, but the way `stats` works is that it compiles the "streaming" statistics on the fly first, and the more expensive advanced statistics are "lazily" computed at the end.
Since we now compile "sort order" in a streaming manner, we use this info when deriving cardinality at the end to see if we can skip sorting - an otherwise necessary step to get cardinality which is done by "scanning" all the sorted values of a column. Everytime two neighboring values differ in a sortedcolumn, it increments the cardinality count.
Apart from this "sort order" optimization, we also improved the "cardinality scan" algorithm - halving its memory footprint and making it faster still for larger datasets by parallelizing the computation!
This in turn, makes the `frequency` command faster and more memory efficient!
- we now also use our own fork of the `csv` crate, featuring SIMD-accelerated UTF-8 validation and other minor perf tweaks, making the *entire qsv suite* faster still!

---

### Added
* `pro`: add `qsv pro` command to interact with qsv pro API by @rzmk in https://github.com/jqnatividad/qsv/pull/2039
* `lens`: new command to interactively view CSVs using the [csvlens](https://github.com/YS-L/csvlens) crate https://github.com/jqnatividad/qsv/pull/2117
* `apply`: add crc32 operation https://github.com/jqnatividad/qsv/pull/2121
* `count`: add --delimiter option https://github.com/jqnatividad/qsv/pull/2120
Expand Down

0 comments on commit 3170c5f

Please sign in to comment.