From 3170c5f59009006345bee5245f75b43a4e97c2be Mon Sep 17 00:00:00 2001 From: Joel Natividad <1980690+jqnatividad@users.noreply.github.com> Date: Tue, 10 Sep 2024 07:47:36 -0400 Subject: [PATCH] `docs`: changelog with DRAFT highlights before release [skip ci] --- CHANGELOG.md | 27 ++++++++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 33217147b..2a3067423 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,9 +6,34 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] -## [0.134.0] - 2024-09-09 +## [0.134.0] - 2024-09-10 + +## qsv pro v1 is here! 🎉 +If you've been using qsv for a while, even if you're a command-line ninja, you'll find a lot of new capabilities in qsv pro that can make your data wrangling experience even better! + +Apart from making qsv easier to use, qsv pro has a multitude of features including: view interactive data tables; browse stats/frequency/metadata; run recipes and tools (scripts); run Polars SQL queries; an interface using Retrieval Augmented Generation (RAG) techniques to convert Natural Language queries to Polars SQL; regular expression search; export to multiple file formats; download/upload from/to compatible CKAN instances; design custom node-based flows and data pipelines; interact with a local API from external programs including the qsv pro command, and run various qsv commands in a graphical user interface; and the list goes on! + +That's just the beginning, there's more to come more to come! You just have to try it! + +Download qsv pro v1 now at [qsvpro.dathere.com](https://qsvpro.dathere.com/. + +Other highlights include: +`pro`: new command to allow qsv to interact with the qsv pro API to tap qsv pro exclusive features +- `lens`: new command to interactively view CSVs using the [csvlens](https://github.com/YS-L/csvlens) crate. +- The ludicrously fast `diff` command is now easier to use with its `--drop-equal-fields` option. @janriemer continues to work on his `csv-diff` crate, and there's more `diff` UX improvements coming soon! +- `stats` adds `sum_length` and `avg_length` "streaming" statistics in addition to the existing `min_length` and `max_length` metrics. These are especially useful for datasets with a lot of "free text" columns. +- `stats` also got "smarter" and "faster" by [dog-fooding](https://en.wikipedia.org/wiki/Eating_your_own_dog_food) its own statistics to make it run faster! +  +It's a little complicated, but the way `stats` works is that it compiles the "streaming" statistics on the fly first, and the more expensive advanced statistics are "lazily" computed at the end. +Since we now compile "sort order" in a streaming manner, we use this info when deriving cardinality at the end to see if we can skip sorting - an otherwise necessary step to get cardinality which is done by "scanning" all the sorted values of a column. Everytime two neighboring values differ in a sortedcolumn, it increments the cardinality count. +Apart from this "sort order" optimization, we also improved the "cardinality scan" algorithm - halving its memory footprint and making it faster still for larger datasets by parallelizing the computation! +This in turn, makes the `frequency` command faster and more memory efficient! +- we now also use our own fork of the `csv` crate, featuring SIMD-accelerated UTF-8 validation and other minor perf tweaks, making the *entire qsv suite* faster still! + +--- ### Added +* `pro`: add `qsv pro` command to interact with qsv pro API by @rzmk in https://github.com/jqnatividad/qsv/pull/2039 * `lens`: new command to interactively view CSVs using the [csvlens](https://github.com/YS-L/csvlens) crate https://github.com/jqnatividad/qsv/pull/2117 * `apply`: add crc32 operation https://github.com/jqnatividad/qsv/pull/2121 * `count`: add --delimiter option https://github.com/jqnatividad/qsv/pull/2120