Skip to content

Releases: jqnatividad/qsv

0.117.0

15 Oct 13:45
1901fe3
Compare
Choose a tag to compare

Highlights:

  • geocode: added Federal Information Processing Standards (FIPS) codes to results for US places, so we can derive GEOIDs. This paves the way to doing data enrichment lookups (starting with the US Census) in an upcoming release. 🦄
  • Added Goal/Non-goals, explicitly codifying what qsv is and isn't, and what we're trying to achieve with the toolkit.
  • excel: CSV output processing is now multi-threaded, making it a bit faster. The bottleneck is still the Excel/ODS library we're using (calamine), which is single-threaded. But there are active discussions underway to make it much faster in the future. 🏇
  • Upgrading the MSRV to 1.73.0 has allowed us to use LLVM 17, which has resulted in a small performance boost. 🏇

Added:

  • geocode: added Federal Information Processing Standards (FIPS) codes to results for US places.
  • Added Goals/Non-goals to README.md

Changed

  • cat : minor optimization 343bb66
  • excel: CSV output processing is now multi-threaded #1360
  • geocode: more efficient dynfmt ptocessing #1367
  • frequency: optimize allocations before hot loop 655bebc
  • luau: upgraded embedded Luau from 0.596 to 0.599
  • deps: bump calamine from 0.22.0 to 0.22.1 4c4ed7e
  • docs: reorganized README, moving FEATURES and INTERPRETERS to their own markdown files.
  • build(deps): bump byteorder from 1.4.3 to 1.5.0 by @dependabot in #1347
  • build(deps): bump tokio from 1.32.0 to 1.33.0 by @dependabot in #1354
  • build(deps): bump regex from 1.9.6 to 1.10.0 by @dependabot in #1356
  • build(deps): bump semver from 1.0.19 to 1.0.20 by @dependabot in #1358
  • build(deps): bump pyo3 from 0.19.2 to 0.20.0 by @dependabot in #1359
  • build(deps): bump serde from 1.0.188 to 1.0.189 by @dependabot in #1361
  • build(deps): bump flate2 from 1.0.27 to 1.0.28 by @dependabot in #1363
  • build(deps): bump regex from 1.10.0 to 1.10.1 by @dependabot in #1366
  • deps: update several indirect dependencies
  • pin Rust nightly to 2023-10-14
  • bump MSRV to 1.73.0

Removed

  • excel: removed --progressbar option as Excel/ODS maximum sheet size is just too small (1,048,576 rows) to make it useful.

Fixed

  • Fixed Jupyter Notebook Viewer Link by @a5dur in #1349

Full Changelog: 0.116.0...0.117.0

0.116.0

05 Oct 20:14
edf73a3
Compare
Choose a tag to compare

Highlights: 🎉 🚀

  • Benchmarks refinements galore with more benchmarks and more comprehensive benchmarking instructions. 🎠
  • geocode: The Geonames index's configuration metadata is now available with the geocode index-check subcommand. No need to maintain a separate metadata JSON file. This should make it even easier to maintain multiple Geonames index files with different configurations without having to worry if you're looking at the right metadata JSON file. 🎠
  • cat: rowskey subcommand is now 27% faster 🏇🏽
  • tojsonl: parallelized with rayon, making it 33% faster! 🏇🏽
  • smaller qsv binary size and faster compile times if the to_parquet feature is disabled. If you're good enough with sqlp's ability to create a parquet file from a SQL query, qsv's binary size and compile time will be markedly smaller/faster. 🏇🏽
  • minor perf tweaks & optimizations - count and luau commands 🏇🏽

Added

  • geocode: added Geonames index file metadata to index-check subcommand
  • tojsonl: parallelized with rayon #1338
  • to: added to_parquet feature. #1341
  • benchmarks: upgraded from 3.0.0 to 3.3.1
    • you can now specify a separate benchmarking binary as we dogfood qsv for the benchmarks and some features are required that may not be in the qsv binary variant being benchmarked
    • added additional count benchmarks with --width option
    • added additional luau benchmarks with single/multi filter options
    • added additional search benchmark with --unicode option
    • show absolute path of qsv binaries used (both the one we're dogfooding and the one being benchmarked) and their version info before running the benchmarks proper
    • ensured schema benchmark was not using the stats cache with the --force option

Changed

  • cat: use an empty byte_record var instead of repeatedly allocating a new one in a hot loop eddafd1
  • count: minor optimization bb113c0
  • luau: minor perf tweaks c71cd16 and f9c1e3c
  • (deps): bump Geosuggest from 0.4.5 to 5.1 #1333
  • (deps): use patched version of calamine which has unreleased fixes since 0.22.0
  • build(deps): bump flexi_logger from 0.27.0 to 0.27.2 by @dependabot in #1328
  • build(deps): bump indexmap from 2.0.0 to 2.0.1 by @dependabot in #1329
  • build(deps): bump hashbrown from 0.14.0 to 0.14.1 by @dependabot in #1334
  • build(deps): bump file-format from 0.20.0 to 0.21.0 by @dependabot in #1335
  • build(deps): bump indexmap from 2.0.1 to 2.0.2 by @dependabot in #1336
  • build(deps): bump regex from 1.9.5 to 1.9.6 by @dependabot in #1337
  • build(deps): bump jql-runner from 7.0.3 to 7.0.4 by @dependabot in #1340
  • build(deps): bump csvs_convert from 0.8.7 to 0.8.8 by @dependabot in #1339
  • build(deps): bump actions/setup-python from 4.7.0 to 4.7.1 by @dependabot in #1342
  • build(deps): bump reqwest from 0.11.21 to 0.11.22 by @dependabot in #1343
  • build(deps): bump csv from 1.2.2 to 1.3.0 by @dependabot in #1344
  • build(deps): bump actix-governor from 0.4.1 to 0.5.0 by @dependabot in #1346
  • applied select clippy suggestions
  • update several indirect dependencies
  • pin Rust nightly to 2023-10-04

Removed

  • geocode: removed separate metadata JSON file for Geonames index files. The metadata is now embedded in the index file itself and can be viewed with the index-check command.
  • removed redundant setting from profile.release-samply in Cargo.toml 2a35be5

Fixed

  • geocode: when producing JSON output with the now subcommands (suggestnow, reversenow, countryinfonow), we now produce valid JSON. We previously generated JSON with escaped/extra quotes as it was formatted to be included in CSV files, which is required for the suggest, reverse and countryinfo subcommands as they are designed to process CSVs with multiple rows, thus requiring escaped JSON. The now commands are only meant for one result so there's no need to escape quote the JSON. #1345
  • schema: fixed --force flag not being honored

Full Changelog: 0.115.0...0.116.0

0.115.0

26 Sep 13:19
1c47a87
Compare
Choose a tag to compare

We continue to refine the benchmark suite, and have added a new setup argument to setup and install the required tools for the benchmark suite. We've also added more comprehensive checks to ensure that the required tools are installed before running the benchmarks. 🎠

For geocode, we've added a JSON file describing the Geonames index file configuration. This should help users maintain several Geonames index files with different configurations. 🎠

geocode should also be a tad faster now, thanks to cached crate making ahash its default hashing algorithm and upgrading hashbrown - microbenchmarks show a 33% performance improvement. 🏇🏽

We also added a release-samply profile so we can make it easier to squeeze more performance out of the toolkit with samply. 🏇🏽


Added

  • geocode: added a JSON file describing the Geonames index file configuration in #1324
  • benchmarks: v3.0.0 release
    • added setup argument to setup and install required tools for the benchmark suite
    • added more comprehensive required tools check
    • added more realistic luau benchmarks, using helper luau scripts
      (dt_format.luau and turnaround_time.luau)
    • added stats with_cache and create_cache benchmarks
    • added benchmark_aggregations.luau script for benchmark analysis
    • added binary, total_mean and qsv_env columns to benchmark results
      binary is the qsv binary variant used
      total_mean is the sum of all the mean run times of the benchmarks
      qsv_env are the qsv-relevant environment variables active while running the benchmarks
    • expanded README.md and benchmark suite usage instructions
  • added release-samply profile to Cargo.toml to facilitate continued performance optimization with samply

Changed

  • readme: move tab completion instructions/script to scripts/misc
  • geocode: updated bundled Geonames index to 2021-09-25
  • bump embedded luau from 0.594 to 0.596
  • build(deps): bump flexi_logger from 0.26.1 to 0.27.0 by @dependabot in #1317
  • build(deps): bump indicatif from 0.17.6 to 0.17.7 by @dependabot in #1318
  • build(deps): bump semver from 1.0.18 to 1.0.19 by @dependabot in #1320
  • build(deps): bump cached from 0.45.1 to 0.46.0 by @dependabot in #1322
  • build(deps): bump geosuggest-core from 0.4.3 to 0.4.5 by @dependabot in #1323
  • build(deps): bump geosuggest-utils from 0.4.3 to 0.4.5 by @dependabot in #1321
  • build(deps): bump fastrand from 2.0.0 to 2.0.1 by @dependabot in #1325
  • bump MSRV from Rust 1.72.0 to 1.72.1
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2023-09-25

Fixed

  • benchmarks: fixed invalid luau benchmark that had invalid luau command

Full Changelog: 0.114.0...0.115.0

0.114.0

21 Sep 11:03
c471321
Compare
Choose a tag to compare

The long-overdue Benchmarks revamp is finally here! 🎉- https://qsv.dathere.com/benchmarks

The benchmarks have been completely rewritten to be more reproducible, and now use hyperfine instead of time. The new benchmarks are now run as part of the release process, and the results are compiled into a single page that is published on the new Quicksilver website.

The new benchmarks are also more comprehensive, and designed to be run on a variety of hardware and operating systems. This allows users to adapt the benchmarks to their own workloads and environments.

Other release highlights include:

  • geocode is now fully-featured and ready for production use! 🎉 Though it only currently features Geonames city-level lookup support, it provides a solid foundation on top of which we'll add more geocoding providers in the future (next up - OpenCage support with street-level geocoding).
  • Polars has been bumped from 0.32.1 to 0.33.2, which includes a number of performance improvements for the sqlp and joinp commands.
  • major performance increase on several regex/aho-corasick powered commands on Apple Silicon thanks to various under-the-hood improvements in the aho-corasick crate.

Big thanks to @rzmk , @a5dur, @minhajuddin2510 and @samibaig and helping me finally push out the revamped Benchmarks!


Added

  • Added autoindex size threshold, replacing QSV_AUTOINDEX env var with QSV_AUTOINDEX_SIZE. Resolves #1300. in #1301 69e25ac
  • diff: Added test for different delimiters by @janriemer in #1297
  • benchmarks: Added qsv benchmark notebook. by @a5dur in #1309
  • geocode: Added countryinfo/now subcommand made available in geosuggest 0.4.3 #1311
  • geocode: Added --language option so users can specify the language of the geocoding results. This requires running the index-update subcommand with the --languages option to rebuild the index with the desired languages.
  • sqlp: add example of using columns with embedded spaces in SQL queries f7bf4f6

Changed

  • benchmarks: Benchmarks revamped #1298, #1310 d8eeb94
  • build(deps): bump serde_json from 1.0.106 to 1.0.107 by @dependabot in #1302
  • build(deps): bump mimalloc from 0.1.38 to 0.1.39 by @dependabot in #1303
  • build(deps): bump simple-home-dir from 0.1.4 to 0.2.0 by @dependabot in #1304
  • build(deps): bump chrono from 0.4.30 to 0.4.31 by @dependabot in #1305
  • (deps): bump Polars from 0.32.1 to Polars 0.33.2 #1308
  • build(deps): bump cpc from 1.9.2 to 1.9.3 by @dependabot in #1313
  • build(deps): bump rayon from 1.7.0 to 1.8.0 by @dependabot in #1315
  • (deps): update several indirect dependencies
  • pin Rust nightly to 2023-09-21

Full Changelog: 0.113.0...0.114.0

0.113.0

08 Sep 17:30
447bb43
Compare
Choose a tag to compare

This is the first "Unicorn" 🦄 release, adding MAJOR new features to the toolkit!

  • geocode: adds high-speed, cache-backed, multi-threaded geocoding using a local, updateable copy of the GeoNames database. This is a major improvement over the previous geocode subcommand in the apply command thanks to the wonderful geosuggest crate.
  • guaranteed non-UTF8 input detection with the validate and input commands. Quicksilver REQUIRES UTF-8 encoded input. You can now use these commands to ensure you have valid UTF-8 input before using the rest of the toolkit.
  • New/expanded whirlwind tour & quick-start notebooks by @a5dur and @rzmk 🎠
  • Various performance improvements all-around: 🏇🏽
    • overall increase of ~5% now that mimalloc - the default allocator for qsv, is built without secure mode unnecessarily enabled.
    • flatten command is now ~10% faster
    • faster regex performance thanks to various under-the-hood improvements in the regex crate
    • and the benchmark scripts have been updated by @minhajuddin2510 to use hyperfine instead of time, and to use the same input file for all benchmarks to make them more reproducible. In upcoming releases, we'll start compiling the benchmark results into a single page as part of the release process, so we can track our progress over time.

and last but not least - Quicksilver now has a website! - https://qsv.dathere.com/ 🦄 🎉 🚀

And its not just a static site with a few links - its a full-blown web app that lets you try out qsv commands in your browser! It's not just a demo site - you can use it as a configurator and save your commands to a gist and share them with others!

It's the first Beta release of the Quicksilver website, so there's still a lot of work to do, but we're excited to share it with you and get your feedback!

We have more exciting features planned for Quicksilver and the website, but we require your help to make it happen! For qsv, use GitHub issues. For the website, use the feedback form. And if you want to help out, please check out the contributing guide.

Big thanks to @rzmk for all the work on the website! To @a5dur for all the QA work on this release! And to @minhajuddin2510 for revamping the benchmark script!


Added

  • geocode: new high-speed geocoding command #1231
    • major improvements using geosuggest upstream #1269
    • add suggest --country filter #1275
    • add --admin1 filter #1276
    • automatic --country inferencing from --admin1 code #1277
    • add --suggestnow and --reversenow subcommands #1280
    • add "%dyncols:" special formatter to dynamically add geocoded columns to the output CSV #1286
  • excel: add SheetType (Worksheet, DialogSheet, MacroSheet, ChartSheet, VBA) in metadata mode; log.info! headers; wordsmith comments #1225
  • excel: moar metadata! moar examples! #1271
  • add support ALL_PROXY env var #1233
  • input: add --encoding-errors handling option #1235
  • fixlengths: add --insert option #1247
  • joinp: add --sql-filter option #1287
  • luau: we now embed Luau 0.594 from 0.592
  • notebooks: add qsv-colab-quickstart by @rzmk in #1253
  • notebooks: Added Whirlwindtour.ipynb by @a5dur in #1223

Changed

Removed

  • apply: remove geocode subcmd now that we have a dedicated geocode command https://github.co...
Read more

0.112.0

15 Aug 16:37
b1dab63
Compare
Choose a tag to compare

This is the second in a series of "Giddy-up" 🏇🏽 releases, improving the performance of the following commands:

  • stats: by refactoring the code to detect empty cells more efficiently, and by removing
    unnecessary bounds checks in the main compute loop. (~10% performance improvement)
  • sample: by refactoring the code to use an index more effectively when available - not only making it faster, but also eliminating the need to load the entire dataset into memory. Also added a --faster option to use a faster random number generator. (~15% performance improvement)
  • frequency, schema, search & validate by amortizing/reducing allocations in hot loops
  • excel: by refactoring the main hot loop to convert Excel cells more efficiently

The prebuilt binaries are also built with CPU optimizations enabled for x86_64 and Apple Silicon (arm64) architectures.

0.112.0 is also a "Carousel" (i.e. increased usability) 🎠 release featuring new Jupyter notebooks in the contrib/notebooks directory to help users get started with qsv.


Added

  • sqlp: added CASE expression support with Polars 0.32 9d508e6
  • sample: added --faster option to use a faster random number generator #1210
  • jsonl: added --delimiter option #1205
  • excel: added --delimiter option ab73067
  • notebook/describegpt: added describegpt QA Jupyter notebook by @a5dur in #1215
  • notebook/count: added intro-to-count.ipynb by @rzmk in #1207

Changed

  • stats: refactor hot compute function - 35999c5
  • stats: faster detection of empty samples b054815 and a7f0836
  • sample: major refactor making it faster, but also eliminating need to load the entire dataset into memory when an index is available. #1210
  • frequency: refactor primary ftables function 57d660d
  • excel: refactor main loop for more performance - 61f227b
  • rustfmt: match_block_trailing_comma #1206
  • bump MSRV to 1.71.1 1c99364
  • apply clippy suggestions #1209
  • build(deps): bump tokio from 1.29.1 to 1.30.0 by @dependabot in #1204
  • build(deps): bump log from 0.4.19 to 0.4.20 by @dependabot in #1211
  • build(deps): bump redis from 0.23.1 to 0.23.2 by @dependabot in #1213
  • build(deps): bump tokio from 1.30.0 to 1.31.0 by @dependabot in #1212
  • build(deps): bump sysinfo from 0.29.7 to 0.29.8 by @dependabot in #1214
  • upgrade to Polars 0.32.0 #1217
  • build(deps): bump flate2 from 1.0.26 to 1.0.27 by @dependabot in #1218
  • build(deps): bump polars from 0.32.0 to 0.32.1 by @dependabot in #1219
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2023-08-13

Removed

  • stats: removed Debug derives from structs - 2def136

Fixed

  • notebook/count: fix Google Colab link by @rzmk in #1208

Full Changelog: 0.111.0...0.112.0

0.111.0

07 Aug 15:47
cf7801d
Compare
Choose a tag to compare

This is the first in a series of "Giddy-up" 🏇🏽 releases.

As Quicksilver matures, we will continue to tweak it in our goal to be the 🚀 fastest general purpose CSV data-wrangling CLI toolkit available.

"Giddy-up" 🏇🏽 releases increase performance by:

  • taking advantage of new Rust features as they become available
  • using new libraries that are faster than the ones we currently use
  • optimizing our code to take advantage of new features in the libraries we use
  • using new algorithms that are faster than the ones we currently use
  • taking advantage of more hardware features (SIMD, multi-core, etc.)
  • adding reproducible benchmarks that are automatically updated on release to track our progress

As it is, Quicksilver has an aggressive release tempo - with more than 160 releases since its initial release in December 2020. This was made possible by the solid foundation of Rust and the xsv project from which qsv was forked. We will continue to build on this foundation by adding more CI tests and starting to track code coverage so we can continue to iterate aggressively with confidence.

Apart from "giddy-up" releases, Quicksilver will also have "carousel" 🎠 releases that will focus on making the toolkit more accessible to non-technical users.

"Carousel" 🎠 releases will include:

  • more documentation
  • more examples
  • more tutorials
  • more recipes in the Cookbook
  • multiple GUI wrappers around the CLI
  • integrations with common desktop tools like Excel, Google Sheets, Open Office, etc.
  • tighter integration with the CKAN ecosystem, with a focus on helping data publishers & data coordinators maintain a high quality data/metadata catalog

Hopefully, this will make qsv more accessible to non-technical users, and help them get more value out of their data. Special attention will be given to "open data" use cases - enabling non-profits, governments and regular citizens tap raw open data and convert it to actionable insight - making open data useful, usable and used.

Every now and then, we'll also have "Unicorn" 🦄 releases that will add MAJOR new features to the toolkit (e.g. 10x type features like the integration of Pola.rs into qsv).

We will also add a new Technical Documentation section to the wiki to document qsv's architecture and how each command works. The hope is doing so will lower the barrier to contributions and help us grow the community of qsv contributors.

Added

  • sort: add --faster option #1190
  • describegpt: add -Q, --quiet option by @rzmk in #1179

Changed

  • stats: refactor init_date_inference #1187
  • join: cache has_headers result in hot loop e53edaf
  • search & searchset: amortize allocs #1188
  • stats: use fast-float to convert string to float #1191
  • sqlp: more examples, apply clippy::needless_borrow lint ff37a04 and b8e1f77
  • use fast-float project-wide (apply, applydp, schema, sort, validate) #1192
  • fine tune publishing workflows to enable universally available CPU features a1dccc7
  • build(deps): bump serde from 1.0.179 to 1.0.180 by @dependabot in #1176
  • build(deps): bump pyo3 from 0.19.1 to 0.19.2 by @dependabot in #1177
  • build(deps): bump qsv-dateparser from 0.9.0 to 0.10.0 by @dependabot in #1178
  • build(deps): bump qsv-sniffer from 0.9.4 to 0.10.0 by @dependabot in #1180
  • build(deps): bump indicatif from 0.17.5 to 0.17.6 by @dependabot in #1182
  • Bump to qsv stats 0.11 #1184
  • build(deps): bump serde from 1.0.180 to 1.0.181 by @dependabot in #1185
  • build(deps): bump qsv_docopt from 1.3.0 to 1.4.0 by @dependabot in #1186
  • build(deps): bump filetime from 0.2.21 to 0.2.22 by @dependabot in #1193
  • build(deps): bump regex from 1.9.1 to 1.9.2 by @dependabot in #1194
  • build(deps): bump regex from 1.9.2 to 1.9.3 by @dependabot in #1195
  • build(deps): bump serde from 1.0.181 to 1.0.182 by @dependabot in #1196
  • build(deps): bump tempfile from 3.7.0 to 3.7.1 by @dependabot in #1199
  • build(deps): bump strum_macros from 0.25.1 to 0.25.2 by @dependabot in #1200
  • build(deps): bump serde from 1.0.182 to 1.0.183 by @dependabot in #1201
  • cargo update bump several indirect dependencies
  • apply select clippy lint suggestions
  • pin Rust nightly to 2023-08-07

Removed

  • temporarily remove rand/simd_support feature when building nightly as its causing the nightly build to fail 0a66fdb

Fixed

New Contributors

Full Changelog: 0.110.0...0.111.0

0.110.0

31 Jul 03:51
Compare
Choose a tag to compare

Added

  • describegpt: Add jsonl to prompt file doc section & more clarification by @rzmk in #1149
  • luau: add --no-jit option #1170
  • sqlp: add CTE examples 33f0218

Changed

  • frequency: minor optimizations ecac0be
  • join: performance optimizations 4cb5937 and 788360a
  • sqlp: reduce allocs in loop ae164b5
  • Apple Silicon build now uses mimalloc allocator by default bfab24a
  • build(deps): bump jql-runner from 7.0.1 to 7.0.2 by @dependabot in #1151
  • build(deps): bump serde from 1.0.171 to 1.0.173 by @dependabot in #1154
  • build(deps): bump tempfile from 3.6.0 to 3.7.0 by @dependabot in #1155
  • build(deps): bump serde from 1.0.174 to 1.0.175 by @dependabot in #1157
  • build(deps): bump redis from 0.23.0 to 0.23.1 by @dependabot in #1164
  • build(deps): bump serde from 1.0.175 to 1.0.177 by @dependabot in #1163
  • build(deps): bump serde_json from 1.0.103 to 1.0.104 by @dependabot in #1160
  • build(deps): bump grex from 1.4.1 to 1.4.2 by @dependabot in #1159
  • build(deps): bump sysinfo from 0.29.6 to 0.29.7 by @dependabot in #1158
  • build(deps): bump mlua from 0.9.0-rc.1 to 0.9.0-rc.3 by @dependabot in #1169
  • build(deps): bump flexi_logger from 0.25.5 to 0.25.6 by @dependabot in #1168
  • build(deps): bump jemallocator from 0.5.0 to 0.5.4 by @dependabot in #1167
  • build(deps): bump serde from 1.0.177 to 1.0.178 by @dependabot in #1166
  • build(deps): bump rust_decimal from 1.30.0 to 1.31.0 by @dependabot in #1172
  • build(deps): bump csvs_convert from 0.8.6 to 0.8.7 by @dependabot in #1174
  • apply clippy:needless_pass_by_ref_mut lint in select and frequency ba6566e and 83add7b
  • cargo update bump indirect dependencies
  • pin Rust nightly to 2023-07-29

Removed

  • excel: remove defunct dates-whitelist comments 2a24d2d

Fixed

  • join: fix left-semi join. Fixes #1150. #1153
  • foreach: fix command argument token splitter pattern. Fixes #1171 #1173

Full Changelog: 0.109.0...0.110.0

0.109.0

17 Jul 06:18
Compare
Choose a tag to compare

This is a monstrous👹 release with lots of new features and improvements!

The biggest new feature is the describegpt command which allows you to use OpenAI's Large Language Models to generate extended metadata from a CSV. We created this command primarily for CKAN and Datapusher+ so we can infer descriptions, tags and to automatically created annotated data dictionaries using the CSV's summary statistics and frequency tables. In that way, it works even for very large CSV files without consuming too many Open AI tokens. This is a very powerful feature and we are looking forward to seeing what people do with it. Thanks @rzmk for all the work on this!

This release also features major improvements in the sqlp and joinp commands thanks to all the new capabilities of Polars 0.31.1.

Polars SQL's capabilities have been vastly improved in 0.31.1 with numerous new SQL functions and operators, and they're all available with the sqlp command.

The joinp command has several new options for CSV parsing, for pre-join filtering (--filter-left and --filter-right), and pre-join validation with the --validate option. Two new asof join variants (--left_by and --right_by) were also added.

Added

  • describegpt command by @rzmk in #1036
  • describegpt: minor refactoring in #1104
  • describegpt: --key & QSV_OPENAI_API_KEY by @rzmk in #1105
  • describegpt: add --user-agent in help message by @rzmk in #1095
  • describegpt: json output format for redirection by @rzmk in #1107
  • describegpt: add testing (resolves #1114) by @rzmk in #1115
  • describegpt: add --model option (resolves #1101) by @rzmk in #1117
  • describegpt: polishing #1122
  • describegpt: add --jsonl option (resolves #1086) by @rzmk in #1127
  • describegpt: add --prompt-file option (resolves #1085) by @rzmk in #1120
  • joinp: added asof_by join variant; added CSV formatting options consistent with sqlp CSV format options #1090
  • joinp: add --filter-left and --filter-right options #1146
  • joinp: add --validate option #1147
  • fetch & fetchpost: add --no-cache option #1112
  • sniff: detect file kind along with mime type #1137
  • user-agent metadata now contains the current command's name #1093

Changed

Fixed

  • fmt: Quote ASCII format differently by @LemmingAvalanche in #1075
  • apply: make dynfmt subcommand case sensitive. Fixes #1126 #1130
  • applydp: make dynfmt case-sensitive #1131
  • describegpt: docs/Describegpt.md: typo 'a' --> 'an' by @rzmk in #1135
  • tojsonl: support snappy-compressed input. Fixes #1133 #1145
  • security.md: fix mailto text by @rzmk in #1079

New Contributors

Full Changelog: 0.108.0...0.109.0

0.108.0

25 Jun 17:08
ac27d40
Compare
Choose a tag to compare

Another big Quicksilver release with lots of new features and improvements!

The two Polars-powered commands - joinp and sqlp - have received significant attention. joinp now supports asof joins and the --try-parsedates option. sqlp now has several Parquet format options, along with a --low-memory option.

Other new features include:

  • A new cat rowskey --group option that emulates csvkit's csvstack command.
  • SIMD-accelerated UTF-8 validation for the input command.
  • A --field-separator option for the flatten command.
  • The sniff command now uses the excellent file-format crate for mime-type detection on ALL platforms, not just Linux, as was the case when we were using the libmagic library.

Also, QuickSilver now has optimized builds for Apple Silicon. These builds are created using native Apple Silicon self-hosted Action Runners, which means we can enable all qsv features without being constrained by cross-compilation limitations and GitHub’s Action Runner’s disk/memory constraints. Additionally, we compile Apple Silicon builds with M1/M2 chip optimizations enabled to maximize performance.

Finally, qsv startup should be noticeably faster, thanks to @vi’s PR to avoid sysinfo::System::new_all.

Added

  • joinp: added asof join & --try-parsedates option #1059
  • cat: emulate csvkit's csvstack #1067
  • input: SIMD-accelerated utf8 validation 88e1df2
  • sniff: replace magic with file-format crate, enabling mime-type detection on all platforms #1069
  • sqlp: add --low-memory option d95048e
  • sqlp: added parquet format options c179cf4 a861ebf
  • flatten: add --field-separator option #1068
  • Apple Silicon binaries built on native Apple Silicon self-hosted Action Runners, enabling all features and optimized for M1/M2 chips

Changed

  • input: minor improvements 62cff74
  • joinp: align option names with join command #1058
  • sqlp: minor improvements
  • changed all GitHub action workflows to account for the new Apple Silicon builds
  • Bump rust_decimal from 1.29.1 to 1.30.0 by @dependabot in #1049
  • Bump serde_json from 1.0.96 to 1.0.97 by @dependabot in #1051
  • Bump calamine from 0.21.0 to 0.21.1 by @dependabot in #1052
  • Bump strum from 0.24.1 to 0.25.0 by @dependabot in #1055
  • Bump actix-governor from 0.4.0 to 0.4.1 by @dependabot in #1060
  • Bump csvs_convert from 0.8.5 to 0.8.6 by @dependabot in #1061
  • Bump itertools from 0.10.5 to 0.11.0 by @dependabot in #1062
  • Bump serde_json from 1.0.97 to 1.0.99 by @dependabot in #1065
  • Bump indexmap from 1.9.3 to 2.0.0 by @dependabot in #1066
  • Bump calamine from 0.21.1 to 0.21.2 by @dependabot in #1071
  • cargo update bump various indirect dependencies
  • pin Rust nightly to 2021-06-23

Fixed

  • Avoid sysinfo::System::new_all by @vi in #1064
  • correct typos project-wide #1072

Removed

  • removed libmagic dependency from all GitHub action workflows

New Contributors

  • @vi made their first contribution in #1064

Full Changelog: 0.107.0...0.108.0