create suite-wide metadata cache, caching dataset-level statistics/metadata #2097

jqnatividad · 2024-08-30T22:47:51Z

Currently, stats compile field/column statistics and persists these stats to a cache file.

This cache file is used by the stats command to return stats instantaneously if the CSV has not changed.
Other "smart" commands also use the stats cache to work faster & smarter.

qsv should have a suite-wide metadata cache that compiles dataset-level statistics and metadata like:

record-level stats/metadata
- record width (max, min, avg, median, variance, stddev, mad) and remove the count --width option
package-level stats/metadata
- number of duplicate records, which is compiled by the existing sortcheck command, and added to a CSV's stats cache when sortcheck is executed. If a CSV has not changed and sortcheck is executed again, it will fetch the existing duplicate record count in the cache
- data dictionary as initially inferred by describegpt. Will have a flag to indicate if the data dictionary has been manually curated to prevent auto-updates by future runs of describegpt. If the dataset changes, this flag is reset.

The text was updated successfully, but these errors were encountered:

jqnatividad mentioned this issue Aug 30, 2024

Create a .qsv file format that is an implementation of W3C's CSV on the Web #1982

Open

jqnatividad added the DCAT3 label Aug 30, 2024

jqnatividad mentioned this issue Aug 30, 2024

count: add additional --width metrics #2093

Closed

jqnatividad added the performance label Sep 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

create suite-wide metadata cache, caching dataset-level statistics/metadata #2097

create suite-wide metadata cache, caching dataset-level statistics/metadata #2097

jqnatividad commented Aug 30, 2024 •

edited

Loading

create suite-wide metadata cache, caching dataset-level statistics/metadata #2097

create suite-wide metadata cache, caching dataset-level statistics/metadata #2097

Comments

jqnatividad commented Aug 30, 2024 • edited Loading

jqnatividad commented Aug 30, 2024 •

edited

Loading