Skip to content

Commit

Permalink
frequency: set default --unique-text to <ALL_UNIQUE>
Browse files Browse the repository at this point in the history
adding brackets, to better differentiate it as a special value separate from values in the CSV
  • Loading branch information
jqnatividad committed Aug 6, 2024
1 parent 80ddd7b commit feafe27
Show file tree
Hide file tree
Showing 3 changed files with 8 additions and 8 deletions.
8 changes: 4 additions & 4 deletions src/cmd/frequency.rs
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ to get column cardinalities. This short-circuits frequency compilation for colum
all unique values (i.e. where rowcount == cardinality), enabling it to compute frequencies for
larger-than-memory datasets as it doesn't need to load all the column's unique values into memory.
Instead, it will use the "ALL_UNIQUE" value for columns with all unique values.
Instead, it will use the "<ALL_UNIQUE>" value for columns with all unique values.
This behavior can be adjusted with the --stats-mode option.
Expand Down Expand Up @@ -93,14 +93,14 @@ frequency options:
eliminates memory usage for columns with all unique values.
There are three modes:
auto: use stats cache if it already exists to get column cardinalities.
For columns with all unique values, "ALL_UNIQUE" will be used.
For columns with all unique values, "<ALL_UNIQUE>" will be used.
force: force stats calculation to get cardinalities.
none: don't use cardinality information.
For columns with all unique values, the first N sorted unique
values (based on the --limit and --unq-limit options) will be used.
[default: auto]
--all-unique-text <arg> The text to use for the "ALL_UNIQUE" category.
[default: ALL_UNIQUE]
--all-unique-text <arg> The text to use for the "<ALL_UNIQUE>" category.
[default: <ALL_UNIQUE>]
-j, --jobs <arg> The number of jobs to run in parallel.
This works much faster when the given CSV data has
an index already created. Note that a file handle
Expand Down
2 changes: 1 addition & 1 deletion src/cmd/schema.rs
Original file line number Diff line number Diff line change
Expand Up @@ -461,7 +461,7 @@ fn get_unique_values(
flag_ignore_case: args.flag_ignore_case,
// internal mode for getting frequency tables
flag_stats_mode: "_schema".to_string(),
flag_all_unique_text: "ALL UNIQUE".to_string(),
flag_all_unique_text: "<ALL UNIQUE>".to_string(),
flag_jobs: Some(util::njobs(args.flag_jobs)),
flag_output: None,
flag_no_headers: args.flag_no_headers,
Expand Down
6 changes: 3 additions & 3 deletions tests/test_frequency.rs
Original file line number Diff line number Diff line change
Expand Up @@ -492,7 +492,7 @@ fn frequency_all_unique_with_stats_cache() {
let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);
let expected = vec![
svec!["field", "value", "count", "percentage"],
svec!["case_enquiry_id", "ALL_UNIQUE", "100", "100"],
svec!["case_enquiry_id", "<ALL_UNIQUE>", "100", "100"],
];
assert_eq!(got, expected);
}
Expand All @@ -512,7 +512,7 @@ fn frequency_all_unique_with_stats_cache_alt_all_unique_text() {

let mut cmd = wrk.command("frequency");
cmd.args(["--select", "1"])
// "ALL_UNIQUE" in German
// "<ALL_UNIQUE>" in German
.args(["--all-unique-text", "<ALLE EINZIGARTIG>"])
.arg(testdata);

Expand All @@ -537,7 +537,7 @@ fn frequency_all_unique_force_stats_cache() {
let got: Vec<Vec<String>> = wrk.read_stdout(&mut cmd);
let expected = vec![
svec!["field", "value", "count", "percentage"],
svec!["case_enquiry_id", "ALL_UNIQUE", "100", "100"],
svec!["case_enquiry_id", "<ALL_UNIQUE>", "100", "100"],
];
assert_eq!(got, expected);
}
Expand Down

0 comments on commit feafe27

Please sign in to comment.