sortcheck

Check if a CSV is sorted. With the --json options, also retrieve record count, sort breaks & duplicate count.

Table of Contents | Source: src/cmd/sortcheck.rs | 👆

Description | Examples | Usage | Sort Options | Common Options

Description ↩

Check if a CSV is sorted. The check is done on a streaming basis (i.e. constant memory). With the --json options, also retrieve record count, sort breaks & duplicate count.

This command can be used in tandem with other qsv commands that sort or require sorted data to ensure that they also work on a stream of data - i.e. without loading an entire CSV into memory.

For instance, a naive dedup requires loading the entire CSV into memory to sort it first before deduping. However, if you know a CSV is sorted beforehand, you can invoke dedup with the --sorted option, and it will skip loading entire CSV into memory to sort it first. It will just immediately dedupe on a streaming basis.

sort also requires loading the entire CSV into memory. For very large CSV files that will not fit in memory, extsort - a multi-threaded streaming sort that can work with arbitrarily large files - can be used instead.

Use --numeric or --natural to verify the file matches the order produced by sort --numeric or sort --natural before piping into a downstream command (e.g. dedup --numeric --sorted). When multiple comparison flags are set, --natural takes precedence over --numeric, which takes precedence over --ignore-case (matching sort and dedup semantics).

Simply put, sortcheck allows you to make informed choices on how to compose pipelines that require sorted data.

STATS-CACHE AWARE: when checking a single column with the default lexicographic or --numeric comparison and a valid stats cache exists (see qsv stats --stats-jsonl), sortcheck answers "is it sorted?" instantly from the cached sort order instead of scanning the file. This applies only to the exit-code path; --json/--pretty-json always do a full scan for exact counts. Disable with QSV_STATSCACHE_MODE=none.

Returns exit code 0 if a CSV is sorted, and exit code 1 otherwise.

Examples ↩

Check if file.csv is lexicographically sorted on all columns:

qsv sortcheck file.csv

Check column "name" only, ignoring case:

qsv sortcheck --select name --ignore-case file.csv

Verify file.csv is sorted numerically before piping into dedup --numeric --sorted:

qsv sortcheck --numeric file.csv && qsv dedup --numeric --sorted file.csv

Check natural order (e.g. item1, item2, item10) and emit JSON stats:

qsv sortcheck --natural --json file.csv

For more examples, see tests.

Usage ↩

qsv sortcheck [options] [<input>]
qsv sortcheck --help

Sort Options ↩

Option	Type	Description
`‑s,` `‑‑select`	string	Select a subset of columns to check for sort. See 'qsv select --help' for the format details.
`‑N,` `‑‑numeric`	flag	Compare according to string numerical value.
`‑‑natural`	flag	Compare using natural sort order (e.g. item1 < item2 < item10). Takes precedence over --numeric. Composes with --ignore-case.
`‑i,` `‑‑ignore‑case`	flag	Compare strings disregarding case. Ignored under pure numeric comparison (i.e. --numeric without --natural), since numeric comparison is case-insensitive by definition.
`‑‑all`	flag	Check all records. Do not stop/short-circuit the check on the first unsorted record.
`‑‑json`	flag	Return results in JSON format, scanning --all records. The JSON result has the following properties - sorted (boolean), record_count (number), unsorted_breaks (number) & dupe_count (number). Unsorted breaks count the number of times two consecutive rows are unsorted (i.e. n row > n+1 row). Dupe count is the number of times two consecutive rows are equal. Note that dupe count does not apply if the file is not sorted and is set to -1.
`‑‑pretty‑json`	flag	Same as --json but in pretty JSON format.

Common Options ↩

Option	Type	Description
`‑h,` `‑‑help`	flag	Display this message
`‑n,` `‑‑no‑headers`	flag	When set, the first row will not be interpreted as headers. That is, it will be sorted with the rest of the rows. Otherwise, the first row will always appear as the header row in the output.
`‑d,` `‑‑delimiter`	string	The field delimiter for reading CSV data. Must be a single character. (default: ,)
`‑p,` `‑‑progressbar`	flag	Show progress bars. Not valid for stdin.

Source: src/cmd/sortcheck.rs | Table of Contents | README

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sortcheck

Description ↩

Examples ↩

Usage ↩

Sort Options ↩

Common Options ↩

FilesExpand file tree

sortcheck.md

Latest commit

History

sortcheck.md

File metadata and controls

sortcheck

Description ↩

Examples ↩

Usage ↩

Sort Options ↩

Common Options ↩