Skip to content

Commit

Permalink
tsv-select --exclude (#267)
Browse files Browse the repository at this point in the history
* Additional tsv-select unit tests, especially empty fields.

* tsv-select performance enhancement: bulk append rest fields.

* [WIP] tsv-select --exclude initial version.

* tsv-select --exclude: Help update; Rework --rest; Drop --rest none from docs.

* tsv-select --exclude: Update PGO profiling..

* tsv-select --exclude: Update bash completion.

* Fix markdown rendering bug.

* tsv-select documentation updates.

* Minor documentation updates.
  • Loading branch information
jondegenhardt committed Mar 2, 2020
1 parent 8bfce34 commit 7d5f7da
Show file tree
Hide file tree
Showing 10 changed files with 865 additions and 74 deletions.
24 changes: 22 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ The tools work like traditional Unix command line utilities such as `cut`, `sort

The rest of this section contains descriptions of each tool. Click on the links below to jump directly to one of the tools. Full documentation is available in the [tool reference](docs/ToolReference.md).

* [tsv-filter](#tsv-filter) - Filter lines using numeric, string and regular expression comparisons against individual fields. (This description also provides an introduction to features found throughout the toolkit.)
* [tsv-filter](#tsv-filter) - Filter lines using numeric, string and regular expression comparisons against individual fields. This description also provides an introduction to features found throughout the toolkit.
* [tsv-select](#tsv-select) - Keep a subset of columns (fields). Like `cut`, but with field reordering.
* [tsv-uniq](#tsv-uniq) - Filter out duplicate lines using either the full line or individual fields as a key.
* [tsv-summarize](#tsv-summarize) - Summary statistics on selected fields, against the full data set or grouped by key.
Expand Down Expand Up @@ -167,11 +167,31 @@ See the [tsv-filter reference](docs/ToolReference.md#tsv-filter-reference) for m

### tsv-select

A version of the Unix `cut` utility with the additional ability to re-order the fields. It also helps with header lines by keeping only the header from the first file (`--header` option). The following command writes fields [4, 2, 9, 10, 11] from a pair of files to stdout:
A version of the Unix `cut` utility with the additional ability to re-order the fields. The following command writes fields [4, 2, 9, 10, 11] from a pair of files to stdout:
```
$ tsv-select -f 4,2,9-11 file1.tsv file2.tsv
```

Fields can be listed more than once, and fields not listed can be output using the `--rest` option. When working with multiple files, the `--header` option can be used to retain only the header from the first file.

Examples:
```
$ # Output fields 2 and 1, in that order
$ tsv-select -f 2,1 data.tsv
$ # Move field 7 to the start of the line
$ tsv-select -f 7 --rest last data.tsv
$ # Move field 1 to the end of the line
$ tsv-select -f 1 --rest first data.tsv
$ # Output a range of fields in reverse order
$ tsv-select -f 30-3 data.tsv
$ # Multiple files with header lines. Keep only one header.
$ tsv-select data*.tsv -H --fields 1,2,4-7,14
```

See the [tsv-select reference](docs/ToolReference.md#tsv-select-reference) for details.

### tsv-uniq
Expand Down
6 changes: 3 additions & 3 deletions bash_completion/tsv-utils
Original file line number Diff line number Diff line change
Expand Up @@ -206,16 +206,16 @@ _tsv_select()
COMPREPLY=()
cur="${COMP_WORDS[COMP_CWORD]}"
prev="${COMP_WORDS[COMP_CWORD-1]}"
opts="--help --version --header --fields --rest --delimiter"
opts="--help --version --header --fields --exclude --rest --delimiter"

# Options requiring an argument or precluding other options
# Options with a restricted set of arguments (ie. -r|--rest) have their own case clause.
case $prev in
-h|--help|-V|--version|-f|--fields|-d|--delimiter)
-h|--help|-V|--version|-f|--fields|-e|--exclude|-d|--delimiter)
return
;;
-r|--rest)
COMPREPLY=( $(compgen -W "none first last" -- ${cur}) )
COMPREPLY=( $(compgen -W "first last" -- ${cur}) )
return
;;
esac
Expand Down
6 changes: 3 additions & 3 deletions docs/ToolReference.md
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,7 @@ _**Tip:**_ Bash completion is very helpful when using commands like `tsv-filter`

## tsv-select reference

**Synopsis:** tsv-select -f <field-list> [options] [file...]
**Synopsis:** tsv-select [options] [file...]

tsv-select reads files or standard input and writes specified fields to standard output in the order listed. Similar to `cut` with the ability to reorder fields.

Expand All @@ -263,13 +263,13 @@ Fields numbers start with one. They are comma separated, and ranges can be used.
* `--V|version` - Print version information and exit.
* `--H|header` - Treat the first line of each file as a header.
* `--f|fields <field-list>` - (Required) Fields to extract. Fields are output in the order listed.
* `--r|rest none|first|last` - Location for remaining fields. Default: none
* `--r|rest first|last` - Location for remaining fields. Default: none
* `--d|delimiter CHR` - Character to use as field delimiter. Default: TAB. (Single byte UTF-8 characters only.)

**Examples:**
```
$ # Output fields 2 and 1, in that order
$ tsv-select -f 2,1 --rest first data.tsv
$ tsv-select -f 2,1 data.tsv
$ # Move field 1 to the end of the line
$ tsv-select -f 1 --rest first data.tsv
Expand Down
24 changes: 22 additions & 2 deletions tsv-select/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,31 @@ _Visit the eBay TSV utilities [main page](../README.md)_

# tsv-select

A version of the Unix `cut` utility with the additional ability to re-order the fields. It also helps with header lines by keeping only the header from the first file (`--header` option). The following command writes fields [4, 2, 9, 10, 11] from a pair of files to stdout:
A version of the Unix `cut` utility with the additional ability to re-order the fields. The following command writes fields [4, 2, 9, 10, 11] from a pair of files to stdout:
```
$ tsv-select -f 4,2,9-11 file1.tsv file2.tsv
```

Reordering fields and managing headers are useful enhancements over `cut`. However, much of the motivation for writing it was to explore the D programming language and provide a comparison point against other common approaches to this task. Code for `tsv-select` is bit more liberal with comments pointing out D programming constructs than code for the other tools. As an unexpected benefit, `tsv-select` is faster than other implementations of `cut` that are available.
Fields can be listed more than once, and fields not listed can be output using the `--rest` option. When working with multiple files, the `--header` option can be used to retain only the header from the first file.

Examples:
```
$ # Output fields 2 and 1, in that order
$ tsv-select -f 2,1 data.tsv
$ # Move field 7 to the start of the line
$ tsv-select -f 7 --rest last data.tsv
$ # Move field 1 to the end of the line
$ tsv-select -f 1 --rest first data.tsv
$ # Output a range of fields in reverse order
$ tsv-select -f 30-3 data.tsv
$ # Multiple files with header lines. Keep only one header.
$ tsv-select data*.tsv -H --fields 1,2,4-7,14
```

Reordering fields and managing headers are useful enhancements over `cut`. However, much of the motivation for writing `tsv-select` was to explore the D programming language and provide a comparison point against other common approaches to this task. Code for `tsv-select` is bit more liberal with comments pointing out D programming constructs than code for the other tools. As an unexpected benefit, `tsv-select` is faster than other implementations of `cut` that are available.

See the [tsv-select reference](../docs/ToolReference.md#tsv-select-reference) for details.
4 changes: 4 additions & 0 deletions tsv-select/profile_data/collect_profile_data.sh
Original file line number Diff line number Diff line change
Expand Up @@ -37,5 +37,9 @@ $prog profile_data_3.tsv -H -f 5,3,1 > /dev/null
$prog profile_data_3.tsv -H -f 1-3 > /dev/null
$prog profile_data_3.tsv -H -f 7 > /dev/null
$prog profile_data_3.tsv -H -f 3-6 > /dev/null
$prog profile_data_1.tsv -H -f 5 --rest last > /dev/null
$prog profile_data_1.tsv -H -f 1 --rest first > /dev/null
$prog -H --exclude 1 profile_data_1.tsv profile_data_2.tsv profile_data_3.tsv -H > /dev/null
$prog profile_data_3.tsv --exclude 2-4 > /dev/null

${ldc_profdata_tool} merge -o app.profdata profile.*.raw
Loading

0 comments on commit 7d5f7da

Please sign in to comment.