Starters

· luisbelloch's blog


Starter Recipes #

Basics #

Select first N or last M lines #

1head -n 10 data/sales.csv
2tail -n 10 data/sales.csv

Select columns 1 and 7 from a CSV #

1cut -d"|" -f 1,7 data/sales.csv

Remove first line #

1tail -n +2 data/sales.csv

Sort CSV by 3rd field #

1sort -t, -k 3

Sort second column numerically #

1sort -k2 -n -r

Join two files by columns 1 and 3 #

Assuming both files are sorted by those columns.

1join -1 1 -2 3 a.csv b.csv

Remove specific characters #

Many ways to do it, easiest is to use tr.

1> echo hello | tr -d 'l'
2heo

AWK #

From the manual: Unix tools re-implemented in simple AWK scripts

Script can be run inline, or also in a separated file: awk -f print_fields.awk file.txt

1{print $1, $3}

Sum of a column values #

1{sum += $2} END {print sum}
1BEGIN {
2  FS = ","
3}
4{
5  print $1, $3
6}

Add new field at the end #

1{print $0, $1+$2}

Count line occurrences #

1{
2  count[$0]++
3}
4
5END {
6  for (line in count) {
7    print line, count[line]
8  }
9}

Count field occurrences #

1{count[$2]++} END {for (val in count) print val, count[val]}

Associative Arrays: Sum Values for Key #

1{
2  arr[$1] += $2
3}
4
5END {
6  for (key in arr) {
7    print key, arr[key]
8  }
9}

XSV #

github.com/BurntSushi/xsv

Select Columns #

Remove all columns except first one and 7th.

1xsv select 1,7 filename.csv

Display as table #

1xsv select 1,7 filename.csv | xsv table

Change delimiters #

Assuming input file has a semi-colon ;, this command will output regular CSV:

1xsv fmt --delimiter ";" filename.csv

This is the equivalent in AWK:

1awk 'BEGIN {FS="|"; OFS=","} {$1=$1; print}'

Pandas #

Setting up a minimal environment #

Create a virtual environment, activate it and install Pandas. Using python 3.10, this takes less than 30 seconds:

1python -m venv .wadus
2source .wadus/bin/activate
3pip install pandas

Where do I get help? #

User Guide should cover 90% of the cases.

JQ #

Key-value access #

Given this file:

1{
2  "base": "USD",
3  "date": "2016-02-05",
4  "rates": {
5    "AUD": 1.3911,
6    "BGN": 1.7459
7  }
8}

We would like to produce the following jsonl file:

1{"base":"usd","quote":"AUD","mid":1.3911}
2{"base":"usd","quote":"BGN","mid":1.7459}

And the script would be:

1jq -c '.rates | to_entries[] | {base:"usd", quote: .key, mid: .value}' exchange_rates_usd.json

Convert json file to csv #

1jq -rcs '.[] | [.code,.name] | @csv' countries.jsonl

Building objects #

You can build it like { some: .code } or use the shortcut directly {code}:

1jq -rcs '.[] | {code}' countries.jsonl
1{"code":"YE"}
2{"code":"ZM"}

Arrays are built using [] instead of {}, as expected.

Full-fledged programs in external files #

Normally you would run it using -f option.

For instance, the exchange_rates.json example can be also produced like this:

1{"provider":"xe","base":"USD","quote":"AUD","mid":1.3911}
2{"provider":"xe","base":"USD","quote":"BGN","mid":1.7459}

by using:

1jq -c --arg provider 'xe' -f exchange_rates.jq exchange_rates_usd.json

where exchange_rates.jq script is:

.base as $base
| .rates
| to_entries[]
| {
    provider: $provider,
    base: $base,
    quote: .key,
    mid: .value
  }

Notice the base is fetched from the top level and reused afterwards. The provider field is set from the outside with --arg.

Where do I get help? #

Reading the manual is a good starting point. It's quite long, don't get anxiety and rtfm before googling.

For some inspiration, you may check a bigger example