Starter Recipes #
Basics #
Select first N or last M lines #
1head -n 10 data/sales.csv
2tail -n 10 data/sales.csv
Select columns 1 and 7 from a CSV #
1cut -d"|" -f 1,7 data/sales.csv
Remove first line #
1tail -n +2 data/sales.csv
Sort CSV by 3rd field #
1sort -t, -k 3
Sort second column numerically #
1sort -k2 -n -r
-n
tells sed to sort numerically-h
sorts numerically assuming SI qualifiers, like 4K, 2M etc-r
produces a descending order
Join two files by columns 1 and 3 #
Assuming both files are sorted by those columns.
1join -1 1 -2 3 a.csv b.csv
Remove specific characters #
Many ways to do it, easiest is to use tr.
1> echo hello | tr -d 'l'
2heo
AWK #
From the manual: Unix tools re-implemented in simple AWK scripts
Print Specific Fields #
Script can be run inline, or also in a separated file: awk -f print_fields.awk file.txt
1{print $1, $3}
Sum of a column values #
1{sum += $2} END {print sum}
Print Fields from CSV #
1BEGIN {
2 FS = ","
3}
4{
5 print $1, $3
6}
Add new field at the end #
1{print $0, $1+$2}
Count line occurrences #
1{
2 count[$0]++
3}
4
5END {
6 for (line in count) {
7 print line, count[line]
8 }
9}
Count field occurrences #
1{count[$2]++} END {for (val in count) print val, count[val]}
Associative Arrays: Sum Values for Key #
1{
2 arr[$1] += $2
3}
4
5END {
6 for (key in arr) {
7 print key, arr[key]
8 }
9}
XSV #
Select Columns #
Remove all columns except first one and 7th.
1xsv select 1,7 filename.csv
Display as table #
1xsv select 1,7 filename.csv | xsv table
Change delimiters #
Assuming input file has a semi-colon ;
, this command will output regular CSV:
1xsv fmt --delimiter ";" filename.csv
This is the equivalent in AWK:
1awk 'BEGIN {FS="|"; OFS=","} {$1=$1; print}'
Pandas #
Setting up a minimal environment #
Create a virtual environment, activate it and install Pandas. Using python 3.10, this takes less than 30 seconds:
1python -m venv .wadus
2source .wadus/bin/activate
3pip install pandas
Where do I get help? #
User Guide should cover 90% of the cases.
JQ #
Key-value access #
Given this file:
1{
2 "base": "USD",
3 "date": "2016-02-05",
4 "rates": {
5 "AUD": 1.3911,
6 "BGN": 1.7459
7 }
8}
We would like to produce the following jsonl
file:
1{"base":"usd","quote":"AUD","mid":1.3911}
2{"base":"usd","quote":"BGN","mid":1.7459}
And the script would be:
1jq -c '.rates | to_entries[] | {base:"usd", quote: .key, mid: .value}' exchange_rates_usd.json
Convert json file to csv #
1jq -rcs '.[] | [.code,.name] | @csv' countries.jsonl
Building objects #
You can build it like { some: .code }
or use the shortcut directly {code}
:
1jq -rcs '.[] | {code}' countries.jsonl
1{"code":"YE"}
2{"code":"ZM"}
Arrays are built using []
instead of {}
, as expected.
Full-fledged programs in external files #
Normally you would run it using -f
option.
For instance, the exchange_rates.json
example can be also produced like this:
1{"provider":"xe","base":"USD","quote":"AUD","mid":1.3911}
2{"provider":"xe","base":"USD","quote":"BGN","mid":1.7459}
by using:
1jq -c --arg provider 'xe' -f exchange_rates.jq exchange_rates_usd.json
where exchange_rates.jq
script is:
.base as $base
| .rates
| to_entries[]
| {
provider: $provider,
base: $base,
quote: .key,
mid: .value
}
Notice the base
is fetched from the top level and reused afterwards. The provider
field is set from the outside with --arg
.
Where do I get help? #
Reading the manual is a good starting point. It's quite long, don't get anxiety and rtfm before googling.
For some inspiration, you may check a bigger example