DuckDB #
Installation #
It comes as a single binary, we just have to download and unpack:
1wget https://github.com/duckdb/duckdb/releases/download/v0.9.2/duckdb_cli-linux-amd64.zip
2unzip duckdb_cli-linux-amd64.zip
Running Queries #
Open Shell #
1./duckdb
2D select tx_id, item_price from read_csv('./data/sales.csv', AUTO_DETECT=TRUE) where tx_id = 'YMEVOKU194';
3┌────────────┬────────────┐
4│ tx_id │ item_price │
5│ varchar │ double │
6├────────────┼────────────┤
7│ YMEVOKU194 │ 28.85 │
8│ YMEVOKU194 │ 4.12 │
9└────────────┴────────────┘
Run Query and Exit #
1./duckdb -c "select * from read_csv('data/sales.csv', AUTO_DETECT=TRUE)"
Creating a database #
Just append the filename when running duckdb
. For instance, this will create a table named purchases
in a file at data/sales.duckdb
:
1duckdb data/sales.duckdb -c "CREATE TABLE purchases AS SELECT * FROM read_csv('data/sales.csv', header=true, auto_detect=true)"
After that, you can open the file and query the purchases
table:
1./duckdb data/sales.duckdb
2v0.9.2 3c695d7ba9
3Enter ".help" for usage hints.
4D describe purchases;
5┌────────────────────┬─────────────┬─────────┬─────────┬─────────┬───────┐
6│ column_name │ column_type │ null │ key │ default │ extra │
7│ varchar │ varchar │ varchar │ varchar │ varchar │ int32 │
8├────────────────────┼─────────────┼─────────┼─────────┼─────────┼───────┤
9│ tx_id │ VARCHAR │ YES │ │ │ │
10│ tx_time │ TIMESTAMP │ YES │ │ │ │
11│ buyer │ VARCHAR │ YES │ │ │ │
12│ currency_code │ VARCHAR │ YES │ │ │ │
13│ payment_type │ VARCHAR │ YES │ │ │ │
14│ credit_card_number │ VARCHAR │ YES │ │ │ │
15│ country │ VARCHAR │ YES │ │ │ │
16│ department │ VARCHAR │ YES │ │ │ │
17│ product │ VARCHAR │ YES │ │ │ │
18│ item_price │ DOUBLE │ YES │ │ │ │
19│ coupon_code │ VARCHAR │ YES │ │ │ │
20│ was_returned │ VARCHAR │ YES │ │ │ │
21├────────────────────┴─────────────┴─────────┴─────────┴─────────┴───────┤
22│ 12 rows 6 columns │
23└────────────────────────────────────────────────────────────────────────┘
24D .mode line
25D select count(*) as con_descuento from purchases where coupon_code is not null;
26con_descuento = 414
Change output format #
Use the .mode csv
, also -csv
or -json
command line options. You can check the available output formats in the documentation.
1D .mode csv
2D select tx_id, item_price from read_csv('./data/sales.csv', AUTO_DETECT=TRUE) where tx_id = 'YMEVOKU194';
3tx_id,item_price
4YMEVOKU194,28.85
5YMEVOKU194,4.12
Remember you can always save the data to a different file:
1./duckdb -jsonlines -c "select tx_id, item_price from read_csv('./data/sales.csv', AUTO_DETECT=TRUE) where tx_id = 'YMEVOKU194';" > ./data/sales.jsonl
Use >>
instead of >
to append to the file.
Converting between CSV and Parquet #
For instance, using sales.csv
:
1./duckdb -c "copy (select * from read_csv('data/sales.csv', AUTO_DETECT=TRUE)) to 'data/sales.parquet' (format 'parquet')"