DuckDB

· luisbelloch's blog


DuckDB #

Official Guides

Installation #

It comes as a single binary, we just have to download and unpack:

1wget https://github.com/duckdb/duckdb/releases/download/v0.9.2/duckdb_cli-linux-amd64.zip
2unzip duckdb_cli-linux-amd64.zip

Running Queries #

Open Shell #

1./duckdb
2D select tx_id, item_price from read_csv('./data/sales.csv', AUTO_DETECT=TRUE) where tx_id = 'YMEVOKU194';
3┌────────────┬────────────┐
4   tx_id     item_price 
5  varchar      double   
6├────────────┼────────────┤
7 YMEVOKU194       28.85 
8 YMEVOKU194        4.12 
9└────────────┴────────────┘

Run Query and Exit #

1./duckdb -c "select * from read_csv('data/sales.csv', AUTO_DETECT=TRUE)"

Creating a database #

Just append the filename when running duckdb. For instance, this will create a table named purchases in a file at data/sales.duckdb:

1duckdb data/sales.duckdb -c "CREATE TABLE purchases AS SELECT * FROM read_csv('data/sales.csv', header=true, auto_detect=true)"

After that, you can open the file and query the purchases table:

 1./duckdb data/sales.duckdb
 2v0.9.2 3c695d7ba9
 3Enter ".help" for usage hints.
 4D describe purchases;
 5┌────────────────────┬─────────────┬─────────┬─────────┬─────────┬───────┐
 6│    column_name     │ column_type │  null   │   key   │ default │ extra │
 7│      varchar       │   varchar   │ varchar │ varchar │ varchar │ int32 │
 8├────────────────────┼─────────────┼─────────┼─────────┼─────────┼───────┤
 9│ tx_id              │ VARCHAR     │ YES     │         │         │       │
10│ tx_time            │ TIMESTAMP   │ YES     │         │         │       │
11│ buyer              │ VARCHAR     │ YES     │         │         │       │
12│ currency_code      │ VARCHAR     │ YES     │         │         │       │
13│ payment_type       │ VARCHAR     │ YES     │         │         │       │
14│ credit_card_number │ VARCHAR     │ YES     │         │         │       │
15│ country            │ VARCHAR     │ YES     │         │         │       │
16│ department         │ VARCHAR     │ YES     │         │         │       │
17│ product            │ VARCHAR     │ YES     │         │         │       │
18│ item_price         │ DOUBLE      │ YES     │         │         │       │
19│ coupon_code        │ VARCHAR     │ YES     │         │         │       │
20│ was_returned       │ VARCHAR     │ YES     │         │         │       │
21├────────────────────┴─────────────┴─────────┴─────────┴─────────┴───────┤
2212 rows                                                      6 columns │
23└────────────────────────────────────────────────────────────────────────┘
24D .mode line
25D select count(*) as con_descuento from purchases where coupon_code is not null;
26con_descuento = 414

Change output format #

Use the .mode csv, also -csv or -json command line options. You can check the available output formats in the documentation.

1D .mode csv
2D select tx_id, item_price from read_csv('./data/sales.csv', AUTO_DETECT=TRUE) where tx_id = 'YMEVOKU194';
3tx_id,item_price
4YMEVOKU194,28.85
5YMEVOKU194,4.12

Remember you can always save the data to a different file:

1./duckdb -jsonlines -c "select tx_id, item_price from read_csv('./data/sales.csv', AUTO_DETECT=TRUE) where tx_id = 'YMEVOKU194';" > ./data/sales.jsonl

Use >> instead of > to append to the file.

Converting between CSV and Parquet #

For instance, using sales.csv:

1./duckdb -c "copy (select * from read_csv('data/sales.csv', AUTO_DETECT=TRUE)) to 'data/sales.parquet' (format 'parquet')"