ClickHouse

· luisbelloch's blog


ClickHouse #

Official Documentation

Installation #

1curl https://clickhouse.com/ | sh

Running Queries #

Open Local Shell #

 1./clickhouse local
 2ClickHouse local version 24.1.1.1017 (official build).
 3
 4c02g90ftml85.local :) select tx_id, item_price from file('./data/sales.csv') where tx_id = 'YMEVOKU194' settings format_csv_delimiter='|';
 5
 6SELECT
 7    tx_id,
 8    item_price
 9FROM file('./data/sales.csv')
10WHERE tx_id = 'YMEVOKU194'
11SETTINGS format_csv_delimiter = '|'
12
13Query id: 214b4d6f-bfde-4fff-98da-d9d9ace6d9f6
14
15┌─tx_id──────┬─item_price─┐
16 YMEVOKU194       28.85 
17 YMEVOKU194        4.12 
18└────────────┴────────────┘
19
202 rows in set. Elapsed: 0.032 sec.

Run Query and Exit #

1./clickhouse local -q "SELECT * FROM file('data/sales.csv') settings format_csv_delimiter='|'"

Start server and connect to it #

1./clickhouse server

And in another window connect to it

1./clickhouse connect

Reading and writing in different formats #

Read CSV and output in Parquet:

1./clickhouse local -q "SELECT * FROM file('data/sales.csv', CSVWithNames) INTO OUTFILE 'data/sales.parquet' FORMAT Parquet" \
2  --format_csv_delimiter='|' \
3  --output_format_parquet_compression_method=gzip

And then read that Parquet file and output json lines:

1./clickhouse local -q "select * from file('data/sales.parquet') format JSONEachRow"

That would print a json object for each row in the table.

Import data into a table #

Assuming you have a server running.

When ClickHouse server is running, data can only be read from user_files folder. First step would be to copy csv files to it:

1cp ./data/sales.csv ./user_files/

Create an script named import_data.sql that will create the table and import data to it:

 1DROP TABLE IF EXISTS compras;
 2
 3CREATE TABLE IF NOT EXISTS compras (
 4  tx_id FixedString(10) NOT NULL,
 5  tx_time DateTime NOT NULL,
 6  buyer String NOT NULL,
 7  currency_code FixedString(3) NOT NULL,
 8  payment_type String NOT NULL,
 9  credit_card_number String NOT NULL,
10  country FixedString(2) NOT NULL,
11  department String NOT NULL,
12  product String NOT NULL,
13  item_price Float64 NOT NULL,
14  coupon_code FixedString(4) NULL,
15  was_returned String NULL
16)
17ENGINE = MergeTree
18ORDER BY tx_time;
19
20SET format_csv_delimiter = '|';
21SET date_time_input_format = 'best_effort';
22
23INSERT INTO compras SELECT * FROM file('sales.csv', CSVWithNames);

And then you can run the file to do the import:

1./clickhouse client --queries-file import_data.sql

Afterwards, the data can be queried from compras table:

1./clickhouse client -q "select currency_code, sum(item_price) from compras group by currency_code"

Additional information #