ClickHouse #
Installation #
1curl https://clickhouse.com/ | sh
Running Queries #
Open Local Shell #
1./clickhouse local
2ClickHouse local version 24.1.1.1017 (official build).
3
4c02g90ftml85.local :) select tx_id, item_price from file('./data/sales.csv') where tx_id = 'YMEVOKU194' settings format_csv_delimiter='|';
5
6SELECT
7 tx_id,
8 item_price
9FROM file('./data/sales.csv')
10WHERE tx_id = 'YMEVOKU194'
11SETTINGS format_csv_delimiter = '|'
12
13Query id: 214b4d6f-bfde-4fff-98da-d9d9ace6d9f6
14
15┌─tx_id──────┬─item_price─┐
16│ YMEVOKU194 │ 28.85 │
17│ YMEVOKU194 │ 4.12 │
18└────────────┴────────────┘
19
202 rows in set. Elapsed: 0.032 sec.
Run Query and Exit #
1./clickhouse local -q "SELECT * FROM file('data/sales.csv') settings format_csv_delimiter='|'"
Start server and connect to it #
1./clickhouse server
And in another window connect to it
1./clickhouse connect
Reading and writing in different formats #
Read CSV and output in Parquet:
1./clickhouse local -q "SELECT * FROM file('data/sales.csv', CSVWithNames) INTO OUTFILE 'data/sales.parquet' FORMAT Parquet" \
2 --format_csv_delimiter='|' \
3 --output_format_parquet_compression_method=gzip
And then read that Parquet file and output json lines:
1./clickhouse local -q "select * from file('data/sales.parquet') format JSONEachRow"
That would print a json object for each row in the table.
Import data into a table #
Assuming you have a server running.
When ClickHouse server is running, data can only be read from user_files
folder. First step would be to copy csv files to it:
1cp ./data/sales.csv ./user_files/
Create an script named import_data.sql
that will create the table and import data to it:
1DROP TABLE IF EXISTS compras;
2
3CREATE TABLE IF NOT EXISTS compras (
4 tx_id FixedString(10) NOT NULL,
5 tx_time DateTime NOT NULL,
6 buyer String NOT NULL,
7 currency_code FixedString(3) NOT NULL,
8 payment_type String NOT NULL,
9 credit_card_number String NOT NULL,
10 country FixedString(2) NOT NULL,
11 department String NOT NULL,
12 product String NOT NULL,
13 item_price Float64 NOT NULL,
14 coupon_code FixedString(4) NULL,
15 was_returned String NULL
16)
17ENGINE = MergeTree
18ORDER BY tx_time;
19
20SET format_csv_delimiter = '|';
21SET date_time_input_format = 'best_effort';
22
23INSERT INTO compras SELECT * FROM file('sales.csv', CSVWithNames);
And then you can run the file to do the import:
1./clickhouse client --queries-file import_data.sql
Afterwards, the data can be queried from compras
table:
1./clickhouse client -q "select currency_code, sum(item_price) from compras group by currency_code"