Query WarehousePG data from ClickHouse, and write data back to WarehousePG, without ETL pipelines or data duplication. ClickHouse connects to WarehousePG using the PostgreSQL table engine, which maps a WarehousePG table to a ClickHouse table definition and handles reads and writes over the standard Postgres wire protocol.
Note
The PostgreSQL engine connects only to the WarehousePG coordinator. It doesn't engage WarehousePG's massively parallel processing (MPP) architecture, so queries run through the coordinator as single-node Postgres queries.
Configuring WarehousePG to accept connections from ClickHouse
On the coordinator host, confirm
$COORDINATOR_DATA_DIRECTORY/postgresql.confhaslisten_addressesset to'*'or your network range. If not, update it:listen_addresses = '*'
Add an entry to
pg_hba.confthat allows connections from the ClickHouse host. Replace<clickhouse-host-network>with the network range of your ClickHouse host:echo "host <database> <user> <clickhouse-host-network>/24 md5" >> $COORDINATOR_DATA_DIRECTORY/pg_hba.conf
Reload the coordinator configuration:
gpstop -uFrom the ClickHouse host, verify the WarehousePG coordinator is reachable:
psql -h <coordinator-ip> -p <port> -d <database> -U <user>
Creating the linked table in ClickHouse
Map a WarehousePG table to a ClickHouse table definition using the PostgreSQL table engine. Once created, queries against the ClickHouse table go directly to WarehousePG in real time, with no data copied or cached locally.
In WarehousePG, create the table to expose to ClickHouse. For example:
CREATE TABLE analytics_data ( id integer primary key, event_type varchar(50), event_time timestamp, value numeric(12, 4) ); INSERT INTO analytics_data (id, event_type, event_time, value) VALUES (1, 'page_view', '2025-01-01 10:00:00', 1.0), (2, 'click', '2025-01-01 10:01:00', 2.5);
From your ClickHouse server, open a
clickhouse-clientsession:clickhouse-client
Create a database and table in ClickHouse that maps to the WarehousePG table using
ENGINE = PostgreSQL:CREATE DATABASE whpg_connect; CREATE TABLE whpg_connect.analytics_data ( id UInt64, event_type String, event_time DateTime, value Float64 ) ENGINE = PostgreSQL( '<coordinator-ip>:<port>', -- host:port of the WarehousePG coordinator '<database>', -- WarehousePG database name 'analytics_data', -- table name in WarehousePG '<user>', -- WarehousePG user '<password>' -- password (stored encrypted in ClickHouse) );
The
ENGINE = PostgreSQLsyntax takes five positional parameters:host:port, database, table, user, and password. An optional sixth parameter specifies the schema name (defaults topublic). If WarehousePG is configured withtrustauthentication, pass an empty string''for the password.Verify the table is correctly linked by inspecting its definition:
DESCRIBE TABLE whpg_connect.analytics_data;
Or view the full
CREATE TABLEstatement:SHOW CREATE TABLE whpg_connect.analytics_data;
The password is shown as
[HIDDEN]in the output.
Reading WarehousePG data from ClickHouse
Query the ClickHouse table. ClickHouse connects to WarehousePG, executes a COPY ... TO STDOUT statement, and converts the row-oriented stream into columnar format as it arrives:
SELECT * FROM whpg_connect.analytics_data;
┌─id─┬─event_type─┬─────────event_time─┬─value─┐ │ 1 │ page_view │ 2025-01-01 10:00:00│ 1 │ │ 2 │ click │ 2025-01-01 10:01:00│ 2.5 │ └────┴────────────┴────────────────────┴───────┘
Data inserted into WarehousePG after the ClickHouse table is created is immediately visible on the next SELECT. ClickHouse doesn't cache WarehousePG data locally when using the PostgreSQL engine.
Writing data to WarehousePG from ClickHouse
Use INSERT on the ClickHouse table to write data to WarehousePG. ClickHouse translates the insert into a COPY ... FROM STDIN statement on the WarehousePG side:
-- Run in ClickHouse INSERT INTO whpg_connect.analytics_data (id, event_type, event_time, value) VALUES (3, 'purchase', '2025-01-01 10:05:00', 99.99);
Confirm the row is visible in WarehousePG:
-- Run in WarehousePG SELECT * FROM analytics_data;
Understanding the query flow
Use EXPLAIN in ClickHouse to see how data moves between the systems:
EXPLAIN SELECT * FROM whpg_connect.analytics_data;
┌─explain────────────────────────────────────────────────────────────────── ────────────────────┐ │ Expression ((Project names + (Projection + Change column names to column identifiers))) │ │ ReadFromPostgreSQL │ └────────────────────────────────────────────────────────────────────────────────────────────────┘
The ReadFromPostgreSQL step establishes a TCP connection to the coordinator, sends the SQL query, and converts the incoming row-oriented stream into ClickHouse's columnar format. By the time data reaches the Expression step, it's already in columnar form.