Query Delta Lake tables

This guide explains how to query existing Delta Lake tables from a Hybrid Manager (HM) Lakehouse cluster.

You can connect Lakehouse clusters to Delta Lake data stored in object storage and run SQL queries using PGAA.

For background on Delta Lake integration, see Delta Lake in Hybrid Manager.

Goals

After completing this guide, you will:

✅ Be able to define PGFS storage locations pointing to Delta Lake data ✅ Be able to create PGAA reader tables that expose Delta Lake data to Postgres SQL ✅ Unlock SQL-based analytics and BI access to existing Delta Lake tables ✅ Enable Postgres tools and users to query Delta data without moving or ETL

When to use this

You should follow this guide when:

You already have Delta Lake tables in an object storage location
You want to query this data using Lakehouse clusters in Hybrid Manager
You want to connect BI tools or Postgres clients to your Delta Lake data
You are testing performance or validating Lakehouse integration with Delta

Prerequisites

An active Lakehouse cluster provisioned via Hybrid Manager
Delta Lake tables in an S3-compatible object storage location
_delta_log directory and Parquet files must be present in the Delta Lake path
Credentials to access object storage (if needed)

Steps

1. Define a PGFS storage location

Connect to your Lakehouse cluster and define a PGFS storage location.

Public bucket example:

SELECT pgfs.create_storage_location(
name => 'my_public_delta_lake_store',
url => 's3://my-public-delta-data/',
options => '{"aws_skip_signature": "true"}',
credentials => '{}'
);

Private bucket example:

SELECT pgfs.create_storage_location(
name => 'my_private_delta_lake_assets',
url => 's3://my-private-delta-data/',
options => '{}',
credentials => '{"access_key_id": "...", "secret_access_key": "..."}'
);

2. Enable PGAA extension

CREATE EXTENSION IF NOT EXISTS pgaa CASCADE;

3. Create PGAA external table for Delta Lake

This exposes your Delta Lake table to Postgres SQL:

CREATE TABLE public.sales_delta_table ()
USING PGAA
WITH (
pgaa.storage_location = 'my_private_delta_lake_assets',
pgaa.path = 'path/to/delta_table_root/',
pgaa.format = 'delta'
);

4. Query Delta Lake data

Example: get record count.

SELECT COUNT(*) FROM public.sales_delta_table;

Example: filter and project columns.

SELECT order_id, customer_name, sale_amount
FROM public.sales_delta_table
WHERE sale_date >= '2023-01-01'
AND product_category = 'Electronics'
LIMIT 100;

Notes

Delta Lake schema is inferred from the _delta_log
Lakehouse cluster uses a vectorized query engine for fast queries
No catalog integration required — Delta Lake is filesystem-based

Next steps

Now that you have created a Delta Lake external table and run queries:

You can Configure PGFS for Delta Lake to add more locations as needed
You can Tune Delta Lake query performance to optimize large queries
You can Integrate Lakehouse queries with BI tools to enable business intelligence access to Delta data

For an architecture overview, see Analytics in Hybrid Manager.

← Prev

Performance tuning for Delta Lake queries

↑ Up

How-To Guides for Analytics in Hybrid Manager

Query existing Apache Iceberg tables

Could this page be better? Report a problem or suggest an addition!