Query Delta Lake tables
This guide explains how to query existing Delta Lake tables from a Hybrid Manager (HM) Lakehouse cluster.
You can connect Lakehouse clusters to Delta Lake data stored in object storage and run SQL queries using PGAA.
For background on Delta Lake integration, see Delta Lake in Hybrid Manager.
Goals
After completing this guide, you will:
✅ Be able to define PGFS storage locations pointing to Delta Lake data ✅ Be able to create PGAA reader tables that expose Delta Lake data to Postgres SQL ✅ Unlock SQL-based analytics and BI access to existing Delta Lake tables ✅ Enable Postgres tools and users to query Delta data without moving or ETL
When to use this
You should follow this guide when:
- You already have Delta Lake tables in an object storage location
- You want to query this data using Lakehouse clusters in Hybrid Manager
- You want to connect BI tools or Postgres clients to your Delta Lake data
- You are testing performance or validating Lakehouse integration with Delta
Prerequisites
- An active Lakehouse cluster provisioned via Hybrid Manager
- Delta Lake tables in an S3-compatible object storage location
- _delta_log directory and Parquet files must be present in the Delta Lake path
- Credentials to access object storage (if needed)
Steps
1. Define a PGFS storage location
Connect to your Lakehouse cluster and define a PGFS storage location.
Public bucket example:
SELECT pgfs.create_storage_location( name => 'my_public_delta_lake_store', url => 's3://my-public-delta-data/', options => '{"aws_skip_signature": "true"}', credentials => '{}' );
Private bucket example:
SELECT pgfs.create_storage_location( name => 'my_private_delta_lake_assets', url => 's3://my-private-delta-data/', options => '{}', credentials => '{"access_key_id": "...", "secret_access_key": "..."}' );
2. Enable PGAA extension
CREATE EXTENSION IF NOT EXISTS pgaa CASCADE;
3. Create PGAA external table for Delta Lake
This exposes your Delta Lake table to Postgres SQL:
CREATE TABLE public.sales_delta_table () USING PGAA WITH ( pgaa.storage_location = 'my_private_delta_lake_assets', pgaa.path = 'path/to/delta_table_root/', pgaa.format = 'delta' );
4. Query Delta Lake data
Example: get record count.
SELECT COUNT(*) FROM public.sales_delta_table;
Example: filter and project columns.
SELECT order_id, customer_name, sale_amount FROM public.sales_delta_table WHERE sale_date >= '2023-01-01' AND product_category = 'Electronics' LIMIT 100;
Notes
- Delta Lake schema is inferred from the _delta_log
- Lakehouse cluster uses a vectorized query engine for fast queries
- No catalog integration required — Delta Lake is filesystem-based
Next steps
Now that you have created a Delta Lake external table and run queries:
- You can Configure PGFS for Delta Lake to add more locations as needed
- You can Tune Delta Lake query performance to optimize large queries
- You can Integrate Lakehouse queries with BI tools to enable business intelligence access to Delta data
For an architecture overview, see Analytics in Hybrid Manager.
← Prev
Performance tuning for Delta Lake queries
↑ Up
How-To Guides for Analytics in Hybrid Manager
Next →
Query existing Apache Iceberg tables
Could this page be better? Report a problem or suggest an addition!