Query Delta Lake tables

This guide explains how to query existing Delta Lake tables from a Hybrid Manager (HM) Lakehouse cluster.

You can connect Lakehouse clusters to Delta Lake data stored in object storage and run SQL queries using PGAA.

For background on Delta Lake integration, see Delta Lake in Hybrid Manager.


Goals

After completing this guide, you will:

✅ Be able to define PGFS storage locations pointing to Delta Lake data ✅ Be able to create PGAA reader tables that expose Delta Lake data to Postgres SQL ✅ Unlock SQL-based analytics and BI access to existing Delta Lake tables ✅ Enable Postgres tools and users to query Delta data without moving or ETL


When to use this

You should follow this guide when:

  • You already have Delta Lake tables in an object storage location
  • You want to query this data using Lakehouse clusters in Hybrid Manager
  • You want to connect BI tools or Postgres clients to your Delta Lake data
  • You are testing performance or validating Lakehouse integration with Delta

Prerequisites

  • An active Lakehouse cluster provisioned via Hybrid Manager
  • Delta Lake tables in an S3-compatible object storage location
  • _delta_log directory and Parquet files must be present in the Delta Lake path
  • Credentials to access object storage (if needed)

Steps

1. Define a PGFS storage location

Connect to your Lakehouse cluster and define a PGFS storage location.

Public bucket example:

SELECT pgfs.create_storage_location(
name => 'my_public_delta_lake_store',
url => 's3://my-public-delta-data/',
options => '{"aws_skip_signature": "true"}',
credentials => '{}'
);

Private bucket example:

SELECT pgfs.create_storage_location(
name => 'my_private_delta_lake_assets',
url => 's3://my-private-delta-data/',
options => '{}',
credentials => '{"access_key_id": "...", "secret_access_key": "..."}'
);

2. Enable PGAA extension

CREATE EXTENSION IF NOT EXISTS pgaa CASCADE;

3. Create PGAA external table for Delta Lake

This exposes your Delta Lake table to Postgres SQL:

CREATE TABLE public.sales_delta_table ()
USING PGAA
WITH (
pgaa.storage_location = 'my_private_delta_lake_assets',
pgaa.path = 'path/to/delta_table_root/',
pgaa.format = 'delta'
);

4. Query Delta Lake data

Example: get record count.

SELECT COUNT(*) FROM public.sales_delta_table;

Example: filter and project columns.

SELECT order_id, customer_name, sale_amount
FROM public.sales_delta_table
WHERE sale_date >= '2023-01-01'
AND product_category = 'Electronics'
LIMIT 100;

Notes

  • Delta Lake schema is inferred from the _delta_log
  • Lakehouse cluster uses a vectorized query engine for fast queries
  • No catalog integration required — Delta Lake is filesystem-based

Next steps

Now that you have created a Delta Lake external table and run queries:


For an architecture overview, see Analytics in Hybrid Manager.


Could this page be better? Report a problem or suggest an addition!