Delta Lake in Hybrid Manager v1.3
Delta Lake is an open-source table format that brings ACID transactions and reliability to data lakes.
Hub quick links: Analytics Hub — Concepts — How-Tos
Hybrid Manager (HM) integrates Delta Lake capabilities into EDB Postgres deployments, enabling Lakehouse clusters to query Delta Lake tables stored in object storage.
For a general overview of Delta Lake, see Understanding Delta Lake with EDB Solutions.
Why use Delta Lake with Hybrid Manager
- Query existing Delta Lakes: Leverage data lakes already built on Delta Lake format with fast Postgres SQL.
- Simplify analytics pipelines: Avoid unnecessary ETL—query Delta Lake tables in place from Postgres.
- Broader ecosystem integration: Connect HM-managed Postgres to Delta Lake data produced by Spark, Trino, Flink, and other tools.
- Cost-effective lakehouse architecture: Store large datasets in object storage and query via Lakehouse clusters.
Key terms and architecture overview
For definitions of analytics terms used in Hybrid Manager—such as PGFS, PGAA, Lakehouse, and Analytics Offload—see Analytics concepts (hub).
When should I use Delta Lake in Hybrid Manager?
Use Delta Lake with Hybrid Manager when you want to:
- Query existing data lakes built on Delta Lake format, without ETL or data duplication.
- Integrate Postgres and data lake ecosystems—query Delta Lake tables from Postgres SQL clients.
- Enable unified analytics across operational Postgres data and data lake data.
- Support BI tools and ad-hoc queries on Delta Lake content using familiar Postgres tools.
- Leverage Lakehouse Clusters for scalable, fast SQL on large Delta datasets.
Important: PGAA currently supports read-only queries on Delta Lake tables. Writing or updating Delta tables via PGAA is not supported.
Key capabilities of Delta Lake in Hybrid Manager
Querying existing Delta Lake tables
What: Run SQL queries on Delta Lake tables stored in object storage.
Why: Enable BI tools and Postgres users to query existing Delta Lake data without duplicating or moving it.
How: Define PGFS storage locations and PGAA external tables in Lakehouse clusters.
Where: S3-compatible object storage with Delta Lake format (_delta_log
+ Parquet files).
How-To: Read Iceberg/Delta with or without a catalog
Simplifying Postgres + Delta Lake integration
What: Connect HM Lakehouse clusters to Delta Lake tables created by Spark or other tools.
Why: Build unified reporting and analytics across your operational and data lake systems.
How: Create PGFS storage locations and PGAA reader tables pointing to Delta Lake paths.
Where: Shared object storage locations used by Delta Lake pipelines.
How-To: Configure PGFS
Supporting unified SQL-based analytics
What: Enable Postgres SQL queries over both operational data and Delta Lake data.
Why: Empower application developers, data scientists, and BI users to query data lake content without complex tooling.
How: Use PGAA reader tables in Lakehouse clusters; optionally join with Postgres data.
Where: Delta Lake tables in object storage + Postgres tables in Lakehouse cluster or via FDW/dblink.
How-To: Read Iceberg/Delta with or without a catalog
Getting started with Delta Lake in Hybrid Manager
To begin using Delta Lake with Hybrid Manager:
- Provision a Lakehouse cluster.
- Configure PGFS pointing to your Delta Lake object storage.
- Enable the
pgaa
extension on the Lakehouse Cluster. - Create PGAA reader tables for Delta Lake paths.
- Query Delta Lake tables using standard Postgres clients.