Analytics Concepts in Hybrid Manager v1.3
Hybrid Manager enables you to build modern data lakehouse and analytics solutions for EDB Postgres.
Hub quick links: Analytics Hub — Concepts — How-Tos
This page introduces key concepts as they are applied in Hybrid Manager — with links to detailed explanations in the Analytics Hub for background.
Architecture overview
Hybrid Manager integrates the following layers:
- Lakehouse Clusters for scalable analytical compute
- Object Storage for cost-efficient analytical data storage
- PGD Clusters for transactional workloads and automated Tiered Tables offload
- Catalog Services (HM-managed or external) for metadata and interoperability
- PGAA and PGFS to bridge Postgres with the data lake
This architecture enables both Postgres-first and multi-engine data lakehouse designs.
For a general introduction, see What is a Postgres Lakehouse.
Lakehouse Clusters in Hybrid Manager
Lakehouse Clusters provide scalable analytical compute in Hybrid Manager:
- Provisioned and managed through HM
- Equipped with PGAA extensions for vectorized query execution
- Designed to query data in object storage (Iceberg, Delta Lake)
- Interoperable with external tools (Spark, Trino)
Lakehouse Clusters are central to analytics in Hybrid Manager.
Learn more: EDB Postgres Lakehouse: An Overview
Apache Iceberg in Hybrid Manager
Hybrid Manager integrates Iceberg as the primary table format for offloaded and analytical data:
- PGD clusters offload data to Iceberg format for Tiered Tables
- Lakehouse Clusters query Iceberg tables efficiently
- Catalog services (Lakekeeper or external) manage Iceberg table metadata
Iceberg provides an open and reliable foundation for lakehouse architectures in HM.
Conceptual overview: Apache Iceberg with EDB Solutions
Delta Lake in Hybrid Manager
Delta Lake provides additional flexibility:
- Lakehouse Clusters can query existing Delta Lake tables in object storage
- Common in environments with Spark/Databricks pipelines
- Delta Lake support in HM is currently read-only
Hybrid Manager enables Postgres-based analytics on Delta Lake data without ETL.
Conceptual overview: Understanding Delta Lake with EDB Solutions
Tiered Tables in Hybrid Manager
Tiered Tables provide automated lifecycle management of large time-series or historical datasets:
- PGD clusters use BDR AutoPartition to manage partitions
- Older partitions are offloaded automatically to object storage in Iceberg format
- Lakehouse Clusters and PGD parent tables can query both hot and cold data
Tiered Tables are a key optimization pattern for combining transactional and analytical workloads in Hybrid Manager.
Conceptual overview: Understanding Tiered Tables with EDB Postgres
PGAA and PGFS in Hybrid Manager
PGAA (Analytics Accelerator)
PGAA extensions provide:
- Vectorized query execution
- Integration with object storage via PGFS
- Support for querying Iceberg and Delta Lake tables
- Catalog integration
PGAA powers both Lakehouse Clusters and PGD offload pipelines.
PGFS (Postgres File System)
PGFS defines object storage locations:
- Used by PGAA to read/write data in object storage
- Supports S3-compatible storage, GCS, and others
- Configured in both PGD clusters and Lakehouse Clusters
PGFS is a simple but critical building block for data lake integration.
How these concepts fit together in Hybrid Manager
Layer | Role |
---|---|
PGD Clusters | Transactional layer, source for Tiered Tables |
Lakehouse Clusters | Analytical compute, queries across Iceberg/Delta and Postgres |
Object Storage | Cost-efficient analytical data storage |
Catalog Services | Centralized metadata for Iceberg tables |
PGAA + PGFS | Bridge between Postgres and object storage |