Analytics Concepts in Hybrid Manager v1.3.5

The February 2025 Innovation Release of EDB Postgres AI is available. For more information, see the release notes.

Hybrid Manager enables you to build modern data lakehouse and analytics solutions for EDB Postgres.

This page introduces key concepts as they are applied in Hybrid Manager — with links to detailed explanations in the Analytics Hub for background.

Architecture overview

Hybrid Manager integrates the following layers:

Lakehouse Clusters for scalable analytical compute
Object Storage for cost-efficient analytical data storage
PGD Clusters for transactional workloads and automated Tiered Tables offload
Catalog Services (HM-managed or external) for metadata and interoperability
PGAA and PGFS to bridge Postgres with the data lake

This architecture enables both Postgres-first and multi-engine data lakehouse designs.

Lakehouse Clusters in Hybrid Manager

Lakehouse Clusters provide scalable analytical compute in Hybrid Manager:

Provisioned and managed through HM
Equipped with PGAA extensions for vectorized query execution
Designed to query data in object storage (Iceberg, Delta Lake)
Interoperable with external tools (Spark, Trino)

Lakehouse Clusters are central to analytics in Hybrid Manager.

Apache Iceberg® in Hybrid Manager

Hybrid Manager integrates Iceberg as the primary table format for offloaded and analytical data:

PGD clusters offload data to Iceberg format for Tiered Tables
Lakehouse Clusters query Iceberg tables efficiently
Catalog services (Lakekeeper or external) manage Iceberg table metadata

Iceberg provides an open and reliable foundation for lakehouse architectures in HM.

Delta Lake in Hybrid Manager

Delta Lake provides additional flexibility:

Lakehouse Clusters can query existing Delta Lake tables in object storage
Common in environments with Spark/Databricks pipelines
Delta Lake support in HM is currently read-only

Hybrid Manager enables Postgres-based analytics on Delta Lake data without ETL.

Tiered Tables in Hybrid Manager

Tiered Tables provide automated lifecycle management of large time-series or historical datasets:

PGD clusters use BDR AutoPartition to manage partitions
Older partitions are offloaded automatically to object storage in Iceberg format
Lakehouse Clusters and PGD parent tables can query both hot and cold data

Tiered Tables are a key optimization pattern for combining transactional and analytical workloads in Hybrid Manager.

PGAA and PGFS in Hybrid Manager

PGAA (Analytics Accelerator)

PGAA extensions provide:

Vectorized query execution
Integration with object storage via PGFS
Support for querying Iceberg and Delta Lake tables
Catalog integration

PGAA powers both Lakehouse Clusters and PGD offload pipelines.

PGFS (Postgres File System)

PGFS defines object storage locations:

Used by PGAA to read/write data in object storage
Supports S3-compatible storage, GCS, and others
Configured in both PGD clusters and Lakehouse Clusters

PGFS is a simple but critical building block for data lake integration.

How these concepts fit together in Hybrid Manager

Layer	Role
PGD Clusters	Transactional layer, source for Tiered Tables
Lakehouse Clusters	Analytical compute, queries across Iceberg/Delta and Postgres
Object Storage	Cost-efficient analytical data storage
Catalog Services	Centralized metadata for Iceberg tables
PGAA + PGFS	Bridge between Postgres and object storage

Next topic

Solving Analytics Problems in Hybrid Manager

← Prev

Analytics in Hybrid Manager

↑ Up

Analytics in Hybrid Manager

Delta Lake in Hybrid Manager

Analytics Concepts in Hybrid Manager v1.3.5

Architecture overview

Lakehouse Clusters in Hybrid Manager

Apache Iceberg® in Hybrid Manager

Delta Lake in Hybrid Manager

Tiered Tables in Hybrid Manager

PGAA and PGFS in Hybrid Manager

PGAA (Analytics Accelerator)

PGFS (Postgres File System)

How these concepts fit together in Hybrid Manager

Next topic

← Prev

↑ Up

Next →