How-To Integrate with Third-Party Iceberg Catalogs v1.3.2

The November 2025 Innovation Release of EDB Postgres AI is available. For more information, see the release notes.

Overview

This guide provides instructions for configuring PostgreSQL Advanced Analytics (PGAA) to work with third-party Apache Iceberg® catalogs, enabling seamless data lake integration for analytical workloads. PGAA supports reading from and writing to Iceberg tables through external catalog services, allowing you to leverage existing data lake infrastructure alongside your PostgreSQL deployments.

Context and Goals

What This Guide Covers

This documentation demonstrates how to:

Configure PGAA to connect with third-party Iceberg catalogs
Enable bidirectional data flow between PostgreSQL and Iceberg tables
Set up catalog integration for PGD (Postgres Distributed) offloads
Ensure compatibility with external analytical tools

Supported Catalog Types

PGAA currently supports integration with:

Snowflake Open Data Catalog (Apache Polaris): Verified compatibility with manual OAuth token management
AWS S3 Tables: Native integration using table buckets

Prerequisites

Before beginning this integration, ensure you have:

Required Components

Active PGAA installation with appropriate licensing
PostgreSQL cluster with BDR (Bi-Directional Replication) configured
Administrative access to PGAA catalog management functions

For Snowflake Polaris Integration

Snowflake trial or production account
AWS S3 bucket configured in your organization's AWS account
IT approval for Snowflake to access your S3 bucket
OAuth client credentials from Snowflake

For AWS S3 Tables Integration

AWS account with S3 Tables enabled
Appropriate IAM permissions for S3 Tables management
S3 bucket configured as a table bucket

Practical Applications

This integration enables several key use cases:

Analytical Offloading: Move historical data from operational PostgreSQL databases to cost-effective object storage
Data Lake Integration: Query Iceberg tables directly from PostgreSQL without data movement
Hybrid Analytics: Combine real-time PostgreSQL data with historical Iceberg data in unified queries
Cross-Platform Analytics: Share data between PostgreSQL and other analytical engines through a common Iceberg catalog

Configuration Instructions

Setting Up Snowflake Polaris Integration

Step 1: Obtain OAuth Token

Due to current limitations, OAuth tokens must be obtained manually and have a 1-hour lifetime. Use the following Python script to retrieve your token:

from pyiceberg.catalog import load_catalog
from pyiceberg.catalog.rest import RestCatalog

# Replace with your Snowflake OAuth credentials
CLIENT_ID = "your_client_id"
CLIENT_SECRET = "your_client_secret"
TOKEN = f"{CLIENT_ID}:{CLIENT_SECRET}"

# Replace with your Snowflake Polaris endpoint
URI = "https://your-account.snowflakecomputing.com/polaris/api/catalog"

catalog = RestCatalog(
    name="PGAA Test",
    warehouse="pgaa_test",
    uri=URI,
    credential=TOKEN,
    scope="PRINCIPAL_ROLE:read_write_all"
)

# This will print your temporary token
print(catalog.properties["token"])

Important: Save the token output - you'll need it for the next step.

Step 2: Configure PGAA Catalog

Execute the following SQL commands to add and configure the Snowflake Polaris catalog in PGAA:

-- Remove existing catalog if present
SELECT bdr.replicate_ddl_command($$
  SELECT pgaa.delete_catalog('snowflake_polaris');
$$);

-- Add new catalog configuration
SELECT bdr.replicate_ddl_command($$
  SELECT pgaa.add_catalog(
    'snowflake_polaris',
    'iceberg-rest',
    '{"url": "https://your-account.snowflakecomputing.com/polaris/api/catalog",
      "token": "YOUR_OAUTH_TOKEN_HERE",
      "warehouse": "pgaa_test"}'
  );
$$);

-- Import catalog metadata
SELECT bdr.replicate_ddl_command($$
  SELECT pgaa.import_catalog('snowflake_polaris');
$$);

Step 3: Configure Write Permissions

Enable write operations to the catalog by configuring your BDR node group:

SELECT bdr.alter_node_group_option(
  (SELECT node_group_name FROM bdr.node_group
   WHERE node_group_name != 'world' LIMIT 1),
  'analytics_write_catalog',
  'snowflake_polaris'
);

Setting Up AWS S3 Tables Integration

For S3 Tables integration, follow the AWS S3 Tables Getting Started Guide to create your table bucket, then use iceberg-s3tables as the catalog type when configuring PGAA:

SELECT bdr.replicate_ddl_command($$
  SELECT pgaa.add_catalog(
    's3_tables_catalog',
    'iceberg-s3tables',
    '{"bucket": "your-table-bucket",
      "region": "us-east-1",
      "access_key": "YOUR_ACCESS_KEY",
      "secret_key": "YOUR_SECRET_KEY"}'
  );
$$);

Verification and Testing

After configuration, verify your integration by testing these operations:

Reading from Catalog

-- List available tables
SELECT * FROM pgaa.catalog_tables('your_catalog_name');

-- Query an Iceberg table
SELECT * FROM your_catalog_name.schema_name.table_name LIMIT 10;

Writing to Catalog

-- Create a new Iceberg table
CREATE TABLE your_catalog_name.schema_name.new_table AS
SELECT * FROM your_postgresql_table;

PGD Offloads

Verify that PGD can successfully offload data to the configured catalog according to your retention policies.

External Tool Compatibility

Test that third-party analytical tools can read both tables and views created through PGAA.

Known Limitations

Token Management: Snowflake Polaris tokens expire after 1 hour and must be manually refreshed
Naming Restrictions: Catalogs with spaces in their names may cause query errors
OAuth Flow: The OAuth client credentials flow must be performed manually outside of PGAA

Troubleshooting

Common Issues

Token Expiration Error

Symptom: Queries fail with authentication errors after approximately 1 hour
Solution: Regenerate the OAuth token and update the catalog configuration

Catalog Import Failures

Symptom: pgaa.import_catalog() returns errors
Solution: Verify network connectivity and ensure the catalog URL is accessible from all PostgreSQL nodes

Query Performance Issues

Symptom: Slow query performance when accessing Iceberg tables
Solution: Ensure proper partition pruning and consider creating materialized views for frequently accessed data

Next Steps

After successfully configuring catalog integration:

Establish token refresh procedures for production environments
Configure monitoring for catalog connectivity
Implement data lifecycle policies for automated offloading
Optimize query patterns for mixed PostgreSQL-Iceberg workloads

← Prev

How-To Read/Write Without Catalog

↑ Up

Analytics Accelerator

How-To Lakehouse Read With/Without A Catalog