How To Get Setup for Lakehouse analytics v1.3.2

The November 2025 Innovation Release of EDB Postgres AI is available. For more information, see the release notes.

Getting setup for Lakehouse analytics

This document explains the prerequisites and setup steps needed before running read-only analytics on Delta Lake and Iceberg tables.

It covers the required image versions, environment variables, CA certificates, catalog configuration, and how to identify your object storage bucket.

Goals

Verify required operand and extension versions
Establish environment variables and credentials
Retrieve appliance CA certificates
Identify the appliance-provisioned object storage bucket

Requirements

You need an operand image with:

PGD 6.0 Expanded snapshot version
PGAA 1.2+

Check your extension version:

SELECT name, installed_version, default_version
FROM pg_available_extensions
WHERE name = 'pgaa';

The default_version must be 1.2+.

Note: If your environment requires snapshot repositories or pre-built snapshot operand images, consult your internal release notes and deployment runbooks.

Assumptions and variables

This guide assumes a PGD6 cluster has already been provisioned in Hybrid Manager. Provisioning is out of scope.

You should use a -pgdx image version with EPAS or PGE. Vanilla PostgreSQL is not supported.

We will define the following variables for later use:

Variable	Meaning	Example
`PROJECT_ID`	Hybrid Manager Project ID	`prj_RAi8WE2xivmeK4qm`
`APPL_URL`	Hybrid Manager base URL	`https://portal.eks-04250637-main.edbaiplatform.enterprisedb.network`
`API_PROV`	Hybrid Manager token with “editor” project permissions	`baak_abcdef…`
`API_WRITE`	Hybrid Manager token with “write_catalog” project permissions	`baak_ghijkl…`
`PARAMS_URI`	Managed catalog API URL	`https://portal.eks-04250637-main.../catalog/`
`PARAMS_WAREHOUSE`	Managed catalog warehouse name (PyIceberg)	`prj_RAi8WE2xivmeK4qm-catalog`
`PARAMS_WAREHOUSE_ID`	Managed catalog warehouse UUID (PGAA)	`cd8c29f6-20fa-11f0-97cd-cfc1662c48a8`
`PARAMS_CONFIG`	Managed catalog config URL (PySpark)	`https://portal.eks.../config?warehouse=prj_RAi8WE2xivmeK4qm-catalog`
`BUCKET_URL`	URL of the bucket used by Hybrid Manager for offloads	`s3://eks-04220339-main-edb-postgres`

Retrieve the appliance CA certificate

This certificate is required for the sample data loading script.

# Using kube_context
kubectl --context "$kube_context" get secret -n cert-manager global-ca-secret -o yaml | yq '.data."ca.crt"' | base64 -d > ca.crt

# On RHOS or avoiding kube_context
kubectl get secret -n cert-manager global-ca-secret -o yaml | yq '.data."ca.crt"' | base64 -d > ca.crt

# On RHOS GCP where cert-manager may not exist
kubectl get secret -n edbpgai-bootstrap beaconator-ca-bundle -o json | jq -r '.data."public.crt"' | base64 --decode > ca.crt

Explanation The CA cert is needed to establish trust when connecting to the Hybrid Manager appliance APIs and services.

Create a Lakekeeper catalog

Use your API_PROV token:

In the Hybrid Manager UI, go to Catalogs → Add Project Catalog.
Copy the JSON with the catalog settings, for example:

{
  "uri": "https://portal-uat-rke.rke.hcp.enterprisedb.network/api/iceberg/prj_efpRstCQll6m2lGm",
  "warehouse": "prj_efpRstCQll6m2lGm-catalog-iXFkMR",
  "warehouseId": "d1d1d4ce-850a-11f0-88fb-af8509011340",
  "config": "https://portal-uat-rke.rke.hcp.enterprisedb.network/api/iceberg/prj_efpRstCQll6m2lGm/v1/config?warehouse=prj_efpRstCQll6m2lGm-catalog-iXFkMR"
}

Explanation These parameters are used later when linking PGAA to managed catalogs or external tools like PyIceberg and PySpark.

On RHOS, use the internal Kubernetes service URL instead of the public URL for in-cluster SQL statements like pgaa.add_catalog.

Identify your bucket URL

The appliance provisions its own object storage bucket. Construct the URL based on your CSP:

AWS Use the eks-XXXX part of the portal URL and append -main-edb-postgres. Example:

https://portal.eks-03242048-v11x.edbaiplatform.enterprisedb.network
→ s3://eks-03242048-main-edb-postgres

GCP Use the gke-XXXX part of the portal URL and append -main-edb-object-storage. Example:

https://portal.gke-05151425-v12x.edbaiplatform2.enterprisedb.network
→ gs://gke-05151425-main-edb-object-storage

RHOS Inspect a Barman backup or the edb-object-storage secret:

kubectl get secret edb-object-storage -o yaml | yq '.data.bucket_name' | base64 -d

Explanation You will need the BUCKET_URL value to reference storage in PGAA commands and sample data scripts.

Sanity checks

Check public bucket access (optional)

aws s3 ls s3://beacon-analytics-demo-data-us-east-1-dev/iceberg-example/default.db --recursive --no-sign-request

You should see files like:

2025-03-18 08:44:38        902 iceberg-example/default.db/iceberg_table/data/00000-0-...
2025-03-18 08:44:38       4340 iceberg-example/default.db/iceberg_table/metadata/4ef9...
2025-03-18 08:44:38       1823 iceberg-example/default.db/iceberg_table/metadata/snap...

Explanation This confirms that the sample Iceberg dataset exists and is accessible for testing.

Next steps

Continue to Read with or without a catalog.
Or, try Read/write without a catalog.

Takeaways

Prepared environment accelerates subsequent how-tos (catalogs, readers, offloads)
CA bundles and bucket URLs are foundational for data access and tooling

On this page
Getting setup for Lakehouse analytics

↑ Up

Analytics Accelerator

How-To Read/Write Without Catalog