How To Get Setup for Lakehouse analytics v1.3

Getting setup for Lakehouse analytics

This document explains the prerequisites and setup steps needed before running read-only analytics on Delta Lake and Iceberg tables.

It covers the required image versions, environment variables, CA certificates, catalog configuration, and how to identify your object storage bucket.


Goals

  • Verify required operand and extension versions
  • Establish environment variables and credentials
  • Retrieve appliance CA certificates
  • Identify the appliance-provisioned object storage bucket

Requirements

You need an operand image with:

  • PGD 6.0 Expanded snapshot version
  • PGAA 1.2+

Check your extension version:

SELECT name, installed_version, default_version
FROM pg_available_extensions
WHERE name = 'pgaa';

The default_version must be 1.2+.

Note: If your environment requires snapshot repositories or pre-built snapshot operand images, consult your internal release notes and deployment runbooks.


Assumptions and variables

This guide assumes a PGD6 cluster has already been provisioned in Hybrid Manager. Provisioning is out of scope.

You should use a -pgdx image version with EPAS or PGE. Vanilla PostgreSQL is not supported.

We will define the following variables for later use:

VariableMeaningExample
PROJECT_IDHybrid Manager Project IDprj_RAi8WE2xivmeK4qm
APPL_URLHybrid Manager base URLhttps://portal.eks-04250637-main.edbaiplatform.enterprisedb.network
API_PROVHybrid Manager token with “editor” project permissionsbaak_abcdef…
API_WRITEHybrid Manager token with “write_catalog” project permissionsbaak_ghijkl…
PARAMS_URIManaged catalog API URLhttps://portal.eks-04250637-main.../catalog/
PARAMS_WAREHOUSEManaged catalog warehouse name (PyIceberg)prj_RAi8WE2xivmeK4qm-catalog
PARAMS_WAREHOUSE_IDManaged catalog warehouse UUID (PGAA)cd8c29f6-20fa-11f0-97cd-cfc1662c48a8
PARAMS_CONFIGManaged catalog config URL (PySpark)https://portal.eks.../config?warehouse=prj_RAi8WE2xivmeK4qm-catalog
BUCKET_URLURL of the bucket used by Hybrid Manager for offloadss3://eks-04220339-main-edb-postgres

Retrieve the appliance CA certificate

This certificate is required for the sample data loading script.

# Using kube_context
kubectl --context "$kube_context" get secret -n cert-manager global-ca-secret -o yaml | yq '.data."ca.crt"' | base64 -d > ca.crt

# On RHOS or avoiding kube_context
kubectl get secret -n cert-manager global-ca-secret -o yaml | yq '.data."ca.crt"' | base64 -d > ca.crt

# On RHOS GCP where cert-manager may not exist
kubectl get secret -n edbpgai-bootstrap beaconator-ca-bundle -o json | jq -r '.data."public.crt"' | base64 --decode > ca.crt

Explanation The CA cert is needed to establish trust when connecting to the Hybrid Manager appliance APIs and services.


Create a Lakekeeper catalog

Use your API_PROV token:

  1. In the Hybrid Manager UI, go to Catalogs → Add Project Catalog.
  2. Copy the JSON with the catalog settings, for example:
{
  "uri": "https://portal-uat-rke.rke.hcp.enterprisedb.network/api/iceberg/prj_efpRstCQll6m2lGm",
  "warehouse": "prj_efpRstCQll6m2lGm-catalog-iXFkMR",
  "warehouseId": "d1d1d4ce-850a-11f0-88fb-af8509011340",
  "config": "https://portal-uat-rke.rke.hcp.enterprisedb.network/api/iceberg/prj_efpRstCQll6m2lGm/v1/config?warehouse=prj_efpRstCQll6m2lGm-catalog-iXFkMR"
}

Explanation These parameters are used later when linking PGAA to managed catalogs or external tools like PyIceberg and PySpark.

On RHOS, use the internal Kubernetes service URL instead of the public URL for in-cluster SQL statements like pgaa.add_catalog.


Identify your bucket URL

The appliance provisions its own object storage bucket. Construct the URL based on your CSP:

  • AWS Use the eks-XXXX part of the portal URL and append -main-edb-postgres. Example:
https://portal.eks-03242048-v11x.edbaiplatform.enterprisedb.network
→ s3://eks-03242048-main-edb-postgres
  • GCP Use the gke-XXXX part of the portal URL and append -main-edb-object-storage. Example:
https://portal.gke-05151425-v12x.edbaiplatform2.enterprisedb.network
→ gs://gke-05151425-main-edb-object-storage
  • RHOS Inspect a Barman backup or the edb-object-storage secret:
kubectl get secret edb-object-storage -o yaml | yq '.data.bucket_name' | base64 -d

Explanation You will need the BUCKET_URL value to reference storage in PGAA commands and sample data scripts.


Sanity checks

Check public bucket access (optional)

aws s3 ls s3://beacon-analytics-demo-data-us-east-1-dev/iceberg-example/default.db --recursive --no-sign-request

You should see files like:

2025-03-18 08:44:38        902 iceberg-example/default.db/iceberg_table/data/00000-0-...
2025-03-18 08:44:38       4340 iceberg-example/default.db/iceberg_table/metadata/4ef9...
2025-03-18 08:44:38       1823 iceberg-example/default.db/iceberg_table/metadata/snap...

Explanation This confirms that the sample Iceberg dataset exists and is accessible for testing.


Next steps


Takeaways

  • Prepared environment accelerates subsequent how-tos (catalogs, readers, offloads)
  • CA bundles and bucket URLs are foundational for data access and tooling