How To Get Setup for Lakehouse analytics v1.3
Getting setup for Lakehouse analytics
This document explains the prerequisites and setup steps needed before running read-only analytics on Delta Lake and Iceberg tables.
It covers the required image versions, environment variables, CA certificates, catalog configuration, and how to identify your object storage bucket.
Goals
- Verify required operand and extension versions
- Establish environment variables and credentials
- Retrieve appliance CA certificates
- Identify the appliance-provisioned object storage bucket
Requirements
You need an operand image with:
- PGD 6.0 Expanded snapshot version
- PGAA 1.2+
Check your extension version:
SELECT name, installed_version, default_version FROM pg_available_extensions WHERE name = 'pgaa';
The default_version
must be 1.2+
.
Note: If your environment requires snapshot repositories or pre-built snapshot operand images, consult your internal release notes and deployment runbooks.
Assumptions and variables
This guide assumes a PGD6 cluster has already been provisioned in Hybrid Manager. Provisioning is out of scope.
You should use a -pgdx
image version with EPAS or PGE. Vanilla PostgreSQL is not supported.
We will define the following variables for later use:
Variable | Meaning | Example |
---|---|---|
PROJECT_ID | Hybrid Manager Project ID | prj_RAi8WE2xivmeK4qm |
APPL_URL | Hybrid Manager base URL | https://portal.eks-04250637-main.edbaiplatform.enterprisedb.network |
API_PROV | Hybrid Manager token with “editor” project permissions | baak_abcdef… |
API_WRITE | Hybrid Manager token with “write_catalog” project permissions | baak_ghijkl… |
PARAMS_URI | Managed catalog API URL | https://portal.eks-04250637-main.../catalog/ |
PARAMS_WAREHOUSE | Managed catalog warehouse name (PyIceberg) | prj_RAi8WE2xivmeK4qm-catalog |
PARAMS_WAREHOUSE_ID | Managed catalog warehouse UUID (PGAA) | cd8c29f6-20fa-11f0-97cd-cfc1662c48a8 |
PARAMS_CONFIG | Managed catalog config URL (PySpark) | https://portal.eks.../config?warehouse=prj_RAi8WE2xivmeK4qm-catalog |
BUCKET_URL | URL of the bucket used by Hybrid Manager for offloads | s3://eks-04220339-main-edb-postgres |
Retrieve the appliance CA certificate
This certificate is required for the sample data loading script.
# Using kube_context kubectl --context "$kube_context" get secret -n cert-manager global-ca-secret -o yaml | yq '.data."ca.crt"' | base64 -d > ca.crt # On RHOS or avoiding kube_context kubectl get secret -n cert-manager global-ca-secret -o yaml | yq '.data."ca.crt"' | base64 -d > ca.crt # On RHOS GCP where cert-manager may not exist kubectl get secret -n edbpgai-bootstrap beaconator-ca-bundle -o json | jq -r '.data."public.crt"' | base64 --decode > ca.crt
Explanation The CA cert is needed to establish trust when connecting to the Hybrid Manager appliance APIs and services.
Create a Lakekeeper catalog
Use your API_PROV
token:
- In the Hybrid Manager UI, go to Catalogs → Add Project Catalog.
- Copy the JSON with the catalog settings, for example:
{ "uri": "https://portal-uat-rke.rke.hcp.enterprisedb.network/api/iceberg/prj_efpRstCQll6m2lGm", "warehouse": "prj_efpRstCQll6m2lGm-catalog-iXFkMR", "warehouseId": "d1d1d4ce-850a-11f0-88fb-af8509011340", "config": "https://portal-uat-rke.rke.hcp.enterprisedb.network/api/iceberg/prj_efpRstCQll6m2lGm/v1/config?warehouse=prj_efpRstCQll6m2lGm-catalog-iXFkMR" }
Explanation These parameters are used later when linking PGAA to managed catalogs or external tools like PyIceberg and PySpark.
On RHOS, use the internal Kubernetes service URL instead of the public URL for in-cluster SQL statements like pgaa.add_catalog
.
Identify your bucket URL
The appliance provisions its own object storage bucket. Construct the URL based on your CSP:
- AWS
Use the
eks-XXXX
part of the portal URL and append-main-edb-postgres
. Example:
https://portal.eks-03242048-v11x.edbaiplatform.enterprisedb.network → s3://eks-03242048-main-edb-postgres
- GCP
Use the
gke-XXXX
part of the portal URL and append-main-edb-object-storage
. Example:
https://portal.gke-05151425-v12x.edbaiplatform2.enterprisedb.network → gs://gke-05151425-main-edb-object-storage
- RHOS
Inspect a Barman backup or the
edb-object-storage
secret:
kubectl get secret edb-object-storage -o yaml | yq '.data.bucket_name' | base64 -d
Explanation
You will need the BUCKET_URL
value to reference storage in PGAA commands and sample data scripts.
Sanity checks
Check public bucket access (optional)
aws s3 ls s3://beacon-analytics-demo-data-us-east-1-dev/iceberg-example/default.db --recursive --no-sign-request
You should see files like:
2025-03-18 08:44:38 902 iceberg-example/default.db/iceberg_table/data/00000-0-... 2025-03-18 08:44:38 4340 iceberg-example/default.db/iceberg_table/metadata/4ef9... 2025-03-18 08:44:38 1823 iceberg-example/default.db/iceberg_table/metadata/snap...
Explanation This confirms that the sample Iceberg dataset exists and is accessible for testing.
Next steps
- Continue to Read with or without a catalog.
- Or, try Read/write without a catalog.
Takeaways
- Prepared environment accelerates subsequent how-tos (catalogs, readers, offloads)
- CA bundles and bucket URLs are foundational for data access and tooling
- On this page
- Getting setup for Lakehouse analytics