Setting up PGAA v6.3.1
Tiered storage in PGD is powered by the Postgres Analytics Accelerator (PGAA) extension. PGAA provides the vectorized query engine and cloud-native storage handlers that allow PGD to offload data partitions to object storage and query them in place.
Before you can configure tiered tables, replicate to analytics, or offload data, you must install PGAA on every node in your cluster and define the object storage destination. Configuring the storage target at the node-group level ensures that every node writes to and reads from the same location.
Prerequisites
- Cluster: PGD version 6.1 or later with PGAA and PGFS extensions installed.
- Storage locations: Local, S3, GCP, or Azure storage using Iceberg or Delta Lake formats.
- Catalog: If using an external catalog service, only Iceberg REST catalogs are supported.
- Permissions: The database user must have
CREATE,ALTER, andEXECUTEprivileges for the PGD and PGAA functions.
Note
The credentials associated with a storage location or a catalog service must have both read and write permissions for the destination bucket. You can verify permissions by running the following functions, which return NULL if successful.
- For a storage location:
SELECT pgaa.test_storage_location ('my_storage_location', true);
- For a catalog service:
SELECT pgaa.test_catalog('my_iceberg_catalog', test_writes := FALSE);
Downloading and installing the PGAA extension
Install the PGAA extension on every node in your PGD cluster.
Download the package from the EDB repository:
export EDB_SUBSCRIPTION_TOKEN=<your-token> export EDB_SUBSCRIPTION_PLAN=<your-subscription-plan> curl -1sSLf "https://downloads.enterprisedb.com/$EDB_SUBSCRIPTION_TOKEN/$EDB_SUBSCRIPTION_PLAN/setup.deb.sh" | sudo -E bash sudo apt-get install -y <postgresql-distribution>-pgaa
export EDB_SUBSCRIPTION_TOKEN=<your-token> EDB_SUBSCRIPTION_PLAN=<your-subscription-plan> curl -1sSLf "https://downloads.enterprisedb.com/$EDB_SUBSCRIPTION_TOKEN/$EDB_SUBSCRIPTION_PLAN/setup.rpm.sh" | sudo -E bash sudo dnf install -y <postgresql-distribution>-pgaa
Where:
$EDB_SUBSCRIPTION_TOKENis the token you received when you registered for the EDB subscription.$EDB_SUBSCRIPTION_PLANis eitherstandard(for PGE or Community Postgres) orenterprise(for EPAS).<postgresql-distribution>specifies your Postgres distribution (edb-postgresextended,edb-as, orpostgres), and version.
Update your
postgresql.conffile. Addingpgaatoshared_preload_librariesallows Postgres to manage the life cycle of the Seafowl query engine as a background worker.shared_preload_libraries = 'pgaa' pgaa.autostart_seafowl = on
Restart your Postgres service to load the new library.
Pointing to object storage
Define the destination for the cold data. Because you are working within a distributed PGD cluster, you must wrap these configuration calls in bdr.replicate_ddl_command to ensure the settings are propagated to all nodes in the group.
Tip
To view your node group, use SELECT node_group_id FROM bdr.node_group;.
Create a storage location using the PGFS extension and set it as the analytics storage for your node group.
SELECT bdr.replicate_ddl_command($$ SELECT pgfs.create_storage_location( 'my-storage-location', 'protocol://my-bucket-name/path/', '{"region": "region-name"}', '{"access_key_id": "...", "secret_access_key": "..."}' ) $$);
Clear
analytics_write_catalog. Its value must beNULLbefore ananalytics_storage_locationcan be assigned:SELECT bdr.alter_node_group_option('my-pgd-group', 'analytics_write_catalog', NULL);
Set the analytics storage location for your node group:
SELECT bdr.alter_node_group_option('my-pgd-group', 'analytics_storage_location', 'my-storage-location');
Configure a connection to an external catalog:
SELECT bdr.replicate_ddl_command($$ SELECT pgaa.add_catalog( 'my-catalog-name', 'iceberg-rest', '{ "url": "https://my-catalog-rest-endpoint.com", "token": "MY_AUTH_TOKEN", }' ) );
Clear
analytics_storage_location. Its value must beNULLbefore ananalytics_write_catalogcan be assigned:SELECT bdr.alter_node_group_option('my-pgd-group', 'analytics_storage_location', NULL);
Set the catalog as the default analytics write location:
SELECT bdr.replicate_ddl_command($$ SELECT bdr.alter_node_group_option('my-pgd-group', 'analytics_write_catalog', 'my-catalog-name') );
Next steps
Once you have defined the storage target for your PGD cluster, determine how your tables interact with that storage based on your requirements for performance and capacity:
Implement tiered tables: Establish a zero-touch data lifecycle by automatically transitioning partitions from local heap storage to your analytics storage targed based on an age threshold.
Replicate to analytics: Maintain a local transactional heap table while simultaneously synchronizing a columnar copy to your storage target for heavy analytical processing.
Offload to analytics: Reclaim heap disk space from existing tables which are being replicated to analytics.