Postgres Analytics Accelerator (PGAA) v1.6
Postgres Analytics Accelerator (PGAA) is a high-performance extension that enables Postgres to query large-scale data stored in open table formats like Delta Lake, Apache Iceberg, and Parquet. By offloading heavy analytical queries to a vectorized execution engine, PGAA bridges the gap between operational databases and data lakes.
Get started
Compatibility: Check supported PostgreSQL versions, operating systems, and other requirements.
Architecture: Understand the core architecture and how the vectorized engine works.
Core concepts: Understand the fundamental principles of vectorized execution, data lake integration, and DirectScan.
Quickstart guide: Install PGAA, create a storage location and read table from our sample benchmark datasets.
Using PGAA
Installation: Step-by-step instructions for installing the extension and enabling the Seafowl background worker.
Configure storage locations: How to securely connect PGAA to AWS S3, GCS, and Azure Blob storage.
Read from object storage: Connect directly to S3, GCS, or Azure Blob Storage to query Parquet, Delta, or Iceberg files via a PGFS storage location.
Read using Iceberg catalogs: Integrate with external Iceberg REST catalogs to manage table metadata.
Write to object storage using CTAS: Use
CREATE TABLE AS SELECT(CTAS) to export Postgres data into optimized lakehouse formats in your object store.
Performance & optimization
Accelerate with Spark: Offload massive datasets and complex distributed joins to a remote Spark cluster via Spark Connect.
Monitor and maintain your analytical tables: Audit storage utilization, monitor table health, and perform table maintenance tasks for PGAA-managed tables.
Optimize query performance: Maximize query speeds by managing DirectScan execution, configuring compute pushdowns, and troubleshooting path fallbacks.
Reference
Configuration parameters: The behavior of the PGAA extension is governed by Grand Unified Configuration (GUC) variables. These parameters allow you to switch executors, enable performance optimizations, and manage security credentials.
Functions: PGAA introduces a suite of SQL functions for administrative tasks, such as mapping new tables, monitoring storage health, and launching maintenance background jobs.
Table options: When mapping or creating analytical tables, specific options allow you to define how data is read from or written to your object store.
Data types: PGAA maps native Postgres data types to optimized columnar formats in the data lake.
Datasets: Access pre-configured schemas and data loading instructions for analytical datasets to baseline your performance.