AI Factory Pipelines v1.3.5

The February 2025 Innovation Release of EDB Postgres AI is available. For more information, see the release notes.

AI Factory Pipelines is a core capability of EDB Postgres® AI (EDB PG AI), enabling intelligent, automated AI data workflows directly in your Postgres clusters — including Retrieval-Augmented Generation (RAG), Knowledge Bases, and vector-powered applications.

Pipelines abstracts away complexity in preparing, managing, and serving AI data for Postgres — integrating with Vector Engine and enabling Sovereign AI patterns where your data stays inside your database or trusted object storage.

Why Pipelines?

Modern AI workloads — especially RAG and Gen AI — require:

Clean, preprocessed data
Consistent embeddings
Efficient vector indexes
Real-time updates when source data changes

Manually managing these steps is complex and error-prone. Pipelines automates this lifecycle:

Prepares and cleans data
Runs embedding generation
Maintains vector indexes
Supports auto-processing and Knowledge Bases

Pipelines helps eliminate "hallucinations" from stale embeddings and makes building reliable Gen AI easier.

What You Can Do

With Pipelines, you can:

Automate vector embedding pipelines for text and image data
Maintain Knowledge Bases with real-time or batch updates
Perform semantic and similarity search using Vector Engine
Integrate with Gen AI Builder and other AI Factory components
Build fully auditable, controlled Sovereign AI systems on your data

How It Works

Preparers

Preparers define how source data is cleaned, chunked, and prepared for embedding. You can create preparers for:

Postgres tables (aidb.create_table_preparer())
Object storage (S3-compatible) (aidb.create_volume_preparer())

Once created, you can run:

Bulk processing (aidb.bulk_data_preparation())
Auto-processing (aidb.set_auto_preparer()) to keep embeddings up to date automatically.

See Preparer Concepts to learn more.

Knowledge Bases

Knowledge Bases provide a simple abstraction for RAG:

Define a Knowledge Base (aidb.create_table_knowledge_base(), aidb.create_volume_knowledge_base())
Automatically manage embeddings and vector indexes
Use aidb.retrieve_key() and aidb.retrieve_text() for semantic search

Supports:

Text and image data
Multi-modal models (e.g., CLIP)
Integration with Gen AI Builder’s retrieval flows

See Knowledge Base Concepts.

Auto-Processing

Pipelines supports flexible auto-processing modes:

Sync or async
Table or volume sources
Configurable batch sizes and triggers

See Auto-Processing for details.

Who Should Use Pipelines?

Pipelines is ideal for:

Gen AI Builders creating enterprise Knowledge Bases
Data Engineers building semantic search pipelines
AI teams needing automated RAG pipelines
Architects implementing Sovereign AI in regulated environments

When to Use Pipelines

Use Pipelines when you need to:

Keep embeddings current with changing data
Reduce RAG errors caused by stale or incomplete embeddings
Power multi-modal Gen AI search across text and images
Maintain full auditability and governance of embedding processes

Configuration values explained

Chunk size and overlap: Smaller chunks improve grounding; overlap helps preserve context across chunks. Tune based on document structure.
Embedding model selection: Use a consistent model for ingestion and query. Choose for your domain and language.
Auto‑processing cadence: Align with data change frequency and SLA. Consider sync vs. async modes.
Batch sizes and parallelism: Increase for throughput but monitor storage, network, and memory.
Similarity threshold and top‑K: Higher thresholds reduce noise; tune top‑K for recall vs. latency.

Learn More

AI Accelerator Pipelines helps you build trusted, scalable Sovereign AI solutions on top of your Postgres data — keeping AI pipelines clean, current, and production-ready.

Get started

Getting started

How to get started with AI Accelerator Pipelines.

Installing

How to install (or upgrade) AI Accelerator Pipelines.

Storage and access

PGFS

How to work with the Postgres File System (PGFS) in Pipelines.

Volumes

AIDB volumes for accessing PGFS storage locations.

Reference

Reference documentation for AI Accelerator Pipelines.

Release notes

Release notes for EDB Postgres AI - AI Accelerator

AI Factory Pipelines v1.3.5

Why Pipelines?

What You Can Do

How It Works

Preparers

Knowledge Bases

Auto-Processing

Who Should Use Pipelines?

When to Use Pipelines

Configuration values explained

Learn More

Start here

Pipelines overview

Compatibility

Limitations

Get started

Getting started

Installing

Core concepts

Knowledge bases

Preparers

Capabilities

Models

Storage and access

PGFS

Volumes

Reference

Reference

Release notes

Release notes

Legal

Licenses

← Prev

↑ Up