AI pipelines Innovation Release

This documentation covers the current Innovation Release of EDB Postgres AI. See also:

Hybrid Manager dual release strategy
Documentation for the current Long-term support release

AI pipelines is a core capability of EDB Postgres AI. It enables you to build intelligent, automated AI data workflows within your Postgres clusters, including:

Data preparation
Retrieval-Augmented Generation (RAG)
Knowledge bases
Vector-powered applications

Pipelines allows you to develop vector-powered applications where your data remains securely within your database or trusted object storage, enabling sovereign AI patterns.

Why Pipelines?

Building modern GenAI and RAG applications is difficult because manual data management is error-prone. Pipelines eliminates this complexity by automating the entire lifecycle of AI-ready data:

Data cleansing: Automatically prepares and cleans data for AI workloads.
Embedding management: Generates and updates embeddings consistently.
Vector indexing: Maintains efficient vector indexes for fast retrieval.
Real-time updates: Automatically reprocesses data when source information changes, eliminating errors from stale embeddings.

Key capabilities

With Pipelines, you can transition from manual scripts to an automated, auditable system:

Automated workflows: Process text and image data for vector embeddings at scale.
Knowledge bases: Build and maintain up-to-date knowledge bases with real-time or batch updates.
Semantic search: Perform semantic and similarity search using Vector Engine.
Sovereign AI: Build fully auditable, controlled AI systems on your data without exposing it to external services.

Use cases & audience

User role	Best use case
GenAI builders	Creating reliable, enterprise-grade knowledge bases and vector-powered applications.
Data engineers	Build automated AI pipelines for AI workloads, ensuring clean, up-to-date data for modeling.
AI teams	Automating RAG workflows to ensure accurate, real-time information retrieval for GenAI applications.
Architects	Implementing Sovereign AI in highly regulated or secure environments.

Optimization & configuration

To get the best performance from your Pipelines, consider these variables:

Chunking strategy: Balance chunk size and overlap to preserve context without losing granularity.
Model selection: Ensure you use the same embedding model for both data ingestion and user queries.
Processing cadence: Tune batch sizes and parallelism to match your SLA and system memory constraints.
Search tuning: Adjust similarity thresholds and top-K results to balance accuracy with latency.

AI pipelines Innovation Release

Why Pipelines?

Key capabilities

Use cases & audience

Optimization & configuration

Overview

Data processing

Pipeline steps

Orchestration

Reference

Examples

← Prev

↑ Up

Next →