AI pipelines Innovation Release

AI pipelines is a core capability of EDB Postgres AI. It enables you to build intelligent, automated AI data workflows within your Postgres clusters, including:

  • Data preparation
  • Retrieval-Augmented Generation (RAG)
  • Knowledge bases
  • Vector-powered applications

Pipelines allows you to develop vector-powered applications where your data remains securely within your database or trusted object storage, enabling sovereign AI patterns.

Why Pipelines?

Building modern GenAI and RAG applications is difficult because manual data management is error-prone. Pipelines eliminates this complexity by automating the entire lifecycle of AI-ready data:

  • Data cleansing: Automatically prepares and cleans data for AI workloads.

  • Embedding management: Generates and updates embeddings consistently.

  • Vector indexing: Maintains efficient vector indexes for fast retrieval.

  • Real-time updates: Automatically reprocesses data when source information changes, eliminating errors from stale embeddings.

Key capabilities

With Pipelines, you can transition from manual scripts to an automated, auditable system:

  • Automated workflows: Process text and image data for vector embeddings at scale.

  • Knowledge bases: Build and maintain up-to-date knowledge bases with real-time or batch updates.

  • Semantic search: Perform semantic and similarity search using Vector Engine.

  • Sovereign AI: Build fully auditable, controlled AI systems on your data without exposing it to external services.

Use cases & audience

User roleBest use case
GenAI buildersCreating reliable, enterprise-grade knowledge bases and vector-powered applications.
Data engineersBuild automated AI pipelines for AI workloads, ensuring clean, up-to-date data for modeling.
AI teamsAutomating RAG workflows to ensure accurate, real-time information retrieval for GenAI applications.
ArchitectsImplementing Sovereign AI in highly regulated or secure environments.

Optimization & configuration

To get the best performance from your Pipelines, consider these variables:

  • Chunking strategy: Balance chunk size and overlap to preserve context without losing granularity.

  • Model selection: Ensure you use the same embedding model for both data ingestion and user queries.

  • Processing cadence: Tune batch sizes and parallelism to match your SLA and system memory constraints.

  • Search tuning: Adjust similarity thresholds and top-K results to balance accuracy with latency.

Overview

EDB Postgres AI's Pipelines automate the transformation of raw data into vector-searchable insights, orchestrating data ingestion, processing, and embedding generation for seamless AI integration.

Data processing

Configure data processing steps in EDB Postgres AI Pipelines to transform raw text and images into structured, vector-searchable formats using built-in AI functions for parsing, chunking, and embedding.

Orchestration

Orchestration in EDB Postgres AI Pipelines manages the execution of AI workflows, providing auto-processing, background workers, and observability to ensure efficient and reliable data transformation from raw inputs to vector-searchable insights.

Reference

Reference documentation for the EDB AI pipelines API — types, views, core functions, and configuration helpers.

Examples

Explore practical examples of EDB Postgres AI Pipelines, demonstrating how to create multi-step workflows with intermediate storage for processing raw data into vector-searchable insights.