Pipelines overview Innovation Release

This documentation covers the current Innovation Release of EDB Postgres AI. See also:

Hybrid Manager dual release strategy
Documentation for the current Long-term support release

Pipelines is an EDB Postgres AI component that automates the transformation of raw data into vector-searchable insights. It integrates pgvector search capabilities with built-in data preparation tools, keeping knowledge bases synchronized with source data without manual intervention.

Integrated data lifecycle

Pipelines automates the journey from raw data to searchable insights. By managing data ingestion, embedding generation, and indexing, it ensures your vector stores stay synchronized with your source data.

Flexible model integration: Choose from supported local models, such as HuggingFace, or connect to OpenAI-compatible external APIs.
Preprocessing automation: Built-in data preparation steps handle data cleaning and transformation directly within AI pipelines.
Eliminate stale data: Auto-processing keeps embeddings current, significantly reducing the hallucinations often caused by outdated vector data.
Unified semantic search: The intelligent knowledge base feature allows you to query text and images using a single function call, regardless of whether the data lives in Postgres tables or S3-compatible object storage.

Pipeline architecture

A pipeline is configured using the aidb.create_pipeline() function and consists of three main aspects:

Data source — pipelines read from Postgres tables or from external cloud storage (S3, GCS, Azure) via Postgres File System (PGFS).
Processing steps — up to 10 sequential operations transform your data. Steps include chunking, parsing, OCR, summarization, and embedding into a knowledge base. Each step is configured using a dedicated helper function.
Orchestration — auto-processing modes (Live, Background, or Disabled) keep your knowledge base in sync with source data automatically, without manual intervention.

For details on each aspect, see:

Data processing configuration — how to configure table and volume data sources.
Pipeline steps — available step operations and their helper functions.
Orchestration — auto-processing modes, background workers, and monitoring.

Supported models

The AIDB extension supports a diverse range of Open Encoder LLMs from HuggingFace (running locally on your Postgres node) and OpenAI Encoders (via cloud API). You can view all available options in the aidb.model_providers table.

← Prev

AI pipelines

↑ Up