Create a pipeline Innovation Release

This documentation covers the current Innovation Release of EDB Postgres AI. See also:

Hybrid Manager dual release strategy
Documentation for the current Long-term support release

A pipeline defines how data moves from a source, through one or more transformation steps, to an AI-ready destination. Creating a pipeline is the first step in making your data searchable — without one, there is no process to generate embeddings or keep your knowledge base in sync with your source data.

Use aidb.create_pipeline() to define a pipeline. Its parameters fall into four groups:

Parameter group	Parameters	More information
Source	`source`, `source_key_column`, `source_data_column`	This page
Steps	`step_1` … `step_10`, `step_N_options`	Pipeline steps
Orchestration	`auto_processing`, `background_sync_interval`, `batch_size`	Orchestration
Destination	`destination`, `destination_key_column`, `destination_data_column`	Set automatically by the `KnowledgeBase` step, or specify explicitly

Table sources

To read from a Postgres table, set the source parameter to the table name, source_key_column to the unique key column, and source_data_column to the column containing the data to process:

SELECT aidb.create_pipeline(
    name               => 'my_pipeline',
    source             => 'my_table',
    source_key_column  => 'id',
    source_data_column => 'content',
    step_1             => 'KnowledgeBase',
    step_1_options     => aidb.knowledge_base_config('my_model', 'Text')
);

Volume sources (PGFS)

To process files stored in external cloud storage (S3, GCS, or Azure), pipelines use Postgres File System (PGFS). PGFS mounts external object storage as a volume that the pipeline can scan for files.

Step 1: Create a storage location

Define the external storage connection using pgfs.create_storage_location:

SELECT pgfs.create_storage_location(
    name     => 'my_s3_location',
    uri      => 's3://my-bucket/my-folder',
    options  => '{"region": "us-east-1"}'
);

Step 2: Create a volume

Create an AIDB volume that references the storage location. The pipeline will scan this volume for new or changed files:

SELECT aidb.create_volume(
    name             => 'my_volume',
    storage_location => 'my_s3_location'
);

Step 3: Reference the volume as the pipeline source

Set the source parameter to the volume name:

SELECT aidb.create_pipeline(
    name       => 'my_pdf_pipeline',
    source     => 'my_volume',
    step_1     => 'ParsePdf',
    step_2     => 'KnowledgeBase',
    step_2_options => aidb.knowledge_base_config('my_model', 'Text')
);

For full PGFS reference, see PGFS documentation.

← Prev

Pipelines overview

↑ Up

AI pipelines

Pipeline steps