Create a pipeline Innovation Release
- Hybrid Manager dual release strategy
- Documentation for the current Long-term support release
A pipeline defines how data moves from a source, through one or more transformation steps, to an AI-ready destination. Creating a pipeline is the first step in making your data searchable — without one, there is no process to generate embeddings or keep your knowledge base in sync with your source data.
Use aidb.create_pipeline() to define a pipeline. Its parameters fall into four groups:
| Parameter group | Parameters | More information |
|---|---|---|
| Source | source, source_key_column, source_data_column | This page |
| Steps | step_1 … step_10, step_N_options | Pipeline steps |
| Orchestration | auto_processing, background_sync_interval, batch_size | Orchestration |
| Destination | destination, destination_key_column, destination_data_column | Set automatically by the KnowledgeBase step, or specify explicitly |
Table sources
To read from a Postgres table, set the source parameter to the table name, source_key_column to the unique key column, and source_data_column to the column containing the data to process:
SELECT aidb.create_pipeline( name => 'my_pipeline', source => 'my_table', source_key_column => 'id', source_data_column => 'content', step_1 => 'KnowledgeBase', step_1_options => aidb.knowledge_base_config('my_model', 'Text') );
Volume sources (PGFS)
To process files stored in external cloud storage (S3, GCS, or Azure), pipelines use Postgres File System (PGFS). PGFS mounts external object storage as a volume that the pipeline can scan for files.
Step 1: Create a storage location
Define the external storage connection using pgfs.create_storage_location:
SELECT pgfs.create_storage_location( name => 'my_s3_location', uri => 's3://my-bucket/my-folder', options => '{"region": "us-east-1"}' );
Step 2: Create a volume
Create an AIDB volume that references the storage location. The pipeline will scan this volume for new or changed files:
SELECT aidb.create_volume( name => 'my_volume', storage_location => 'my_s3_location' );
Step 3: Reference the volume as the pipeline source
Set the source parameter to the volume name:
SELECT aidb.create_pipeline( name => 'my_pdf_pipeline', source => 'my_volume', step_1 => 'ParsePdf', step_2 => 'KnowledgeBase', step_2_options => aidb.knowledge_base_config('my_model', 'Text') );
For full PGFS reference, see PGFS documentation.
- On this page
- Table sources
- Volume sources (PGFS)