Auto-processing Innovation Release

Auto-processing is a set of capabilities designed to keep source data (in tables or volumes) and AI pipeline outputs (like embeddings in a knowledge base) in sync automatically.

Key features

  • Full sync: Automatically handles inserts, updates, and deletes so that knowledge bases remain accurate without manual intervention.

  • Change detection: Processes only new or modified records, avoiding unnecessary reprocessing of unchanged data.

  • Quick turnaround: Results become available immediately after a batch finishes processing. There is no need for a full source scan before starting.

  • Batch processing: Groups records into batches to be processed concurrently, which is optimal for GPU-based inference tasks.

Auto-processing modes

There are three main modes for managing how data moves through the pipeline:

ModeDescriptionProsCons
LiveUses Postgres triggers to process changes immediately within the same transaction that modifies the data.Transactional guarantee, zero lag.Can block/delay the original data modification transaction.
BackgroundUses a Postgres background worker to process data asynchronously at a configurable interval (background_sync_interval).Does not block user transactions and ideal for huge data sets.Results are delayed by the sync interval.
DisabledNo automatic sync. Data must be processed manually using aidb.run_pipeline().Full control over when resources are used.High manual overhead, requires full reprocessing for table sources.

Configuration

The auto-processing mode is set via the auto_processing parameter when creating or updating a pipeline.

Setting the mode at pipeline creation:

SELECT aidb.create_pipeline(
    name                     => 'my_pipeline',
    source                   => 'my_source_table',
    ...
    auto_processing          => 'Background',
    background_sync_interval => '5 minutes'
);

Changing the mode for an existing pipeline:

SELECT aidb.update_pipeline(
    'my_pipeline',
    auto_processing          => 'Live'
);

For more details on these functions and their full parameter lists, see SQL reference.

ParameterTypeDescription
auto_processingaidb.PipelineAutoProcessingModeAuto-processing mode: Live, Background, or Disabled.
background_sync_intervalintervalInterval between background executions. Must be between 1 second and 2 days. Only applies when auto_processing is Background.