Pipeline Designer concepts Innovation Release

Pipeline Designer is a visual interface for AI Database (AIDB) pipelines. It presents and constrains AIDB functionality through a guided wizard. The concepts below cover what the designer enforces and why. For the underlying pipeline, knowledge base, and model concepts, see the AIDB documentation.

Pipeline structure

A pipeline is a sequence of one to ten processing steps that transform data from a source table into a destination. You configure the source, steps, and destination through the pipeline creation wizard.

Pipeline names must be lowercase, start with a letter, and contain only letters, digits, and underscores. The maximum length is 46 characters (AIDB reserves up to 17 characters of the Postgres identifier limit for internal suffixes). These constraints are validated when you enter the pipeline name in the pipeline creation wizard.

Step types

Pipeline Designer exposes the AIDB pipeline steps and knowledge base functionality as pipeline steps. For detailed descriptions of what each operation does, its parameters, and its input/output types, see the AIDB pipeline steps documentation.

Step typeInputOutputUse case
ParseHTMLText (HTML)TextStrip HTML markup and extract clean text
ParsePDFBytes (PDF)TextExtract text content from PDF files
PdfToImageBytes (PDF)Bytes (image)Convert PDF pages to images, typically before OCR
PerformOCRBytes (image)TextExtract text from scanned images or image-based PDFs
ChunkTextTextTextSplit long text into smaller segments
SummarizeTextTextTextCondense text using a completion model
KnowledgeBaseTextVectorGenerate embeddings and store in a vector knowledge base
SQL enum casing

When calling AIDB SQL functions such as aidb.create_pipeline(), use the step type spelling from the AIDB documentation, not the display names shown in the Pipeline Designer UI. Some step types differ in casing between the two (for example, ParsePDF in the UI versus ParsePdf in SQL).

Multi-step pipelines

Pipelines can chain up to ten steps. In the pipeline canvas, steps appear as cards arranged left to right between the Table Source and Destination Table endpoints, with a + control between each pair of cards for inserting a step at that position.

Ordering constraints enforced by the designer:

  • Steps that consume raw document data (ParsePDF, PdfToImage) must be placed first. PerformOCR also accepts binary input but can appear later in the pipeline (for example, after PdfToImage).
  • The KnowledgeBase step, if present, must be placed last.
  • Middle steps (ChunkText, SummarizeText, ParseHTML) can appear in any order between the first and last positions.

Type compatibility:

Each step produces output of a specific type (text or bytes), and the next step must accept that type as input. The designer validates this automatically and prevents you from deploying incompatible step sequences. For example, you can't place a ChunkText step (which expects text input) immediately after a PdfToImage step (which produces bytes output) without an intermediate OCR step to convert bytes to text.

Common multi-step patterns:

PatternStepsUse case
PDF to knowledge baseParsePDF, ChunkText, KnowledgeBaseIndex PDF documents for semantic search
Image OCR to knowledge basePerformOCR, ChunkText, KnowledgeBaseExtract and index text from scanned documents
HTML processingParseHTML, ChunkText, SummarizeTextClean, chunk, and summarize web content
Full document pipelineParsePDF, ChunkText, SummarizeText, KnowledgeBaseParse, chunk, summarize, and index documents

Pipelines are limited to 10 steps, and the step structure can't be changed after creation. See Limitations for details.

Processing modes

During pipeline creation, select one of three AIDB auto-processing modes. You can change the mode after creation without recreating the pipeline.

ModeHow it worksWhen to use
On Demand (default)Processes rows only when triggered manually via SQLTesting, ad-hoc runs, or pipelines where you control exactly when processing occurs
LiveProcesses rows immediately as they are inserted or updated, via database triggersLow-latency use cases where results must reflect writes with minimal delay
BackgroundProcesses rows asynchronously in batches on a configurable intervalMost production workloads. Handles large backlogs and continuous ingestion without blocking writes

In Background mode, the designer exposes two additional controls, batch size (default: 100) and sync interval (default: 30 seconds, ranging from 1 second to 2 days). For a full explanation of how each mode works, see the AIDB auto-processing documentation.

Data flow and step results

When a pipeline executes, data flows through AIDB's internal envelope from one step to the next. For details on how AIDB handles data lineage, chunking identifiers, and step-to-step data transfer, see the AIDB pipelines documentation.

In the pipeline creation wizard, each intermediate step has a Save Step Results toggle (enabled by default). When enabled, the step's output is persisted to a destination table, letting you inspect intermediate results at each stage of the pipeline and not just the final output. The toggle is hidden on the last step, whose output is always written to the pipeline's destination. See Creating pipelines for where this toggle appears in the wizard.