Preparers - Concepts
Preparers are used to perform common preprocessing steps on source data from either a table or volume source. The processed data is stored in a destination table and can be used by other preparers or by retrievers for embedding generation.
Concepts
Data preparation operation
The data preparation operation is how the preparer transforms the source data. The supported operations are encoded as variants of the aidb.DataPreparationOperation
enum.
Note
Each operation has its own set of parameters that are used to customize the operation. Learn more in Primitives.
Data sources
A data source is the input data for the data preparation operation. The aidb extension supports two types of data sources for preparers:
- Table: a column in a table in the PG database.
- Volume: a PGFS "volume," which is a wrapper for accessing an S3 object store or local file system.
Execution
Primitive functions help with testing operations and their configurations on individual inputs with minimal setup. This is useful for quick experimentation before scaling up with a preparer for bulk data preparation.
Bulk data preparation performs a preparer's associated operation for all of the preparer's source data.
Note
Bulk data preparation does not delete existing destination data unless it conflicts with newly generated data. It is recommended to configure separate destination tables for each preparer.
Consistency with source data
To ensure correct and consistent data, the prepared destination data must be in sync with the source data. In the case of the table data source, you can enable preparer auto processing to inform the preparer pipeline about changes to the source data.
Note
If the source table already contains data when the preparer is created, then an initial manual bulk data preparation call must be made.
Preparer auto processing
Preparer auto processing can be enabled to create triggers in Postgres to keep the prepared destination data up to date if source data is added, updated, or removed.
Note
Preparer auto processing currently works only with preparers with a table data source, not a volume source.
Could this page be better? Report a problem or suggest an addition!