Gen AI Builder FAQ v1.3
This FAQ focuses on Gen AI Builder inside Hybrid Manager’s AI Factory. It covers common questions about models, knowledge bases, assistants, tools, SDK usage, security, performance, and operations.
Table of contents
- Typical use cases and patterns
- Which models/drivers are supported?
- When to use RAG vs. fine‑tuning?
- How do I create and update Knowledge Bases?
- What are Retrievers and how do I tune them?
- How do Assistants call Tools and Structures?
- Where does data live and how is access governed?
- How do I improve latency and control cost?
- How do I test, observe, and troubleshoot?
- Where to find SDK references and examples?
Which models/drivers are supported?
Gen AI Builder is driver‑based. You can use private/endpoints or hosted APIs via Prompt Drivers, Embedding Drivers, and Vector Store Drivers.
- Prompt Drivers: OpenAI, Anthropic, Google, and more (see reference:
reference/sdk/drivers/index.mdx
and per‑provider pages) - Embedding Drivers: OpenAI, Google, NVIDIA NIM, Hugging Face, etc. (
reference/sdk/drivers/embedding-drivers.mdx
) - Vector Store Drivers: pgvector (Postgres), local, Pinecone, Qdrant, Redis, and others (
reference/sdk/drivers/vector-store-drivers.mdx
) - NVIDIA NIM: deploy as private endpoints via Model Serving
See also: AI Factory Models and deploy NIM containers.
When to use RAG vs fine‑tuning?
Prefer RAG for living knowledge and governed answers; use fine‑tuning for tone/format/narrow tasks. Many production assistants combine both: retrieve context from Knowledge Bases and guide responses with Rulesets. For modular pipelines, explore RAG Engines.
How do I create and update Knowledge Bases?
- Create: Create a Knowledge Base and configure Data Sources (Confluence, Google Drive, S3, Web Page, Data Lake, Custom)
- Update: Manage a Knowledge Base; re‑sync when source Libraries change
- Storage: embeddings live in Postgres (pgvector) or a configured vector store
- Tuning: choose chunking and metadata that match your retrieval; validate with golden questions
SDK references: Data Loaders, Embedding Drivers, Vector Store Drivers.
What are Retrievers and how do I tune them?
Retrievers control how assistants fetch context: target KBs, max tokens, filters, re‑ranking.
- Create: Create a Retriever
- Tune: similarity thresholds, top‑K, metadata filters, rerankers
- Advanced: modular RAG Engines with retrieval/rerank/response stages
SDK references: RAG Engines, Rerank Drivers.
How do Assistants call Tools and Structures?
Assistants orchestrate retrieval + generation + actions. Use Tools for external systems, or promote Structures (pipelines/workflows/agents) as callable Tools.
- Build assistants: Create an Assistant with Rulesets, Retrievers, and Tools
- Structures: package logic as a ZIP and deploy Structures; expose as Tool or run standalone
- Memory and threads: see Threads and conversation memory drivers
SDK references: Structures, Tools, Assistant Drivers, Conversation Memory Drivers.
Where does data live and how is access governed?
- Documents: ingest via Data Sources; store in governed Data Lake or Postgres
- Embeddings/vectors: Postgres pgvector (recommended) or another configured vector store
- Governance: enforce permissions in source systems; restrict tool usage per project; audit retrieval and tool calls with threads/logs
See: Configure Data Lake, Vector Engine.
How do I improve latency and control cost?
- Retrieval: reduce top‑K, improve chunking/metadata, use re‑rank selectively
- Generation: choose right model per route; stream responses; batch where safe
- Caching: memoize embeddings, hot retrievals, and tool outputs when possible
- Infra: colocate models and KBs; scale with Model Serving autoscaling
See SDK: Engines and Drivers. See Models: Model Serving.
How do I test, observe, and troubleshoot?
- Testing: golden sets, conversation playbooks, SDK unit tests
- Observability: thread logs, assistant/structure run events; optionally export to your observability stack
- Debugging: verify retrieval set first; inspect ruleset changes; re‑run the same structure/assistant with stored inputs
Docs: Threads, SDK Structures/observability.
Typical use cases and patterns
- Enterprise knowledge assistants: KB + Retriever + Assistant + Tooling (tickets/CRM)
- Customer support copilots: policy/FAQ KBs + routing + guardrails via Rulesets
- Workflow bots: Structures + Tools for approvals, data enrichment, and reporting
- RAG for analytics: pgvector + Pipelines + Assistant for guided exploration
Explore: Hybrid KB best practices, Quickstart UI.
Where to find SDK references and examples?
- SDK overview: Gen AI Builder SDK
- Data: Artifacts, Loaders, Chunkers
- Drivers: Prompt/Assistant, Embedding, Vector Stores
- Structures: Tasks, Pipelines, Workflows
- Tools: Overview
See also the product guides: assistants, knowledge bases, retrievers, rulesets, structures, tools, threads.
- On this page
- Which models/drivers are supported?
- When to use RAG vs fine‑tuning?
- How do I create and update Knowledge Bases?
- What are Retrievers and how do I tune them?
- How do Assistants call Tools and Structures?
- Where does data live and how is access governed?
- How do I improve latency and control cost?
- How do I test, observe, and troubleshoot?
- Typical use cases and patterns
- Where to find SDK references and examples?