Gen AI Builder on Hybrid Manager v1.3.2

The November 2025 Innovation Release of EDB Postgres AI is available. For more information, see the release notes.

Overview

Gen AI Builder within Hybrid Manager enables organizations to create intelligent applications that combine Large Language Models with organizational data, all running within your controlled Kubernetes infrastructure. This platform provides visual and programmatic interfaces for building retrieval-augmented generation (RAG) systems, conversational assistants, and automated workflows without external dependencies. See the Gen AI hub.

The integration ensures complete data sovereignty—your knowledge bases, model interactions, and conversation histories remain within your infrastructure while leveraging state-of-the-art AI capabilities through locally deployed models.

Core Architecture

LLM Integration

Gen AI Builder connects to models deployed through Hybrid Manager's Model Serving infrastructure, providing seamless access to language models for text generation, embeddings, and reranking operations.

Model Access Patterns

Internal endpoints via cluster DNS for low-latency inference
External endpoints through secured gateways for application integration
OpenAI-compatible APIs enabling standard client libraries (see OpenAI API compatibility)

Supported Operations

Chat completions for conversational interactions
Embeddings for semantic search and knowledge base creation
Reranking for result optimization in retrieval workflows

Knowledge Base Infrastructure

Knowledge bases transform organizational data into searchable, AI-ready formats through embedding and indexing within PostgreSQL databases enhanced with pgvector and Vector Engine concepts. See Knowledge Bases and Knowledge Base pipeline.

Data Processing Pipeline

Ingestion: Connect to data sources including databases, documents, and APIs
Chunking: Split content into semantically meaningful segments
Embedding: Generate vector representations using deployed models
Indexing: Store vectors in PostgreSQL with metadata for filtering

Storage Architecture

Vector embeddings stored in PostgreSQL with pgvector extension
Document metadata maintained for filtering and attribution
Original content preserved for reference and regeneration
Incremental updates supported for dynamic knowledge bases

Building Intelligent Applications

Assistant Development

Assistants orchestrate the interaction between users, knowledge bases, and external systems through configurable workflows that combine retrieval, reasoning, and action. See Assistants.

Configuration Components

Directive: Define the assistant's purpose and behavior
Knowledge Integration: Connect relevant Knowledge Bases for context
Tool Binding: Enable external system interactions via Tools
Model Selection: Choose appropriate LLM for generation tasks

Example: Customer Service Assistant

assistant:
  name: customer-support
  directive: "Help customers with product inquiries and order status"
  knowledge_bases:
    - product-documentation
    - order-database
  tools:
    - order-tracking-api
    - ticket-creation
  model: llama-3-8b-instruct

Retrieval-Augmented Generation

RAG patterns ground LLM responses in organizational knowledge, reducing hallucination while maintaining conversational capabilities. See Retrievers.

Retrieval Flow

User query triggers semantic search across knowledge bases
Relevant documents retrieved based on vector similarity
Context provided to LLM along with user query
Response generated using retrieved information

Optimization Strategies

Hybrid search combining vector and keyword matching
Metadata filtering for domain-specific retrieval
Reranking to improve result relevance (see Using NVIDIA NIM)
Context window management for large documents

Tool Integration

Tools extend assistant capabilities by enabling interactions with external systems and APIs while maintaining security boundaries. See Tools.

Tool Types

REST APIs: Connect to external services with authentication
Database Queries: Execute SQL against operational databases
Internal Services: Call Kubernetes services within cluster
Custom Functions: Deploy specialized logic as containerized services

Security Controls

Credential management through Kubernetes secrets
Rate limiting to prevent abuse
Audit logging for compliance tracking
Scope restrictions per assistant configuration

Common Use Cases

Enterprise Knowledge Management

Organizations consolidate disparate information sources into unified, searchable knowledge bases accessible through natural language interfaces.

Implementation Pattern

Ingest documentation from multiple repositories (Confluence, SharePoint, Google Drive)
Create semantic indices for cross-source search
Deploy assistants for different departments with filtered access
Maintain audit trails for compliance requirements

Example: Technical Documentation Assistant A software company indexes all technical documentation, API references, and support tickets. Engineers query the system using natural language, receiving contextual answers with source citations.

Customer Service Automation

Intelligent assistants handle customer inquiries by combining product knowledge with real-time system access.

Capabilities

Answer product questions using documentation knowledge base
Check order status through API integration
Create support tickets for complex issues
Escalate to human agents when necessary

Example: E-commerce Support Bot An online retailer deploys an assistant that handles 70% of customer inquiries automatically, accessing order systems, product catalogs, and return policies through unified interfaces.

Compliance and Audit Support

Organizations leverage Gen AI to navigate complex regulatory requirements and maintain compliance documentation.

Features

Query regulatory knowledge bases for specific requirements
Generate compliance reports from operational data
Track decision rationale through conversation threads
Maintain immutable audit logs for review

Example: Financial Compliance Assistant A bank creates an assistant that helps compliance officers interpret regulations, generate required reports, and document decision-making processes with full traceability.

Operational Considerations

Performance Optimization

System performance depends on multiple factors requiring careful tuning for production deployments.

Key Metrics

Retrieval Latency: Time to fetch relevant documents
Generation Latency: LLM inference time for responses
End-to-End Latency: Total time from query to response
Throughput: Concurrent requests handled

Optimization Techniques

Knowledge base partitioning for parallel search
Response caching for frequently asked questions
Model quantization for improved inference speed
Connection pooling for database access

Resource Management

Gen AI applications require careful resource allocation across compute, memory, and storage dimensions.

Resource Requirements

Knowledge Base Storage: PostgreSQL capacity for vectors and metadata
Model Serving: GPU resources for LLM inference
Application Pods: CPU and memory for orchestration logic
Network Bandwidth: Data transfer between components

Scaling Strategies

Horizontal scaling of retrieval services
GPU sharing for development workloads
Dedicated resources for production assistants
Auto-scaling based on request patterns

Security Framework

Comprehensive security controls protect sensitive data throughout the application lifecycle.

Access Control

Role-based permissions for assistant access
Knowledge base filtering based on user context
Tool authorization per assistant configuration
API key management for external access

Data Protection

Encryption at rest for knowledge base content
TLS for all inter-service communication
Sanitization of sensitive information in logs
Secure credential storage in Kubernetes secrets

Monitoring and Observability

Conversation Tracking

Thread management provides comprehensive visibility into assistant interactions for debugging and compliance. See Threads.

Thread Components

User queries and assistant responses
Retrieved documents with relevance scores
Tool invocations with parameters and results
Token usage and latency metrics

Analysis Capabilities

Response quality evaluation
Knowledge gap identification
Performance bottleneck detection
User satisfaction tracking

System Monitoring

Operational metrics ensure reliable service delivery and capacity planning.

Monitoring Areas

Knowledge base query patterns and performance
Model endpoint utilization and latency
Tool API success rates and response times
Resource consumption trends

Alerting Scenarios

High latency exceeding SLA thresholds
Failed tool invocations affecting functionality
Resource exhaustion risks
Unusual usage patterns indicating issues

Implementation Guide

Getting Started

Organizations should begin with focused use cases that demonstrate value before expanding deployments.

Initial Setup

Deploy embedding model for knowledge base creation
Configure data sources for initial knowledge base
Deploy LLM for generation tasks
Create simple assistant with basic retrieval

Validation Steps

Test retrieval quality with sample queries
Verify response accuracy against source documents
Measure performance under expected load
Review security controls and access patterns

Best Practices

Knowledge Base Design

Maintain clear document boundaries for attribution
Implement regular update cycles for fresh content
Use metadata effectively for filtering
Monitor retrieval quality metrics

Assistant Configuration

Start with restrictive directives and expand gradually
Test edge cases and failure scenarios
Implement fallback strategies for tool failures
Version control assistant configurations

Production Deployment

Establish baseline performance metrics
Implement gradual rollout strategies
Monitor user feedback and satisfaction
Maintain configuration backups

Integration with Hybrid Manager

Project Organization

Gen AI resources deploy within Hybrid Manager projects, inheriting namespace isolation and resource controls.

Project Resources

Knowledge bases as PostgreSQL databases
Assistants as Kubernetes deployments
Tools as service integrations
Threads as persistent storage

UI Workflows

Hybrid Manager provides visual interfaces for Gen AI development without requiring deep technical expertise.

Visual Capabilities

Drag-and-drop assistant configuration
Knowledge base creation wizards
Tool registration interfaces
Conversation testing environments

Programmatic Access

SDKs and APIs enable integration with existing applications and automation workflows.

Integration Options

Python SDK for application development
REST APIs for service integration
Kubernetes operators for GitOps workflows
CLI tools for automation scripts

Next Steps

Begin exploring Gen AI Builder capabilities:

Review Concepts: Understand core concepts and architecture
Create Knowledge Base: Follow the knowledge base guide
Build Assistant: Use the assistant creation guide
Deploy Application: Implement using the quickstart UI

For detailed documentation:

← Prev

Enabling AI Factory

↑ Up

AI Factory in Hybrid Manager

Model Management on Hybrid Manager