Gen AI Builder on Hybrid Manager v1.3
Overview
Gen AI Builder within Hybrid Manager enables organizations to create intelligent applications that combine Large Language Models with organizational data, all running within your controlled Kubernetes infrastructure. This platform provides visual and programmatic interfaces for building retrieval-augmented generation (RAG) systems, conversational assistants, and automated workflows without external dependencies. See the Gen AI hub.
The integration ensures complete data sovereignty—your knowledge bases, model interactions, and conversation histories remain within your infrastructure while leveraging state-of-the-art AI capabilities through locally deployed models.
Core Architecture
LLM Integration
Gen AI Builder connects to models deployed through Hybrid Manager's Model Serving infrastructure, providing seamless access to language models for text generation, embeddings, and reranking operations.
Model Access Patterns
- Internal endpoints via cluster DNS for low-latency inference
- External endpoints through secured gateways for application integration
- OpenAI-compatible APIs enabling standard client libraries (see OpenAI API compatibility)
Supported Operations
- Chat completions for conversational interactions
- Embeddings for semantic search and knowledge base creation
- Reranking for result optimization in retrieval workflows
Knowledge Base Infrastructure
Knowledge bases transform organizational data into searchable, AI-ready formats through embedding and indexing within PostgreSQL databases enhanced with pgvector and Vector Engine concepts. See Knowledge Bases and Knowledge Base pipeline.
Data Processing Pipeline
- Ingestion: Connect to data sources including databases, documents, and APIs
- Chunking: Split content into semantically meaningful segments
- Embedding: Generate vector representations using deployed models
- Indexing: Store vectors in PostgreSQL with metadata for filtering
Storage Architecture
- Vector embeddings stored in PostgreSQL with pgvector extension
- Document metadata maintained for filtering and attribution
- Original content preserved for reference and regeneration
- Incremental updates supported for dynamic knowledge bases
Building Intelligent Applications
Assistant Development
Assistants orchestrate the interaction between users, knowledge bases, and external systems through configurable workflows that combine retrieval, reasoning, and action. See Assistants.
Configuration Components
- Directive: Define the assistant's purpose and behavior
- Knowledge Integration: Connect relevant Knowledge Bases for context
- Tool Binding: Enable external system interactions via Tools
- Model Selection: Choose appropriate LLM for generation tasks
Example: Customer Service Assistant
assistant: name: customer-support directive: "Help customers with product inquiries and order status" knowledge_bases: - product-documentation - order-database tools: - order-tracking-api - ticket-creation model: llama-3-8b-instruct
Retrieval-Augmented Generation
RAG patterns ground LLM responses in organizational knowledge, reducing hallucination while maintaining conversational capabilities. See Retrievers.
Retrieval Flow
- User query triggers semantic search across knowledge bases
- Relevant documents retrieved based on vector similarity
- Context provided to LLM along with user query
- Response generated using retrieved information
Optimization Strategies
- Hybrid search combining vector and keyword matching
- Metadata filtering for domain-specific retrieval
- Reranking to improve result relevance (see Using NVIDIA NIM)
- Context window management for large documents
Tool Integration
Tools extend assistant capabilities by enabling interactions with external systems and APIs while maintaining security boundaries. See Tools.
Tool Types
- REST APIs: Connect to external services with authentication
- Database Queries: Execute SQL against operational databases
- Internal Services: Call Kubernetes services within cluster
- Custom Functions: Deploy specialized logic as containerized services
Security Controls
- Credential management through Kubernetes secrets
- Rate limiting to prevent abuse
- Audit logging for compliance tracking
- Scope restrictions per assistant configuration
Common Use Cases
Enterprise Knowledge Management
Organizations consolidate disparate information sources into unified, searchable knowledge bases accessible through natural language interfaces.
Implementation Pattern
- Ingest documentation from multiple repositories (Confluence, SharePoint, Google Drive)
- Create semantic indices for cross-source search
- Deploy assistants for different departments with filtered access
- Maintain audit trails for compliance requirements
Example: Technical Documentation Assistant A software company indexes all technical documentation, API references, and support tickets. Engineers query the system using natural language, receiving contextual answers with source citations.
Customer Service Automation
Intelligent assistants handle customer inquiries by combining product knowledge with real-time system access.
Capabilities
- Answer product questions using documentation knowledge base
- Check order status through API integration
- Create support tickets for complex issues
- Escalate to human agents when necessary
Example: E-commerce Support Bot An online retailer deploys an assistant that handles 70% of customer inquiries automatically, accessing order systems, product catalogs, and return policies through unified interfaces.
Compliance and Audit Support
Organizations leverage Gen AI to navigate complex regulatory requirements and maintain compliance documentation.
Features
- Query regulatory knowledge bases for specific requirements
- Generate compliance reports from operational data
- Track decision rationale through conversation threads
- Maintain immutable audit logs for review
Example: Financial Compliance Assistant A bank creates an assistant that helps compliance officers interpret regulations, generate required reports, and document decision-making processes with full traceability.
Operational Considerations
Performance Optimization
System performance depends on multiple factors requiring careful tuning for production deployments.
Key Metrics
- Retrieval Latency: Time to fetch relevant documents
- Generation Latency: LLM inference time for responses
- End-to-End Latency: Total time from query to response
- Throughput: Concurrent requests handled
Optimization Techniques
- Knowledge base partitioning for parallel search
- Response caching for frequently asked questions
- Model quantization for improved inference speed
- Connection pooling for database access
Resource Management
Gen AI applications require careful resource allocation across compute, memory, and storage dimensions.
Resource Requirements
- Knowledge Base Storage: PostgreSQL capacity for vectors and metadata
- Model Serving: GPU resources for LLM inference
- Application Pods: CPU and memory for orchestration logic
- Network Bandwidth: Data transfer between components
Scaling Strategies
- Horizontal scaling of retrieval services
- GPU sharing for development workloads
- Dedicated resources for production assistants
- Auto-scaling based on request patterns
Security Framework
Comprehensive security controls protect sensitive data throughout the application lifecycle.
Access Control
- Role-based permissions for assistant access
- Knowledge base filtering based on user context
- Tool authorization per assistant configuration
- API key management for external access
Data Protection
- Encryption at rest for knowledge base content
- TLS for all inter-service communication
- Sanitization of sensitive information in logs
- Secure credential storage in Kubernetes secrets
Monitoring and Observability
Conversation Tracking
Thread management provides comprehensive visibility into assistant interactions for debugging and compliance. See Threads.
Thread Components
- User queries and assistant responses
- Retrieved documents with relevance scores
- Tool invocations with parameters and results
- Token usage and latency metrics
Analysis Capabilities
- Response quality evaluation
- Knowledge gap identification
- Performance bottleneck detection
- User satisfaction tracking
System Monitoring
Operational metrics ensure reliable service delivery and capacity planning.
Monitoring Areas
- Knowledge base query patterns and performance
- Model endpoint utilization and latency
- Tool API success rates and response times
- Resource consumption trends
Alerting Scenarios
- High latency exceeding SLA thresholds
- Failed tool invocations affecting functionality
- Resource exhaustion risks
- Unusual usage patterns indicating issues
Implementation Guide
Getting Started
Organizations should begin with focused use cases that demonstrate value before expanding deployments.
Initial Setup
- Deploy embedding model for knowledge base creation
- Configure data sources for initial knowledge base
- Deploy LLM for generation tasks
- Create simple assistant with basic retrieval
Validation Steps
- Test retrieval quality with sample queries
- Verify response accuracy against source documents
- Measure performance under expected load
- Review security controls and access patterns
Best Practices
Knowledge Base Design
- Maintain clear document boundaries for attribution
- Implement regular update cycles for fresh content
- Use metadata effectively for filtering
- Monitor retrieval quality metrics
Assistant Configuration
- Start with restrictive directives and expand gradually
- Test edge cases and failure scenarios
- Implement fallback strategies for tool failures
- Version control assistant configurations
Production Deployment
- Establish baseline performance metrics
- Implement gradual rollout strategies
- Monitor user feedback and satisfaction
- Maintain configuration backups
Integration with Hybrid Manager
Project Organization
Gen AI resources deploy within Hybrid Manager projects, inheriting namespace isolation and resource controls.
Project Resources
- Knowledge bases as PostgreSQL databases
- Assistants as Kubernetes deployments
- Tools as service integrations
- Threads as persistent storage
UI Workflows
Hybrid Manager provides visual interfaces for Gen AI development without requiring deep technical expertise.
Visual Capabilities
- Drag-and-drop assistant configuration
- Knowledge base creation wizards
- Tool registration interfaces
- Conversation testing environments
Programmatic Access
SDKs and APIs enable integration with existing applications and automation workflows.
Integration Options
- Python SDK for application development
- REST APIs for service integration
- Kubernetes operators for GitOps workflows
- CLI tools for automation scripts
Next Steps
Begin exploring Gen AI Builder capabilities:
- Review Concepts: Understand core concepts and architecture
- Create Knowledge Base: Follow the knowledge base guide
- Build Assistant: Use the assistant creation guide
- Deploy Application: Implement using the quickstart UI
For detailed documentation: