Gen AI Builder on Hybrid Manager v1.3

Overview

Gen AI Builder within Hybrid Manager enables organizations to create intelligent applications that combine Large Language Models with organizational data, all running within your controlled Kubernetes infrastructure. This platform provides visual and programmatic interfaces for building retrieval-augmented generation (RAG) systems, conversational assistants, and automated workflows without external dependencies. See the Gen AI hub.

The integration ensures complete data sovereignty—your knowledge bases, model interactions, and conversation histories remain within your infrastructure while leveraging state-of-the-art AI capabilities through locally deployed models.

Core Architecture

LLM Integration

Gen AI Builder connects to models deployed through Hybrid Manager's Model Serving infrastructure, providing seamless access to language models for text generation, embeddings, and reranking operations.

Model Access Patterns

  • Internal endpoints via cluster DNS for low-latency inference
  • External endpoints through secured gateways for application integration
  • OpenAI-compatible APIs enabling standard client libraries (see OpenAI API compatibility)

Supported Operations

  • Chat completions for conversational interactions
  • Embeddings for semantic search and knowledge base creation
  • Reranking for result optimization in retrieval workflows

Knowledge Base Infrastructure

Knowledge bases transform organizational data into searchable, AI-ready formats through embedding and indexing within PostgreSQL databases enhanced with pgvector and Vector Engine concepts. See Knowledge Bases and Knowledge Base pipeline.

Data Processing Pipeline

  1. Ingestion: Connect to data sources including databases, documents, and APIs
  2. Chunking: Split content into semantically meaningful segments
  3. Embedding: Generate vector representations using deployed models
  4. Indexing: Store vectors in PostgreSQL with metadata for filtering

Storage Architecture

  • Vector embeddings stored in PostgreSQL with pgvector extension
  • Document metadata maintained for filtering and attribution
  • Original content preserved for reference and regeneration
  • Incremental updates supported for dynamic knowledge bases

Building Intelligent Applications

Assistant Development

Assistants orchestrate the interaction between users, knowledge bases, and external systems through configurable workflows that combine retrieval, reasoning, and action. See Assistants.

Configuration Components

  • Directive: Define the assistant's purpose and behavior
  • Knowledge Integration: Connect relevant Knowledge Bases for context
  • Tool Binding: Enable external system interactions via Tools
  • Model Selection: Choose appropriate LLM for generation tasks

Example: Customer Service Assistant

assistant:
  name: customer-support
  directive: "Help customers with product inquiries and order status"
  knowledge_bases:
    - product-documentation
    - order-database
  tools:
    - order-tracking-api
    - ticket-creation
  model: llama-3-8b-instruct

Retrieval-Augmented Generation

RAG patterns ground LLM responses in organizational knowledge, reducing hallucination while maintaining conversational capabilities. See Retrievers.

Retrieval Flow

  1. User query triggers semantic search across knowledge bases
  2. Relevant documents retrieved based on vector similarity
  3. Context provided to LLM along with user query
  4. Response generated using retrieved information

Optimization Strategies

  • Hybrid search combining vector and keyword matching
  • Metadata filtering for domain-specific retrieval
  • Reranking to improve result relevance (see Using NVIDIA NIM)
  • Context window management for large documents

Tool Integration

Tools extend assistant capabilities by enabling interactions with external systems and APIs while maintaining security boundaries. See Tools.

Tool Types

  • REST APIs: Connect to external services with authentication
  • Database Queries: Execute SQL against operational databases
  • Internal Services: Call Kubernetes services within cluster
  • Custom Functions: Deploy specialized logic as containerized services

Security Controls

  • Credential management through Kubernetes secrets
  • Rate limiting to prevent abuse
  • Audit logging for compliance tracking
  • Scope restrictions per assistant configuration

Common Use Cases

Enterprise Knowledge Management

Organizations consolidate disparate information sources into unified, searchable knowledge bases accessible through natural language interfaces.

Implementation Pattern

  • Ingest documentation from multiple repositories (Confluence, SharePoint, Google Drive)
  • Create semantic indices for cross-source search
  • Deploy assistants for different departments with filtered access
  • Maintain audit trails for compliance requirements

Example: Technical Documentation Assistant A software company indexes all technical documentation, API references, and support tickets. Engineers query the system using natural language, receiving contextual answers with source citations.

Customer Service Automation

Intelligent assistants handle customer inquiries by combining product knowledge with real-time system access.

Capabilities

  • Answer product questions using documentation knowledge base
  • Check order status through API integration
  • Create support tickets for complex issues
  • Escalate to human agents when necessary

Example: E-commerce Support Bot An online retailer deploys an assistant that handles 70% of customer inquiries automatically, accessing order systems, product catalogs, and return policies through unified interfaces.

Compliance and Audit Support

Organizations leverage Gen AI to navigate complex regulatory requirements and maintain compliance documentation.

Features

  • Query regulatory knowledge bases for specific requirements
  • Generate compliance reports from operational data
  • Track decision rationale through conversation threads
  • Maintain immutable audit logs for review

Example: Financial Compliance Assistant A bank creates an assistant that helps compliance officers interpret regulations, generate required reports, and document decision-making processes with full traceability.

Operational Considerations

Performance Optimization

System performance depends on multiple factors requiring careful tuning for production deployments.

Key Metrics

  • Retrieval Latency: Time to fetch relevant documents
  • Generation Latency: LLM inference time for responses
  • End-to-End Latency: Total time from query to response
  • Throughput: Concurrent requests handled

Optimization Techniques

  • Knowledge base partitioning for parallel search
  • Response caching for frequently asked questions
  • Model quantization for improved inference speed
  • Connection pooling for database access

Resource Management

Gen AI applications require careful resource allocation across compute, memory, and storage dimensions.

Resource Requirements

  • Knowledge Base Storage: PostgreSQL capacity for vectors and metadata
  • Model Serving: GPU resources for LLM inference
  • Application Pods: CPU and memory for orchestration logic
  • Network Bandwidth: Data transfer between components

Scaling Strategies

  • Horizontal scaling of retrieval services
  • GPU sharing for development workloads
  • Dedicated resources for production assistants
  • Auto-scaling based on request patterns

Security Framework

Comprehensive security controls protect sensitive data throughout the application lifecycle.

Access Control

  • Role-based permissions for assistant access
  • Knowledge base filtering based on user context
  • Tool authorization per assistant configuration
  • API key management for external access

Data Protection

  • Encryption at rest for knowledge base content
  • TLS for all inter-service communication
  • Sanitization of sensitive information in logs
  • Secure credential storage in Kubernetes secrets

Monitoring and Observability

Conversation Tracking

Thread management provides comprehensive visibility into assistant interactions for debugging and compliance. See Threads.

Thread Components

  • User queries and assistant responses
  • Retrieved documents with relevance scores
  • Tool invocations with parameters and results
  • Token usage and latency metrics

Analysis Capabilities

  • Response quality evaluation
  • Knowledge gap identification
  • Performance bottleneck detection
  • User satisfaction tracking

System Monitoring

Operational metrics ensure reliable service delivery and capacity planning.

Monitoring Areas

  • Knowledge base query patterns and performance
  • Model endpoint utilization and latency
  • Tool API success rates and response times
  • Resource consumption trends

Alerting Scenarios

  • High latency exceeding SLA thresholds
  • Failed tool invocations affecting functionality
  • Resource exhaustion risks
  • Unusual usage patterns indicating issues

Implementation Guide

Getting Started

Organizations should begin with focused use cases that demonstrate value before expanding deployments.

Initial Setup

  1. Deploy embedding model for knowledge base creation
  2. Configure data sources for initial knowledge base
  3. Deploy LLM for generation tasks
  4. Create simple assistant with basic retrieval

Validation Steps

  • Test retrieval quality with sample queries
  • Verify response accuracy against source documents
  • Measure performance under expected load
  • Review security controls and access patterns

Best Practices

Knowledge Base Design

  • Maintain clear document boundaries for attribution
  • Implement regular update cycles for fresh content
  • Use metadata effectively for filtering
  • Monitor retrieval quality metrics

Assistant Configuration

  • Start with restrictive directives and expand gradually
  • Test edge cases and failure scenarios
  • Implement fallback strategies for tool failures
  • Version control assistant configurations

Production Deployment

  • Establish baseline performance metrics
  • Implement gradual rollout strategies
  • Monitor user feedback and satisfaction
  • Maintain configuration backups

Integration with Hybrid Manager

Project Organization

Gen AI resources deploy within Hybrid Manager projects, inheriting namespace isolation and resource controls.

Project Resources

  • Knowledge bases as PostgreSQL databases
  • Assistants as Kubernetes deployments
  • Tools as service integrations
  • Threads as persistent storage

UI Workflows

Hybrid Manager provides visual interfaces for Gen AI development without requiring deep technical expertise.

Visual Capabilities

  • Drag-and-drop assistant configuration
  • Knowledge base creation wizards
  • Tool registration interfaces
  • Conversation testing environments

Programmatic Access

SDKs and APIs enable integration with existing applications and automation workflows.

Integration Options

  • Python SDK for application development
  • REST APIs for service integration
  • Kubernetes operators for GitOps workflows
  • CLI tools for automation scripts

Next Steps

Begin exploring Gen AI Builder capabilities:

  1. Review Concepts: Understand core concepts and architecture
  2. Create Knowledge Base: Follow the knowledge base guide
  3. Build Assistant: Use the assistant creation guide
  4. Deploy Application: Implement using the quickstart UI

For detailed documentation: