AI Factory Models v1.3
AI Factory Models
AI Factory Models provides comprehensive lifecycle management for AI models within EDB Postgres AI environments, delivering enterprise-grade governance and scalable inference capabilities through deep integration with Hybrid Manager infrastructure.
System Overview
AI Factory Models operates as a unified platform that bridges model governance and production deployment, ensuring organizational control over AI assets while enabling seamless integration across EDB PG AI capabilities. The system manages the complete model lifecycle from acquisition through retirement, providing consistent operational procedures and security controls.
Core Components
Model Library Centralized governance system that manages model image repositories, approval workflows, and metadata tracking. The library ensures only validated models reach production environments while maintaining comprehensive audit trails for compliance requirements.
Model Serving Infrastructure KServe-based deployment engine that transforms approved models into scalable inference endpoints within Kubernetes clusters managed by Hybrid Manager. The serving layer provides enterprise-grade operational characteristics including auto-scaling, health monitoring, and resource optimization.
Integration Framework Standardized interfaces that connect model inference capabilities with AI Accelerator Pipelines, Knowledge Bases, and Gen AI Builder applications, ensuring consistent model access patterns across all EDB PG AI workloads.
Strategic Benefits
Sovereign AI Implementation
Organizations achieve complete control over their AI infrastructure by deploying models within their own Kubernetes environments rather than relying on external API services. This approach ensures sensitive data and proprietary models remain within organizational boundaries while maintaining enterprise-grade operational capabilities.
Operational Consistency
Standardized deployment patterns and governance frameworks eliminate the complexity of managing diverse model types and frameworks. Teams benefit from unified operational procedures regardless of the underlying model architecture or intended use case.
Scalable Resource Management
Efficient GPU and compute resource allocation across multiple models optimizes infrastructure costs while maintaining performance guarantees. The system prevents resource underutilization through dynamic allocation strategies based on actual workload demands.
Architecture Integration
AI Factory Models integrates deeply with Hybrid Manager to provide comprehensive model lifecycle management within existing Kubernetes infrastructure. The system leverages Hybrid Manager's cluster orchestration capabilities while extending them with AI-specific governance and deployment features.
Hybrid Manager Dependencies
Asset Library Integration Model images, metadata, and version information are managed through Hybrid Manager's Asset Library, providing centralized storage and governance capabilities that extend beyond traditional container image management.
Kubernetes Orchestration Model serving workloads deploy as native Kubernetes resources through Hybrid Manager's cluster management capabilities, ensuring consistent operational behavior and resource allocation across all deployed models.
Security Framework Access control, network policies, and audit logging leverage Hybrid Manager's security infrastructure while adding model-specific governance controls and compliance features.
Operational Workflows
Model Lifecycle Management
Acquisition and Validation Models enter the system through automated synchronization from private registries or manual registration processes. Each model undergoes security scanning, performance validation, and compliance verification before approval for production use.
Deployment and Scaling Approved models deploy as InferenceServices that automatically scale based on demand patterns. The system optimizes resource allocation across multiple models while maintaining isolation and performance guarantees.
Monitoring and Maintenance Comprehensive observability provides visibility into model performance, resource utilization, and operational health. Automated alerting and recovery procedures minimize service disruptions while maintaining detailed audit trails.
Governance Framework
Approval Workflows Configurable approval processes ensure models meet organizational standards before production deployment. The system supports complex multi-stakeholder workflows while automating routine approvals for models meeting predefined criteria.
Policy Enforcement Security policies, resource constraints, and compliance requirements are enforced automatically across all model deployments. This ensures consistent application of organizational standards without manual intervention.
Audit and Compliance Detailed logging and reporting capabilities support regulatory compliance requirements while providing comprehensive visibility into model usage patterns and operational activities.
Learning Pathways
Foundation Understanding
Begin with core concepts to establish architectural understanding before progressing to practical implementation:
Conceptual Framework
- Model Serving Concepts - Understanding inference deployment patterns
- Architecture Overview - System design and component relationships
- Model Library Explained - Governance and lifecycle management principles
System Design
- Deployment Strategies - Production deployment patterns and considerations
- Observability Framework - Monitoring and operational visibility
Practical Implementation
Progress through hands-on implementation after establishing conceptual understanding:
Initial Setup
- Python Inference Quickstart - Basic client integration
- GPU Resource Setup - Infrastructure preparation for model workloads
- ServingRuntime Configuration - Framework-specific deployment settings
Model Deployment
- InferenceService Creation - Basic model deployment procedures
- NVIDIA NIM Integration - Specialized deployment patterns
- Endpoint Access Configuration - Internal and external connectivity
Advanced Operations
- Private Registry Integration - Custom model source configuration
- Repository Rules Management - Automated model discovery
- Infrastructure Verification - Deployment validation and troubleshooting
Reference Materials
Technical References
- Model Serving Manual - Comprehensive deployment and configuration reference
- Model Library Manual - Complete governance framework documentation
- FAQ - Common issues and implementation considerations
Integration Guides
- AI Accelerator Pipelines - Pipeline model integration
- Gen AI Builder - Application development with served models
- Vector Engine - Embedding model integration
Use Case Applications
Enterprise RAG Systems
Deploy embedding models for semantic search and retrieval augmented generation workflows that require consistent model versions across development and production environments.
Private LLM Deployment
Implement internal language model services that process sensitive data within organizational boundaries while maintaining enterprise-grade operational capabilities.
Multi-Modal AI Applications
Integrate vision, language, and embedding models into unified applications that require coordinated inference capabilities across different model types.
Hybrid Model Architectures
Combine private and external model services within applications that require both proprietary models for sensitive operations and external services for general capabilities.
Implementation Considerations
Infrastructure Requirements
Model serving requires substantial compute resources, particularly GPU infrastructure for large language models and vision workloads. Organizations should plan capacity based on anticipated model sizes and throughput requirements.
Network Architecture
Hybrid Manager cluster connectivity affects both model deployment and runtime access patterns. Consider network latency, bandwidth requirements, and security boundaries when designing model serving architectures.
Security Framework
Model governance extends beyond traditional container security to include AI-specific concerns such as model provenance, data privacy, and inference audit trails. Implement comprehensive security controls aligned with organizational AI governance policies.
Operational Complexity
Managing multiple models across different frameworks requires substantial operational expertise. Establish clear procedures for deployment, monitoring, and troubleshooting before scaling to production workloads.
AI Factory Models provides the foundation for enterprise AI operations through comprehensive model lifecycle management and scalable inference deployment within your controlled infrastructure environments.
Getting started
Concepts
Learn how Model Serving in AI Factory delivers scalable, Kubernetes-native model inferencing for intelligent applications and data products.
Architecture
How Model Library and Model Serving work together in AI Factory to deliver enterprise LLM inference capabilities through governed model management and scalable deployment infrastructure.
Deployment Overview
Understand how model deployment works in AI Factory and how to deploy your models as scalable inference services.
Use Cases and Personas
Real-world implementation scenarios for AI Factory Models across different organizational roles and technical requirements.
FAQ
Frequently Asked Questions about using Model Serving in AI Factory with KServe.
Manual
Model Serving
Comprehensive reference for deploying and managing scalable AI model inference services within EDB Postgres AI environments.
Model Library
Comprehensive reference for understanding and implementing centralized AI model governance through the EDB Postgres AI Model Library.
Observability
Learn how to monitor and observe your Model Serving workloads in AI Factory.
Explainers
Model Serving Explained In Depth
Understand how Model Serving works in AI Factory, using KServe to provide scalable, production-ready model inference.
Model Library Explained In Depth
Understand the Model Library in AI Factory, how it works, and how it is powered by the Hybrid Manager Image and Model Library.
How-to (Set Up Serving)
Integrate Private Registry
How to connect a private container registry to the AI Factory Model Library to enable the use of your own model images.
Define Repository Rules
How to define and manage repository rules to control which model images appear in the Model Library.
Manage Metadata
How to manage metadata for repositories and image tags in the Image and Model Library.
Setup GPU resources
How to provision and configure GPU resources to support KServe model serving.
Configure ServingRuntime
Learn how to configure a ClusterServingRuntime in KServe to define an AI model serving environment on Kubernetes.
How-to (Deploy Models)
Deploy AI Models
Step-by-step guide to deploying AI models from the Model Library to Model Serving in Hybrid Manager.
Create InferenceService
How to create an InferenceService to deploy an NVIDIA NIM container with KServe on Kubernetes.
Deploy NIM Container
Learn how to deploy an NVIDIA NIM Microservices available on build.nvidia.com container using KServe on a Kubernetes cluster. Understand core concepts and prepare for using this capability in EDB Hybrid Manager AI Factory.
How-to (Integrate with Your Apps)
Python Quickstart Tutorial - LLM Inference Example
Call your InferenceService endpoint to chat with a model.
Access KServe endpoints
How to call KServe InferenceService endpoints from inside your Kubernetes cluster and from outside via a secured portal, with examples and security considerations.
How-to (Operate & Observe)
Verify deployments & GPUs
How-To Verify InferenceServices deployments and GPU resource usage.
Monitor model serving
Learn how to monitor deployed AI models using KServe, check status and resource utilization, and prepare for integration with Hybrid Manager AI Factory observability.
Update GPU Resources
How to update GPU resource allocation for an NVIDIA NIM InferenceService deployed with KServe.
Use Air‑Gapped Model Cache
Build a model profile cache in a connected environment, upload it to object storage, and use it from an air‑gapped HM cluster.
- On this page
- AI Factory Models