Model Serving in Hybrid Manager v1.2
Model Serving in Hybrid Manager provides a scalable, Kubernetes-native way to serve AI models as production-grade inference services.
It is implemented using KServe and runs on GPU-enabled nodes in your Hybrid Manager project’s Kubernetes cluster. Model Serving enables Gen AI applications, Knowledge Bases, and custom pipelines to use high-performance models under your control.
How Model Serving fits in the Hybrid Manager architecture
Model Serving is a core capability of Hybrid Manager’s AI Factory workload:
- Models are deployed as KServe InferenceServices within the project’s Kubernetes cluster.
 - Model Serving is powered by GPU-enabled infrastructure that you provision and manage.
 - Model images come from the Asset Library (formerly Model Library), backed by Hybrid Manager’s image governance.
 - Model endpoints (HTTP/gRPC) are available to:
 - Gen AI Builder Assistants.
 - AIDB Knowledge Bases.
 - External applications and APIs.
 
Model Serving in Hybrid Manager ensures that all model serving is governed, auditable, and runs securely within your infrastructure — enabling Sovereign AI patterns.
How it works in Hybrid Manager
- KServe is installed and managed by Hybrid Manager within your project’s Kubernetes cluster.
 - You must provision GPU node groups or node pools to support high-performance model serving.
 - GPU nodes must be correctly labeled and configured to support KServe workloads.
 - Models are deployed from the Asset Library via ClusterServingRuntime and InferenceService definitions.
 - Your applications and AI Factory workloads can invoke model endpoints via REST or gRPC.
 
Key Hybrid Manager considerations
- GPU infrastructure is required for most advanced models, such as LLMs, embeddings, and vision models.
 - Hybrid Manager enables full observability of model serving, including Prometheus metrics and Kubernetes-native monitoring.
 - Model serving endpoints are secured and managed within your Hybrid Manager project scope.
 - Governance for model images and deployment comes from Hybrid Manager’s integrated Asset Library and image controls.
 
Typical use cases
- Power Gen AI Builder Assistants with LLM or embedding models.
 - Enable AIDB Knowledge Bases with GPU-accelerated embedding pipelines.
 - Serve image models (OCR, vision) as part of multi-modal retrieval systems.
 - Expose enterprise-grade model APIs to downstream applications.