Prerequisites for AI Factory on Hybrid Manager v1.3

Infrastructure Requirements

Kubernetes Cluster Foundation

AI Factory requires a properly configured Hybrid Manager Kubernetes cluster with sufficient resources for AI workloads. The cluster must support GPU scheduling and have appropriate node groups configured for different workload types.

Cluster Requirements

  • Kubernetes 1.27 or later with GPU device plugin support
  • NVIDIA GPU operator installed for GPU node management
  • Sufficient CPU and memory for orchestration components
  • Network policies supporting service mesh communication

GPU Node Configuration

GPU resources are essential for model serving operations. Node configuration must align with model requirements and expected workload characteristics.

GPU Node Requirements

  • NVIDIA GPUs with CUDA 12.1+ support
  • GPU nodes labeled with nvidia.com/gpu=true
  • GPU taint nvidia.com/gpu for dedicated scheduling
  • Sufficient GPU memory for target model sizes

Model Resource Requirements

Model TypeExample ModelGPU RequirementsContainer Image
Text Completionllama-3.3-nemotron-super-49b4 x L40Snvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1:1.8.5
Text Embeddingsllama-3.2-nemoretriever-300m1 x L40Snvcr.io/nim/nvidia/llama-3.2-nemoretriever-300m-embed-v1:latest
Image Embeddingsnvclip1 x L40Snvcr.io/nim/nvidia/nvclip:latest
OCRpaddleocr1 x L40Snvcr.io/nim/baidu/paddleocr:latest
Text Rerankingllama-3.2-nv-rerankqa-1b1 x L40Snvcr.io/nim/nvidia/llama-3.2-nv-rerankqa-1b-v2:latest

Reference the NVIDIA NIM documentation for detailed resource specifications per model.

Registry Configuration

Internet-Connected Deployments

For clusters with internet access, configure NVIDIA NGC registry authentication to enable model image pulls.

NGC API Key Configuration

Obtain an NGC API key following the NVIDIA NGC documentation.

Create the required secrets:

# Set your NGC API key
NGC_API_KEY=<your-ngc-api-key>

# Create model runtime secret
kubectl -n default create secret generic nvidia-nim-secrets \
    --from-literal=NGC_API_KEY=${NGC_API_KEY}

# Enable secret replication across namespaces
kubectl -n default annotate secret nvidia-nim-secrets \
    replicator.v1.mittwald.de/replicate-to='m-.*'

Image Pull Secret Configuration

Configure Docker registry authentication for image pulls:

# Create image pull secret
kubectl -n default create secret docker-registry ngc-cred \
    --docker-server=nvcr.io \
    --docker-username='$oauthtoken' \
    --docker-password=${NGC_API_KEY}

# Enable secret replication
kubectl -n default annotate secret ngc-cred \
    replicator.v1.mittwald.de/replicate-to='m-.*'

Air-Gapped Deployments

For environments without internet access, use the hub guides to prepare images and deployment assets in advance, and configure private registry access in Hybrid Manager.

Model Image Migration

Prepare and mirror required model images to your private registry following these hub references:

Model Registry Updates

Update default model references to point to your private registry. Work with your Model Library configuration and HM API/console paths.

Profile Caching (NIM)

Some NVIDIA NIM models use runtime profiles that must be available locally for offline operation. Follow NVIDIA’s documentation for profile discovery and caching strategies.

Network Requirements

Service Communication

AI Factory components require specific network configurations for inter-service communication.

Network Policies

  • Allow traffic between model serving pods and application namespaces
  • Enable ingress for external model endpoint access
  • Support service mesh communication for observability

DNS Resolution

  • Cluster-local DNS for internal service discovery
  • External DNS for public endpoint access if required

Storage Access

Ensure network connectivity to required storage systems.

Object Storage

  • S3-compatible storage for model artifacts and profiles
  • Sufficient bandwidth for model download operations
  • IAM roles or credentials for storage access

Security Prerequisites

RBAC Configuration

Configure role-based access control for AI Factory operations.

Required Roles

  • Namespace administrator for project-level management
  • Model operator for deployment operations
  • Read-only access for monitoring and observability

Certificate Management

Prepare SSL certificates for secure communication.

Certificate Requirements

  • TLS certificates for model endpoint exposure
  • Internal CA for service mesh communication
  • Certificate rotation procedures for compliance

Next Steps

With prerequisites satisfied, proceed to:

  1. GPU Setup Guide for detailed GPU configuration
  2. Model Library Configuration for registry integration
  3. Create First InferenceService to deploy initial model

For planning assistance, consult: