Prerequisites for AI Factory on Hybrid Manager v1.3.2

The November 2025 Innovation Release of EDB Postgres AI is available. For more information, see the release notes.

Infrastructure Requirements

Kubernetes Cluster Foundation

AI Factory requires a properly configured Hybrid Manager Kubernetes cluster with sufficient resources for AI workloads. The cluster must support GPU scheduling and have appropriate node groups configured for different workload types.

Cluster Requirements

Kubernetes 1.27 or later with GPU device plugin support
NVIDIA GPU operator installed for GPU node management
Sufficient CPU and memory for orchestration components
Network policies supporting service mesh communication

GPU Node Configuration

GPU resources are essential for model serving operations. Node configuration must align with model requirements and expected workload characteristics.

GPU Node Requirements

NVIDIA GPUs with CUDA 12.1+ support
GPU nodes labeled with nvidia.com/gpu=true
GPU taint nvidia.com/gpu for dedicated scheduling
Sufficient GPU memory for target model sizes

Model Resource Requirements

Model Type	Example Model	GPU Requirements	Container Image
Text Completion	llama-3.3-nemotron-super-49b	4 x L40S	nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1:1.8.5
Text Embeddings	llama-3.2-nemoretriever-300m	1 x L40S	nvcr.io/nim/nvidia/llama-3.2-nemoretriever-300m-embed-v1:latest
Image Embeddings	nvclip	1 x L40S	nvcr.io/nim/nvidia/nvclip:latest
OCR	paddleocr	1 x L40S	nvcr.io/nim/baidu/paddleocr:latest
Text Reranking	llama-3.2-nv-rerankqa-1b	1 x L40S	nvcr.io/nim/nvidia/llama-3.2-nv-rerankqa-1b-v2:latest

Reference the NVIDIA NIM documentation for detailed resource specifications per model.

Registry Configuration

Internet-Connected Deployments

For clusters with internet access, configure NVIDIA NGC registry authentication to enable model image pulls.

NGC API Key Configuration

Obtain an NGC API key following the NVIDIA NGC documentation.

Create the required secrets:

# Set your NGC API key
NGC_API_KEY=<your-ngc-api-key>

# Create model runtime secret
kubectl -n default create secret generic nvidia-nim-secrets \
    --from-literal=NGC_API_KEY=${NGC_API_KEY}

# Enable secret replication across namespaces
kubectl -n default annotate secret nvidia-nim-secrets \
    replicator.v1.mittwald.de/replicate-to='m-.*'

Image Pull Secret Configuration

Configure Docker registry authentication for image pulls:

# Create image pull secret
kubectl -n default create secret docker-registry ngc-cred \
    --docker-server=nvcr.io \
    --docker-username='$oauthtoken' \
    --docker-password=${NGC_API_KEY}

# Enable secret replication
kubectl -n default annotate secret ngc-cred \
    replicator.v1.mittwald.de/replicate-to='m-.*'

Air-Gapped Deployments

For environments without internet access, use the hub guides to prepare images and deployment assets in advance, and configure private registry access in Hybrid Manager.

Model Image Migration

Prepare and mirror required model images to your private registry following these hub references:

Private registries and image governance: Model Library
KServe manifests and deployment flow: Using NVIDIA NIM in your environment

Model Registry Updates

Update default model references to point to your private registry. Work with your Model Library configuration and HM API/console paths.

Profile Caching (NIM)

Some NVIDIA NIM models use runtime profiles that must be available locally for offline operation. Follow NVIDIA’s documentation for profile discovery and caching strategies.

NVIDIA NIM docs: https://docs.nvidia.com/nim/
Hub usage patterns: Using NVIDIA NIM in your environment
Air Gapped Cache

Network Requirements

Service Communication

AI Factory components require specific network configurations for inter-service communication.

Network Policies

Allow traffic between model serving pods and application namespaces
Enable ingress for external model endpoint access
Support service mesh communication for observability

DNS Resolution

Cluster-local DNS for internal service discovery
External DNS for public endpoint access if required

Storage Access

Ensure network connectivity to required storage systems.

Object Storage

S3-compatible storage for model artifacts and profiles
Sufficient bandwidth for model download operations
IAM roles or credentials for storage access

Security Prerequisites

RBAC Configuration

Configure role-based access control for AI Factory operations.

Required Roles

Namespace administrator for project-level management
Model operator for deployment operations
Read-only access for monitoring and observability

Certificate Management

Prepare SSL certificates for secure communication.

Certificate Requirements

TLS certificates for model endpoint exposure
Internal CA for service mesh communication
Certificate rotation procedures for compliance

Next Steps

With prerequisites satisfied, proceed to:

GPU Setup Guide for detailed GPU configuration
Model Library Configuration for registry integration
Create First InferenceService to deploy initial model

For planning assistance, consult:

↑ Up

AI Factory in Hybrid Manager

Deploy AI Factory with Hybrid Manager UI