Frequently Asked Questions - AI Factory on Hybrid Manager v1.3.2

The November 2025 Innovation Release of EDB Postgres AI is available. For more information, see the release notes.

Platform Capabilities

What types of models does HM 1.3 support?

Hybrid Manager 1.3 supports Large Language Model (LLM) deployments exclusively through NVIDIA NIM containers. Traditional machine learning models (classification, regression, time-series forecasting) are not supported in this release.

Supported NVIDIA NIM model categories:

Text Generation: Large language models for chat and completion tasks
Text Embeddings: Models for semantic search and RAG applications
Text Reranking: Models for search result optimization
Multimodal Models: Vision models including CLIP and OCR capabilities

Can I deploy custom models?

Custom models must be packaged as NVIDIA NIM containers to be compatible with HM 1.3. Standard machine learning frameworks (scikit-learn, XGBoost, TensorFlow for traditional ML) are not supported. Custom LLMs can be deployed if they conform to NIM container specifications and API standards.

See Private Registry Integration for custom NIM deployment procedures.

What distinguishes AI Factory from cloud AI services?

AI Factory provides complete sovereignty over AI operations:

Models execute within your Kubernetes infrastructure
Data remains within organizational boundaries
No external API dependencies for inference
Complete audit trails for regulatory compliance

Installation and Setup

What are the minimum infrastructure requirements?

Core Requirements

Kubernetes 1.27+ with NVIDIA GPU operator
NVIDIA GPUs compatible with NIM containers (L40S, A100, H100)
100GB+ object storage for model artifacts
Network connectivity to NVIDIA NGC registry (or air-gapped configuration)

GPU Requirements by NIM Model Type

Text completion (Llama 3.3 70B): 4 x L40S GPUs
Text embeddings: 1 x L40S GPU
Text reranking: 1 x L40S GPU
Vision models: 1 x L40S GPU

Consult Prerequisites Guide for comprehensive specifications.

How do I configure GPU nodes for NIM models?

GPU node preparation involves:

Install NVIDIA GPU operator on the cluster
Label GPU nodes with nvidia.com/gpu=true
Apply GPU taints for dedicated scheduling
Verify CUDA compatibility for NIM requirements

Detailed instructions available in GPU Setup Documentation.

Can AI Factory operate in air-gapped environments?

Air-gapped deployments require advance preparation:

Mirror NVIDIA NIM images to private registry
Download and cache model profiles
Upload profiles to object storage
Configure Model Library for private registry access

Complete procedures documented in Air-Gap Configuration.

Model Management

How do I deploy NVIDIA NIM models?

NIM model deployment workflow:

Access Model Library in HM console
Select NVIDIA NIM model from catalog
Configure resources (GPU allocation, memory, replicas)
Deploy InferenceService to project namespace
Access through generated endpoints

Step-by-step guide: Create InferenceService.

Which NVIDIA NIM models are available by default?

Default NIM models in HM 1.3:

llama-3.3-nemotron-super-49b: Advanced reasoning and chat
llama-3.2-nemoretriever-300m-embed: Text embeddings
llama-3.2-nv-rerankqa-1b: Query-document reranking
nvclip: Multimodal embeddings
paddleocr: Optical character recognition

How do I manage NIM model versions?

Version management strategies:

Model Library maintains version tags for each NIM image
Blue-green deployments enable zero-downtime updates
Canary deployments allow gradual traffic shifting
Rollback through InferenceService configuration updates

Gen AI Builder

Where do I find Gen AI documentation?

Primary Gen AI resources:

Gen AI Hub - Comprehensive reference
Concepts - Component overview
Architecture - System design

How do I create knowledge bases for RAG?

Knowledge base creation process:

Configure data sources (databases, documents, APIs)
Process content through NIM embedding models
Store vectors in PostgreSQL with pgvector
Configure retrieval strategies for search

Implementation guide: Knowledge Base Creation.

What are assistants and how do they work?

Assistants orchestrate interactions between users, knowledge bases, and external systems. They leverage NIM models for generation while maintaining conversation context through threads. Assistants differ from simple chatbots by incorporating retrieval, tool use, and structured reasoning capabilities.

Operations and Maintenance

How do I monitor NIM model performance?

Monitoring encompasses:

Metrics Collection

Prometheus metrics for inference latency
GPU utilization and memory consumption
Token generation throughput
Request success rates

Visualization

Grafana dashboards integrated in HM console
Custom panels for model-specific metrics
Alert configuration for SLA breaches

Reference: Model Observability.

How should I handle NIM model updates?

Update procedure for production deployments:

Validation: Deploy new version in development namespace
Testing: Execute performance and accuracy tests
Deployment: Implement canary or blue-green strategy
Monitoring: Track metrics during transition
Decision: Complete rollout or rollback based on metrics

Troubleshooting

NIM model fails to start - diagnostic steps?

Common initialization failures:

GPU unavailability: Verify GPU resources match model requirements
Image pull failures: Check NGC credentials and network connectivity
Profile cache missing: Ensure profiles available in air-gapped setups
Insufficient memory: Validate memory allocation for model size

Diagnostic commands:

kubectl describe inferenceservice <name> -n <namespace>
kubectl logs <pod-name> -n <namespace>
kubectl get events -n <namespace>

High inference latency - optimization strategies?

Performance optimization approaches:

Batch processing: Increase batch size for throughput optimization
Model quantization: Use INT8 quantization where supported
Response caching: Cache frequent queries at application layer
Horizontal scaling: Deploy additional replicas for load distribution

Poor retrieval quality in RAG applications?

Retrieval troubleshooting:

Embedding quality: Verify appropriate NIM embedding model selection
Document chunking: Adjust chunk size and overlap parameters
Search parameters: Tune top-k and similarity thresholds
Index completeness: Confirm all documents processed successfully

Security and Compliance

How do I implement access control?

Role-based access control for AI resources involves:

Kubernetes RBAC for namespace and resource permissions
Model Library access controls for deployment authorization
API key management for external endpoint access
Network policies for inter-service communication

What encryption is implemented?

Encryption coverage:

At rest: Kubernetes secrets encryption, database encryption
In transit: TLS for API calls, mTLS within service mesh
Model artifacts: Encrypted object storage
Knowledge bases: Encrypted vector storage in PostgreSQL

Which operations are audited?

Audit logging captures:

NIM model deployment and configuration changes
Inference requests (configurable detail level)
Knowledge base queries and updates
Assistant conversations via thread tracking
Administrative operations on AI resources

Performance and Scaling

How do I configure resource quotas?

Resource quotas prevent resource exhaustion at the namespace level. Configure GPU quotas, memory limits, and storage constraints based on project requirements and available infrastructure capacity.

When should I scale horizontally versus vertically?

Horizontal Scaling (additional replicas):

High concurrent request volume
Stateless inference workloads
Load distribution requirements

Vertical Scaling (increased resources per instance):

Large model memory requirements
Batch processing optimization
Single-request latency minimization

What model sizes can HM 1.3 support?

Model size constraints:

Single GPU: Models up to 13B parameters
Multi-GPU: Models up to 70B+ parameters using tensor parallelism
Memory limits: 80GB (A100), 48GB (L40S) per GPU

NVIDIA NIM handles model sharding and parallelism automatically based on available resources.

Additional Resources

Getting Started

Implementation Guides

Troubleshooting Resources

Model Verification

For issues not addressed here, contact EDB support or consult the AI Factory Hub.

← Prev

Use Cases and Personas for AI Factory on Hybrid Manager

↑ Up

AI Factory in Hybrid Manager

GPU Recommendations for Default NIM Models