AI Factory Model Use Cases and Personas v1.3.2

The November 2025 Innovation Release of EDB Postgres AI is available. For more information, see the release notes.

This guide outlines practical implementation scenarios for AI Factory Models across different organizational personas and use cases. Each scenario provides specific implementation pathways aligned with role-based requirements.

Persona-Based Implementation Scenarios

Platform Engineers

Platform Engineers focus on infrastructure provisioning, system reliability, and operational excellence across AI model deployments.

Infrastructure Management Scenario

Objective: Establish reliable, scalable model serving infrastructure that supports multiple development teams and production workloads.

Key Requirements:

Multi-tenant resource isolation for different teams and projects
Automated scaling based on inference demand patterns
Comprehensive monitoring and alerting for proactive issue resolution
Security controls that align with organizational policies

Implementation Approach:

Infrastructure Foundation

Configure GPU-enabled Kubernetes nodes with appropriate taints and labels
Implement network policies for secure multi-tenant isolation
Establish resource quotas and limits aligned with organizational capacity planning

Operational Framework

Deploy monitoring stack integration for inference service observability
Configure automated backup procedures for model metadata and configurations
Implement disaster recovery procedures for critical model serving workloads

Governance Integration

Establish approval workflows for production model deployments
Configure security scanning pipelines for model image validation
Implement audit logging for compliance and operational forensics

Reference Documentation:

AI Engineers

AI Engineers require streamlined model deployment workflows that abstract infrastructure complexity while providing comprehensive control over model performance characteristics.

Model Deployment Optimization Scenario

Objective: Deploy and optimize large language models for production inference with minimal operational overhead.

Key Requirements:

Framework-agnostic deployment procedures that support diverse model types
Performance optimization capabilities including quantization and batch processing
A/B testing frameworks for comparing model variants
Integration with existing AI pipelines and workflows

Implementation Strategy:

Model Preparation

Register model images in the Asset Library with comprehensive metadata
Configure ServingRuntime optimizations for specific model frameworks
Establish performance baselines through systematic benchmarking

Deployment Configuration

Create InferenceServices with resource allocation aligned to model requirements
Implement auto-scaling policies based on inference demand patterns
Configure canary deployment strategies for safe production updates

Performance Monitoring

Establish model-specific metrics for latency, throughput, and accuracy tracking
Implement automated performance regression detection
Configure alerting for model performance degradation

Reference Documentation:

Application Developers

Application Developers integrate model inference capabilities into business applications while maintaining development velocity and operational simplicity.

RAG Application Development Scenario

Objective: Build retrieval-augmented generation applications that leverage private embedding models and language models within organizational infrastructure.

Key Requirements:

Standardized API interfaces compatible with existing application frameworks
Consistent model versions across development and production environments
Low-latency inference for interactive user experiences
Integration with vector databases and knowledge management systems

Development Workflow:

Model Integration

Identify approved embedding and language models through the Model Library
Configure application clients using standardized OpenAI-compatible endpoints
Implement error handling and fallback strategies for model availability

Performance Optimization

Implement caching strategies for frequently requested embeddings
Configure batch processing for bulk document embedding operations
Optimize request patterns to minimize inference latency

Production Deployment

Establish environment-specific model endpoint configurations
Implement monitoring for application-specific model usage patterns
Configure automated failover between model instances

Reference Documentation:

DevOps Engineers

DevOps Engineers focus on CI/CD integration, deployment automation, and operational reliability for model serving workloads.

Automated Model Deployment Pipeline Scenario

Objective: Implement continuous deployment pipelines that automatically update model serving infrastructure based on model registry changes.

Key Requirements:

Automated model validation and deployment workflows
Integration with existing CI/CD infrastructure and approval processes
Rollback capabilities for failed deployments or performance regressions
Comprehensive logging and audit trails for deployment activities

Pipeline Implementation:

Automation Framework

Configure webhook triggers for model registry updates
Implement automated validation pipelines for model compatibility testing
Establish deployment gates based on performance and security criteria

Deployment Orchestration

Create infrastructure-as-code templates for InferenceService deployments
Implement blue-green deployment strategies for zero-downtime updates
Configure automated rollback procedures based on health check failures

Monitoring Integration

Establish deployment success/failure metrics and alerting
Implement performance comparison dashboards for pre/post deployment analysis
Configure audit logging for deployment activities and approval workflows

Reference Documentation:

Security Engineers

Security Engineers implement comprehensive security controls for model serving infrastructure while ensuring compliance with organizational and regulatory requirements.

Secure Model Serving Architecture Scenario

Objective: Establish security controls that protect model assets and inference data throughout the model serving lifecycle.

Key Requirements:

Zero-trust network architecture for model inference traffic
Comprehensive audit logging for model access and usage patterns
Vulnerability management for model images and serving infrastructure
Data privacy controls for inference requests and responses

Security Implementation:

Access Control Framework

Implement role-based access control for model registration and deployment operations
Configure network policies for secure multi-tenant model isolation
Establish API authentication and authorization for inference endpoints

Vulnerability Management

Configure automated security scanning for model images during registration
Implement vulnerability remediation workflows for identified security issues
Establish security approval gates for production model deployments

Audit and Compliance

Configure comprehensive audit logging for all model operations
Implement data retention policies aligned with regulatory requirements
Establish compliance reporting capabilities for security assessments

Reference Documentation:

Cross-Functional Use Cases

Enterprise Sovereign AI Implementation

Scenario: Large enterprise implementing comprehensive AI governance while maintaining operational efficiency across multiple business units.

Stakeholders: Platform Engineers, Security Engineers, Compliance Teams

Implementation Requirements:

Multi-tenant model serving infrastructure supporting diverse business requirements
Comprehensive governance framework ensuring regulatory compliance
Integration with existing enterprise identity and access management systems
Performance optimization across different model types and usage patterns

Solution Architecture:

Infrastructure Layer

Deploy Hybrid Manager across multiple Kubernetes clusters for geographic distribution
Implement federated model registry with synchronized governance policies
Configure cross-cluster networking for model sharing and load balancing

Governance Integration

Establish enterprise-wide approval workflows with business unit delegation
Implement automated compliance reporting for regulatory requirements
Configure security policies aligned with enterprise risk management frameworks

Operational Excellence

Deploy comprehensive monitoring across all model serving infrastructure
Implement automated capacity planning based on historical usage patterns
Establish incident response procedures for model serving disruptions

Hybrid Cloud Model Deployment

Scenario: Organization deploying models across on-premises and cloud infrastructure while maintaining consistent operational procedures.

Stakeholders: Platform Engineers, DevOps Engineers, Network Engineers

Implementation Considerations:

Network connectivity and latency optimization between deployment environments
Consistent security policies across different infrastructure providers
Data locality requirements for model inference and training workloads
Disaster recovery capabilities spanning multiple infrastructure tiers

Architecture Components:

Federated Hybrid Manager deployment across cloud and on-premises infrastructure
Unified model registry with environment-specific deployment configurations
Cross-environment monitoring and alerting integration
Consistent backup and disaster recovery procedures

Implementation Considerations

Resource Planning

Model serving infrastructure requires substantial computational resources, particularly for large language models and high-throughput inference workloads. Organizations should establish capacity planning procedures based on anticipated model sizes and usage patterns.

Resource Allocation Guidelines:

GPU memory requirements scale directly with model parameter counts
CPU allocation affects preprocessing and postprocessing performance
Network bandwidth impacts high-throughput inference scenarios
Storage performance affects model loading and checkpoint operations

Operational Complexity

Managing diverse model types across different frameworks requires substantial operational expertise. Establish comprehensive training programs and operational procedures before scaling to production workloads.

Operational Best Practices:

Implement standardized deployment procedures across different model types
Establish clear escalation procedures for operational issues
Maintain comprehensive documentation for troubleshooting and maintenance
Regular operational reviews to identify improvement opportunities

Performance Optimization

Model inference performance depends on multiple factors including hardware configuration, model characteristics, and application usage patterns. Implement systematic performance testing and optimization procedures.

Optimization Strategies:

Establish performance baselines for all deployed models
Implement automated performance regression detection
Configure resource allocation based on actual usage patterns
Regular performance reviews and optimization cycles

These use cases provide concrete implementation pathways for different organizational roles while highlighting the comprehensive capabilities of AI Factory Models across diverse deployment scenarios.

← Prev

Deployment Overview

↑ Up

AI Factory Models

Model Serving FAQ

AI Factory Model Use Cases and Personas v1.3.2

Persona-Based Implementation Scenarios

Platform Engineers

Infrastructure Management Scenario

AI Engineers

Model Deployment Optimization Scenario

Application Developers

RAG Application Development Scenario

DevOps Engineers

Automated Model Deployment Pipeline Scenario

Security Engineers

Secure Model Serving Architecture Scenario

Cross-Functional Use Cases

Enterprise Sovereign AI Implementation

Hybrid Cloud Model Deployment

Implementation Considerations

Resource Planning

Operational Complexity

Performance Optimization

← Prev

↑ Up

Next →