Using NVIDIA NIM Microservices available on build.nvidia.com in your environment
EDB AI Factory deploys NVIDIA NIM microservices with KServe using Kubernetes manifests. This guide provides ready-to-use YAML templates for the following NIM types:
- Chat Completion (LLM)
- Text Embedding
- Text Reranker
- Image Embedding
- Image OCR
The process is:
- Set up NVIDIA NGC account and API key.
- Create Kubernetes secrets for NGC access and image pulls.
- Create GPU Nodes
- Deploy an InferenceService for each model instance.
- Validate
- Use in AIDB
Supported Models and GPU Requirements
Model Type | NIM Model | NVIDIA Documented Resource Requirements |
---|---|---|
Text Completion | llama-3.3-70b-instruct | 4 × L40S |
Text Embeddings | arctic-embed-l | 1 × L40S |
Image Embeddings | nvclip | 1 × L40S |
OCR | paddleocr | 1 × L40S |
Text Reranking | llama-3.2-nv-rerankqa-1b-v2 | 1 × L40S |
1. Set Up NVIDIA NGC API Key
- Go to build.nvidia.com and log in with your NVIDIA account.
- In the top-right menu, select Setup → API Key.
- Click Generate API Key.
- Copy the key and store it securely. You will need it for the Kubernetes secrets.
2. Create Kubernetes Secrets
Create the NGC API key secret:
kubectl create secret generic nvidia-nim-secrets \ --from-literal=NGC_API_KEY="<YOUR_NGC_API_KEY>"
3. Create GPU Nodes
Ensure your Kubernetes cluster has nodes that satisfy the minimum GPU requirement for your chosen NIM model(s).
Node label:
nvidia.com/gpu: "true"
Node taint:
Key: nvidia.com/gpu
Value: "true"
Effect: NoSchedule
These settings ensure workloads requiring GPUs are scheduled correctly and only on GPU-enabled nodes.
4. Deploy the following KServe resources
Download the YAML file:
- Chat Completion Runtime: llm-runtime.yaml
- Chat Completion Service: llm-service.yaml
- Text Embedding Runtime: embed-runtime.yaml
- Text Embedding Service: embed-service.yaml
- Text Reranker Runtime: rerank-runtime.yaml
- Text Reranker Service: rerank-service.yaml
- Image Embedding Runtime: image-embed-runtime.yaml
- Image Embedding Service: image-embed-service.yaml
- Image OCR Runtime: ocr-runtime.yaml
- Image OCR Service: ocr-service.yaml
If needed: Update <your-storage-class>
, and any resource settings to match your environment.
Apply to your cluster:
kubectl apply -f llm-runtime.yaml kubectl apply -f llm-service.yaml
Check deployment status:
kubectl get clusterservingruntime kubectl get inferenceservice
Repeat for each model type you want to deploy. Verify endpoints using the provided curl commands. Integrate with EDB Postgres AI Accelerator using the SQL commands in the registration section.
5. Validation
Find the endpoint:
kubectl get inferenceservice llama33-8b-instruct -o jsonpath='{.status.url}'
Test it:
curl -X POST "<ENDPOINT>/v1/chat/completions" \ -H "Content-Type: application/json" \ -d '{"model": "meta/llama3-8b-instruct", "messages": [{"role": "user", "content": "Tell me a story"}]}'
6. Use the Model
1. Enable AIDB
CREATE EXTENSION aidb CASCADE;
2. Register the model
SELECT aidb.create_model( 'my_nim_llm', 'nim_completions', '{"model": "meta/llama-3.3-70b-instruct", "url": "http://<ENDPOINT>:8000/v1/chat/completions"}'::JSONB );
3. Run the model
To interact with the model, execute the following query:
SELECT aidb.decode_text('my_nim_llm', 'Tell me a short, one sentence story');
decode_text ---------------------------------------------------------------------------------------- As the clock struck midnight, a single tear fell from the porcelain doll's glassy eye.
Your output may vary. You've successfully used NVIDIA NIM Microservices via the EDB AI Accelerator.
Sources
- NVIDIA NIM KServe deployment resources: https://catalog.ngc.nvidia.com/orgs/nim/resources/llm-nim-kserve
- NVIDIA NIM API reference: https://docs.nvidia.com/nim/large-language-models/latest/api-reference.html
- KServe documentation: https://kserve.github.io/website/latest/modelserving/servingruntimes/
- EDB AI Factory deployment examples: https://www.enterprisedb.com/docs/edb-postgres-ai/ai-factory/learn/how-to/model-serving/deploy-nim-container/
- Model examples and tags: build.nvidia.com model cards
Could this page be better? Report a problem or suggest an addition!