Using NVIDIA NIM Microservices available on build.nvidia.com in your environment

Suggest edits

EDB AI Factory deploys NVIDIA NIM microservices with KServe using Kubernetes manifests. This guide provides ready-to-use YAML templates for the following NIM types:

Chat Completion (LLM)
Text Embedding
Text Reranker
Image Embedding
Image OCR

The process is:

Set up NVIDIA NGC account and API key.
Create Kubernetes secrets for NGC access and image pulls.
Create GPU Nodes
Deploy an InferenceService for each model instance.
Validate
Use in AIDB

Supported Models and GPU Requirements

Model Type	NIM Model	NVIDIA Documented Resource Requirements
Text Completion	llama-3.3-70b-instruct	4 × L40S
Text Embeddings	arctic-embed-l	1 × L40S
Image Embeddings	nvclip	1 × L40S
OCR	paddleocr	1 × L40S
Text Reranking	llama-3.2-nv-rerankqa-1b-v2	1 × L40S

1. Set Up NVIDIA NGC API Key

Go to build.nvidia.com and log in with your NVIDIA account.
In the top-right menu, select Setup → API Key.
Click Generate API Key.
Copy the key and store it securely. You will need it for the Kubernetes secrets.

2. Create Kubernetes Secrets

Create the NGC API key secret:

kubectl create secret generic nvidia-nim-secrets \
  --from-literal=NGC_API_KEY="<YOUR_NGC_API_KEY>"

3. Create GPU Nodes

Ensure your Kubernetes cluster has nodes that satisfy the minimum GPU requirement for your chosen NIM model(s).

Node label:

nvidia.com/gpu: "true"

Node taint:

Key: nvidia.com/gpu
Value: "true"
Effect: NoSchedule

These settings ensure workloads requiring GPUs are scheduled correctly and only on GPU-enabled nodes.

4. Deploy the following KServe resources

Download the YAML file:

Chat Completion Runtime: llm-runtime.yaml
Chat Completion Service: llm-service.yaml
Text Embedding Runtime: embed-runtime.yaml
Text Embedding Service: embed-service.yaml
Text Reranker Runtime: rerank-runtime.yaml
Text Reranker Service: rerank-service.yaml
Image Embedding Runtime: image-embed-runtime.yaml
Image Embedding Service: image-embed-service.yaml
Image OCR Runtime: ocr-runtime.yaml
Image OCR Service: ocr-service.yaml

If needed: Update <your-storage-class>, and any resource settings to match your environment.

Apply to your cluster:

kubectl apply -f llm-runtime.yaml
kubectl apply -f llm-service.yaml

Check deployment status:

kubectl get clusterservingruntime
kubectl get inferenceservice

Repeat for each model type you want to deploy. Verify endpoints using the provided curl commands. Integrate with EDB Postgres AI Accelerator using the SQL commands in the registration section.

5. Validation

Find the endpoint:

kubectl get inferenceservice llama33-8b-instruct -o jsonpath='{.status.url}'

Test it:

curl -X POST "<ENDPOINT>/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{"model": "meta/llama3-8b-instruct", "messages": [{"role": "user", "content": "Tell me a story"}]}'

6. Use the Model

1. Enable AIDB

CREATE EXTENSION aidb CASCADE;

2. Register the model

SELECT aidb.create_model(
'my_nim_llm',
'nim_completions',
'{"model": "meta/llama-3.3-70b-instruct", "url": "http://<ENDPOINT>:8000/v1/chat/completions"}'::JSONB
);

3. Run the model

To interact with the model, execute the following query:

SELECT aidb.decode_text('my_nim_llm', 'Tell me a short, one sentence story');

Output

                                          decode_text
    ----------------------------------------------------------------------------------------
     As the clock struck midnight, a single tear fell from the porcelain doll's glassy eye.

Your output may vary. You've successfully used NVIDIA NIM Microservices via the EDB AI Accelerator.

Sources

NVIDIA NIM KServe deployment resources: https://catalog.ngc.nvidia.com/orgs/nim/resources/llm-nim-kserve
NVIDIA NIM API reference: https://docs.nvidia.com/nim/large-language-models/latest/api-reference.html
KServe documentation: https://kserve.github.io/website/latest/modelserving/servingruntimes/
EDB AI Factory deployment examples: https://www.enterprisedb.com/docs/edb-postgres-ai/ai-factory/learn/how-to/model-serving/deploy-nim-container/
Model examples and tags: build.nvidia.com model cards

← Prev

Using NVIDIA NIM Microservices (Hosted)

↑ Up

Using NVIDIA NIM Microservices

Pipelines PGFS

Could this page be better? Report a problem or suggest an addition!