Using NVIDIA NIM Microservices available on build.nvidia.com in your environment

EDB AI Factory deploys NVIDIA NIM microservices with KServe using Kubernetes manifests. This guide provides ready-to-use YAML templates for the following NIM types:

  • Chat Completion (LLM)
  • Text Embedding
  • Text Reranker
  • Image Embedding
  • Image OCR

The process is:

  1. Set up NVIDIA NGC account and API key.
  2. Create Kubernetes secrets for NGC access and image pulls.
  3. Create GPU Nodes
  4. Deploy an InferenceService for each model instance.
  5. Validate
  6. Use in AIDB

Supported Models and GPU Requirements

Model TypeNIM ModelNVIDIA Documented Resource Requirements
Text Completionllama-3.3-70b-instruct4 × L40S
Text Embeddingsarctic-embed-l1 × L40S
Image Embeddingsnvclip1 × L40S
OCRpaddleocr1 × L40S
Text Rerankingllama-3.2-nv-rerankqa-1b-v21 × L40S

1. Set Up NVIDIA NGC API Key

  1. Go to build.nvidia.com and log in with your NVIDIA account.
  2. In the top-right menu, select Setup → API Key.
  3. Click Generate API Key.
  4. Copy the key and store it securely. You will need it for the Kubernetes secrets.

2. Create Kubernetes Secrets

Create the NGC API key secret:

kubectl create secret generic nvidia-nim-secrets \
  --from-literal=NGC_API_KEY="<YOUR_NGC_API_KEY>"

3. Create GPU Nodes

Ensure your Kubernetes cluster has nodes that satisfy the minimum GPU requirement for your chosen NIM model(s).

Node label:

nvidia.com/gpu: "true"

Node taint:

Key: nvidia.com/gpu
Value: "true"
Effect: NoSchedule

These settings ensure workloads requiring GPUs are scheduled correctly and only on GPU-enabled nodes.

4. Deploy the following KServe resources

Download the YAML file:

If needed: Update <your-storage-class>, and any resource settings to match your environment.

Apply to your cluster:

kubectl apply -f llm-runtime.yaml
kubectl apply -f llm-service.yaml

Check deployment status:

kubectl get clusterservingruntime
kubectl get inferenceservice

Repeat for each model type you want to deploy. Verify endpoints using the provided curl commands. Integrate with EDB Postgres AI Accelerator using the SQL commands in the registration section.


5. Validation

Find the endpoint:

kubectl get inferenceservice llama33-8b-instruct -o jsonpath='{.status.url}'

Test it:

curl -X POST "<ENDPOINT>/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{"model": "meta/llama3-8b-instruct", "messages": [{"role": "user", "content": "Tell me a story"}]}'

6. Use the Model

1. Enable AIDB

CREATE EXTENSION aidb CASCADE;

2. Register the model

SELECT aidb.create_model(
'my_nim_llm',
'nim_completions',
'{"model": "meta/llama-3.3-70b-instruct", "url": "http://<ENDPOINT>:8000/v1/chat/completions"}'::JSONB
);

3. Run the model

To interact with the model, execute the following query:

SELECT aidb.decode_text('my_nim_llm', 'Tell me a short, one sentence story');
Output
                                          decode_text
    ----------------------------------------------------------------------------------------
     As the clock struck midnight, a single tear fell from the porcelain doll's glassy eye.

Your output may vary. You've successfully used NVIDIA NIM Microservices via the EDB AI Accelerator.

Sources


Could this page be better? Report a problem or suggest an addition!