Enabling a self-hosted model for the Migration Portal AI Copilot v1.3.2

The November 2025 Innovation Release of EDB Postgres AI is available. For more information, see the release notes.

You can use a self-hosted AI Factory model to serve the AI Copilot. This example uses NVIDIA NIM to serve the requests and Llama 3 to process it and generate an answer.

Chat completions: nvidia/llama-3_3-nemotron-super-49b-v1
Embeddings: nvidia/llama-3.2-nv-embedqa-1b-v2

Warning

There are significant safety implications to consider when using self-hosted models with Migration Copilot.

The models provided by third-party vendors like OpenAI amd Azure OpenAI include content filtering and other safeguards which are designed to reduce the risk of the model responding to, generating or contributing to unsafe content. When you use self-hosted models these additional protections are no longer present.

In addition, because you are hosting the models, you now bear responsibility for the risks and potential liability associated with any unsafe behavior.

Prerequisites

Prepare the resources your environment requires to deploy the Migration Portal AI Copilot with a self-hosted solution.

You have administrative access to the HM environment.

Your organization has created a chat completion and a text embeddings model with the Hybrid Manager's AI Factory. They have provided the endpoints for each model, which you can set as environment variables.

export COMPLETIONS_SVC=llama-3-3-nemotron-super-49b-v1
export EMBEDDINGS_SVC=llama-3-2-nv-embedqa-1b-v2

export COMPLETIONS_ENDPOINT=$(kubectl get inferenceservice $COMPLETIONS_SVC -o     jsonpath='{.status.url}')
export EMBEDDINGS_ENDPOINT=$(kubectl get inferenceservice $EMBEDDINGS_SVC -o     jsonpath='{.status.url}')

Enabling the AI Copilot

Check if the edb-migration-copilot namespace exists:
```
kubectl get namespaces edb-migration-copilot
```
The namespace is created during the installation of the Hybrid Manager. If you are enabling the AI Copilot before installing the HM, you must create the namespace in advance.
If the edb-migration-copilot namespace doesn't exist yet, create it:
```
kubectl create ns edb-migration-copilot
```
Set the following environment variables to link the secret with the model endpoints:
```
export OPENAI_API_BASE=${COMPLETIONS_ENDPOINT}/v1
export OPENAI_EMBEDDINGS_API_BASE=${EMBEDDINGS_ENDPOINT}/v1
export OPENAI_API_KEY=<openai api key> # set to a placeholder value like `noop` if models are deployed in a way that no key is required
```
Note
The AI Copilot uses OpenAI-compatible APIs to communicate with all models, including self-hosted ones. This is why some configuration parameters contain openai in their names, even when you're using a different model to serve queries.

Create the ai-vendor-secrets secret and configure it to point at the models' endpoints:

kubectl create secret generic ai-vendor-secrets \
    --namespace=edb-migration-copilot \
    --type=opaque \
    --from-literal=AI_VENDOR=NIM \
    --from-literal=RAGCHEW_OPENAI_API_BASE="${OPENAI_API_BASE}" \
    --from-literal=RAGCHEW_OPENAI_EMBEDDINGS_API_BASE="$    {OPENAI_EMBEDDINGS_API_BASE}" \
    --from-literal=OPENAI_API_KEY="${OPENAI_API_KEY}"

Create a new file called migration-portal-values.yaml with the following helm value to override the default AI vendor secrets with the secret created in the previous step.
```
parameters:
  edb-migration-copilot:
    ai_vendor_secrets: ai-vendor-secrets
```

Update the Hybrid Manager installation file to include the AI Copilot configuration. This involves either updating the YAML values you used for installation or running the helm upgrade command with the AI Copilot configuration parameters.

Helm or operator install
Helm upgrade

Add this configuration block to either the values.yaml file you used to install HM with Helm, or to the CRD you used to install HM with the operator.

[...]
parameters:
  edb-migration-copilot:
    ai_vendor_secrets: ai-vendor-secrets                    # Allows the default AI vendor secrets to be overridden by the secret created above. This *must* match the name of the above secret.
    chat_model: nvidia/llama-3.3-nemotron-super-49b-v1      # This must match the model name as listed by `${COMPLETIONS_ENDPOINT}/v1/models`.
    embeddings_model: nvidia/llama-3.2-nv-embedqa-1b-v2     # This must match the model name as listed by `${EMBEDDINGS_ENDPOINT}/v1/models`.
    chat_model_profile: llama3                              # Enables prompt rules and other parameters which can help improve the quality of responses from chat completion models of the llama3 family.
    embeddings_dimension: '"2048"'                          # This must match the size of the vectors generated by the embeddings model.
    tokenizer: huggingface                                  # Causes tokenizers from the Hugging Face Transformers library to be used for token counting.
    tokenizer_model: nvidia/llama-3.3-nemotron-super-49b-v1 # Sets the specific tokenizer to use, ensuring it corresponds to a pretrained model on Hugging Face Hub.
    chat_request_max_tokens: '"1024"'                       # Limits the total number of requested tokens to prevent excessively-verbose responses.
    default_similarity_limit: '"10"'                        # Limits the number of contextual chunks included in queries to prevent overwhelming the model with input.
    stream_no_stop_sequence: '"true"'                       # Prevents OpenAI-specific stop sequences being sent to the model. This should always be set when using NIM models in order to avoid duplicated final chunks in response streams.
[...]

Run the helm upgrade command while including the following configuration parameters. Remember to include these parameters each subsequent time you invoke helm upgrade. Otherwise, your values will be overridden with defaults.

helm upgrade \
    -n edbpgai-bootstrap \
    --install \
    [...]
    --set parameters.edb-migration-copilot.ai_vendor_secrets=ai-vendor-secrets \
    --set parameters.edb-migration-copilot.chat_model=nvidia/llama-3.3-nemotron-super-49b-v1 \
    --set parameters.edb-migration-copilot.embeddings_model=nvidia/llama-3.2-nv-embedqa-1b-v2 \
    --set parameters.edb-migration-copilot.chat_model_profile=llama3 \
    --set parameters.edb-migration-copilot.embeddings_dimension='"2048"' \
    --set parameters.edb-migration-copilot.tokenizer=huggingface \
    --set parameters.edb-migration-copilot.tokenizer_model=nvidia/llama-3.3-nemotron-super-49b-v1 \
    --set parameters.edb-migration-copilot.chat_request_max_tokens='"1024"' \
    --set parameters.edb-migration-copilot.default_similarity_limit='"10"' \
    --set parameters.edb-migration-copilot.stream_no_stop_sequence='"true"' \
   [...]

Restart the edb-migration-copilot services to trigger a reconciliation of the new values with the system.
```
kubectl rollout restart edb-migration-copilot -n edb-migration-copilot
```

Additional configuration for air-gapped installations (experimental)

When running in an air-gapped environment, Migration Copilot will fail when it tries to fetch the pre-trained tokenizer data from Hugging Face Hub. Set the following parameter to use a local snapshot of tokenizer data instead:

Helm or operator install
Helm upgrade

Include the following airgapped_mode parameter to the previously documented configuration block to either the values.yaml file you used to install HM with Helm, or to the CRD you used to install HM with the operator.

[...]
parameters:
  edb-migration-copilot:
    airgapped_mode: '"true"' # Forces a local snapshot of tokenizer data to be used instead of fetching from Hugging Face Hub.
                             # Only works if `tokenizer` and `tokenizer_model` are set to `huggingface` and
                             #`nvidia/llama-3.3-nemotron-super-49b-v1` respectively.
[...]

Include the following airgapped_mode parameter to the previously documented helm upgrade command. Remember to include all parameters each subsequent time you invoke helm upgrade. Otherwise, your values will be overridden with defaults.

helm upgrade \
    -n edbpgai-bootstrap \
    --install \
    [...]
    --set parameters.edb-migration-copilot.airgapped_mode='"true"' \
    [...]

Restart the edb-migration-copilot services.

Important

Migration Copilot only ships with tokenizer data for the nvidia/llama-3.3-nemotron-super-49b-v1 pre-trained tokenizer. Using airgapped_mode: '"true"' with tokenizer_model set to any other model will result in failure of the Migration Copilot.

← Prev

Enabling a third-party model for the Migration Portal AI Copilot

↑ Up

Enabling the Migration Portal AI Copilot

Troubleshooting Hybrid Manager