Preconfigure a Rancher RKE2 cluster for use with Hybrid Manager v1.3

After you've installed the necessary system tools, you're almost ready to set up a Rancher RKE2 cluster for use with Hybrid Manager (HM).

But first, you need the Helm chart by which both preconfiguration and installation depend upon.

Configure your Helm chart

The Helm chart (values.yaml) is the core configuration for your HM platform and central to installation.

Throughout the preconfiguration process and installation phase, there are steps that refer to updating the Helm chart values.yaml. These are critical to a successful installation phase after preconfiguration.

Adding the EDB Helm repo and retrieving the chart

You just need your EDB Cloudsmith token to add the repo and get the chart to configure.

  1. Add the Helm chart repo from EDB Cloudsmith:

    helm repo add enterprisedb-edbpgai "https://downloads.enterprisedb.com/<your-EDB-Cloudsmith-token>/pgai-platform/helm/charts/"

    ...where <your-EDB-Cloudsmith-token> is your EDB Cloudsmith repository access token.

  2. Update the repo:

    helm repo update
  3. Retrieve the default Helm chart: values.yaml file:

    helm show values enterprisedb-edbpgai/edbpgai-bootstrap > values.yaml

Creating your RKE2 cluster

You need a RKE2 cluster to preconfigure and install HM upon. The RKE2 cluster must preconfigured precisely for a successful HM installation.

The rest of this guide walks you through the complete preconfiguration process.

Supported RKE2:Kubernetes version pairs

  • HM currently supports:

    • RKE2 1.31.X with Kubernetes 1.31.X
    • RKE2 1.32.X with Kubernetes 1.32.X

Once you know which of these configurations you are going to implement, start the cluster creation process either using your CSP for cloud-based deployments or using your standard on-premises workflow.

Configure node size and resources

To support HM and its associated components, the nodes in your Rancher RKE2 cluster must meet the resource requirements.

Please see the Rancher RKE2 for how to set up these master nodes before moving on to the worker nodes that run HM.

Node sizing

With the RKE2 master nodes and Kubernetes cluster created, you can move onto the two sets of worker nodes required for HM: the HM Control Plane nodes and HM Postgres Data Nodes:

HM Control Plane nodes

HM Control Plane (CP) worker nodes must meet the minimum requirements for resource allocation (number of worker nodes and CPU, memory, and disk size for each) as outlined below.

The size of the telemetry stack (particularly Prometheus and Thanos) you require for your cluster is the primary factor driving scaling up/out for HM CP nodes. Because the telemetry stack scales with the number of Postgres databases the CP is monitoring, CP nodes scale with the number of databases being managed.

  • General recommendations:
    • 3 CP nodes each with:
      • CPU:
        • Minimum: 8 vCPUs (for up to 10 Postgres databases)
        • Recommended: 16 vCPUs for medium-sized clusters (10-50 Postgres databases)
        • For >50 Postgres databases: 16+ vCPUs
      • Memory:
        • Minimum: 32 GB RAM (for up to 10 Postgres databases)
        • Recommended: 64 GB RAM or more for larger workloads (10-50 Postgres databases)
        • For >50 Postgres databases: 64+ GB RAM
      • Disk
        • Minimum: 100 GB SSD (for up to 10 Postgres databases)
        • Recommended: 200 GB SSD (for 10-50 Postgres databases)
        • For >50 Postgres databases: >200 GB SSD

Postgres data nodes

At least three worker nodes for the Postgres data nodes are recommended for smaller workloads (5-20 Postgres databases). As you scale up Postgres databases beyond around 20, you may require more worker nodes for the Postgres workloads and eventually even more CP nodes for supporting the increasing number of Postgres databases.

  • General recommendations:

    • 6 Postgres data nodes each with:
      • CPU:
        • Recommended: 16 vCPU
      • Memory:
        • Recommended: 32 GB of RAM per node
      • Disk:
        • Minimum: 100 GB of persistent storage per node (adjust based on database and logging requirements). Use fast disks (SSD) for optimal performance.

GPU Node Provisioning and Operator Setup (Optional for AI/ML)

If you plan to leverage GPU capabilities for model deployment, follow these steps to provision and enable the GPU nodes:

Rancher Machine Pool Workaround

The Rancher UI may miss certain instance types with GPUs when creating a machine pool. Use this workaround to provision the required nodes:

Create a machine pool with any standard instance type, and set the initial replica count to 0.

Update the machine pool configuration manually to specify the required GPU instance type (e.g., g6e.12xlarge). Rancher should accept the missing instance type during this update phase.

Scale up the machine pool to the desired number of GPU-enabled nodes.

Install NVIDIA GPU Operator

After the GPU nodes are provisioned, you must install the NVIDIA GPU Operator to make the GPU resources available to Kubernetes pods:

Install the NVIDIA GPU Operator following the official Rancher RKE2 documentation.

After installation, you can label these new nodes appropriately and leverage them for deploying AI/ML models.

Taints and labels for CP vs Postgres data nodes machine sets

To use different types of nodes for the HM CP nodes and the HM Postgres data nodes hosting the Postgres workloads, RKE2 requires two different machine sets, each with their own taints and labels:

  • CP nodes taints and labels:

    spec:
        replicas: 3
        template:
        spec:
            metadata:
            labels:
                edbaiplatform.io/control-plane: "true"
        taints:
        - key: edbplatform.io/control-plane
            value: "true"
            effect: NoSchedule
  • Postgres data nodes taints and labels:

    spec:
        replicas: 3
        template:
        spec:
            metadata:
            labels:
                edbaiplatform.io/postgres: “true”
            taints:
            - key: edbplatform.io/postgres
                value: "true"
                effect: NoSchedule

Networking bandwidth

Ensure each node has adequate networking capacity to handle HM communication and external data transfer (for example, S3 backups).

Baseline recommendations:

ComponentRecommended bandwidthJustification
K8s control plane nodes1 Gbps+ per nodeHandles internal Kubernetes traffic, API server requests, logs/metrics, and orchestration tasks.
Worker nodes (HM)1 Gbps+ per nodeFor HM control components, PostgreSQL replication, internode communication, and metrics/logs.
External bandwidth1-20 Gbps (aggregated)S3 backups and inter-cluster replication may require high throughput.

Network configuration and requirements

Foundational planning and networking architecture

Before you finish creating your Kubernetes cluster, you must make a series of high-level network and firewall decisions. These choices form the foundation for the rest of the deployment process. The overall sequence is the same whether you are deploying in the cloud or on-premises, but the mechanics differ.

High-level network decisions

Decide on the fundamental characteristics of your network.

  • Network stack: Is your network IPv4 or Dual Stack (IPv4 + IPv6)?

  • CNI/network type: Choose your Container Network Interface (CNI) implementation, e.g. a standard SDN overlay (Calico, Cilium, Flannel) or an OVN-based network (a standard SDN overlay using Calico is recommended).

  • DHCP vs static: Confirm whether DHCP will assign node addresses or if you must assign static IPs.

For cloud-based deployments, these decisions often map to managed options in your CSP's cluster creation wizard.

For on-premises deployments, your network team must configure your VLANs, subnets, routing, and DHCP servers.

IP Address and CIDR planning

Based on the architecture, now you must allocate address space carefully to avoid overlaps. This is your network blueprint.

Static IPs

Static IPs:

  • API virtual IP (for HA control plane / API endpoint)

  • Ingress virtual IP (used by on-prem LBs like MetalLB, or by hardware LBs)

  • Default gateway IP

  • DNS server IP(s)

  • NTP server IP(s)

Network ranges
  • Machine Network CIDR: For the underlay/pnetwork.

  • Management Network CIDR: For administrative access.

  • Cluster Network CIDR: For pod IP addresses within Kubernetes.

  • Service Network CIDR: For internal Kubernetes services.

Downstream Cluster Space

Reserve a /16 IPv4 block for each downstream cluster if your platform requires it.

Core cluster networking

With your network plan in place, the next step is to enable the cluster’s internal communication fabric and secure it at the pod level. This involves installing or confirming your Container Network Interface (CNI) and applying baseline NetworkPolicies.

Configure container network interface (CNI)

A working CNI is essential for pods to communicate with each other and for network policies.

Every Kubernetes cluster requires a CNI to provide pod-to-pod networking.

Cloud deployments
  • EKS (AWS): Uses the AWS VPC CNI by default, which assigns pods secondary IPs from your VPC. This integrates natively with AWS networking and security groups. If you need advanced NetworkPolicy or eBPF features, you can add Calico (policy-only) or install Cilium.

  • GKE (Google Cloud): Uses Google’s managed dataplane by default, which supports NetworkPolicy and integrates with Google’s firewalling. Third-party CNIs are rarely required.

On-prem deployments

Ensure your chosen CNI (Calico is preferred) is installed and configured correctly according to its documentation.

RKE2 ships with Canal (Calico + Flannel) enabled by default. To use another CNI, disable Canal during installation and deploy your preferred CNI.

This step makes the cluster's internal network functional.

Warning

Do not proceed until all nodes report Ready, pods receive IPs from the Pod CIDR, and cross-node pod networking is working.

Apply baseline NetworkPolicies

Once the CNI is active, immediately establish baseline isolation policies before exposing services externally.

  • Start with a default deny ingress policy in each namespace.

  • Add allow rules for:

    • DNS egress to CoreDNS.

    • Control plane → Postgres data node communication.

      Metrics and logging pipelines.

This ensures the cluster starts from a secure, least-privilege baseline.

For cloud-based deployments, your CNI may need policy enforcement enabled (e.g., Calico-policy mode for AWS VPC CNI).

For on-premises scenarios, Calico, Cilium, and OVN-K enforce NetworkPolicies natively.

Create a load balancer

If you are deploying HM to the cloud, install or enable the appropriate load balancer controller, then apply a Service or Ingress resource to trigger creation of the load balancer.

External access and firewall rules

Now that the cluster's internal network is running, configure how external traffic will reach the services running inside it.

On-premises deployments (NodePort or custom LB)

For on-premises deployments, external access is typically provided through NodePort services.

  • By default, HM components expose services on specific NodePort values.
  • You may use NodePort directly, or front these ports with MetalLB (software load balancer) or a hardware load balancer for a friendlier DNS name and better failover.

Required ports (default NodePort values):

  • 32542 – HM Portal (HTTP)
  • 30288 – HM Portal (HTTPS)
  • 30290 – Beacon gRPC
  • 30292 – Spire TLS

If you need to change these defaults, update your Helm chart values.yaml under parameters.upm-istio-gateway:

ingress_http_node_port: <port>
ingress_https_node_port: <port>
ingress_grpc_tls_node_port: <port>
ingress_spire_tls_node_port: <port>

Then open up your firewall for the redefined ports.

Cloud-based deployments (LoadBalancer)

If you are deploying HM in the cloud (EKS, GKE), use Kubernetes Service: LoadBalancer or an Ingress resource.

On EKS, install the AWS Load Balancer Controller; on GKE, the controller is built in.

Required ports (cloud load balancer)

  • 443 – HM Portal (HTTPS ingress)

  • 8444 – HM internal API

  • 9443 – Beacon gRPC API

  • 9445 – Spire TLS

DNS, TLS, and application configuration

With the external access path defined, you can now assign user-friendly names to your services and secure them.

DNS configuration

Load balancer or Node Port configuration

If you are deploying on-premises, you do not set loadBalancerEnabled to true. Instead configure your Helm chart values.yaml accordingly:

beaconAgent:
  provisioning:
    loadBalancersEnabled: false
    nodePortDomain: "<your-node-port-domain>"
Note

The nodePortDomain value is used as the URL for all Postgres instances. It should be a DNS name pointing to the IP addresses of nodes where Postgres clusters are running.

Note

If using an external on-premises load balancer strategy in combination with Node Port strategy, nodePortDomain should point to the FQDN of a load balancer pointing to the Postgres nodepools. In this way the generated URL for each Postgres instance is working. If a load balancer controller is being used, such as metallb(https://metallb.io/), it should be configured differently to properly route traffic to the appropriate nodepool services.

Necessary DNS records

Point your public and internal domain names to the entry point you configured in the previous step.

  • Ensure your Helm chart values.yaml fields under parameters are configured after deciding on a root domain name for HM:

  • global.portal_domain_name: "portal.<root_domain>: The host name for the HM Portal.

  • upm-beacon.server_host: "beacon.<root_domain>": The host name through which the beacon server API is reachable.

  • transporter-rw-service:domain_name: "transporter.<root_domain>": The domain name for the internal Transporter migration read/write service.

  • transporter-dp-agent:rw_service_url: "transporter.<root_domain>/transporter"": The URL for the internal Transporter migration read/write service.

  • Create a DNS A-record for your portal domain (portal_domain_name) pointing to your cloud-based load balancer IP or the Node IPs covered by your nodePortDomain.

TLS certificate management

Secure your endpoints (such as the HM Portal) using TLS certificates.

This requires DNS hostnames to be finalized (see previous step) so the certificates can be issued for the correct names.

Custom certificates are strongly suggested, with auto-generated self-signed certificates as the default fallback option.

You can also set up a custom cert-manager issuer for the HM Portal and even setup your own certificate authority.

User identity configuration

To configure user identity, you have to choose between implementing your own IdP or using HM's native users feature.

It is strongly recommended to set up your identity provider IdP) with HM for managing user access in production.

Native users are supported by default, but managing users this way is not recommended in production.

To create your first native user (user0),

You need to user's email, hash, userID, userID, and username:

ParameterDescription
emailEmail address of the user. Also serves as the user's login identifier for accessing the Console.
hashBcrypt-hashed user password for password store. To generate this value, use echo ${password} | htpasswd -BinC 10 userA | cut -d: -f2, where the actual password is stored behind the ${password} variable. userA represents the username used during the password hashing process. It can be any arbitrary text, as it's not used elsewhere in the configuration; only the resulting hash is utilized.
userIDEach new user configured with a static password must have a distinct, unique identifier for the user (uuid). You can choose to generate this value with a UUID generator tool or assign a random sequence of characters manually.
usernameUnique username for the user. This is the primary identifier for logging into the Console. It can be the same as email.

Then set these values in the Helm chart values.yaml under pgai:portal:authentication:staticPasswords.

Network security policies hardening

With services exposed and TLS/DNS configured, refine your NetworkPolicies and firewall rules to enforce stricter boundaries.

Network policies

Create Kubernetes NetworkPolicy resources to control which pods can communicate with each other. This is crucial for securing traffic between the control plane, data nodes, and other system components.

Ingress/egress rules

Harden the boundaries. Validate and enforce ingress/egress rules on your firewall to ensure that only necessary traffic is allowed into and out of the cluster, restricting all other access.

Sync EDB Postgres AI Platform container images into a customer owned registry

The software stack of HM is pushed into EDB Cloudsmith registry to provide artifacts for you to use in your local registry: customer-managed internal registry for Rancher RKE2 on-premises scenarios or your customer-managed registry on your cloud service provider (CSP) registry (AWS Elastic Container Registry(ECR), Google Cloud Artifact Registry (AR) for HM on Rancher on CSP scenarios.

You are required to have your own secure and approved local registry (you use the URI, user, and password for your local container registry in the sync process below) and know the EDB PGAI version that you want to install. With this information, all the artifacts from Cloudsmith sync internally into your local registry before installing or upgrading the software stack with the Helm chart.

The sync process needs to preserve the container images' SHA256 to ensure images security and immutability across different environments.

Sync using edbctl

  1. Ensure edbctl is installed and configured if it is not already.

  2. Configure the necessary environmental variables:

  • Define the EDB PGAI release to be taken:

    export EDBPGAI_RELEASE=<EDB-pgai-release-version>
  • Define the EDB Cloudsmith access token:

    export CS_EDB_TOKEN=<your-Cloudsmith-token>
  • Define the EDB Cloudsmith registry source:

    export EDB_SOURCE_REGISTRY=pgai-platform
  • Define you local container registry, user, and passowrd:

    export LOCAL_REGISTRY_URI=<your_local_container_registry_uri
    export LOCAL_REGISTRY_USER=><your_local_registry_user>
    export LOCAL_REGISTRY_PWD=<your_local_registry_password_for_your_user>
  1. Run the sync-to-local-registry command:

    edbctl image sync-to-local-registry \
        --destination-registry-uri "${LOCAL_REGISTRY_URI}" \
        --version "${EDBPGAI_RELEASE}" \
        --source-registry-username "${EDB_SOURCE_REGISTRY}" \
        --source-registry-password "${CS_EDB_TOKEN}" \
        --destination-registry-username "${LOCAL_REGISTRY_USER}" \
      --destination-registry-password "${LOCAL_REGISTRY_PWD}"
  2. Sync the EDB PGAI Operator image to the destination registry:

    edbctl operator sync-to-local-registry \
        --destination-registry-uri "${LOCAL_REGISTRY_URI}" \
        --version "${EDBPGAI_RELEASE}" \
        --source-registry-username "${EDB_SOURCE_REGISTRY}" \
        --source-registry-password "${CS_EDB_TOKEN}" \
        --destination-registry-username "${LOCAL_REGISTRY_USER}" \
        --destination-registry-password "${LOCAL_REGISTRY_PWD}"

Your local registry is now synced with EDB's Cloudsmith registry.

Set the containerRegistryURL in your Helm chart

Be sure to set your containerRegistryURL to your now-synced local container registry's URL in the Helm chart values.yaml:

containerRegistryURL: "<your-local-container-registry-url>"

Image discovery configuration

Image discovery is a process that runs in the Agent (beacon) to discover Postgres images. This process connects to your customer-managed local container registry (see previous step) which must be OCI compliant. Any OCI compliant registry is supported with HM.

Configuring the Helm chart for image discovery

  1. To enable image discovery for HM, first change the value of beaconAgent.provisioning:imageDiscovery to true in your values.yaml.

  2. Now set the beaconAgent.provisioning:imagesetDiscoveryContainerRegistryURL to the local container registry you synced the EDB images to in the previous step, as this is the container registry from which HM discovers Postsgres container images.

  3. Optionally, set imagesetDiscoveryAllowInsecureRegistry option to to true, if you are planning on establishing a TLS connection without certificate validation.

Registry credentials

The image discovery process authenticates with your local container image registry using a Kubernetes image pull secret (see next step). Therefore, the service account or principal used must have permissions to:

  • list repositories
  • list tags
  • read tag manifests

See the documentation for your local internal container registry (such as quay) for on-premises scenario (set in previous step) configuration instruction to configure the required registry permissions for the container registry so that HM can access it.

If using a CSP-based local registry for your images (AWS Elastic Container Registry, Google Cloud's Artifact Registry (AR)), use the following examples as a guide:

AWS ECR

When using ECR with EKS, eks_managed_identity is the only supported authentication type. Before using eks_managed_identity, you must create a role having permission AmazonEC2ContainerRegistryReadOnly then associate that role with your EKS cluster pod identity:

EKS_CLUSTER_NAME="<eks_cluster_name>"
EKS_CLUSTER_REGION="<eks_cluster_region>"
IMAGE_DISCOVERY_IAM_ROLE_NAME="<iam_role_name>"

cat <<EOF > ./image-discovery-trust.json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowEksAuthToAssumeRoleForPodIdentity",
            "Effect": "Allow",
            "Principal": {
                "Service": "pods.eks.amazonaws.com"
            },
            "Action": [
                "sts:AssumeRole",
                "sts:TagSession"
            ]
        }
    ]
}
EOF

aws iam create-role --role-name "${IMAGE_DISCOVERY_IAM_ROLE_NAME}" \
    --assume-role-policy-document file://image-discovery-trust.json

aws iam attach-role-policy --role-name "${IMAGE_DISCOVERY_IAM_ROLE_NAME}" \
    --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly

IMAGE_DISCOVERY_IAM_ROLE_ARN=$(aws iam get-role --role-name ${IMAGE_DISCOVERY_IAM_ROLE_NAME} | jq -r '.Role.Arn')
aws eks create-pod-identity-association --cluster-name "${EKS_CLUSTER_NAME}" \
    --namespace upm-beacon \
    --service-account upm-beacon-agent-k8s \
    --role-arn "${IMAGE_DISCOVERY_IAM_ROLE_ARN}" \
    --region "${EKS_CLUSTER_REGION}"

Google Cloud AR

When using Google Cloud AR with Google Kubernetes Engine (GKE), the service account used for generating the image pull secret must have the following roles:

  • roles/artifactregistry.reader on the project level
  • roles/browser on the project level—in particular, the permission resourcemanager.projects.list is required to allow retrieving repositories within AR:
gcloud projects add-iam-policy-binding <PROJECT-ID> \
    --member="serviceAccount:<SERVICE-ACCOUNT-NAME>@<PROJECT-ID>.iam.gserviceaccount.com" \
    --role="roles/artifactregistry.reader"

gcloud projects add-iam-policy-binding <PROJECT-ID> \
    --member="serviceAccount:<SERVICE-ACCOUNT-NAME>@<PROJECT-ID>.iam.gserviceaccount.com" \
    --role="roles/browser"

For more information, see the Google Cloud AR roles documentation and Resource Manager roles documentation.

Setting the secret value

The final step in configuring image discovery is confirming the parameters.upm-beacon:image_discovery_secret_name. This is the name of the Kubernetes secret containing the credetials of the container registry for image discovery. By default, it is edb-cred.

Namespaces and preliminary secrets

Namespace for HM

Create a dedicated namespace for HM components to ensure isolation and manageability.

Preparation for object storage and general Kubernetes secrets

Create Kubernetes secrets for any required credentials, such as object storage credentials (example: aws_secret_access_key), database access tokens, or any other sensitive information.

ImagePullSecret namespace and required secrets

Use edbctl to create the ImagePullSecrect namespace and required secrets:

  1. Create the necessary namespaces and pull secrets:

    edbctl image-pull-secret create \
      --username <container registry username> \
      --passowrd <container registry passowrd> \
      --registry <local registry URI>
  2. When prompted with Proceed? [y/N] with the current Kubernetes context, select y.

    You should then see something like the following example:

    2025/02/10 10:10:10 Creating Kubernetes Namespaces and ImagePullSecrets with the provided credentials...
    2025/02/07 15:29:08 Namespaces and ImagePullSecrets creation completed
  3. Create the ImagePullSecret list:

    edbctl image-pull-secret list

    You should then see something like the following example output:

    Current Kubernetes context is: <your-KubeContext>
    Namespace edbpgai-bootstrap: exists, all set!
      Secret edb-cred: exists, all set!
    Namespace upm-replicator: exists, all set!
      Secret edb-cred: exists, all set!

Authentication and security keys

HM and its underlying components require secure authentication mechanisms to ensure proper communication between components and to protect sensitive data.

Generate the key and store it in a secret

  1. Generate AES-256 encryption key.

    HM uses an AES-256 encryption to secure sensitive data during communication or at rest (for example, database credentials, tokens). To generate a random AES-256 encryption key:

    export AES_256_KEY=$(openssl rand -base64 32)
  2. Store the key in Kubernetes.

    To make the key accessible to HM and associated services, create a Kubernetes secret in the appropriate namespace:

    1. Run the following command to create the secret:

      kubectl create secret generic hm-auth-key \
          --namespace <hm-namespace-created-above> \
          --from-literal=aes-256-key=$AES_256_KEY
    2. Verify the secret:

      kubectl get secret hm-auth-key --namespace <hm-namespace>

Update the Helm chart

In the Helm chart, values.yaml, set the field parameters.upm-istio-gateway.cooke_aeskey to the AES-256 encryption key you generated in the previous step.

Other necessary secrets

  1. Create a secret for GenAI Builder and configure Delta Lake object storage.

  2. Create a secret for Catalog.

Securing Migration Portal

Create custom secrets for Migration Portal if you want to secure internal communication for Migration Portal.

Storage and cluster preparation

Block storage configuration

HM uses Block Storage for all primary, stateful workloads. A strategic approach to using Kubernetes Storage Classes to manage performance and cost is recommended.

Workload segregation

Define and use custom Storage Classes to match the storage I/O profile to the specific type of workloads:

HM componentStorage typeI/O requirementExample optimization
HM Control Plane (CP)Internal DBs and microservices (such as Thanos)Moderate IOPS, High throughputStandard SSD tier for internal state.
HM Postgres data nodesPrimary database I/OHigh IOPS, low latencyPremium/high-performance SSD tier (crucial for production)

Snapshot class (optional, CSI-capable backends only)

In addition to Storage Classes for provisioning volumes, you may also configure a Kubernetes VolumeSnapshotClass if your CSI driver supports snapshots.

  • This defines how PersistentVolume snapshots are created and managed.
  • Only one or two snapshot classes are usually required (e.g., one per storage backend).
  • On-prem with TopoLVM/local CSI: snapshots are not supported. Use HM’s object-store backup mechanism instead.
  • Cloud deployments or enterprise on-prem CSI (e.g., Portworx, Ceph RBD) generally support snapshots.
Note

Snapshots complement object-store backups but do not replace them. Use snapshots for fast rollback and recovery within the same cluster; use object-store backups for long-term retention or disaster recovery.

Cluster prerequisites and driver verification

With your Block Storage strategy and optional snapshot strategy outlined: storage classes identified for the HM CP and Postgres data node workloads (see previous step) and and optional VolumeSnapshotClasses confirmed, you are ready to implement Block Storage and optional snapshots by installing the appropriate CSI driver.

Cluster prerequisites

Ensure your cluster supports dynamic volume provisioning.

  • In cloud deployments, this is typically enabled by default.
  • In on-premises deployments, verify that your chosen CSI driver (e.g., TopoLVM, Portworx, Ceph RBD) is properly installed and configured.

CSI driver installation

The chosen StorageClass dictates the required Container Storage Interface (CSI) driver, which enables the cluster to dynamically provision Persistent Volumes.

  • Persistent storage driver (required):

    • On-prem (primary use case): Use a local CSI driver such as TopoLVM for node-attached storage, or an enterprise driver such as Portworx or Ceph RBD if available.
    • Cloud deployments (alternative): Use the CSI driver provided by your CSP (e.g., AWS EBS, GCE PD).
  • Snapshot controller (optional, CSI-capable backends only):

    • If your CSI driver supports snapshots, you can enable the Kubernetes CSI Snapshot Controller and configure a VolumeSnapshotClass. This allows fast, volume-level snapshots for operational recovery.
    • On-prem (TopoLVM/local CSI): Snapshots are not supported. Use HM’s built-in object-store backup and restore for data protection.
    • On-prem (enterprise CSI with snapshot support): You may enable the snapshot controller if the driver supports it.
    • Cloud: Most cloud CSI drivers support snapshots. You can use snapshots for short-term rollback, but still use object-store backups for cross-cluster recovery and long-term retention.

Update the Helm chart

With your chosen StorageClasses created, update the Helm chart values.yaml:

global:
  storage_class: <hm-cp-storage-class>

for HM internal services.

KMS key setup

If you are planning on using Transparent Data Encryption (TDE), configuring a Key Management Store (KMS) is required.

Object storage configuration (MinIO setup)

HM requires S3-compatible Object Storage (like MinIO) for data protection and archival. This bucket stores all non-block-storage data:

  • Postgres WALs and Backups (enabling Point-in-Time Recovery).

  • Managed Storage Locations.

  • Archived Logs and Metrics.

MinIO user and policy creation

Set up a dedicated bucket, user, and policy for the HM platform.

  1. Download the MinIO Client mc and configure mc client for managing your MinIO instance.

  2. Define envrionmental variables: Replace the bracketed values with your desired names and credentials.

    export MINIO_DEPLOYMENT_NAME=<minio_deployment_name> # deployment name to MinIO
    export BUCKET_NAME=<minio_bucket_name> # bucket name
    export AWS_ACCESS_KEY_ID=<minio_user_name> # user name
    export AWS_SECRET_ACCESS_KEY=<minio_user_password> # user's password
    export MINIO_POLICY_NAME=<minio_policy_name> # policy to MinIO
  3. Create a MinIO bucket using the mc mb command.

  4. Create a new user:

mc admin user add ${MINIO_DEPLOYMENT_NAME} ${AWS_ACCESS_KEY_ID} ${AWS_SECRET_ACCESS_KEY}
  1. Create one MinIO policy:
cat << EOF > policy.json
{
    "Version" : "2012-10-17",
    "Statement": [
        {
            "Effect" : "Allow",
            "Action" : [
                "s3:*"
            ],
            "Resource" : [
                "arn:aws:s3:::${BUCKET_NAME}",
                "arn:aws:s3:::${BUCKET_NAME}/*"
            ]
        }
    ]
}
EOF
  1. Apply the policy:
mc admin policy create ${MINIO_DEPLOYMENT_NAME} ${MINIO_POLICY_NAME} ./policy.json
  1. Attach the policy to the user:

    mc admin policy attach ${MINIO_DEPLOYMENT_NAME} ${MINIO_POLICY_NAME} --user ${AWS_ACCESS_KEY_ID}

Apply the secret for bucket access

After preparing the dedicated user for HM access to the bucket, create and apply the following secret that associates the created user with HM to provide access for object storage:

apiVersion: v1
kind: Secret
metadata:
  name: edb-object-storage # name cannot be changed
  namespace: default # namespace cannot be changed
stringData:
    auth_type: credentials

    # Optional: Used only when the object storage server's certificate is not issued by a well-known CA
    #
    # Base64 string of the CA bundle for the certificate used by the object storage server
    aws_ca_bundle_base64: <aws_ca_bundle_base64>

    # Required: Endpoint URL to the object storage
    aws_endpoint_url_s3: <endpoint-url-to-object-storage>

    # Required: AWS Static Credentials - AWS_ACCESS_KEY_ID
    aws_access_key_id: <AWS_ACCESS_KEY_ID>

    # Required: AWS Static Credentials - AWS_SECRET_ACCESS_KEY
    aws_secret_access_key: <AWS_SECRET_ACCESS_KEY>

    # Required: Bucket name
    bucket_name: <bucket_name>

    # Required: Region of the bucket
    aws_region: <aws_region>

    # Optional: true or false
    # When server-side encryption is disabled, set this to true. By default, its value is false, indicating that server-side encryption is enabled.
    server_side_encryption_disabled: <boolean>

Proceed to Installation

With the cluster configured, proceed to the installation phase[../installing.mdx].