Planning your architecture Innovation Release

Overview

Role: CTO / Architect / Lead Engineer

Prerequisites

  • A list of the business goals that your architecture plan should enable (examples: bugdet constraints, desired uptime, and desired latency)

Outcomes

  • An Architectural Decision Record (ADR) defining the topology, locality, and redundancy model of your Hybrid Manager (HM) architecture. (at minimum: architecture diagrams with notes)

  • Initial inputs for the HM configuration — either a HybridControlPlane Custom Resource manifest (Operator method, recommended) or a values.yaml file (Bootstrap method).

Note

You, as the customer, ultimately own your deployment architecture. While EDB's Sales Engineering, Professional Services, Support Team, or documentation can be consulted, the final architectural decisions rest with your team.

Next phase: Phase 2: Gathering your system requirements

Architectural discovery

The goal of architectural discovery is to navigate and then document the necessary decisions to successfully deploy Hybrid Manager (HM). These decisions form the blueprint for Gathering your system requirements (Phase 2) and Preparing your environment (Phase 3).

The accompanying questions cover a broad set of considerations extending beyond just the database layer. This guide should be viewed from two perspectives:

  • Current state: Where are your existing database and application workloads today?

  • Target state: Where do you intend to deploy HM immediately, and where do you plan to expand over the next 1-2 years?

Recommendation: Acquiring and reviewing diagrams of your current and target state is the most efficient way to complete this phase.

Locality: Where will HM live?

Understanding the physical or logical locations of your database and dependent applications is crucial for determining the necessary architecture.

  • Questions to answer:

    • Where is the current database solution located in terms of cloud regions (CSP) or physical data centers (on-premises)?

    • Where are the dependent application workloads for these databases located?

    • Are there upstream layers of dependency, and where are those located?

  • Analysis:

    • Locality determines the initial scope of the deployment (for example, single cloud region vs. multi-region).
    • If you plan to span multiple regions, clouds, or hybrid cloud environments, Postgres Distributed is likely the appropriate database service recommendation.
    • The locality of upstream applications is key to minimizing network latency.

Disaster recovery (hot/cold)

Disaster recovery (DR) ensures business continuity across different locations.

  • Questions to answer:

    • How is disaster recoveryas a subset of business continuityaccomplished across these locations, or is there an additional location assigned specifically as disaster recovery?
    • Is there an additional location assigned specifically as DR?
    • How is DR capability validated, and how often?
  • Analysis:

    • Having a dedicated secondary location indicates a strong architectural requirement.
    • If no formal DR practice exists, the HM DBaaS far-away replica solution may provide new capabilities.

Activeness (Active/Passive vs. Active/Active)

Activeness describes how your distributed locations are utilized for critical workloads.

  • Questions to answer:

    • If you have multiple locations, how does the critical dependent workload utilize these systems?
    • Is one location active and the other passive for transaction processing (OLTP)?
    • Is one location active for OLTP, and the other active for analytical processing (OLAP/BI)?
  • Analysis:

    • If your target state requires simultaneous writes to multiple database instances (that is, true active/active across locations), Postgres Distributed is the required solution due to its multi-writer capability..
    • Understanding whether a location is passively waiting (cold standby) or actively running (hot standby) helps define resource requirements and recovery time objectives (RTO).
    • Business continuity: The architectural choices around active/passive, active/active, and standby models must balance the organization's tolerance for downtime/data loss against the cost of maintaining redundant systems.

These topics naturally follow the discussion of Activeness and help complete the picture of your application ecosystem.

  • Ingress traffic routing in terms of the consuming application.
  • Replication at various application layers.
  • Caching layers (and their location relative to the database).
  • Session demands (for example, is session replication handled at the application layer?).

Lifecycle operations

Understanding your operations practices helps determine the complexity of the Kubernetes environment required to manage the database service.

  • Questions to answer:

    • Do you utilize life cycle operations patterns such as Blue/Green or Canary?
    • How do you handle DML/DDL updates (data and schema) vs. engine upgrades (major versions)?
    • What pre-production environments (staging, development, testing) are required?
  • Analysis:

    • Practices like Blue/Green deployment align well with the zero-downtime features offered by EDB's database solutions.
    • The number of pre-production environments directly influences the total cluster count and resource sizing defined in Gathering your system requirements.

Supported platforms

HM and Kubernetes have a 1:1 relationship—each HM deployment requires its own dedicated Kubernetes cluster. The Kubernetes cluster must be dedicated to HM in its current version; sharing with other workloads is not supported.

  • Amazon EKS (Elastic Kubernetes Service)
  • Microsoft Azure AKS (Azure Kubernetes Service)
  • Google GKE (Google Kubernetes Engine)
  • ROSA (Red Hat OpenShift Service on AWS)
  • RHOS (Red Hat OpenShift Container Platform)
  • Rancher RKE (Rancher Kubernetes Engine / RKE2)
Note

The customer is responsible for the full life cycle management of the Kubernetes cluster: provisioning, deploying, upgrading, and scaling.

HM distributed reference architecture

The HM distributed reference architecture represents the ultimate goal for achieving the highest levels of scale and fastest SLAs. It typically spans multiple data centers.

HM reference architecture

Diagram legend reference

The legend defines the colors and logical groupings used in the architecture diagram:

  • Locality: The highest-level physical or logical grouping, such as a physical data center or a geographical region (for example, "City 1" and "Data Center 1").
  • Kubernetes Cluster: The complete Kubernetes environment—including all CP and worker nodes—hosting the entire platform.
  • EDB HM: The logical boundary for the core HM components. This is typically implemented as a dedicated Kubernetes namespace (for example, control-plane).
  • Compute Machine: The virtual machines (for example, vm01, vm02, vm03) that serve as the Kubernetes worker nodes, providing the CPU, memory, and storage for the cluster.
  • Infrastructure Abstraction: This critical layer represents Kubernetes-native resources that abstract underlying physical or virtual infrastructure. These resources must be provided by the Kubernetes cluster's environment.
    • Example 1: type: LoadBalancer: This is a Kubernetes Service type that requests an external load balancer. In public cloud environments (like AWS, GCP, Azure), this is automatically provisioned as a managed service. In on-premises or bare-metal deployments, you must provide a solution (like MetalLB) to fulfill these LoadBalancer requests.
    • Example 2: StorageClass: This resource abstracts the "Block Storage" and "Object Storage" requirements. It maps Kubernetes storage requests (Persistent Volume Claims) to actual, provisioned storage hardware or software (like local-pv, Ceph, vSphere, or cloud-based disks).

HM Control Plane vs. Data Plane

AttributeControl Plane (CP)Data Plane (DP)
RoleOrchestration, configuration, and observabilityPostgres cluster hosting and replication
ProtocolHTTP (REST / gRPC) via Istio service meshPostgres wire protocol (TCP) per PG cluster
Multi-location topology & trafficStar — Primary is mandatory hub across locations; Primary pushes Control outbound to each Secondary, Secondaries return Logs to PrimaryOptional mesh — locations may be fully isolated or directly peered for Postgres replication; no hub required
Lifecycle isolationAll CPs share management context; CP upgrades affect the full estateEach DP is an independent failure, security, and upgrade domain
DR roleSecondary CPs provide management continuityDPs are the replication targets; hot/cold standby model determined per deployment
Scenarioscore, migration, aidbaas, analytics, ai

Protocol and ingress

The simplest way to distinguish the two planes is by protocol.

The Control Plane is entirely EDB product, and every component in it speaks HTTP. That HTTP traffic is served and secured by the Istio service mesh, which provides a single, centralized ingress surface across all CP services. Authentication, mTLS, and traffic policy are all enforced at the Istio layer before a request reaches any CP component.

The Data Plane is customer-provisioned Postgres, and every cluster in it speaks the Postgres wire protocol (PSQL). There's no middleware in the path. Each Postgres cluster listens on its own dedicated TCP connection — there's no shared proxy or mesh aggregating these connections. This is a deliberate design choice: dedicated ingress per cluster maximizes both availability and the security isolation boundary between customer workloads.

In short: HTTP CP ingress is centralized via Istio; PSQL DP ingress is dedicated per Postgres cluster.


Lifecycle isolation

Lifecycle isolation is the most operationally significant distinction between the two planes.

The CP and every DP it manages are on independent upgrade lifecycles. EDB publishes patches on a monthly cadence for the Long Term Support (LTS) release. Customers are strongly encouraged to apply these promptly, and the architecture makes this practical: upgrading the CP has no effect on running customer Postgres services on the DP. There's no maintenance window for Postgres availability when the CP is patched.

These monthly CP patches carry two categories of content:

  1. CP component security patches — hardening the orchestration layer itself.
  2. New Postgres container images — updated images for Postgres engine versions, extensions, and the container host OS, all with current security patches applied.

Once new images are available in the CP, it's the customer's decision when to use the CP's orchestration capabilities to roll those images out to their DP Postgres deployments. This preserves full customer control over Postgres upgrade timing while ensuring that hardened images are always available on demand.


Availability and disaster recovery

CP availability and DP availability are independent concerns, and the two planes warrant different recovery strategies.

Control Plane

The CP doesn't sit in the data path of customer Postgres traffic. A CP outage doesn't interrupt connections to provisioned Postgres clusters — running workloads continue unaffected. The CP has a robust DR capability: with disciplined operational practice (regular backups, documented runbooks), a CP can be restored in under one hour.

Data Plane

Because the DP carries live Postgres traffic, higher availability targets require active replication across failure domains — optimally across regions. Postgres replication across DP locations (where network peering is provisioned) allows the DP to survive a regional failure without any impact to application connectivity.

Availability zones

Both planes can take full advantage of availability zones within a location. Every HM deployment location may contain CP elements, DP elements, or both. Any location can be recovered from disaster using backups. The difference is ceiling: the CP's recovery is a restore operation measured in minutes to an hour, while the DP's Postgres clusters have the architectural capacity to remain continuously available through a disaster by maintaining live replicas in other failure domains.

Deployment architectures

Use reference architectures A-D below as reference models to decide which topology matches your "Target state."

Note

This legend above also applies to reference architectures A-D below.

Minimum Control Plane

The minimum install colocates the HM Control Plane (CP) on the Kubernetes control nodes.

This is fully functional for:

  • Centralizing a view of your Postgres/Oracle Estate.
  • Database migration capabilities.
  • GenAI (limited capabilities due to lack of managed Postgres instances).

HM minimum

Internal architecture: HM Control Plane

HM is composed of several core microservices running within the Kubernetes cluster. Understanding these components is helpful for planning resource allocation and security boundaries.

Architectural dependencies

The architecture diagrams above reference several external components. While you verify the specific hardware/software requirements for these in Phase 2: Gathering your system requirements, you must account for their connectivity in your architectural design.

  • Identity provider (IdP): Required for user authentication. The architecture relies on OIDC (LDAP/SAML) for all human access.
  • Key Management Service (KMS): (Optional) Required only if your security policy demands Transparent Data Encryption (TDE).
  • Object Storage: Required for system resilience. It hosts backups, logs, and facilitates data replication for Multi-Location topologies.
  • Block Storage: Required for database performance. Your storage architecture must provide persistent volumes (PVCs) for the Postgres data layer.
  • Local network: The fabric connecting the CP to Data Plane. Latency here drives your Locality decisions.
  • Container Registry: The source of truth for application images. For air-gapped designs, this represents your local synchronized registry.

HM Control Plane + Data Plane

Sitting alongside the HM CP is the HM Data Plane (DP). This is where your actual database workloads reside.

  • Postgres clusters: The actual database instances (Primary and Standbys).

  • Extensions: PostGIS, PGVector, and other database extensions.

  • Backup agents: Local tools (like Barman) managing WAL archiving to your Object Storage.

HM Data Plane

This view shows a fully capable HM deployment, including resources like GPU acceleration for AI workloads.

HM fully featured


Multi-location topologies in EDB Hybrid Manager

EDB Hybrid Manager supports multi-location deployments through two distinct but complementary network topologies — one for the Control Plane and one for the Data Plane. Each topology reflects the role and protocol of its plane.

The multi-location capability is a DBaaS offering following a hub and spoke model.

  1. As a DBaaS offering, secondary HMs have a reduced capability set compared to the primary.
  2. The primary HM controls the Secondary.
  3. Connectivity is established using load-balanced endpointsnot a network mesh service, like Submariner for example.

HM multi-location


Control Plane: star topology

CP star topology

The Control Plane is arranged in a star topology: one designated Primary CP sits at the center and up to five Secondary CPs radiate outward from it.

Traffic follows a strict directional pattern. Control flows outward from the Primary to each Secondary — configuration, orchestration commands, and lifecycle operations all originate at the hub. Logs flow inward, returning from each Secondary back to the Primary. Secondaries have no lateral visibility: they can't control one another, and they can't self-govern. All authority is centralized in the Primary.

This hub-and-spoke authority model has an important consequence. Because each Secondary CP is managed by the Primary but is otherwise autonomous in terms of what workloads it hosts, any Secondary can serve an entirely different engagement. Common examples include:

  • Dev/test vs. production — separate Secondaries enforce environment isolation at the infrastructure level
  • One customer vs. another — multi-tenant or MSP deployments where workload separation is a contractual or security requirement
  • On-premises vs. cloud — a single Primary can govern CPs deployed in a private data center alongside CPs deployed in one or more CSPs
  • Geo-distributed regions — each Secondary can reside in a different geographic region, managed from a single operational center of gravity The star is connected exclusively over private network paths — internal LoadBalancer endpoints accessed via VPC/VNet peering, Transit Gateway, VPN, or equivalent. No Secondary is reachable over a public endpoint.

Data Plane: optional mesh topology

DP optional mesh topology

The Data Plane topology is optional mesh: any two Data Plane locations may be left entirely isolated, or they may be directly interconnected via VPC/VNet peering for Postgres-level replication. There's no hub, no mandatory center, and no required connectivity between locations.

When two Data Planes are peered, the network path enables direct Postgres replication between clusters hosted in each location. This is the foundation for two distributed Postgres topologies supported by EDB:

  • EDB Postgres Distributed (PGD) — active/active multi-writer replication across locations, suitable for applications requiring zero RPO and near-zero RTO across regions. See the PGD documentation.
  • CloudNativePG distributed — streaming replication across Kubernetes clusters and failure domains using CloudNativePG's distributed topology. See the CloudNativePG distributed documentation. Data Planes that aren't peered remain fully independent — separate security domains, separate failure domains, separate upgrade timelines. Peering is additive: it introduces a replication path without changing the isolation characteristics of either plane for any other purpose.

The optional nature of the mesh means the Data Plane topology is shaped entirely by the customer's replication and availability requirements, not by any architectural mandate from HM itself.


Relationship between the two topologies

The CP star and DP mesh are independent overlays on the same set of physical or cloud locations. A location can host a Secondary CP, a Data Plane, or both. The CP star governs what is deployed and how it is managed; the DP mesh governs how Postgres data moves between locations. Neither topology constrains the other.

Control PlaneData Plane
ShapeStar — mandatory hubOptional mesh — peer-to-peer
Hub requiredYes — Primary CPNo
Cross-node trafficControl out, Logs inPostgres replication (when peered)
Isolation modelSecondaries are independent engagement boundariesEach DP is an independent failure and security domain
Network requirementPrivate path to Primary required for all SecondariesPrivate path only between peered DPs

Choosing an installation scenario

HM provides a comprehensive suite of capabilities by default. To meet strict security standards and organizational governance, HM supports a modular installation through "scenarios". This allows you to deploy a curated subset of features, effectively reducing the software footprint for security audits and streamlining the UI by removing unauthorized or unlicensed components.

Note

Scenarios are intended for advanced production deployment planning. For pilots, proof of concepts (PoCs), and initial evaluations, We recommend a full installation to maintain maximum optionality and ensure all integrated capabilities are available for testing. If this is the case, omit the scenarios configuration parameter, as this enables all available scenarios by default.

Available scenarios

The four available scenarios are:

ModuleDescriptionIncluded capabilities
coreRequired. The foundational layer for all deployments.Estate management, Observability, DBaaS provisioning, and system services (Cert-manager, Istio, etc.).
migrationEnables schema and data migration tools.Migration Portal integration, Data Migration Service (DMS), and other migration services.
analyticsLarge-scale data processing and cataloging.Lakehouse cluster management, Data Catalog.
aiTools for building and serving generative AI.Sovereign AI, model serving (kserve), Langflow, and GenAI builders.
dbaasDatabase clusters management.Postgres cluster management, including provisioning, scaling, and updates.
Note

From 2026.4.0 onwards, the dbaas, database cluster management capabilities are now controlled by a dedicated dbaas scenario, separated from the core scenario. Existing installations upgrading from 2026.3 must explicitly add dbaas to their scenarios list to retain Postgrescluster management functionality.

Planning your selection

When choosing which scenarios to install, consider the following architectural and operational factors:

Functional footprint: While the baseline resource usage is similar across scenarios, each scenario introduces specific services and endpoints. Limiting your installation to required scenarios simplifies security audits and reduces the "attack surface" of your production environment.

UI and feature governance: The HM console dynamically hides navigation links and tools (such as GenAI Builder or Data Catalog) for disabled scenarios. This ensures users only interact with authorized and licensed capabilities.

Default behavior: If the scenarios parameter is omitted from your configuration, the system installs all scenarios by default to maintain backward compatibility.

Decide whether your environment requires a full installation or a targeted subset.

Important

Currently, modifying installation scenarios after deployment isn't supported. Ensure you install all required capabilities, as you can't add or remove them once HM is deployed.

Impact on configuration

The decisions made during this discovery process directly determine the some of the root parameters of your installation configuration.

While you do not need to create the file yet, your Architecture Decision Record should specify the values for these keys. The SRE/Admin builds on these inputs, by either recording them or beginning/continuing the configuration file in Phase 2: Gathering your system requirements and/or uses these values to build the configuration file in Phase 4: Preparing your environment.

Values needed for the two installation methods.

As noted, HM supports two installation methods: the recommended Operator method and the legacy Bootstrap method. However, the information and field values you gather in Phases 1–2 are the same regardless of which method you choose. The fields where you input the values are different, as final configuration file format differs depending on the installation method:

  • Operator method (recommended): Uses a HybridControlPlane Custom Resource manifest with the edb-hcp-operator helm chart.
  • Bootstrap method (deprecated): Uses a values.yaml file with the edbpgai-bootstrap helm chart.

Configuration details

Architecture decisionHybridControlPlane CR field (Operator)Config parameter (values.yaml) (Bootstrap)Example value
Kubernetes Platformspec.flavoursystemaks, eks, gke, rhos
Target locationspec.componentsParameters.upm-beacon.beacon_location_idparameters.upm-beacon.beacon_location_idaws-us-east-1
Provisioning modespec.beaconAgent.provisioning.providerbeaconAgent.provisioning.providerazure, aws, or gcp
Installation scenariosspec.scenarios (YAML list)scenarios (comma-separated string, e.g. "core,migration,ai,analytics,dbaas")core, migration, ai, analytics, dbaas

Impact on configuration file

The following YAML snippets show how your architectural decisions above map to the final configuration file structures. There's an example for each installation type.

Operator method (HybridControlPlane CR)

apiVersion: edbpgai.edb.com/v1alpha1
kind: HybridControlPlane
metadata:
  name: edbpgai
spec:
  flavour: <Kubernetes_Flavor> # for example, rhos, rke2, aks, eks, gke
  imageRegistry: <Container_Registry_Domain>/pgai-platform
  version: <Version>
  scenarios: # Omit to install all scenarios by default
    - core
    - migration # remove if not needed
    - ai        # remove if not needed
    - analytics # remove if not needed
    - dbaas     # remove if not needed
  componentsParameters:
    upm-beacon:
      beacon_location_id: <Deployment_Location_Name>
  beaconAgent:
    provisioning:
      provider: <Provider_Name> # Azure, AWS, GCP
      openshift: <Boolean_Value> # Defaults to false, set to true if deploying on RHOS

Bootstrap method (values.yaml)

system: <Kubernetes_Flavor> # for example, rhos, rke2, aks, eks, gke
bootstrapImageName: <Container_Registry_Domain>/pgai-platform/edbpgai-bootstrap/bootstrap-<Kubernetes_Flavor>
bootstrapImageTag: <image-tag-version>
scenarios: "core,migration,ai,analytics,dbaas"
parameters:
  upm-beacon:
    beacon_location_id: "<Deployment_Location_Name>" # Identified in Phase 1: a simple string which will be a hint in the UI to identify this location.
beaconAgent:
    provisioning:
        openshift: <Boolean_Value> # Defaults to `false`, set to true if deploying on RHOS or ROSA

Next phase

Your architecture is defined and ideally recorded in an ADR for reference.

Proceed to Phase 2: Gathering your system requirements → to verify that your infrastructure can match your design in your ADR.