Architecture overview v6.1.0

Suggest edits

EDB Postgres Distributed (PGD) is a distributed database solution that extends PostgreSQL's capabilities, enabling highly available and fault-tolerant database deployments across multiple nodes. PGD provides data distribution with advanced conflict management, data-loss protection, high availability up to five 9s, and throughput up to 5X faster than native logical replication.

PGD is built on a multi-master foundation (bi-directional replication, or BDR) which is then optimized for performance and availability through Connection Manager. You can run PGD without Connection Manager if you need a custom deployment better utilizing the multi-master functionality. When running without Connection Manager, writes are distributed among the nodes and replicated to one another, and conflict resolution is relied upon for maintaining consistency. This can be more efficient depending on your architectural needs. However, Connection Manager ensures lower contention and conflict through the use of a write leader. Raft is implemented to help the system make important decisions, like deciding which node is the Raft election leader and which node is the write leader.

High-level architecture

At the highest level, PGD comprises two main components: Bi-Directional Replication (BDR) and Connection Manager. BDR is a Postgres extension that enables a multi-master replication mesh between different BDR-enabled Postgres instances/nodes. Connection Manager sends requests to the write leader—ensuring a lower risk of conflicts (stronger consistency) between nodes.

Diagram showing 3 application nodes and 3 PGD nodes. Traffic is being directed from each of the PGD nodes by the connection manatger to the write leader node.

Changes are replicated directly, row-by-row between all nodes. Logical replication in PGD is asynchronous by default, so only eventual consistency is guaranteed (within seconds usually). However, commit scope options offer immediate consistency and durability guarantees via CAMO, group and synchronous commits.

The Raft algorithm provides a mechanism for electing leaders (both Raft leader and write leader), deciding which nodes to add or subtract from the cluster. It generally ensures that the distributed system remains consistent and fault tolerant, even in the face of node failures.

Architectural elements

PGD comprises several key architectural elements that work together to provide its distributed database solution:

PGD nodes: These are individual Postgres instances that store and manage data. They are the basic building blocks of a PGD cluster.
Groups: By default, all nodes are also members of a top-level group with its own Raft leader but without a write leader. PGD nodes can be further organized into subgroups, which enhance manageability and high availability. Each group can contain multiple nodes, allowing for redundancy and failover within the group. Groups facilitate organized replication and data consistency among nodes within the same group and across different groups. Each group has its own write leader.
Replication mechanisms: PGD's replication mechanisms include BDR for efficient replication across nodes, enabling multi-master replication. BDR supports asynchronous replication by default but can be configured for varying levels of synchronicity, such as Group Commit or Synchronous Commit, to enhance data durability.
Monitoring tools: To monitor performance, health, and usage with PGD, you can use its built-in command-line interface (CLI), which offers several useful commands. For example:
- The pgd nodes list command provides a summary of all nodes in the cluster, including their state and status.
- The pgd cluster show --health command checks the health of the cluster, reporting on node accessibility, replication slot health, and other critical metrics.
- The pgd events show command lists significant events like background worker errors and node membership changes, which helps in tracking the operational status and issues within the cluster.
Furthermore, the BDR extension allows for monitoring your cluster using SQL using the bdr.monitor role.

Node types

All nodes in PGD are effectively data nodes. They vary only in their purpose in the cluster.

Data nodes: Store and manage data, handle read and write operations, and participate in replication.

There are then three types of nodes which, although built like a data node, have a specific purpose. These are:

Subscriber-only nodes: Subscribe to changes from data nodes for read-only purposes. Used in reporting or analytics.
Witness nodes: Participate in the consensus process without storing data, aiding in achieving quorum and maintaining high availability.
Logical standby nodes: Act as standby nodes that can be promoted to data nodes if needed, ensuring high availability and disaster recovery.

Node roles

Data nodes in a group can also take on particular roles to enable particular features. These roles are transient and can be transferred to any other capable node in the group if needed. These roles can include:

Raft leader: Arbitrates and manages consensus between a group's nodes.
Write leader: Receives all write operations when applications connect through the connection manager.

Architectural flexibility

PGD offers flexible options with how its architecture can be deployed, maintained, and scaled to meet various performance, availability, and compliance needs.

PGD supports rolling maintenance, including blue/green deployments for both Postgres upgrades and other system or application-level changes. This approach ensures that the database remains available during routine tasks, such as minor or major version upgrades, schema changes, and vacuuming operations. The system seamlessly switches between active database versions, achieving zero downtime.

PGD provides automatic failover to ensure high availability. If a node in the cluster becomes unavailable, another node takes over its responsibilities, minimizing downtime. Also, PGD includes self-healing capabilities, where nodes that have failed or disconnected reconnect to the cluster and resume normal operations once the issue is resolved.

PGD allows for selective replication, enabling users to replicate only a subset of data to specific nodes. This feature can be used to optimize performance by reducing unnecessary data traffic between nodes or to meet regulatory requirements, such as geographical data restrictions. For instance, a healthcare application might only replicate patient data within a specific region to comply with local data privacy laws.

With commit scopes, PGD also provides configurable durability. Accordingly, durability can be increased from the default asynchronous behavior and tuned using various configurable commit scopes:

Synchronous Commit: Works a lot like PostgreSQL’s synchronous_commit option in its underlying operation. Requires writing to at least one other node at COMMIT time but can be tuned to require all nodes.
CAMO (Commit At Most Once): Works by tracking each transaction with a unique ID and using a pair of nodes to confirm the transaction's outcome, ensuring the application knows whether to retry the transaction or not.
Group Commit: An experimental commit scope, the goal of which is to protect against data loss in case of single-node failures of temporary outages by requiring more than one PGD node to successfully confirm a transaction at COMMIT time.
Lag Control: If replication is running outside of set limits (taking too long for another node to be replicated to), a delay is injected into the node that originally received the transaction, slowing things down until other nodes have caught up.