PGD settings v5
You can set PGD-specific configuration settings. Unless noted otherwise, you can set the values at any time.
Conflict handling
bdr.default_conflict_detection
Sets the default conflict detection method for newly created tables. Accepts same values as bdr.alter_table_conflict_detection().
Global sequence parameters
bdr.default_sequence_kind
Sets the default sequence kind.
The default is distributed
, which means snowflakeid
is used for int8
sequences (that is, bigserial
) and galloc
sequence for int4
(that is, serial
)
and int2
sequences.
DDL handling
bdr.default_replica_identity
Sets the default value for REPLICA IDENTITY
on newly created tables. The
REPLICA IDENTITY
defines the information written to the write-ahead log to
identify rows that are updated or deleted.
The accepted values are:
Value | Description |
---|---|
default | Records the old values of the columns of the primary key, if any (this is the default PostgreSQL behavior). |
full | Records the old values of all columns in the row. |
nothing | Records no information about the old row. |
auto | Tables with PK are created with REPLICA IDENTITY DEFAULT, and tables without PK are created with REPLICA IDENTITY FULL. This is the default PGD behavior. |
See the PostgreSQL documentation for more details.
PGD can't replicate UPDATE
and DELETE
operations on tables without a
PRIMARY KEY
or UNIQUE
constraint. The exception is when the replica identity
for the table is FULL
, either by table-specific configuration or by
bdr.default_replica_identity
.
If bdr.default_replica_identity
is default
and there is a UNIQUE
constraint included in the table definition, it won't be automatically picked up as REPLICA IDENTITY
.
You need to set the REPLICA IDENTITY
explicitly using ALTER TABLE ... REPLICA IDENTITY ...
.
Setting the replica identity of tables to full
increases the volume of
WAL written and the amount of data replicated on the wire for the table.
On setting bdr.default_replica_identity to default
When setting bdr.default_replica_identity
to default
using ALTER SYSTEM
,
always quote the value, like this:
You need to include the quotes because default, unquoted, is a special value to the ALTER SYSTEM command that triggers the removal of the setting from the configuration file. When the setting is removed, the system uses the PGD default setting, which is auto
.
bdr.ddl_replication
Automatically replicates DDL across nodes (default is on
).
This parameter can be set only by bdr_superuser or superuser roles.
Running DDL or calling PGD administration functions with bdr.ddl_replication =
off
can create situations where replication stops until an administrator can
intervene. See DDL replication for details.
A LOG
-level log message is emitted to the PostgreSQL server logs whenever
bdr.ddl_replication
is set to off
. Additionally, a WARNING-level
message
is written whenever replication of captured DDL commands or PGD replication
functions is skipped due to this setting.
bdr.role_replication
Automatically replicates ROLE commands across nodes (default is on
). Only a
superuser can set this parameter. This setting works only if
bdr.ddl_replication
is turned on as well.
Turning this parameter off without using external methods to ensure roles are in sync across all nodes might cause replicated DDL to interrupt replication until the administrator intervenes.
See Role manipulation statements for details.
bdr.ddl_locking
Configures the operation mode of global locking for DDL.
This parameter can be set only by bdr_superuser or superuser roles.
Possible options are:
Value | Description |
---|---|
all | Use global locking for all DDL operations. (Default) |
dml | Use global locking only for DDL operations that need to prevent writes by taking the global DML lock for a relation. |
off | Don't use global locking for DDL operations. |
Default is all
.
A LOG
-level log message is emitted to the PostgreSQL server logs whenever
bdr.ddl_replication
is set to off
. Additionally, a WARNING
message is
written whenever any global locking steps are skipped due to this setting. It's
normal for some statements to result in two WARNING
messages: one for skipping
the DML lock and one for skipping the DDL lock.
For backward compatibility, bdr.ddl_locking
supports aliases.
on
and true
are an alias for all
.
false
is an alias for off
.
See also Global locking.
bdr.truncate_locking
Sets the TRUNCATE command's locking behavior (default is off
).
When true
, TRUNCATE obeys the bdr.ddl_locking
setting.
Global locking
DDL locking is controlled by bdr.ddl_locking
. Other global
locking settings include the following.
bdr.global_lock_max_locks
Sets the maximum number of global locks that can be held on a node (default is 1000). Can be set only at Postgres server start.
bdr.global_lock_timeout
Sets the maximum allowed duration of any wait for a global lock (default is 10 minutes). A value of zero disables this timeout.
bdr.global_lock_statement_timeout
Sets the maximum allowed duration of any statement holding a global lock (default is 60 minutes). A value of zero disables this timeout.
bdr.global_lock_idle_timeout
Sets the maximum allowed duration of idle time in a transaction holding a global lock (default is 10 minutes). A value of zero disables this timeout.
bdr.lock_table_locking
Sets locking behavior for LOCK TABLE statement (default is off). When enabled, LOCK TABLE statement also takes a global DML lock on the cluster, blocking other locking statements.
Value | Description |
---|---|
on | Use global locking for all table locks. |
off | Don't use global locking for table locks. (Default) |
bdr.predictive_checks
Sets the log level for predictive checks (currently used only by global locks).
Can be DEBUG
, LOG
, WARNING
(default), or ERROR
. Predictive checks are
early validations for expected cluster state when doing certain operations. You
can use them for those operations for fail early rather than wait for timeouts.
In global lock terms, PGD checks that there are enough nodes connected and
withing reasonable lag limit for getting the quorum needed by the global lock.
Node management
bdr.replay_progress_frequency
Sets the interval for sending replication position info to the rest of the cluster (default is 1 minute).
bdr.standby_slot_names
Sets the slots required to receive and confirm replication changes before any other ones. This setting is useful primarily when using physical standbys for failover or when using subscribe-only nodes.
Generic replication
bdr.writers_per_subscription
Sets the default number of writers per subscription. (In PGD, you can also change
this with bdr.alter_node_group_option
for a group.)
bdr.max_writers_per_subscription
Maximum number of writers per subscription (sets upper limit for the bdr.writers_per_subscription
setting).
bdr.xact_replication
Replicates current transaction (default is on
).
Turning this off makes the whole transaction local only, which means the transaction isn't visible to logical decoding by PGD and all other downstream targets of logical decoding. Data isn't transferred to any other node, including logical standby nodes.
This parameter can be set only by the bdr_superuser or superuser roles.
This parameter can be set only inside the current transaction using the
SET LOCAL
command unless bdr.permit_unsafe_commands = on
.
Note
Even with transaction replication disabled, WAL is generated, but those changes are filtered away on the origin.
Warning
Turning off bdr.xact_replication
leads to data
inconsistency between nodes. Use it only to recover from
data divergence between nodes or in
replication situations where changes on single nodes are required for
replication to continue. Use at your own risk.
bdr.permit_unsafe_commands
Overrides safety check on commands that are deemed unsafe for general use.
Requires bdr_superuser or PostgreSQL superuser.
Warning
The commands that are normally not considered safe can either produce inconsistent results or break replication altogether. Use at your own risk.
bdr.batch_inserts
Number of consecutive inserts to one table in a single transaction that turns on batch processing of inserts for that table.
This setting allows replication of large data loads as COPY internally, rather than as a set of inserts. It's also how the initial data during node join is copied.
bdr.maximum_clock_skew
Specifies the maximum difference between
the incoming transaction commit timestamp and the current time on the
subscriber before triggering bdr.maximum_clock_skew_action
.
It checks if the timestamp of the currently replayed transaction is in the
future compared to the current time on the subscriber. If it is, and the
difference is larger than bdr.maximum_clock_skew
, it performs the action
specified by the bdr.maximum_clock_skew_action
setting.
The default is -1
, which means ignore clock skew (the check is turned
off). It's valid to set 0 as when the clocks on all servers are synchronized.
The fact that the transaction is being replayed means it was committed in
the past.
bdr.maximum_clock_skew_action
Specifies the action to take if a clock skew higher than
bdr.maximum_clock_skew
is detected.
There are two possible values for this setting:
Value | Description |
---|---|
WARN | Log a warning about this fact. The warnings are logged once per minute at the maximum to prevent flooding the server log. |
WAIT | Wait until the current local timestamp is no longer older than remote commit timestamp minus the bdr.maximum_clock_skew . |
bdr.accept_connections
Enables or disables connections to PGD (default is on
).
Requires bdr_superuser or PostgreSQL superuser.
bdr.standby_slot_names
This setting is typically used in failover configurations to ensure that the failover-candidates streaming physical replicas for this PGD node have received and flushed all changes before they ever become visible to subscribers. That guarantees that a commit can't vanish on failover to a standby for the provider.
Replication slots whose names are listed in the comma-separated
bdr.standby_slot_names
list are treated specially by the WAL sender on a PGD
node.
PGD's logical replication WAL senders ensure that all local changes are sent and
flushed to the replication slots in bdr.standby_slot_names
before the node
sends those changes to any other PGD replication clients. Effectively, it
provides a synchronous replication barrier between the named list of slots and
all other replication clients.
Any replication slot can be listed in bdr.standby_slot_names
. Both logical and
physical slots work, but it's generally used for physical slots.
Without this safeguard, two anomalies are possible where a commit can be received by a subscriber and then vanish from the provider on failover because the failover candidate hadn't received it yet:
For 1+ subscribers, the subscriber might have applied the change but the new provider might execute new transactions that conflict with the received change, as it never happened as far as the provider is concerned.
For 2+ subscribers, at the time of failover, not all subscribers have applied the change. The subscribers now have inconsistent and irreconcilable states because the subscribers that didn't receive the commit have no way to get it.
Setting bdr.standby_slot_names
by design causes other subscribers not listed
in there to lag behind the provider if the required number of listed nodes are
not keeping up. Monitoring is thus essential.
Another use case where bdr.standby_slot_names
is useful is when using a
subscriber-only node, to ensure that it doesn't move ahead of any of the
regular PGD nodes. This can best be achieved by listing the logical slots of all
regular PGD peer nodes in combination with setting
bdr.standby_slots_min_confirmed
to at least 1.
bdr.standby_slots_min_confirmed
Controls how many of the bdr.standby_slot_names
have to confirm before
sending data to PGD subscribers.
bdr.writer_input_queue_size
Specifies the size of the shared memory queue used by the receiver to send data to the writer process. If the writer process is stalled or making slow progress, then the queue might get filled up, stalling the receiver process too. So it's important to provide enough shared memory for this queue. The default is 1 MB, and the maximum allowed size is 1 GB. While any storage size specifier can be used to set the GUC, the default is KB.
bdr.writer_output_queue_size
Specifies the size of the shared memory queue used by the receiver to receive data from the writer process. Since the writer isn't expected to send a large amount of data, a relatively smaller sized queue is enough. The default is 32 KB, and the maximum allowed size is 1 MB. While any storage size specifier can be used to set the GUC, the default is KB.
bdr.min_worker_backoff_delay
Allows for rate limiting of PGD background worker launches by
preventing a given worker from being relaunched more often than every
bdr.min_worker_backoff_delay
milliseconds. On repeated errors, the backoff
increases exponentially with added jitter up to a maximum of
bdr.max_worker_backoff_delay
.
Time-unit suffixes are supported.
Note
This setting currently affects only receiver worker, which means it primarily affects how fast a subscription tries to reconnect on error or connection failure.
The default for bdr.min_worker_backoff_delay
is 1 second. For
bdr.max_worker_backoff_delay
, it's 1 minute.
If the backoff delay setting is changed and the PostgreSQL configuration is
reloaded, then all current backoffs wait for reset. Additionally, the
bdr.worker_task_reset_backoff_all()
function is provided to allow the
administrator to force all backoff intervals to immediately expire.
A tracking table in shared memory is maintained to remember the last launch time of each type of worker. This tracking table isn't persistent. It's cleared by PostgreSQL restarts, including soft restarts during crash recovery after an unclean backend exit.
You can use the view bdr.worker_tasks
to
inspect this state so the administrator can see any backoff rate limiting
currently in effect.
For rate-limiting purposes, workers are classified by task. This key consists of
the worker role, database OID, subscription ID, subscription writer ID,
extension library name and function name, extension-supplied worker name, and
the remote relation ID for sync writers. NULL
is used where a given classifier
doesn't apply, for example, when manager workers don't have a subscription ID and
receivers don't have a writer ID.
CRDTs
bdr.crdt_raw_value
Sets the output format of CRDT data types.
The default output (when this setting is off
) is to return only the current
value of the base CRDT type, for example, a bigint for crdt_pncounter
.
When set to on
, the returned value represents the full representation of
the CRDT value, which can, for example, include the state from multiple nodes.
Commit scope
bdr.commit_scope
Sets the current (or default) commit scope (default is an empty string).
Commit At Most Once
bdr.camo_local_mode_delay
The commit delay that applies in CAMO's asynchronous mode to emulate the
overhead that normally occurs with the CAMO partner having to confirm
transactions (default is 5 ms). Set to 0
to disable this feature.
bdr.camo_enable_client_warnings
Emits warnings if an activity is carried out in the database for which CAMO properties can't be guaranteed (default is enabled). Well-informed users can choose to disable this setting to reduce the amount of warnings going into their logs.
Transaction streaming
bdr.default_streaming_mode
Controls transaction streaming by the subscriber node. Possible values
are: off
, writer
, file
, and auto
. Defaults to auto
. If set to off
,
the subscriber doesn't request transaction streaming. If set to one of the other
values, the subscriber requests transaction streaming and the publisher provides
it if it supports them and if configured at group level. For more details, see
Transaction streaming.
Lag Control
bdr.lag_control_max_commit_delay
Maximum acceptable post-commit delay that can be tolerated, in fractional milliseconds.
bdr.lag_control_max_lag_size
Maximum acceptable lag size that can be tolerated, in kilobytes.
bdr.lag_control_max_lag_time
Maximum acceptable lag time that can be tolerated, in milliseconds.
bdr.lag_control_min_conforming_nodes
Minimum number of nodes required to stay below acceptable lag measures.
bdr.lag_control_commit_delay_adjust
Commit delay micro adjustment measured as a fraction of the maximum commit delay time. At a default value of 0.01%, it takes 100 net increments to reach the maximum commit delay.
bdr.lag_control_sample_interval
Minimum time between lag samples and commit delay micro adjustments, in milliseconds.
bdr.lag_control_commit_delay_start
The lag threshold at which commit delay increments start to be applied, expressed as a fraction of acceptable lag measures. At a default value of 1.0%, commit delay increments don't begin until acceptable lag measures are breached.
By setting a smaller fraction, it might be possible to prevent a breach by "bending the lag curve" earlier so that it's asymptotic with the acceptable lag measure.
Timestamp-based snapshots
bdr.timestamp_snapshot_keep
Time to keep valid snapshots for the timestamp-based snapshot use (default is
0
, meaning don't keep past snapshots).
Monitoring and logging
bdr.debug_level
Defines the log level that PGD uses to write its debug messages. The default
value is debug2
. If you want to see detailed PGD debug output, set
bdr.debug_level = 'log'
.
bdr.trace_level
Similar to bdr.debug_level
, defines the log level to use for PGD trace messages.
Enabling tracing on all nodes of an EDB Postgres Distributed cluster might help
EDB Support to diagnose issues. You can set this parameter only at Postgres server start.
Warning
Setting bdr.debug_level
or bdr.trace_level
to a value >=
log_min_messages
can produce a very large volume of log output. Don't
enabled it long term in production unless plans are in place for log filtering,
archival, and rotation to prevent disk space exhaustion.
bdr.track_subscription_apply
Tracks apply statistics for each subscription with bdr.stat_subscription
(default is on
).
bdr.track_relation_apply
Tracks apply statistics for each relation with bdr.stat_relation
(default is off
).
bdr.track_apply_lock_timing
Tracks lock timing when tracking statistics for relations with bdr.stat_relation
(default is off
).
Decoding worker
bdr.enable_wal_decoder
Enables logical change record (LCR) sending on a single node with a decoding worker (default is false). When set to true, a decoding worker process starts, and WAL senders send the LCRs it produces. If set back to false, any WAL senders using LCR are restarted and use the WAL directly.
Note
You also need to enable this setting on all nodes in the PGD group and
set the enable_wal_decoder
option to true on the group.
bdr.receive_lcr
When subscribing to another node, this setting enables the node to request the
use of logical change records (LCRs) for the subscription (default is false).
When this setting is true on a downstream node, the node
requests that upstream nodes use LCRs when sending to it. If you set
bdr.enable_wal_decoder
to true on a node, also set this setting to true.
Note
You also need to enable this setting on all nodes in the PGD group and
set the enable_wal_decoder
option to true on the group.
bdr.lcr_cleanup_interval
Logical change record (LCR) file cleanup interval (default is 3 minutes). When the decoding worker is enabled, the decoding worker stores LCR files as a buffer. These files are periodically cleaned, and this setting controls the interval between any two consecutive cleanups. Setting it to zero disables cleanup.
Connectivity settings
The following are a set of connectivity settings affecting all cross-node
libpq
connections. The defaults are set to fairly conservative values and
cover most production needs. All variables have SIGHUP
context, meaning
changes are applied upon reload.
bdr.global_connection_timeout
Maximum time to wait while connecting, in seconds (default is 15 seconds). Write as a decimal integer, for example, 10. Zero, negative, or not specified means wait indefinitely. The minimum allowed timeout is 2 seconds, therefore a value of 1 is interpreted as 2.
bdr.global_keepalives
Controls whether TCP keepalives are used (default is 1, meaning on). If you don't want keepalives, you can change this to 0, meaning off. This parameter is ignored for connections made by a Unix-domain socket.
bdr.global_keepalives_idle
Controls the number of seconds of inactivity after which TCP sends a keepalive
message to the server (default is 1 second). A value of zero uses the system default. This parameter
is ignored for connections made by a Unix-domain socket or if keepalives are
disabled. It's supported only on systems where TCP_KEEPIDLE
or an equivalent
socket option is available. On other systems, it has no effect.
bdr.global_keepalives_interval
Controls the number of seconds after which to retransmit a TCP keepalive message
that isn't acknowledged by the server (default is 2 seconds). A value of zero uses the system default.
This parameter is ignored for connections made by a Unix-domain socket or if
keepalives are disabled. It's supported only on systems where TCP_KEEPINTVL
or
an equivalent socket option is available. On other systems, it has no effect.
bdr.global_keepalives_count
Controls the number of TCP keepalives that can be lost before the client's
connection to the server is considered dead (default is 3). A value of zero uses the system
default. This parameter is ignored for connections made by a Unix-domain socket
or if keepalives are disabled. It's supported only on systems where
TCP_KEEPCNT
or an equivalent socket option is available. On other systems, it
has no effect.
bdr.global_tcp_user_timeout
Controls the number of milliseconds that transmitted data can remain
unacknowledged before a connection is forcibly closed (default is 5000, that is, 5 seconds).
A value of zero uses the
system default. This parameter is ignored for connections made by a Unix-domain
socket. It's supported only on systems where TCP_USER_TIMEOUT
is available. On
other systems, it has no effect.
Internal settings - Raft timeouts
bdr.raft_global_election_timeout
To account for network failures, the Raft consensus protocol implements timeouts for elections and requests. This value is used when a request is being sent to the global (top-level) group. The default is 6 seconds (6s).
bdr.raft_group_election_timeout
To account for network failures, the Raft consensus protocol implements timeouts for elections and requests. This value is used when a request is being sent to the sub-group. The default is 3 seconds (3s).
bdr.raft_response_timeout
For responses, the settings of
bdr.raft_global_election_timeout
and
bdr.raft_group_election_timeout
are used
as appropriate. You can override this behavior by setting
this variable. The setting of bdr.raft_response_timeout
must be less than
either of the election timeout values. Set this variable to -1 to disable the override.
The default is -1.
Internal settings - Other Raft values
bdr.raft_keep_min_entries
The minimum number of entries to keep in the Raft log when doing log compaction
(default is 1000
; PGD 5.3 and earlier: 100
). The value of 0
disables log
compaction. You can set this parameter only at Postgres server start.
Warning
If log compaction is disabled, the log grows in size forever.
bdr.raft_log_min_apply_duration
To move the state machine forward, Raft appends entries to its internal log.
During normal operation, appending takes only a few milliseconds. This poses an
upper threshold on the duration of that append action, above which an INFO
message is logged. This can indicate a problem. Default
is 3000 ms.
bdr.raft_log_min_message_duration
When to log a consensus request. Measures roundtrip time of a PGD consensus
request and logs an INFO
message if the time exceeds this parameter (default is 5000 ms).
bdr.raft_group_max_connections
The maximum number of connections across all PGD groups for a Postgres server (default is 100 connections). These connections carry PGD consensus requests between the groups' nodes. You can set this parameter only at Postgres server start.
Internal settings - Other values
bdr.backwards_compatibility
Specifies the version to be backward compatible to, in the same numerical format
as used by bdr.bdr_version_num
, for example, 30618
. (Default is the
current PGD version.) Enables exact behavior of a
former PGD version, even if this has generally unwanted effects.
Since this changes from release to release, we advise
against explicit use in the configuration file unless the value is different
from the current version.
bdr.track_replication_estimates
Tracks replication estimates in terms of apply rates and catchup intervals for peer nodes. Protocols like CAMO can use this information to estimate the readiness of a peer node. This parameter is enabled by default.
bdr.lag_tracker_apply_rate_weight
PGD monitors how far behind peer nodes are in terms of applying WAL from the local node and calculate a moving average of the apply rates for the lag tracking. This parameter specifies how much contribution newer calculated values have in this moving average calculation. Default is 0.1.
bdr.enable_auto_sync_reconcile
When enabled, nodes perform automatic synchronization of data from a node that is furthest ahead with respect to the down node. Default (from 5.5.1) is off.
- On this page
- Conflict handling
- Global sequence parameters
- DDL handling
- Global locking
- Node management
- Generic replication
- bdr.writers_per_subscription
- bdr.max_writers_per_subscription
- bdr.xact_replication
- bdr.permit_unsafe_commands
- bdr.batch_inserts
- bdr.maximum_clock_skew
- bdr.maximum_clock_skew_action
- bdr.accept_connections
- bdr.standby_slot_names
- bdr.standby_slots_min_confirmed
- bdr.writer_input_queue_size
- bdr.writer_output_queue_size
- bdr.min_worker_backoff_delay
- CRDTs
- Commit scope
- Commit At Most Once
- Transaction streaming
- Lag Control
- Timestamp-based snapshots
- Monitoring and logging
- Decoding worker
- Connectivity settings
- Internal settings - Raft timeouts
- Internal settings - Other Raft values
- Internal settings - Other values