Bootstrap v1
Note
When referring to "PostgreSQL cluster" in this section, the same concepts apply to both PostgreSQL and EDB Postgres Advanced Server, unless differently stated.
This section describes the options you have to create a new PostgreSQL cluster and the design rationale behind them. There are primarily two ways to bootstrap a new cluster:
- from scratch (
initdb
) - from an existing PostgreSQL cluster, either directly (
pg_basebackup
) or indirectly through a physical base backup (recovery
)
The initdb
bootstrap also offers the possibility to import one or more
databases from an existing Postgres cluster, even outside Kubernetes, and
having a different major version of Postgres.
For more detailed information about this feature, please refer to the
"Importing Postgres databases" section.
Important
Bootstrapping from an existing cluster opens up the possibility to create a replica cluster, that is an independent PostgreSQL cluster which is in continuous recovery, synchronized with the source and that accepts read-only connections.
Warning
EDB Postgres for Kubernetes requires both the postgres
user and database to
always exists. Using the local Unix Domain Socket, it needs to connect
as postgres
user to the postgres
database via peer
authentication in
order to perform administrative tasks on the cluster.
DO NOT DELETE the postgres
user or the postgres
database!!!
Info
EDB Postgres for Kubernetes is gradually introducing support for
Kubernetes' native VolumeSnapshot
API
for both incremental and differential copy in backup and recovery
operations - if supported by the underlying storage classes.
Please see "Recovery from Volume Snapshot objects"
for details.
The bootstrap
section
The bootstrap method can be defined in the bootstrap
section of the cluster
specification. EDB Postgres for Kubernetes currently supports the following bootstrap methods:
initdb
: initialize a new PostgreSQL cluster (default)recovery
: create a PostgreSQL cluster by restoring from a base backup of an existing cluster and, if needed, replaying all the available WAL files or up to a given point in timepg_basebackup
: create a PostgreSQL cluster by cloning an existing one of the same major version usingpg_basebackup
via streaming replication protocol - useful if you want to migrate databases to EDB Postgres for Kubernetes, even from outside Kubernetes.
Differently from the initdb
method, both recovery
and pg_basebackup
create a new cluster based on another one (either offline or online) and can be
used to spin up replica clusters. They both rely on the definition of external
clusters.
Given that there are several possible backup methods and combinations of backup storage that the EDB Postgres for Kubernetes operator provides, please refer to the "Recovery" section for guidance on each method.
API reference
Please refer to the "API reference for the bootstrap
section
for more information.
The externalClusters
section
The externalClusters
section provides a mechanism for specifying one or more
PostgreSQL clusters associated with the current configuration. Its primary use
cases include:
- Importing Databases: Specify an external source to be utilized during
the importation of databases via logical backup and
restore, as part of the
initdb
bootstrap method. - Cross-Region Replication: Define a cross-region PostgreSQL cluster employing physical replication, capable of extending across distinct Kubernetes clusters or traditional VM/bare-metal environments.
- Recovery from Physical Base Backup: Recover, fully or at a given Point-In-Time, a PostgreSQL cluster by referencing a physical base backup.
Info
Ongoing development will extend the functionality of externalClusters
to
accommodate additional use cases, such as logical replication and foreign
servers in future releases.
As far as bootstrapping is concerned, externalClusters
can be used
to define the source PostgreSQL cluster for either the pg_basebackup
method or the recovery
one. An external cluster needs to have:
a name that identifies the origin cluster, to be used as a reference via the
source
optionat least one of the following:
- information about streaming connection
- information about the recovery object store, which is a Barman Cloud
compatible object store that contains:
- the WAL archive (required for Point In Time Recovery)
- the catalog of physical base backups for the Postgres cluster
Note
A recovery object store is normally an AWS S3, or an Azure Blob Storage, or a Google Cloud Storage source that is managed by Barman Cloud.
When only the streaming connection is defined, the source can be used for the
pg_basebackup
method. When only the recovery object store is defined, the
source can be used for the recovery
method. When both are defined, any of the
two bootstrap methods can be chosen.
Furthermore, in case of pg_basebackup
or full recovery
point in time, the
cluster is eligible for replica cluster mode. This means that the cluster is
continuously fed from the source, either via streaming, via WAL shipping
through the PostgreSQL's restore_command
, or any of the two.
API reference
Please refer to the "API reference for the externalClusters
section
for more information.
Password files
Whenever a password is supplied within an externalClusters
entry,
EDB Postgres for Kubernetes autonomously manages a PostgreSQL password file
for it, residing at /controller/external/NAME/pgpass
in each instance.
This approach empowers EDB Postgres for Kubernetes to securely establish connections with an
external server without exposing any passwords in the connection string.
Instead, the connection safely references the aforementioned file through the
passfile
connection parameter.
Bootstrap an empty cluster (initdb
)
The initdb
bootstrap method is used to create a new PostgreSQL cluster from
scratch. It is the default one unless specified differently.
The following example contains the full structure of the initdb
configuration:
The above example of bootstrap will:
- create a new
PGDATA
folder using PostgreSQL's nativeinitdb
command - create an unprivileged user named
app
- set the password of the latter (
app
) using the one in theapp-secret
secret (make sure thatusername
matches the same name of theowner
) - create a database called
app
owned by theapp
user.
Thanks to the convention over configuration paradigm, you can let the
operator choose a default database name (app
) and a default application
user name (same as the database name), as well as randomly generate a
secure password for both the superuser and the application user in
PostgreSQL.
Alternatively, you can generate your password, store it as a secret, and use it in the PostgreSQL cluster - as described in the above example.
The supplied secret must comply with the specifications of the
kubernetes.io/basic-auth
type.
As a result, the username
in the secret must match the one of the owner
(for the application secret) and postgres
for the superuser one.
The following is an example of a basic-auth
secret:
The application database is the one that should be used to store application data. Applications should connect to the cluster with the user that owns the application database.
Important
If you need to create additional users, please refer to "Declarative database role management".
In case you don't supply any database name, the operator will proceed
by convention and create the app
database, and adds it to the cluster
definition using a defaulting webhook.
The user that owns the database defaults to the database name instead.
The application user is not used internally by the operator, which instead relies on the superuser to reconcile the cluster with the desired status.
Passing Options to initdb
The PostgreSQL data directory is initialized using the
initdb
PostgreSQL command.
EDB Postgres for Kubernetes enables you to customize the behavior of initdb
to modify
settings such as default locale configurations and data checksums.
Warning
EDB Postgres for Kubernetes acts only as a direct proxy to initdb
for locale-related
options, due to the ongoing and significant enhancements in PostgreSQL's locale
support. It is your responsibility to ensure that the correct options are
provided, following the PostgreSQL documentation, and to verify that the
bootstrap process completes successfully.
To include custom options in the initdb
command, you can use the following
parameters:
builtinLocale
: When builtinLocale
is set to a value, EDB Postgres for Kubernetes passes it to the
--builtin-locale
option in initdb
. This option controls the builtin locale, as
defined in "Locale Support"
from the PostgreSQL documentation (default: empty). Note that this option requires
localeProvider
to be set to builtin
. Available from PostgreSQL 17.
dataChecksums
: When dataChecksums
is set to true
, EDB Postgres for Kubernetes invokes the -k
option in
initdb
to enable checksums on data pages and help detect corruption by the
I/O system - that would otherwise be silent (default: false
).
encoding
: When encoding
set to a value, EDB Postgres for Kubernetes passes it to the --encoding
option in initdb
, which selects the encoding of the template database
(default: UTF8
).
icuLocale
: When icuLocale
is set to a value, EDB Postgres for Kubernetes passes it to the
--icu-locale
option in initdb
. This option controls the ICU locale, as
defined in "Locale Support"
from the PostgreSQL documentation (default: empty).
Note that this option requires localeProvider
to be set to icu
.
Available from PostgreSQL 15.
icuRules
: When icuRules
is set to a value, EDB Postgres for Kubernetes passes it to the
--icu-rules
option in initdb
. This option controls the ICU locale, as
defined in "Locale
Support" from the
PostgreSQL documentation (default: empty). Note that this option requires
localeProvider
to be set to icu
. Available from PostgreSQL 16.
locale
: When locale
is set to a value, EDB Postgres for Kubernetes passes it to the --locale
option in initdb
. This option controls the locale, as defined in
"Locale Support" from
the PostgreSQL documentation. By default, the locale parameter is empty. In
this case, environment variables such as LANG
are used to determine the
locale. Be aware that these variables can vary between container images,
potentially leading to inconsistent behavior.
localeCollate
: When localeCollate
is set to a value, EDB Postgres for Kubernetes passes it to the --lc-collate
option in initdb
. This option controls the collation order (LC_COLLATE
subcategory), as defined in "Locale Support"
from the PostgreSQL documentation (default: C
).
localeCType
: When localeCType
is set to a value, EDB Postgres for Kubernetes passes it to the --lc-ctype
option in
initdb
. This option controls the collation order (LC_CTYPE
subcategory), as
defined in "Locale Support"
from the PostgreSQL documentation (default: C
).
localeProvider
: When localeProvider
is set to a value, EDB Postgres for Kubernetes passes it to the --locale-provider
option in initdb
. This option controls the locale provider, as defined in
"Locale Support" from the
PostgreSQL documentation (default: empty, which means libc
for PostgreSQL).
Available from PostgreSQL 15.
walSegmentSize
: When walSegmentSize
is set to a value, EDB Postgres for Kubernetes passes it to the --wal-segsize
option in initdb
(default: not set - defined by PostgreSQL as 16 megabytes).
Note
The only two locale options that EDB Postgres for Kubernetes implements during
the initdb
bootstrap refer to the LC_COLLATE
and LC_TYPE
subcategories.
The remaining locale subcategories can be configured directly in the PostgreSQL
configuration, using the lc_messages
, lc_monetary
, lc_numeric
, and
lc_time
parameters.
The following example enables data checksums and sets the default encoding to
LATIN1
:
Warning
EDB Postgres for Kubernetes supports another way to customize the behavior of the
initdb
invocation, using the options
subsection. However, given that there
are options that can break the behavior of the operator (such as --auth
or
-d
), this technique is deprecated and will be removed from future versions of
the API.
Executing Queries After Initialization
You can specify a custom list of queries that will be executed once,
immediately after the cluster is created and configured. These queries will be
executed as the superuser (postgres
) against three different databases, in
this specific order:
- The
postgres
database (postInit
section) - The
template1
database (postInitTemplate
section) - The application database (
postInitApplication
section)
For each of these sections, EDB Postgres for Kubernetes provides two ways to specify custom queries, executed in the following order:
- As a list of SQL queries in the cluster's definition (
postInitSQL
,postInitTemplateSQL
, andpostInitApplicationSQL
stanzas) - As a list of Secrets and/or ConfigMaps, each containing a SQL script to be
executed (
postInitSQLRefs
,postInitTemplateSQLRefs
, andpostInitApplicationSQLRefs
stanzas). Secrets are processed before ConfigMaps.
Objects in each list will be processed sequentially.
Warning
Use the postInit
, postInitTemplate
, and postInitApplication
options
with extreme care, as queries are run as a superuser and can disrupt the entire
cluster. An error in any of those queries will interrupt the bootstrap phase,
leaving the cluster incomplete and requiring manual intervention.
Important
Ensure the existence of entries inside the ConfigMaps or Secrets specified
in postInitSQLRefs
, postInitTemplateSQLRefs
, and
postInitApplicationSQLRefs
, otherwise the bootstrap will fail. Errors in any
of those SQL files will prevent the bootstrap phase from completing
successfully.
The following example runs a single SQL query as part of the postInitSQL
stanza:
The example below relies on postInitApplicationSQLRefs
to specify a secret
and a ConfigMap containing the queries to run after the initialization on the
application database:
Note
Within SQL scripts, each SQL statement is executed in a single exec on the
server according to the PostgreSQL semantics.
Comments can be included, but internal commands like psql
cannot.
Compatibility Features
EDB Postgres Advanced Server adds many compatibility features to the plain community PostgreSQL. You can find more information about that in the EDB Postgres Advanced Server.
Those features are already enabled during cluster creation on EPAS and
are not supported on the community PostgreSQL image. To disable them
you can use the redwood
flag in the initdb
section
like in the following example:
Bootstrap from another cluster
EDB Postgres for Kubernetes enables the bootstrap of a cluster starting from
another one of the same major version.
This operation can happen by connecting directly to the source cluster via
streaming replication (pg_basebackup
), or indirectly via an existing
physical base backup (recovery
).
The source cluster must be defined in the externalClusters
section, identified
by name
(our recommendation is to use the same name
of the origin cluster).
Important
By default the recovery
method strictly uses the name
of the
cluster in the externalClusters
section to locate the main folder
of the backup data within the object store, which is normally reserved
for the name of the server. You can specify a different one with the
barmanObjectStore.serverName
property (by default assigned to the
value of name
in the external cluster definition).
Bootstrap from a backup (recovery
)
Given the several possibilities, methods, and combinations that the EDB Postgres for Kubernetes operator provides in terms of backup and recovery, please refer to the "Recovery" section.
Bootstrap from a live cluster (pg_basebackup
)
The pg_basebackup
bootstrap mode allows you to create a new cluster
(target) as an exact physical copy of an existing and binary-compatible
PostgreSQL instance (source) managed by EDB Postgres for Kubernetes, using a valid
streaming replication connection. The source instance can either be a primary
or a standby PostgreSQL server. It’s crucial to thoroughly review the
requirements section below, as the pros and cons of PostgreSQL physical
replication fully apply.
The primary use cases for this method include:
- Reporting and business intelligence clusters that need to be regenerated periodically (daily, weekly)
- Test databases containing live data that require periodic regeneration (daily, weekly, monthly) and anonymization
- Rapid spin-up of a standalone replica cluster
- Physical migrations of EDB Postgres for Kubernetes clusters to different namespaces or Kubernetes clusters
Important
Avoid using this method, based on physical replication, to migrate an existing PostgreSQL cluster outside of Kubernetes into EDB Postgres for Kubernetes unless you are completely certain that all requirements are met and the operation has been thoroughly tested. The EDB Postgres for Kubernetes community does not endorse this approach for such use cases and recommends using logical import instead. It is exceedingly rare that all requirements for physical replication are met in a way that seamlessly works with EDB Postgres for Kubernetes.
Warning
In its current implementation, this method clones the source PostgreSQL instance, thereby creating a snapshot. Once the cloning process has finished, the new cluster is immediately started. Refer to "Current limitations" for more details.
Similar to the recovery
bootstrap method, once the cloning operation is
complete, the operator takes full ownership of the target cluster, starting
from the first instance. This includes overriding certain configuration
parameters as required by EDB Postgres for Kubernetes, resetting the superuser password,
creating the streaming_replica
user, managing replicas, and more. The
resulting cluster operates independently from the source instance.
Important
Configuring the network connection between the target and source instances lies outside the scope of EDB Postgres for Kubernetes documentation, as it depends heavily on the specific context and environment.
The streaming replication client on the target instance, managed transparently
by pg_basebackup
, can authenticate on the source instance using one of the
following methods:
Both authentication methods are detailed below.
Requirements
The following requirements apply to the pg_basebackup
bootstrap method:
- target and source must have the same hardware architecture
- target and source must have the same major PostgreSQL version
- target and source must have the same tablespaces
- source must be configured with enough
max_wal_senders
to grant access from the target for this one-off operation by providing at least one walsender for the backup plus one for WAL streaming - the network between source and target must be configured to enable the target instance to connect to the PostgreSQL port on the source instance
- source must have a role with
REPLICATION LOGIN
privileges and must accept connections from the target instance for this role inpg_hba.conf
, preferably via TLS (see "About the replication user" below) - target must be able to successfully connect to the source PostgreSQL instance
using a role with
REPLICATION LOGIN
privileges
Seealso
For further information, please refer to the
"Planning" section for Warm Standby,
the
pg_basebackup
page
and the
"High Availability, Load Balancing, and Replication" chapter
in the PostgreSQL documentation.
About the replication user
As explained in the requirements section, you need to have a user
with either the SUPERUSER
or, preferably, just the REPLICATION
privilege in the source instance.
If the source database is created with EDB Postgres for Kubernetes, you
can reuse the streaming_replica
user and take advantage of client
TLS certificates authentication (which, by default, is the only allowed
connection method for streaming_replica
).
For all other cases, including outside Kubernetes, please verify that
you already have a user with the REPLICATION
privilege, or create
a new one by following the instructions below.
As postgres
user on the source system, please run:
Enter the password at the prompt and save it for later, as you will need to add it to a secret in the target instance.
Note
Although the name is not important, we will use streaming_replica
for the sake of simplicity. Feel free to change it as you like,
provided you adapt the instructions in the following sections.
Username/Password authentication
The first authentication method supported by EDB Postgres for Kubernetes
with the pg_basebackup
bootstrap is based on username and password matching.
Make sure you have the following information before you start the procedure:
- location of the source instance, identified by a hostname or an IP address and a TCP port
- replication username (
streaming_replica
for simplicity) - password
You might need to add a line similar to the following to the pg_hba.conf
file on the source PostgreSQL instance:
The following manifest creates a new PostgreSQL 17.2 cluster,
called target-db
, using the pg_basebackup
bootstrap method
to clone an external PostgreSQL cluster defined as source-db
(in the externalClusters
array). As you can see, the source-db
definition points to the source-db.foo.com
host and connects as
the streaming_replica
user, whose password is stored in the
password
key of the source-db-replica-user
secret.
All the requirements must be met for the clone operation to work, including the same PostgreSQL version (in our case 17.2).
TLS certificate authentication
The second authentication method supported by EDB Postgres for Kubernetes
with the pg_basebackup
bootstrap is based on TLS client certificates.
This is the recommended approach from a security standpoint.
The following example clones an existing PostgreSQL cluster (cluster-example
)
in the same Kubernetes cluster.
Note
This example can be easily adapted to cover an instance that resides outside the Kubernetes cluster.
The manifest defines a new PostgreSQL 17.2 cluster called cluster-clone-tls
,
which is bootstrapped using the pg_basebackup
method from the cluster-example
external cluster. The host is identified by the read/write service
in the same cluster, while the streaming_replica
user is authenticated
thanks to the provided keys, certificate, and certification authority
information (respectively in the cluster-example-replication
and
cluster-example-ca
secrets).
Configure the application database
We also support to configure the application database for cluster which bootstrap
from a live cluster, just like the case of initdb
and recovery
bootstrap method.
If the new cluster is created as a replica cluster (with replica mode enabled), application
database configuration will be skipped.
Important
While the Cluster
is in recovery mode, no changes to the database,
including the catalog, are permitted. This restriction includes any role
overrides, which are deferred until the Cluster
transitions to primary.
During the recovery phase, roles remain as defined in the source cluster.
The example below configures the app
database with the owner app
and
the password stored in the provided secret app-secret
, following the
bootstrap from a live cluster.
With the above configuration, the following will happen only after recovery is completed:
- If the
app
database does not exist, it will be created. - If the
app
user does not exist, it will be created. - If the
app
user is not the owner of theapp
database, ownership will be granted to theapp
user. - If the
username
value matches theowner
value in the secret, the password for the application user (theapp
user in this case) will be updated to thepassword
value in the secret.
Current limitations
Snapshot copy
The pg_basebackup
method takes a snapshot of the source instance in the form of
a PostgreSQL base backup. All transactions written from the start of
the backup to the correct termination of the backup will be streamed to the target
instance using a second connection (see the --wal-method=stream
option for
pg_basebackup
).
Once the backup is completed, the new instance will be started on a new timeline and diverge from the source. For this reason, it is advised to stop all write operations to the source database before migrating to the target database.
Important
Before you attempt a migration, you must test both the procedure and the applications. In particular, it is fundamental that you run the migration procedure as many times as needed to systematically measure the downtime of your applications in production.