Troubleshooting v1
In this page, you can find some basic information on how to troubleshoot EDB Postgres for Kubernetes in your Kubernetes cluster deployment.
Hint
As a Kubernetes administrator, you should have the
kubectl
Cheat Sheet page
bookmarked!
Before you start
Kubernetes environment
What can make a difference in a troubleshooting activity is to provide clear information about the underlying Kubernetes system.
Make sure you know:
- the Kubernetes distribution and version you are using
- the specifications of the nodes where PostgreSQL is running
- as much as you can about the actual storage, including storage class and benchmarks you have done before going into production.
- which relevant Kubernetes applications you are using in your cluster (i.e. Prometheus, Grafana, Istio, Certmanager, ...)
- the situation of continuous backup, in particular if it's in place and working correctly: in case it is not, make sure you take an emergency backup before performing any potential disrupting operation
Useful utilities
On top of the mandatory kubectl
utility, for troubleshooting, we recommend the
following plugins/utilities to be available in your system:
cnp
plugin forkubectl
jq
, a lightweight and flexible command-line JSON processorgrep
, searches one or more input files for lines containing a match to a specified pattern. It is already available in most *nix distros. If you are on Windows OS, you can usefindstr
as an alternative togrep
or directly usewsl
and install your preferred *nix distro and use the tools mentioned above.
First steps
To quickly get an overview of the cluster or installation, the kubectl
plugin
is the primary tool to use:
- the status subcommand provides an overview of a cluster
- the report subcommand provides the manifests
for clusters and the operator deployment. It can also include logs using
the
--logs
option. The report generated via the plugin will include the full cluster manifest.
The plugin can be installed on air-gapped systems via packages. Please refer to the plugin document for complete instructions.
Are there backups?
After getting the cluster manifest with the plugin, you should verify if backups are set up and working.
In a cluster with backups set up, you will find, in the cluster Status, the fields
lastSuccessfulBackup
and firstRecoverabilityPoint
. You should make sure
there is a recent lastSuccessfulBackup
.
A cluster lacking the .spec.backup
stanza won't have backups.
An insistent message will appear in the PostgreSQL logs:
Before proceeding with troubleshooting operations, it may be advisable to perform an emergency backup depending on your findings regarding backups. Refer to the following section for instructions.
It is extremely risky to operate a production database without keeping regular backups.
Emergency backup
In some emergency situations, you might need to take an emergency logical
backup of the main app
database.
Important
The instructions you find below must be executed only in emergency situations
and the temporary backup files kept under the data protection policies
that are effective in your organization. The dump file is indeed stored
in the client machine that runs the kubectl
command, so make sure that
all protections are in place and you have enough space to store the
backup file.
The following example shows how to take a logical backup of the app
database
in the cluster-example
Postgres cluster, from the cluster-example-1
pod:
Note
You can easily adapt the above command to backup your cluster, by providing the names of the objects you have used in your environment.
The above command issues a pg_dump
command in custom format, which is the most
versatile way to take logical backups in PostgreSQL.
The next step is to restore the database. We assume that you are operating
on a new PostgreSQL cluster that's been just initialized (so the app
database
is empty).
The following example shows how to restore the above logical backup in the
app
database of the new-cluster-example
Postgres cluster, by connecting to
the primary (new-cluster-example-1
pod):
Important
The example in this section assumes that you have no other global objects
(databases and roles) to dump and restore, as per our recommendation. In case
you have multiple roles, make sure you have taken a backup using pg_dumpall -g
and you manually restore them in the new cluster. In case you have multiple
databases, you need to repeat the above operation one database at a time, making
sure you assign the right ownership. If you are not familiar with PostgreSQL,
we advise that you do these critical operations under the guidance of
a professional support company.
The above steps might be integrated into the cnp
plugin at some stage in the future.
Logs
All resources created and managed by EDB Postgres for Kubernetes log to standard output in accordance with Kubernetes conventions, using JSON format.
While logs are typically processed at the infrastructure level and include those from EDB Postgres for Kubernetes, accessing logs directly from the command line interface is critical during troubleshooting. You have three primary options for doing so:
- Use the
kubectl logs
command to retrieve logs from a specific resource, and applyjq
for better readability. - Use the
kubectl cnp logs
command for EDB Postgres for Kubernetes-specific logging. - Leverage specialized open-source tools like
stern, which can aggregate logs from
multiple resources (e.g., all pods in a PostgreSQL cluster by selecting the
k8s.enterprisedb.io/clusterName
label), filter log entries, customize output formats, and more.
Note
The following sections provide examples of how to retrieve logs for various resources when troubleshooting EDB Postgres for Kubernetes.
Operator information
By default, the EDB Postgres for Kubernetes operator is installed in the
postgresql-operator-system
namespace in Kubernetes as a Deployment
(see the "Details about the deployment" section
for details).
You can get a list of the operator pods by running:
Note
Under normal circumstances, you should have one pod where the operator is
running, identified by a name starting with postgresql-operator-controller-manager-
.
In case you have set up your operator for high availability, you should have more entries.
Those pods are managed by a deployment named postgresql-operator-controller-manager
.
Collect the relevant information about the operator that is running in pod
<POD>
with:
Then get the logs from the same pod by running:
Gather more information about the operator
Get logs from all pods in EDB Postgres for Kubernetes operator Deployment (in case you have a multi operator deployment) by running:
Tip
You can add -f
flag to above command to follow logs in real time.
Save logs to a JSON file by running:
Get EDB Postgres for Kubernetes operator version by using kubectl-cnp
plugin:
Output:
Cluster information
You can check the status of the <CLUSTER>
cluster in the NAMESPACE
namespace with:
Output:
The above example reports a healthy PostgreSQL cluster of 3 instances, all in
ready state, and with <CLUSTER>-1
being the primary.
In case of unhealthy conditions, you can discover more by getting the manifest
of the Cluster
resource:
Another important command to gather is the status
one, as provided by the
cnp
plugin:
Tip
You can print more information by adding the --verbose
option.
Get EDB PostgreSQL Advanced Server (EPAS) / PostgreSQL container image version:
Output:
Note
Also you can use kubectl-cnp status -n <NAMESPACE> <CLUSTER_NAME>
to get the same information.
Pod information
You can retrieve the list of instances that belong to a given PostgreSQL cluster with:
Output:
You can check if/how a pod is failing by running:
You can get all the logs for a given PostgreSQL instance with:
If you want to limit the search to the PostgreSQL process only, you can run:
The following example also adds the timestamp:
If the timestamp is displayed in Unix Epoch time, you can convert it to a user-friendly format: