Encrypting columns in PGD v6.3.1

Column encryption protects sensitive fields in a table by converting them into ciphertext, a scrambled and unreadable form, while leaving the rest of the record in its original, human-readable form. Because encryption happens at the SQL level during INSERT or UPDATE, only ciphertext ever reaches the Write-Ahead Log (WAL) and gets replicated across PGD nodes, keeping sensitive data protected at rest, in backups, and in transit.

In a distributed cluster, column encryption also provides the following benefits:

  • Node independence: Applications can provide the key to any node for consistent decryption.
  • Reduced attack surface: Keys never enter the WAL in plaintext. Only ciphertext is replicated.
  • Memory isolation: Keys are cleared when the session closes.

PGD supports column-level encryption through the following:

  • pgcrypto: A community Postgres extension, not shipped as an EDB package, but supported when used with PostgreSQL, EDB Postgres Extended (PGE), and EDB Postgres Advanced Server (EPAS).
  • DBMS_CRYPTO: An Oracle-compatibility package available exclusively on EPAS.

Both are fully compatible with PGD replication. When using pgcrypto or DBMS_CRYPTO, data is transformed using cryptographic functions before being written to the heap. pgcrypto stores the resulting ciphertext as bytea, and DBMS_CRYPTO uses RAW, BLOB, or CLOB, all of which PGD can replicate natively.

Comparing encryption methods

Column encryption is one of several approaches to protecting data in PGD. Each operates at a different level of the stack and guards against different threats.

Transparent Data Encryption (TDE) is an optional feature of PGE and EPAS that encrypts data at the storage level, making database files on disk unreadable without the encryption key. TDE operates transparently alongside PGD replication, but it offers no protection against a DBA who has access to the running database, who can still query all data in plaintext.

Data redaction is a feature of EPAS that masks sensitive values in query results based on policy. The underlying data on disk is unchanged, so it must be paired with TDE or column encryption to provide meaningful protection.

Column encryption protects at the field level, which is more granular than either of the above. A DBA without the session-level decryption key sees only ciphertext, even in a live query. The trade-off is that the application must manage keys and handle encryption and decryption explicitly.

TechniqueProtection levelKey managed byProtects againstVisible to DBAs
TDEBlock/storage-levelDatabase/KMIPPhysical disk theft or OS-level accessYes
Data redactionPresentation-levelDatabase policyUnauthorized app usersYes
Column encryptionField-levelApplicationCompromised DBAs or system adminsNo (ciphertext only)

Planning for encryption in a distributed cluster

Encrypting columns in PGD introduces considerations that don't arise in a single-node setup. Before implementing column encryption, make sure you've accounted for indexing, CPU overhead, and the effect on storage and replication.

Handling indexing and search on encrypted columns

Both pgcrypto and DBMS_CRYPTO use probabilistic (randomized) encryption by default, meaning encrypting the same plaintext multiple times produces a different ciphertext each time. pgcrypto achieves this behavior via pgp_sym_encrypt(), and DBMS_CRYPTO does this via AES in CBC mode.

Randomized encryption has two key implications:

  • Index incompatibility: A standard B-tree index on a ciphertext column can't support plaintext equality lookups.
  • Query failure: The following common pattern returns zero results even if the record exists, because each call to pgp_sym_encrypt() produces different ciphertext that will never match what's stored in the column. No error is raised, making the failure easy to mistake for the record not existing:
SELECT * FROM customers
WHERE email = pgp_sym_encrypt('alice@example.com', 'your_key');

With pgcrypto specifically, if the column type is text rather than bytea, Postgres raises a type mismatch error instead:

ERROR: operator does not exist: text = bytea

To search encrypted columns by exact value, use one of these strategies:

  • Deterministic encryption: Configure the encryption algorithm to always produce the same ciphertext for a given plaintext.
  • Blind index (recommended): Store an additional column containing a one-way cryptographic hash of the plaintext, using a function like hmac() from pgcrypto, and index that column instead. Queries hash the search value and compare against the index rather than the encrypted column.

For implementation details on both approaches, see the pgcrypto documentation or the DBMS_CRYPTO reference.

Warning

Both approaches reveal equality patterns: an attacker who can observe the stored values can infer when two rows contain the same data, and on low-variety fields like status flags or country codes, can deduce the plaintext entirely.

Accounting for CPU overhead

Encryption and decryption are computationally expensive operations that run on the database server. Ensure your nodes have sufficient CPU headroom. You can significantly reduce this impact by using algorithms that leverage hardware acceleration, such as AES on processors that support the AES-NI instruction set.

Planning for storage and WAL growth

Ciphertext is larger than the original plaintext, which increases table size, WAL volume, and the replication bandwidth required for PGD to synchronize nodes. Plan capacity accordingly.

Encrypting columns using pgcrypto

pgcrypto is a community Postgres extension that provides cryptographic functions including pgp_sym_encrypt() and pgp_sym_decrypt(). It works with PostgreSQL, PGE, and EPAS. Views handle transparent decryption and triggers handle automatic encryption on write, so applications can read and write plaintext without calling the cryptographic functions directly.

  1. Create the extension:

    CREATE EXTENSION IF NOT EXISTS pgcrypto;
  2. Create the customers table with the encrypted column defined as bytea:

    CREATE TABLE customers (
        customer_id   uuid    DEFAULT uuidv7() PRIMARY KEY,
        username      TEXT    NOT NULL,
        email         BYTEA   NOT NULL
    );
  3. Create a view for transparent decryption:

    -- Note: requires SET app.pgc_key = 'your_key' to be run in the session first
    CREATE OR REPLACE VIEW customers_view AS
    SELECT
        customer_id,
        username,
        pgp_sym_decrypt(email, current_setting('app.pgc_key')) AS email
    FROM customers;
  4. Create a trigger to encrypt data automatically before it's written:

    CREATE OR REPLACE FUNCTION encrypt_customer_pii_trigger()
    RETURNS TRIGGER AS $$
    BEGIN
        IF current_setting('app.pgc_key', true) IS NULL OR current_setting('app.pgc_key') = '' THEN
            RAISE EXCEPTION 'Encryption key not set. Please run SET app.pgc_key = ...';
        END IF;
    
        IF (TG_OP = 'INSERT' OR NEW.email != OLD.email) THEN
            NEW.email := pgp_sym_encrypt(
                convert_from(NEW.email, 'utf8'),
                current_setting('app.pgc_key'),
                'cipher-algo=aes256'
            );
        END IF;
        RETURN NEW;
    END;
    $$ LANGUAGE plpgsql;
    
    CREATE TRIGGER trg_encrypt_customer_pii
    BEFORE INSERT OR UPDATE ON customers
    FOR EACH ROW EXECUTE FUNCTION encrypt_customer_pii_trigger();
  5. With the view and trigger in place, set the session key and interact with the table normally:

    -- 1. Initialize session key
    SET app.pgc_key = 'your_secret_key';
    
    -- 2. Trigger handles encryption automatically
    INSERT INTO customers (username, email) VALUES ('dolores', 'dolores@example.com');

Querying the raw customers table reveals that the email column stores ciphertext, while customers_view returns the decrypted plaintext:

SELECT customer_id, username, email FROM customers;
Output
 customer_id                         | username | email
-------------------------------------+----------+------------------------------------------------
 01966e42-1af7-7a2a-b4b9-9cf8e0d3c2a | dolores  | \xc30d04070302e4a7f23b8c12d956aa...
(1 row)
SELECT customer_id, username, email FROM customers_view;
Output
 customer_id                         | username | email
-------------------------------------+----------+---------------------
 01966e42-1af7-7a2a-b4b9-9cf8e0d3c2a | dolores  | dolores@example.com
(1 row)

Encrypting columns using DBMS_CRYPTO

DBMS_CRYPTO is an Oracle-compatible package available exclusively in EPAS. It provides encryption and decryption functions that mirror the Oracle DBMS_CRYPTO API. As with pgcrypto, a view handles transparent decryption and a trigger handles automatic encryption on write.

  1. Create the customers table with the encrypted column defined as RAW:

    CREATE TABLE customers (
        customer_id   uuid     DEFAULT uuidv7() PRIMARY KEY,
        username      TEXT     NOT NULL,
        email         RAW(128)
    );
  2. Create a view for transparent decryption:

    CREATE OR REPLACE VIEW customers_view AS
    SELECT
        customer_id,
        username,
        UTL_I18N.RAW_TO_CHAR(
            DBMS_CRYPTO.DECRYPT(
                src => email,
                typ => 258 + 256 + 4096, -- AES256 + CBC + PKCS7
                key => UTL_I18N.STRING_TO_RAW(current_setting('app.pgc_key'), 'AL32UTF8')
            ),
            'AL32UTF8'
        ) AS email
    FROM customers;
  3. Create a trigger to encrypt data automatically before it's written:

    CREATE OR REPLACE FUNCTION epas_encrypt_customer_pii_trigger()
    RETURNS TRIGGER AS $$
    DECLARE
        l_typ INTEGER := 258 + 256 + 4096; -- AES256 + CBC + PKCS7
        l_key RAW(32) := UTL_I18N.STRING_TO_RAW(current_setting('app.pgc_key'), 'AL32UTF8');
    BEGIN
        IF (TG_OP = 'INSERT' OR NEW.email IS DISTINCT FROM OLD.email) THEN
            NEW.email := DBMS_CRYPTO.ENCRYPT(
                src => UTL_I18N.STRING_TO_RAW(convert_from(NEW.email, 'utf8'), 'AL32UTF8'),
                typ => l_typ,
                key => l_key
            );
        END IF;
        RETURN NEW;
    END;
    $$ LANGUAGE plpgsql;
    
    CREATE TRIGGER trg_epas_encrypt_customer_pii
    BEFORE INSERT OR UPDATE ON customers
    FOR EACH ROW EXECUTE FUNCTION epas_encrypt_customer_pii_trigger();
  4. With the view and trigger in place, set the session key and interact with the table normally:

    -- 1. Initialize session key
    SET app.pgc_key = 'my_secure_32_byte_key_1234';
    
    -- 2. Trigger handles encryption automatically
    INSERT INTO customers (username, email) VALUES ('dolores', 'dolores@example.com');

Querying the raw customers table reveals that the email column stores ciphertext as a RAW hex string, while customers_view returns the decrypted plaintext:

SELECT customer_id, username, email FROM customers;
Output
 customer_id                         | username | email
-------------------------------------+----------+------------------------------------------------
 01966e42-1af7-7a2a-b4b9-9cf8e0d3c2a | dolores  | A3F2C8D14E7B0591...
(1 row)
SELECT customer_id, username, email FROM customers_view;
Output
 customer_id                         | username | email
-------------------------------------+----------+---------------------
 01966e42-1af7-7a2a-b4b9-9cf8e0d3c2a | dolores  | dolores@example.com
(1 row)

Managing encryption keys securely

PGD replicates data but not keys, which means all nodes must have access to the same key material. Follow these practices to manage keys safely:

  • Avoid hardcoding keys: Never include keys in SQL scripts or CREATE VIEW definitions, as they're stored in pg_catalog and are visible to anyone with access to system catalogs.

  • Use a session-level configuration parameter to pass the key: Setting it at the session level means the key exists only in memory and is cleared when the session closes. The parameter name can be anything in the app.* namespace. For example, app.pgc_key:

    SET app.pgc_key = 'your_ultra_secure_aes_256_key_here';
    Important

    Configuration parameters are local to the specific node and session where the command was executed. Because PGD operates as a multi-node cluster, your application must ensure the SET command is issued to the specific node handling the current connection. If your application switches nodes (for example, via Connection Manager), the parameter must be re-set on the new node before attempting to read or write encrypted data.

  • Use an external key management service for production deployments: Services such as HashiCorp Vault or AWS KMS let the application fetch the key at runtime and provide it to PGD for the duration of the session, so keys are never persisted to local disk on any PGD node.