Seamless Data Synchronization Across Databases with PostgreSQL Logical Replication
Olivia Novak
Dev Intern · Leapcell

Introduction
In today's data-driven world, applications often span multiple databases, requiring efficient mechanisms to keep data consistent and provide near real-time updates. Whether it's for building data warehouses, enabling microservices communication, or ensuring high availability through read replicas, the challenge of synchronizing data across disparate systems is a ubiquitous one. Traditional methods like batch jobs or custom application-level CDC often introduce complexity, latency, and operational overhead. This is where PostgreSQL's Logical Replication emerges as a powerful and elegant solution. It offers a native, robust, and highly configurable way to capture and apply changes, significantly simplifying the architecture for distributed data management. This article will explore the intricacies of PostgreSQL Logical Replication, demonstrating how it can be leveraged for seamless cross-database data synchronization and efficient Change Data Capture (CDC).
Understanding Logical Replication
To fully grasp the capabilities of PostgreSQL Logical Replication, it's essential to first understand a few core concepts:
- Logical Replication: A method of replicating data objects and their changes, based on their logical representation (INSERT, UPDATE, DELETE statements), rather than their physical storage blocks. This allows for replication between different PostgreSQL major versions, or even to non-PostgreSQL systems using custom decoders.
- Physical Replication: The traditional method of replicating the entire database cluster, including the operating system files, at the block level. This is primarily used for disaster recovery and read-only replica scenarios where the replica is an exact copy of the primary.
- Write-Ahead Log (WAL): PostgreSQL's core mechanism for ensuring data integrity and durability. Every change to the database is first written to the WAL before being applied to the actual data files. Logical Replication extracts data changes directly from the WAL.
- Publication: A set of tables (and optionally all tables in a schema or all tables in the database) on the publisher database that are marked for replication. Publishers define what data to send.
- Subscription: A connection to a publication on a subscriber database. Subscribers receive and apply the changes from the publication.
- Logical Decoding: The process of translating the binary WAL format into a logical format (e.g., SQL statements, JSON, Avro) that can be easily understood and processed for replication.
The Mechanism of Logical Replication
PostgreSQL Logical Replication works by capturing changes from the publisher's WAL and transmitting them to the subscriber. This process can be broken down into several steps:
- WAL Generation: Any DML operation (INSERT, UPDATE, DELETE) on the publisher database generates corresponding entries in the WAL.
- Logical Decoding: A logical decoding plugin (like
pgoutput
, which is the default for built-in logical replication) reads these WAL entries and translates them into a logical stream of changes. - Publication: The publisher creates a publication, specifying which tables or schemas it wants to replicate. This acts as a filter for the logical change stream.
- Subscription: The subscriber creates a subscription, pointing to a specific publication on the publisher.
- Data Transfer: The subscriber connects to the publisher. The publisher then sends a snapshot of the initial data for the subscribed tables to the subscriber. After the initial sync, it continuously streams the logical changes (INSERTs, UPDATEs, DELETEs) to the subscriber.
- Application: The subscriber receives these logical changes and applies them to its local tables, effectively synchronizing the data.
Implementation and Example
Let's walk through a practical example of setting up logical replication between two PostgreSQL instances.
Prerequisites:
You need two PostgreSQL instances (e.g., pg1
and pg2
). Ensure they can communicate over the network. For this example, we'll assume pg1
is the publisher and pg2
is the subscriber.
Step 1: Configure PostgreSQL on Publisher (pg1
)
Edit postgresql.conf
on pg1
and set the following parameters:
# /path/to/pg1/data/postgresql.conf wal_level = logical max_replication_slots = 10 # Adjust as needed max_wal_senders = 10 # Adjust as needed
Restart the pg1
instance for changes to take effect.
Next, open pg_hba.conf
on pg1
to allow connections from the subscriber. Add a line similar to this (replace subscriber_ip
with the actual IP address or range):
# /path/to/pg1/data/pg_hba.conf
host replication all subscriber_ip/32 md5
Restart pg1
again if you modified pg_hba.conf
.
Step 2: Create a Publication on Publisher (pg1
)
Connect to pg1
as a superuser or a user with REPLICATION
privilege.
-- Connect to pg1 psql -h localhost -p 5432 -U postgres -- Create a database and table for demonstration CREATE DATABASE publisher_db; \c publisher_db; CREATE TABLE products ( product_id SERIAL PRIMARY KEY, name VARCHAR(255) NOT NULL, price DECIMAL(10, 2) NOT NULL, last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); -- Create a publication for the 'products' table CREATE PUBLICATION my_publication FOR TABLE products; -- Optionally, to replicate all tables in the current database: -- CREATE PUBLICATION my_all_tables_publication FOR ALL TABLES;
Step 3: Prepare the Subscriber Database (pg2
)
Connect to pg2
as a superuser.
-- Connect to pg2 psql -h localhost -p 5433 -U postgres -- Create a database and table with the same schema as on the publisher -- It's crucial for the subscriber table schema to be compatible (at least have the same primary key and column names/types) CREATE DATABASE subscriber_db; \c subscriber_db; CREATE TABLE products ( product_id SERIAL PRIMARY KEY, name VARCHAR(255) NOT NULL, price DECIMAL(10, 2) NOT NULL, last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP );
Step 4: Create a Subscription on Subscriber (pg2
)
From the subscriber_db
on pg2
, create the subscription:
-- Connect to subscriber_db on pg2 psql -h localhost -p 5433 -U postgres -d subscriber_db CREATE SUBSCRIPTION my_subscription CONNECTION 'host=localhost port=5432 user=postgres dbname=publisher_db password=your_pg1_password' PUBLICATION my_publication;
Replace localhost
and 5432
with the actual host and port of your pg1
instance, and your_pg1_password
with the password for the postgres
user on pg1
.
Step 5: Test the Replication
Now, let's insert some data into the publisher and observe it on the subscriber.
On pg1
(in publisher_db
):
INSERT INTO products (name, price) VALUES ('Laptop', 1200.00); INSERT INTO products (name, price) VALUES ('Mouse', 25.50); UPDATE products SET price = 1150.00 WHERE name = 'Laptop'; DELETE FROM products WHERE name = 'Mouse';
On pg2
(in subscriber_db
):
SELECT * FROM products;
You should see the Laptop
entry with the updated price, and the Mouse
entry should be absent, demonstrating successful replication.
Application Scenarios
Logical Replication is incredibly versatile and can be applied in numerous scenarios:
- Change Data Capture (CDC): By monitoring the logical replication stream, external applications can react to database changes in real-time. This is fundamental for event-driven architectures, auditing, or stream processing.
- Cross-Database Data Synchronization: Maintaining consistent data across different databases for microservices, data marts, or geographically distributed applications.
- Data Warehousing and ETL: Feeding changes incrementally into data warehouses, reducing the need for bulk data loads.
- Zero-Downtime Upgrades: Replicating data from an older PostgreSQL version to a newer one, minimizing downtime during major version upgrades.
- Multi-Master (Active-Active) Replication (with care): While complex and usually requiring additional conflict resolution logic at the application layer, logical replication can form the backbone for custom multi-master setups.
- Selective Replication: Replicating only specific tables or schemas, allowing for finer-grained control over what data is synchronized, which is not possible with physical replication.
- Data Sharing with External Systems: With custom logical decoding plugins, data can be replicated to non-PostgreSQL databases or message queues, acting as a universal data change stream.
Conclusion
PostgreSQL Logical Replication offers a robust, efficient, and native solution for critical database operations like cross-database data synchronization and Change Data Capture. By leveraging the WAL and a publication/subscription model, it provides fine-grained control over data flow, enabling highly resilient and distributed data architectures. Its versatility makes it an indispensable tool for modern application development and data management, empowering developers to build scalable systems with consistent data across their entire ecosystem.