PostgreSQL partitioning, logical replication and other Q&A about PostgreSQL Superpowers
Welcome to the new week!
Last Tuesday, I did a new version of the “PostgreSQL Superpowers” webinar, thanks to the invitation from the good folks at Particular. It’s available here:
Or check:
I’m happy with how it went, as looking at the number of questions, I managed to pique people's curiosity about this topic!
Curious enough, that I didn’t manage to cover all the questions during the webinar, which is why I decided to answer them today as part of my humble newsletter.
I encourage you to watch the recording, but to recap what I showed:
PostgreSQL is not only a rock-solid Open Source database, but also how rich the plugin ecosystem is, and how that impacts its success
native PostgreSQL partitioning for efficient data storage management,
How to compose plugins like Timescale for time series data, PostGIS for spatial data (GPS tracking, etc.).
In general, how can plugins cut the time needed to deliver essential features
Automate reporting with continuous aggregations without having to poll the database
Stream database changes in real time by using logical replication as a push-notification system
Detect issues as they happen, for example, spotting unusual fleet activity, without constant polling
All of that is based on the simple, but real-world app of tracking business trips and managing reporting and alerting around them.
Ok, let’s move into the questions that I didn’t manage to cover during the webinar.
Questions & Answers
Partitioning
Michał: How many partitions is too much partitions? talking about weekly partitions for ~10 years?
If we follow the example from our webinar, the trips table looks like this:
CREATE TABLE trips (
trip_time TIMESTAMPTZ NOT NULL,
vehicle_id INT NOT NULL,
driver_name VARCHAR(255) NOT NULL,
start_location TEXT NOT NULL,
end_location TEXT NOT NULL,
distance_kilometers NUMERIC(10,2) NOT NULL,
fuel_used_liters NUMERIC(10,2) NOT NULL,
PRIMARY KEY (trip_time, vehicle_id)
) PARTITION BY RANGE (trip_time);
-- setup partitions for year
DO $$DECLARE
month_start_date DATE := '2023-01-01';
month_end_date DATE := '2023-12-01';
BEGIN
WHILE month_start_date < month_end_date LOOP
EXECUTE format('
CREATE TABLE trips_%s PARTITION OF trips
FOR VALUES FROM (%L) TO (%L);',
TO_CHAR(month_start_date, 'YYYY_MM'),
month_start_date,
month_start_date + INTERVAL '1 month'
);
month_start_date := month_start_date + INTERVAL '1 month';
END LOOP;
END$$;
Then, if we partition it by time, we’d get in 10 years:
month - 120 partitions (12 * 10),
week - 520 partitions (52 * 10),
day - 3650 (365 * 1000).
Obviously, 520 and 3650 sound like potentially a lot and should raise our eyebrows. PostgreSQL can handle up to a few thousand partitions fairly well, provided that typical queries allow the query planner to prune all but a small number of partitions. But here's the catch - each partition requires its metadata to be loaded into the local memory of each session that touches it. See the nice research on how number of partitions can impact performance: Partitioning in Postgres and the risk of high partition counts.
Nevertheless, we may notice that for our case, partitions fall into three groups:
current partition - where new trips are added,
active period partitions - e.g., the last 3 months, that are interesting for us from the alerting and reporting needs or maybe filling back missing trips,
old partitions - those that we don’t touch at all, or sporadically (e.g. older than 3 months).
This split is also essential from the perspective of another question:
Serge: Is there a notion of hot partition?
Yes, depending on our scenarios, some partitions may be the most used. That’s also why partition planning is critical, and it should be also made based on the business scenario. For our case, we should definitely care and keep focus on the current partition (the hot one, as it’s the most often used), and the active period.
What about the rest? They can be moved either to cheaper, slower storage. PostgreSQL allows this through setting up a so-called Tablespace.
We can do it by calling a SQL statement:
-- Create cheap storage
CREATE TABLESPACE cold_storage LOCATION '/mnt/slow_disk';
-- detach partion
ALTER TABLE trips DETACH PARTITION trips_2025_01;
-- set cheaper disk for this table
ALTER TABLE trips_2025_01 SET TABLESPACE cold_storage;
We could also use the pg_partman extension to automate it:
-- Tell partman to keep only 2 years attached
UPDATE partman.part_config
SET retention = '3 months',
retention_keep_table = true -- detach but don't drop
WHERE parent_table = 'public.trips';
-- Run maintenance (or let background worker do it)
SELECT partman.run_maintenance();
That also answers the next question:
Karol: How can we “move” older partitions to some less available/slower storage? Is available in postgres itself or some postgres cloud providers?
So it’s a native mechanism, but supported in the Cloud RDSes (also pg_partman is usually available there).
Worth noting is that detaching means removing a partition from the parent table while keeping it as a standalone table. The detached partition continues to exist as a standalone table, but no longer has any ties to the table from which it was detached. Read more in the partitioning docs.
We can attach it again, if we want, or just delete the content or even the whole partition table if we don’t need to access it.
That’s also important for the next question:
Matteo: what about a time range query? like time >= x AND time < y?. will Postgres touch only the relevant partitions?
Yes, PostgreSQL uses partition pruning to automatically skip irrelevant partitions when you query with conditions on the partition key.
Here's how it works with our trips table when using the Explain tool:
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM trips
WHERE trip_time >= '2025-06-15'
AND trip_time < '2025-07-15';
We’ll get something close to:
Append (cost=0.00..241.00 rows=4800 width=64) (actual time=0.015..0.467 rows=4752 loops=1)
Subplans Removed: 10 -- ← 10 partitions skipped
-> Seq Scan on trips_2025_06 (cost=0.00..48.20 rows=1200 width=64) (actual time=0.014..0.128 rows=1188 loops=1)
Filter: ((trip_time >= '2025-06-15'::timestamp) AND (trip_time < '2025-07-15'::timestamp))
-> Seq Scan on trips_2025_07 (cost=0.00..48.20 rows=1200 width=64) (actual time=0.012..0.115 rows=1176 loops=1)
Filter: ((trip_time >= '2025-06-15'::timestamp) AND (trip_time < '2025-07-15'::timestamp))
PostgreSQL will only query (scan) partition tables trips_2025_06 and trips_2025_07 because our date range spans mid-June to mid-July. The other 10 monthly partitions were pruned away. Pruned means skipped.
Two types of pruning occur:
Planning time pruning - when query values are constants (like above)
Execution time pruning - with prepared statements or subqueries where values aren't known until runtime.
This is why partitioning by time works well for time-series data - PostgreSQL only touches partitions that overlap with your date range. Still, the recommendation is to try avoid doing queries between partitions. We should make our partitions as autonomous as possible, as querying between partitions is costly (may even mean accessing different disks, etc.).
That’s also how we’re coming to the next two questions:
Barry: What happens when you run this on a table with data already in it? Will it automatically split the data into the relevant partition?
And:
Nikita: Please share your experience with rebalancing of partitions.
PostgreSQL doesn't have automatic partition rebalancing like distributed databases. Once data lands in a partition, it stays there unless you manually move it. This becomes a problem when your partitioning strategy no longer fits your data patterns.
The most common issue? You chose the wrong partition size. Maybe you started with yearly partitions, but now queries are slow because each partition has too much data. You need to split them, e.g.:
-- Can't just alter boundaries - must detach and recreate
ALTER TABLE trips DETACH PARTITION trips_2025;
-- Create monthly partitions instead
CREATE TABLE trips_2025_01 PARTITION OF trips
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
-- ... create all 12 months
-- Move data from the old yearly partition
INSERT INTO trips SELECT * FROM trips_2025;
-- drop old years partition
DROP TABLE trips_2025;
Another case is when you have data in the wrong partitions. Updating your partition key value won’t move your data to a different partition, so instead of updating the partition key value, you need to insert it into the new partition with the new key and drop it from the old one. You can do it as follows:
WITH moved AS (
DELETE FROM trips_2024_12
-- condition to find record(s) you want to move
WHERE trip_time = '2024-12-31' and vehicle_id = 123
RETURNING
vehicle_id,
driver_name,
start_location,
end_location,
distance_kilometers,
fuel_used_liters
)
INSERT INTO trips (trip_time, vehicle_id, driver_name, start_location,
end_location, distance_kilometers, fuel_used_liters)
SELECT '2025-01-01', 123, driver_name, start_location,
end_location, distance_kilometers, fuel_used_liters
FROM moved;
In the same way, if you have a table that’s not yet partitioned but has existing data, then PostgreSQL will refuse to partition it. You have two options. Either manually create new partitioned tables and migrate data first, or use pg_partman, which can partition existing tables:
SELECT partman.create_parent('public.trips', 'trip_time', 'native', 'monthly');
CALL partman.partition_data_proc('public.trips');
So, again, PostgreSQL extensions ecosystem FTW!
And final question on partitioning:
Dillan: Does ef core support partitions
Not at the moment, you’ll need to set up partitions on your own through custom migration, and AFAIK, the same story for other popular ORMs. Still, that’s not a big issue, as when you do the setup, you can query a partitioned table as any other table.
Logical Replication
I wrote in more detail about PostgreSQL Logical replication and Write-Ahead Log in:
Check also a great article from Gunnar Morling, an authority in this space:
They should answer some of the questions. Let’s do a super quick TLDR of the key concepts.
A Write-Ahead Log is a structure on the disk to which all statements representing changes (INSERT, UPDATE, DELETE) are appended before applying them to tables upon transaction commit. Most of the databases are cleaning those entries, soon after they’re committed, PostgreSQL allows us to keep them as long as we tell it to.
This enables Logical replication, which streams database changes from a publisher to subscribers using the Write-Ahead Log. Unlike physical replication that copies everything byte-by-byte, logical replication sends actual SQL-like changes (thus logical, as they represent the logical change, not serialised bytes of data).
Key components:
Publication: Defines which tables to replicate. You set it up by calling:
CREATE PUBLICATION outbox_pub FOR TABLE outbo
Replication Slot: Tracks subscriber progress in WAL, ensures no data loss (pg_create_logical_replication_slot)
Subscriber: Connects to slot, receives changes, sends acknowledgements
So, answering Matteo’s question:
Matteo: How does logical replication relate to Change Data Capture?
Logical replication IS PostgreSQL's native CDC. It captures changes from WAL and streams them out, the same as Debezium or other CDC tools.
The difference is scope. Debezium uses logical replication internally, providing a higher abstraction level and features such as connectors to Kafka. It handles batching, retries, and monitoring. With native logical replication, you build that plumbing yourself.
Muhammad: I assume WAL notifications are ordered?
Yes, strictly ordered by transaction commit order. WAL is append-only. PostgreSQL tracks subscriber progress through replication slots using LSN (Log Sequence Number - read more here).
Ordering is per replication slot.
Muhammad: What happens when there are multiple subscribers to WAL notifications?
Multiple subscribers of the same slot process at different speeds, so "ordered" messages might arrive out of order at different downstream systems. So it’s similar to RabbitMQ guarantees for multiple consumers of the same queue.
If you need global ordering across systems, you need to either have a single subscriber for the replication slot or you need additional coordination.
Read also more in The Order of Things: Why You Can't Have Both Speed and Ordering in Distributed Systems.
Muhammad: Do we get publisher confirms with this Postgres Publication
Not in the traditional message queue sense. The subscriber sends acknowledgements after processing batches of changes, not individual messages - that's why we call SendStatusUpdate() in the code example.
PostgreSQL gives you position-based tracking. If your subscriber crashes after processing message 100 but before sending the status update, it'll reprocess from maybe message 80. Your consumer must be idempotent.
Muhammad: Does WAL get auto flushed if there is no subscriber confirms?
No. WAL segments stick around until the subscriber confirms. Replication slots will prevent the removal of required resources even when there is no connection. But that also brings cost:
Nikita: Does WAL grow indefinitely when a subscriber goes down? Can we limit the growth?
Yes, indefinitely by default. To understand why, you need to know how WAL cleanup works.
PostgreSQL continuously writes changes to WAL files in the pg_wal directory. Each file is 16MB by default. Without replication slots, PostgreSQL deletes old WAL files after:
They've been checkpointed (data written to main tables)
They've been archived (if archiving is enabled)
Let’s compare that to how messaging systems deal with retention.
In Kafka, you set retention policies like "keep data for 7 days" or "keep 100GB". Data gets deleted after this period, regardless of whether consumers read it. Late consumers simply miss old data.
PostgreSQL works differently. WAL retention is controlled by replication slots, not publications. A publication defines which tables to replicate - it has no retention policy. The slot tracks the consumer's position and prevents WAL deletion until consumed.
With slots, PostgreSQL also checks every slot position before deletion. One slow slot blocks WAL cleanup for everyone.
If you want Kafka-like behaviour where data is available for future subscribers, you need WAL archiving:
-- Archive WAL to long-term storage
archive_mode = on
archive_command = 'cp %p /mnt/wal_archive/%f'
This copies WAL files before deletion. Future subscribers can replay from the archive, not the active WAL. But logical replication doesn't automatically use archives - you'd need to restore them manually or use tools like pg_rewind.
You can also set the maximum size of data to keep for the slot:
ALTER SYSTEM SET max_slot_wal_keep_size = '10GB';
This works like a circuit breaker - slots that retain too much WAL get invalidated rather than filling your disk. But unlike Kafka, where late consumers just miss data, PostgreSQL slots become permanently broken.
The fundamental difference: Kafka decouples retention from consumption. PostgreSQL couples them tightly - data stays until consumed or the slot breaks. You can't have "keep WAL for 7 days for future subscribers" without archiving.
So again, just like with partitioning, archiving strategy is the key!
Trivia
What are core key features not present in SQL Server?
Lots to mention! Jokes aside, MSSQL is a decent database, but from my perspective, Microsoft is lagging behind and underfunding efforts on it. Even on their conferences, they announce more news around PostgreSQL than MSSQL. The biggest difference is in the ecosystem and extensions being first-class citizens.
Still, here’s a quick list from the top of my head:
Extension system - The ability to add functionality without touching core code. That's how we get pg_partman, PostGIS, TimescaleDB. SQL Server has nothing comparable.
Logical replication with custom output plugins - SQL Server has CDC, but for traditional replication, you can't customise or plug into it as described for PostgreSQL.
Partitioning in MSSQL is just for the enterprise version, and IMHO, more complicated than in PostgreSQL.
Foreign Data Wrappers - Query external databases, CSV files, APIs as if they were local tables. SQL Server has linked servers but they're not as flexible.
Custom types and operators - Define your own data types and how they behave. SQL Server is limited to built-in types.
JSONB with GIN indexing - While SQL Server has JSON support, PostgreSQL's JSONB with indexing is more mature and performant.
Tablespaces - Place specific tables/partitions on different disks. SQL Server uses filegroups, but PostgreSQL's approach is simpler for partitioning.
Arrays as first-class citizens - you can store and index arrays directly.
Rich ecosystem of various hosting services, besides cloud RDSes there’s a variety of bare metal VPCs, providers like Supabase, Neon. And cloud-native databases like AWS AuroraDB, Google Spanner, CockroachDB, Yugabyte, compatible with PostgreSQL syntax.
The PostgreSQL Docker image is lighter and much easier to run integration tests.
Peter: What misuse of PostgreSQL (feature or in general) has turned out to be useful?
I’d say that:
Using PostgreSQL as a document database - I wrote about it in PostgreSQL JSONB - Powerful Storage for Semi-Structured Data. What seemed to be a wild idea worked really well. For year, I was co-maintaining .NET tool Marten, which allows using PostgreSQL as a document db, now I’m building also Pongo - a Mongo-compliant alternative in Node.js. Also tools like FerretDB and even cloud providers used it, e.g. AWS in DocumentDB, Microsoft in CosmosDB.
Using PostgreSQL as a queue - that’s not something I’d recommend as the safe default, but if throughput of a few thousand messages per second is fine for you, then why not? That’s what Particular did in NServiceBus transport. There are popular tools like Supabase queues, pgBoss, beware of Just use PostgreSQL for all advice, as Just use X is dangerous advice, but can be a good choice.
Using PostgreSQL as an event store - that’s what I did in Marten, and do in Emmett, and also other tools are doing. Don’t write your own, now you have tools in all popular environments that’ll do it for you.
Full-Text Search Engine - Using PostgreSQL's full-text search capabilities instead of Elasticsearch or Solr might not be always the best ide, but for many use cases, PostgreSQL's tsvector and tsquery are more than sufficient for many cases and can eliminate operational complexity if you’re already using PostgreSQL and don’t have sophisticated needs.
Peter: What is a common thing you see as a consultant that many teams+orgs get wrong with postgres?
I think that the issues are similar to those in other environments:
not using explain and not tuning queries,
not using prepared statements for the commonly used queries,
not setting or oversetting indexes,
not using optimistic concurrency,
not thinking about the transaction isolation, and causing deadlocks.
From the issues that are special for PostgreSQL:
not knowing how PostgreSQL sequences works, and losing messages when using the outbox pattern or PostgreSQL as a message queue,
Not understanding VACUUM and autovacuum: PostgreSQL's MVCC model means dead tuples accumulate and need cleanup. Many teams either disable autovacuum (terrible idea) or don't tune it properly for their workload. This leads to table bloat, degraded query performance. Check more about it in:
Woof, that was a lot! I hope that this was useful and I didn’t mess anything! How would you answer those questions? Or maybe you have more of them?
Cheers!
Oskar
p.s. Ukraine is still under brutal Russian invasion. A lot of Ukrainian people are hurt, without shelter and need help. You can help in various ways, for instance, directly helping refugees, spreading awareness, and putting pressure on your local government or companies. You can also support Ukraine by donating e.g. to Red Cross, Ukraine humanitarian organisation or donate Ambulances for Ukraine.