The Order of Things: Why You Can't Have Both Speed and Ordering in Distributed Systems
Picture this scenario: An online retailer runs a flash sale with 500 units of a hot product. Within minutes, they've processed 742 orders. Their inventory shows -242 units. Customer service is about to have a very bad day.
This isn't a story about bad code. The engineers did everything right: database transactions, inventory checks, proper error handling. They discovered that in distributed systems, you must choose between global ordering and performance.
When you have multiple servers processing requests simultaneously, "check inventory then charge customer" becomes a race condition. The milliseconds between "check" and "charge" are windows where other orders slip through.
You might think stronger locks could fix this. They could, while destroying your system's performance and/or its availability. Let’s discuss today why.
PostgreSQL: When Correctness Is King
Let's start with PostgreSQL, the database that prioritises correctness above all else. Consider this seemingly bulletproof code that could be generated by the ORM used in application:
BEGIN;
SELECT quantity FROM inventory WHERE product_id = 'console-1';
--- ⌛ Application checks if quantity > 0
INSERT INTO orders (user_id, product_id, amount) VALUES (123, 'console-1', 599.99);
UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 'console-1';
COMMIT;
With one server, this works perfectly. With ten servers hitting the same database? You've got a problem.
The issue is that BEGIN
doesn't mean "stop the world while I work." It means "group these operations together." Between your SELECT and UPDATE, nine other transactions can read the same inventory count. All ten transactions see "5 units available." All ten proceed with the sale.
The Locking Solution
PostgreSQL offers a solution: SELECT FOR UPDATE
. This actually locks the row:
BEGIN;
SELECT quantity FROM inventory WHERE product_id = 'console-1' FOR UPDATE;
--- ⛔ Now other transactions must wait
INSERT INTO orders (user_id, product_id, amount) VALUES (123, 'console-1', 599.99);
UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 'console-1';
COMMIT;
But what actually happens inside PostgreSQL when you use FOR UPDATE
? Let's look at the mechanics.
When Transaction A executes SELECT FOR UPDATE
, PostgreSQL:
Acquires a
RowExclusiveLock
on the row with product_id = 'console-1'Records this lock in the lock manager
Other transactions attempting to read with
FOR UPDATE
or write must wait
You can see this in action:
-- ✅ Session 1
BEGIN;
SELECT * FROM inventory WHERE product_id = 'console-1' FOR UPDATE;
-- ⛔ Session 2 (in another connection)
SELECT pid, relation::regclass, mode, granted
FROM pg_locks
WHERE relation = 'inventory'::regclass;
-- Output shows something close to:
-- pid | relation | mode | granted
-- 1234 | inventory | RowExclusiveLock | t
-- 5678 | inventory | RowExclusiveLock | f -- waiting!
The waiting transaction appears in pg_stat_activity
:
SELECT pid, wait_event_type, wait_event, query
FROM pg_stat_activity
WHERE wait_event IS NOT NULL;
-- Shows: wait_event_type = 'Lock', wait_event = 'tuple'
This creates a queue. If 100 concurrent users attempt to buy the same product, 99 wait while one is processed. The 100th user might wait several seconds—an eternity in web applications.
The performance impact is severe. During high traffic, you might process 50 orders per second instead of 1,000 per second. Every popular product becomes a serialisation point for your entire system.
For more details on PostgreSQL locking, see the official documentation on explicit locking.
The Sequence Problem
As I explored in my article about PostgreSQL sequences and their quirks, even the basic building blocks of ordering have surprises. PostgreSQL sequences—used for generating IDs—can have gaps even when all transactions succeed. Why? Because sequences grab their values before knowing if the transaction will commit.
Consider this race:
Transaction A starts at 10:00:00.000, gets sequence value 1
Transaction B starts at 10:00:00.001, gets sequence value 2
Transaction B commits at 10:00:00.100
Transaction A commits at 10:00:00.200
Your records appear "out of order" relative to their IDs. Record 2 exists before record 1 in the commit order. This becomes critical when you're using sequences for ordering in patterns like the Outbox pattern.
The problem runs deeper. PostgreSQL allocates transaction IDs (XIDs) from a global counter. When combined with MVCC (Multi-Version Concurrency Control), this creates interesting effects:
-- Transaction XIDs determine visibility
SELECT xmin, xmax, * FROM inventory WHERE product_id = 'console-1';
-- xmin: transaction that created this row version
-- xmax: transaction that deleted/updated this row version
A row is visible in your transaction if:
Its xmin is committed and < your transaction's XID
Its xmax is either null or > your transaction's XID
This is why PostgreSQL can show you different data than another concurrent transaction—each sees a different "snapshot" of the database based on transaction ordering. Even PostgreSQL makes trade-offs between strict ordering and performance.
The Replication Lag
The problems compound with replication. Your primary database knows the correct order. But replicas apply changes asynchronously. A customer might:
Place an order (writes to the primary instance)
Check their order history (reads from replica)
See nothing—the replica hasn't caught up
You can force reads from the primary, but now you've eliminated the performance benefit of having replicas.
MongoDB: The Speed Trap
MongoDB takes a different philosophy: speed first, consistency when you ask for it.
Here's our inventory check in MongoDB:
const product = await db.products.findOne({ _id: 'console-1' });
if (product.quantity > 0) {
await db.orders.insertOne({ userId: 123, productId: 'console-1', amount: 599.99 });
await db.products.updateOne(
{ _id: 'console-1' },
{ $inc: { quantity: -1 } }
);
}
This has the same race condition as our PostgreSQL example, but MongoDB makes it worse through its replica set architecture.
How MongoDB Replication Actually Works
MongoDB uses replica sets where one primary accepts writes and secondaries asynchronously replicate from the primary's oplog (operations log). As I explained in the Write-Ahead Log article, the oplog is MongoDB's version of a write-ahead log:
// Actual oplog entry structure
{
"ts": Timestamp(1626900000, 1), // Operation timestamp
"t": NumberLong(10), // Term (for elections)
"h": NumberLong(0), // Hash
"v": 2, // Version
"op": "u", // Operation type (update)
"ns": "shop.products", // Namespace
"ui": UUID("..."), // Collection UUID
"o": { "$v": 1, "$set": { "quantity": 4 } }, // Operation
"o2": { "_id": "console-1" } // Query
}
By default, MongoDB reads might hit different replica set members. The primary might have a quantity of 5, while the secondary still shows a quantity of 10. Your application checks inventory on the secondary (seeing 10), while another server updates the primary (setting it to 0).
You can see the replication lag yourself:
db.adminCommand({ replSetGetStatus: 1 }).members.forEach(member => {
console.log(`${member.name}: ${member.optime.ts.getTime()}`);
});
// Primary: Timestamp(1626900000, 1)
// Secondary1: Timestamp(1626899995, 1) -- 5 seconds behind!
For deep details on MongoDB replication mechanics, see the MongoDB Architecture Guide.
MongoDB's Atomic Operations
MongoDB offers findOneAndUpdate
for atomic operations:
const result = await db.products.findOneAndUpdate(
{ _id: 'console-1', quantity: { $gt: 0 } },
{ $inc: { quantity: -1 } },
{ returnDocument: 'after' }
);
if (result.value) {
// Inventory decreased successfully, proceed with order
await db.orders.insertOne({ userId: 123, productId: 'console-1', amount: 599.99 });
}
This is atomic for the inventory update. But now you have a new problem: what if the order insert fails? You've decremented inventory but have no order record.
The Session Solution
MongoDB 4.0 added multi-document transactions:
const session = await client.startSession();
await session.withTransaction(async () => {
const product = await db.products.findOne(
{ _id: 'console-1' },
{ session }
);
if (product.quantity > 0) {
await db.products.updateOne(
{ _id: 'console-1' },
{ $inc: { quantity: -1 } },
{ session }
);
await db.orders.insertOne(
{ userId: 123, productId: 'console-1', amount: 599.99 },
{ session }
);
}
});
You've just made MongoDB behave like PostgreSQL, inheriting PostgreSQL's performance characteristics. MongoDB's own documentation warns that transactions impact performance—in benchmarks, throughput drops by 50-60% with transactions enabled.
Why? Because MongoDB must now coordinate:
Acquire locks across multiple documents
Ensure all replica set members can see a consistent snapshot
Maintain undo logs for potential rollback
Block concurrent operations on the same documents
The MongoDB transactions documentation explicitly states: "In most cases, multi-document transaction incurs a greater performance cost over single document writes."
The Revision Pattern
In my article about handling projections in eventually consistent stores, I explored another challenge: eventual consistency in reads. When you write to MongoDB and immediately read, you might get stale data or even null results.
The solution involves tracking revisions and retrying until you see the expected state:
async function retryIfNotFound(find, options = { maxRetries: 5, delay: 100 }) {
let retryCount = 0;
do {
try {
const result = await find();
if (result !== null) return result;
if (retryCount === options.maxRetries) {
throw new Error('Document not found after max retries');
}
const sleepTime = Math.pow(2, retryCount) * options.delay;
await sleep(sleepTime);
retryCount++;
} catch (error) {
throw error;
}
} while (true);
}
Usage with revision tracking:
const { productItems, revision } = await retryIfNotFound(() =>
shoppingCarts.findOne(
{
shoppingCartId: event.data.shoppingCartId,
revision: { $gte: expectedRevision },
},
{ projection: { productItems: 1, revision: 1 } }
)
);
This pattern, retry with exponential backoff until you see the version you expect—works. Normally, one call should be enough, but sometimes it can take multiple retries to get it.
Such an approach improves the issue but adds complexity and may increase latency. You're fighting MongoDB's natural behaviour instead of working with it. Each retry might take hundreds of milliseconds, turning a simple read into a multi-second operation under load.
Kafka: Distributed by Design
Kafka takes yet another approach: it embraces distribution but partitions the problem.
As I explained in The Write-Ahead Log article, Kafka is built entirely around the concept of logs. Topics provide logical grouping, while partitions are the physical append-only logs. Each partition guarantees ordering, but there's no global order across partitions.
In Kafka, you might model our inventory system as events. Order service publishes event:
producer.send({
topic: 'orders',
key: productId, // Ensures all events for a product go to same partition
value: {
type: 'OrderPlaced',
userId: 123,
productId: 'console-1',
amount: 599.99
}
});
The Inventory service consumes and updates
consumer.on('message', async (message) => {
if (message.value.type === 'OrderPlaced') {
const inventory = await getInventory(message.value.productId);
if (inventory > 0) {
await decrementInventory(message.value.productId);
await publishInventoryReserved(message.value);
} else {
await publishOrderRejected(message.value);
}
}
});
By using the product ID as the partition key, Kafka guarantees all events for a product are processed in order. But this only works within a single partition.
The Distributed Transaction Problem
What happens when a customer orders two different products in one checkout?
Order contains: Product A and Product B
Event 1: "Reserve inventory for A" → partition 1 (based on hash of "product-A")
Event 2: "Reserve inventory for B" → partition 7 (based on hash of "product-B")
Event 3: "Process payment for order" → partition 3 (based on hash of "order-123")
These events may be processed in any order across partitions. To understand why, look at how Kafka consumer groups work:
// Consumer group with 10 instances, 10 partitions
// Each consumer gets 1 partition
Consumer 1: Reading partition 0 at offset 1000
Consumer 2: Reading partition 1 at offset 5000
Consumer 3: Reading partition 2 at offset 2000
// ... and so on
Each consumer processes its partition independently at its own pace. Partition 1 might be at offset 5000 while partition 7 is still at offset 100. Your payment might process before inventory is even checked.
Kafka provides ordering within partitions through its ISR (In-Sync Replica) mechanism:
# Check partition details
kafka-topics --describe --topic orders --bootstrap-server localhost:9092
# Output shows:
# Partition: 0 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3
# Partition: 1 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1
The ISR ensures all in-sync replicas have the same messages in the same order. But this is per partition—there's no coordination between partitions.
For more details, see Kafka's design documentation.
Plus even if you have events on the same partition, then between publishing message and handling it there is always a latency. So data may not be immediately in the end database, and cause discrepancies. Kafka doesn’t support optimistic concurrency.
That’s why Kafka should not be treated as a database (nor as an event store). See more:
The Saga Pattern
When developers reach these ordering limits, they often turn to the Saga pattern. As I explained in Saga and Process Manager - distributed processes in practice, Saga doesn't solve the ordering problem—it accepts that you can't have distributed transactions and builds compensations around that fact.
The typical solution models the process as a series of compensating transactions:
Reserve Product A (can be undone by releasing the reservation)
Reserve Product B (can be undone by releasing the reservation)
Process Payment (can be undone by refund)
Confirm Reservations (convert to permanent inventory decrease)
But this creates new problems. You're now managing distributed state machines across multiple services. Each service maintains its own state, for instance:
InventoryService: { "product-A": "reserved", "product-B": "available" }
PaymentService: { "order-123": "pending" }
OrderService: { "order-123": "awaiting-payment" }
States can become inconsistent between services. What if payment succeeds but the inventory times out? What if a network partition happens during compensation?
Your simple inventory check has become a complex choreography of events, timeouts, and compensations. You haven't eliminated the distributed transaction problem - you've just made it explicit in your application code. Which doesn’t have to be that bad, but introducing it in a simple cases can add latency. Each step requires network calls, persistence, and coordination. You've traded database locks for distributed complexity.
Kafka's Performance Trade-offs
Even within single-partition ordering, Kafka makes trade-offs:
Without a built-in idempotence on producer, producing is faster, but can create duplicates on retry
const producer = new Kafka.Producer({
'client.id': 'inventory-service',
'metadata.broker.list': 'localhost:9092'
});
With an idempotent producer, we won’t have duplicates, but it’ll be slower
// 20% performance hit
const producer = new Kafka.Producer({
'client.id': 'inventory-service',
'metadata.broker.list': 'localhost:9092',
'enable.idempotence': true, // Prevents duplicates
'max.in.flight.requests.per.connection': 5 // Must be ≤ 5
});
And it will be even slower with Kafka “transactions”:
const producer = new Kafka.Producer({
'client.id': 'inventory-service',
'metadata.broker.list': 'localhost:9092',
'transactional.id': 'inventory-tx-1' // Enables transactions
});
await producer.initTransactions();
await producer.beginTransaction();
await producer.produce({ topic: 'orders', messages: [...] });
await producer.commitTransaction();
Why such performance penalties? With transactions, Kafka must:
Coordinate with a transaction coordinator
Write markers to the log
Ensure all partitions involved commit atomically
Maintain transaction state across failures
And remember—this ordering only works within partitions. For global ordering, you'd need a single partition, which means single-threaded, sequential processing, one message after another. Your distributed system just became a very expensive single-threaded application. At that point, you might as well use PostgreSQL.
The truth we try to hide from ourselves
After exploring these three systems, a pattern emerges. Each makes different trade-offs, but they're all making trade-offs:
PostgreSQL chose correctness. You pay with lock contention and reduced throughput.
MongoDB choose speed. You pay with application complexity and eventual consistency.
Kafka chose scalability. You pay with architectural complexity and partial ordering.
There's no magic configuration or hidden feature that gives you everything. The trade-offs are fundamental.
Why Is Ordering So Expensive?
In a single-threaded application, ordering is free. Operations happen one after another. Period.
In a distributed system, establishing "after" requires coordination. Consider what happens at the network level:
Server A must send a message: "I did X" (minimum 1ms in the same datacenter)
Server B must acknowledge: "I heard you" (another 1ms)
Only then can Server B safely do Y "after" X
But it's worse with multiple participants:
3 servers need 3 message exchanges (A→B, B→C, C→A)
10 servers need 45 message exchanges (n*(n-1)/2)
100 servers need 4,950 message exchanges
Each message exchange is at least:
~0.5ms in the same datacenter
~50ms across data centres in the same region
~150ms across continents
For comparison:
Reading from CPU cache: 0.0000001ms
Reading from RAM: 0.0001ms
Reading from SSD: 0.01ms
Network coordination is 5,000 to 1,500,000 times slower than local operations. This is physics, not poor implementation.
The Two-Phase Commit Trap
You might think: "Use two-phase commit! Coordinate across databases!"
Two-phase commit (2PC) does guarantee consistency across systems. Here's the protocol:
Phase 1 - Voting:
Coordinator → Participant A: "PREPARE transaction-123"
Coordinator → Participant B: "PREPARE transaction-123"
Participant A → Coordinator: "READY" (locks resources)
Participant B → Coordinator: "READY" (locks resources)
Phase 2 - Completion:
Coordinator → Participant A: "COMMIT transaction-123"
Coordinator → Participant B: "COMMIT transaction-123"
Participant A → Coordinator: "DONE" (releases locks)
Participant B → Coordinator: "DONE" (releases locks)
The killer problem: those locks are held during network round-trip times. With just 2 participants:
2 round trips minimum
At 1ms per message: 4ms holding locks
At 50ms per message: 200ms holding locks
But the real problem is failure handling. If the coordinator crashes after Phase 1:
Participants have locked resources
They don't know whether to commit or abort
They must keep the locks until the coordinator recovers
This could be forever
This is called the "blocking problem" of 2PC. Your entire system stops because one component fails. Modern systems avoid 2PC for this reason. Even with perfect networks, the performance cost is prohibitive.
Google Spanner uses atomic clocks and GPS synchronisation to reduce coordination needs, and it still has uncertainty windows of 7ms. Amazon DynamoDB uses vector clocks and still requires application-level conflict resolution. If companies with unlimited resources can't eliminate these trade-offs, the trade-offs are fundamental to distributed systems.
Making Peace with Physics
So what do you do? You can't beat physics, but you can work with it.
Pattern 1: Reduce Coordination Scope
Instead of coordinating everything, coordinate only what matters. In our inventory example, don’t coordinate this:
await checkUserAuthentication();
await validateShippingAddress();
await calculateTaxes();
But focus on the critical business check:
await atomicallyDecrementInventory();
Most operations don't need global ordering. Identify the few that do and design around them.
Pattern 2: Business-Level Solutions
Sometimes the best solution isn't technical. Amazon shows "Only 3 left in stock!" not just for marketing—it's also setting expectations. If they occasionally oversell by one unit, customers understand.
Other business solutions:
Time-bound reservations: "Item held for 10 minutes"
Inventory buffers: Keep 5% hidden reserve
Graceful degradation: "High demand! Order confirmed pending inventory check"
automatic ordering of items if the stock was zeroed, and asking the user whether they want to wait or get their money back.
Pattern 3: Choose Your Guarantees
Not all operations need the same guarantees:
Financial transactions: Use a database with proper locking (e.g. PostgreSQL). Speed matters less than correctness.
Product browsing: You can use a database with eventual consistency (MongoDB, ElasticSearch). Speed and feature set may matter more than perfect accuracy.
Analytics events: Use Kafka with at-least-once delivery. Order within user sessions is enough.
Pattern 4: Embrace Idempotency
Make operations safe to retry:
// 👎 Bad: Changes meaning if applied twice
UPDATE inventory SET quantity = quantity - 1;
// 👍 Good: Same result if applied multiple times
UPDATE inventory SET quantity = 5 WHERE version = 42;
With idempotent operations, you can retry without fear, compensate for reordering, and recover from partial failures.
Conclusion: Pick Your Poison
Every distributed system makes trade-offs between ordering and performance. There's no configuration flag or architectural pattern that eliminates this trade-off—only different ways to manage it.
PostgreSQL gives you correctness but makes you pay in lock contention. MongoDB gives you speed but makes you handle consistency yourself. Kafka gives you scale but requires complex coordination patterns.
The key is understanding which trade-off you're making and designing accordingly. Don't fight your database's natural behaviour—work with it.
Next time you see eventual consistent reads in MongoDB, you'll understand why. When PostgreSQL seems slow under load, you'll know what you're paying for. When Kafka requires complex event choreography, you'll see it as the price of distribution.
The speed of light isn't just a suggestion. Network calls aren't free. Coordination has a cost. Once you accept these constraints, you can stop looking for silver bullets and start making informed trade-offs.
The question isn't "How do I get both?" The question is "What can I afford to give up?"
Cheers!
Oskar
p.s. Ukraine is still under brutal Russian invasion. A lot of Ukrainian people are hurt, without shelter and need help. You can help in various ways, for instance, directly helping refugees, spreading awareness, and putting pressure on your local government or companies. You can also support Ukraine by donating, e.g. to the Ukraine humanitarian organisation, Ambulances for Ukraine or Red Cross.