Dealing with Race Conditions in Event-Driven…

Oct 20, 2025

My events came out of order! What should I do? Are you familiar with the term "phantom record" and its benefits? No? Let me explain it to you today. Let's discuss how to embrace the chaos and learn to deal with it.

Read →

8 Comments

Hubert

Oct 21

Hey!

Do you store the intermediate state in the storage or in the application's memory only?

What if the app restarts in the middle of processing the payment and some information that already have been asked will disappear (since we acked the message in the queue) and the other ones will just arrive after that?

Reply (1)

Oskar Dudycz

Oct 21

Hey Hubert, it depends on the exact use case, but my preference would be to use durable storage, then I can have proper guarantees, and use e.g. optimistic concurrency to double check that somehow there wasn't another process trying to update the same document. That shouldn't happen, but in a distributed system like Kubernetes, with multiple consumer groups, it could happen due to misconfiguration.

If I were using Event Sourcing and putting external events first to the event stream in the local event store, then I could be building this in memory from all stream events. Having that, I could subscribe to such a stream and store new events in the new, internal stream when reaching some condition.

I'd consider storing it in memory also for some performance optimisations, together with durable storage.

Thank you Oskar!

Happy to help! 🫡

Thanks for sharing Oskar, very interesting read ! In my experience workflows are usually triggered by a command/event and then it's "easier" to handle out of order events related to that workflow. In the scenario you described, if payment processing workflow would be triggered by OrderCreated event (for example), then we could use orderId as a correlationId for all events related to that payment processing workflow.

It's fascinating that everything you described is how Sagas in NServiceBus (.NET) work - e.g. define a read model (SagaData), handle all the events and update the read model (SagaData) which ensures concurrent updates are successful (optimistic concurrency). On each event handler implement the logic of checking if workflow finished, publish completion event. Only difference I can think of is Saga's built on top of correlating events to Sagas and in your example that piece is missing. Would love to hear your thoughts.

https://docs.particular.net/nservicebus/sagas/#starting-a-saga-dealing-with-out-of-order-delivery

Ben Virkler

Oct 29

Very helpful article Oskar, and timely for my project. You described the scenario where you know what events you should receive, just not the order. But what if you don't know that? For example, you get an ItemRemovedFromCart event, but the item doesn't exist in your view of the current state of the cart. Is it an invalid event? Or is there an ItemAddedToCart event that hasn't come through yet? Any thoughts on how to handle this scenario?

Reply (1)

Darek Rojewski

Nov 2Edited

Sorry to jump in but it's interesting and i just wanted to add my two cents.

First thought:

I know that it's just your example but ItemAdded/ItemRemoved look like „private” events, not public one that would be meaningful for "outside world" and would contain required info to take a decission in consumer.

Second thought, imo good to have on mind:

Is it a case with 1 publisher - 1 active consumer instance case or your consumer is scaled? [1]

Third thought:

Typically I’d use event versioning (but it depends on the context) that you can use to care about the processing order

[1] if you scale, it should be taken into account. in e.g. rabbitmq, consistent hash exchange works well - all events for the same object (say - contract 1234) will be handled by the same consumer instance

i know i focused on the tech things, but these are just comments after all and it's my perspective :-)

Reply (1)

Oskar Dudycz

Nov 3

Ben, that’s a good question, I decided to do a longer follow up in the latest edition: https://www.architecture-weekly.com/p/handling-events-coming-in-an-unknown

It overlaps with piece of advice that Darek gave in terms of internal and external events (Darek, nice suggestions btw!).

I also showed how using revision can help to detect missing events if they’re coming from the same source.

Ben, feel free to follow up if I should expand more 🙂

Architecture Weekly

Dealing with Race Conditions in Event-Driven…