This week, we'll discuss the deduplication strategies. We'll see whether they're useful and consider scenarios where you may need them. We'll also do a reality check with the promises of exactly-once delivery made by messaging vendors. TLDR: they're broken.
Essentially it’s the same logic, the difference is the lifespan and scope for both options. In Azure Service Bus It’s explicitly tied to session. The idempotence guarantee is not guaranteed between sessions (or partitions, as ASB supports either session scope or partition scope).
Great read, thank you (bit pitty GCP is not included).
BTW, KafkaStreams support full EOS semantics, which means dups should not be a concern if your processing logic is Kafka/KS (KafkaStreams) only. It fits into the most of EDA processing patterns, for the cost of committing into stateful and far-from-lightweight (especially OOTB) nature of KS - albeit it's tunable especially in releases after 2.8.
Anyway, still ALO semantics prevail, so (as pointed out in this and other articles), the key is to make business logic idempotent. In this spite, I'm not quite sure what does the concept of (standalone, stateful) Broker can add on the top of cloud-native architectures built on the top of modern services buses? I.e. broker can de-duplicate right, but then might be facing the same issue when delivering from Broker to downstream services, which again requires idemptonecy on the consume end?
> BTW, KafkaStreams support full EOS semantics, which means dups should not be a concern if your processing logic is Kafka/KS (KafkaStreams) only.
Yup, that's the benefit if you fully control the processing. The other example is event store implementation on top of a relational database. If you add a table with monotonic checkpoints, you can check if the checkpoint position hasn't already been processed (so the checkpoint in the database isn't bigger than the event position). If it is, then you can skip committing the transaction and not make duplicated changes.
Still, that works only if storage supports such usage and changes are wrapped in the same storage. That's why Kafka internally uses tiered storage: RocksDB to be able to handle the additional needed capabilities.
> Anyway, still ALO semantics prevail, so (as pointed out in this and other articles), the key is to make business logic idempotent. In this spite, I'm not quite sure what does the concept of (standalone, stateful) Broker can add on the top of cloud-native architectures built on the top of modern services buses?
Yup, that's why I'm claiming that Exactly-Once Delivery is a myth. It's always concerned in terms of internal bus logic. But not in terms of application processing. As you said, you always need to take care of it on your own.
In the Azure ServiceBus example I don't know what this block is supposed to be doing:
> if (this.sessionCache.hasMessage(sessionId, messageId)) {
> context.ack(deduplicationCache.get<T>(sessionId, messageId));
> return;
> }
Is deduplicationCache different than the sessionCache? Or was it a copy-paste error?
Essentially it’s the same logic, the difference is the lifespan and scope for both options. In Azure Service Bus It’s explicitly tied to session. The idempotence guarantee is not guaranteed between sessions (or partitions, as ASB supports either session scope or partition scope).
But what about sessionCache and deduplicationCache being used at the same time? I don't get this part.
Ah ok, you're right, my bad! It's a copy-paste issue. Sorry for that; I just fixed it!
Great read, thank you (bit pitty GCP is not included).
BTW, KafkaStreams support full EOS semantics, which means dups should not be a concern if your processing logic is Kafka/KS (KafkaStreams) only. It fits into the most of EDA processing patterns, for the cost of committing into stateful and far-from-lightweight (especially OOTB) nature of KS - albeit it's tunable especially in releases after 2.8.
Anyway, still ALO semantics prevail, so (as pointed out in this and other articles), the key is to make business logic idempotent. In this spite, I'm not quite sure what does the concept of (standalone, stateful) Broker can add on the top of cloud-native architectures built on the top of modern services buses? I.e. broker can de-duplicate right, but then might be facing the same issue when delivering from Broker to downstream services, which again requires idemptonecy on the consume end?
> BTW, KafkaStreams support full EOS semantics, which means dups should not be a concern if your processing logic is Kafka/KS (KafkaStreams) only.
Yup, that's the benefit if you fully control the processing. The other example is event store implementation on top of a relational database. If you add a table with monotonic checkpoints, you can check if the checkpoint position hasn't already been processed (so the checkpoint in the database isn't bigger than the event position). If it is, then you can skip committing the transaction and not make duplicated changes.
Still, that works only if storage supports such usage and changes are wrapped in the same storage. That's why Kafka internally uses tiered storage: RocksDB to be able to handle the additional needed capabilities.
> Anyway, still ALO semantics prevail, so (as pointed out in this and other articles), the key is to make business logic idempotent. In this spite, I'm not quite sure what does the concept of (standalone, stateful) Broker can add on the top of cloud-native architectures built on the top of modern services buses?
Yup, that's why I'm claiming that Exactly-Once Delivery is a myth. It's always concerned in terms of internal bus logic. But not in terms of application processing. As you said, you always need to take care of it on your own.