We take a Kafka client, call the producer, send the message, and boom, expect it to be delivered on the other end. And that's actually how it goes. But wouldn't it be nice to understand better what happens behind the scenes? How is this data actually stored on disk? Where? When? That's what I did today, making a dummy Kafka Producer and taking you on the journey as the message goes through Broker, partition, and disk. Bon Appetit!
That's an amazing post, Oskar. I'm impressed by the depth of the post. This is a CodeCrafter's (https://app.codecrafters.io/catalog) worth material - have you considered creating a course from that type of content? It might reach an even bigger audience that wants a deeper understanding of event streaming
Thank you, Thiago; I have been thinking for a long time about the online course about Event Sourcing but I haven't decided to do it yet. Maybe indeed, I could change those blog article series into a course or e-book. I'll think about it :)
I like this post very much! The most interesting part for me was one where you showed how high level concept of WAL is implemented while being aware of the physical constraints of fsync concurrency. With workloads where I expect high load it’s important to me to ensure I configured crucial parts of app settings correctly (like batch or replica sync configuration) and this post was one of the more concise guides on basic producer/broker configuration I’ve read in the last months.
Having this kind of cross-layer understanding helps building significantly more robust software in fewer iterations than usual.
Thank you! Such a comment is the one I hoped to get 😅 My goal was to precisely show that the magic is that there's no magic, and the physics law still works for Kafka and other tools we use. I personally like to learn new things this way, so zoom in, zoom out, to understand micro and macro scale.
As you said, understanding how fsync can be seen as technical detail, but it's not as it helps to understand where are limitations and how much we can bend the tool to our will. Plus understand the original use case behind it. 🙂
Great explanation Oskar! I would really like to read the ending - consumer view :)
Thank you, happy that it was helpful!
I’ll try to cover that on Monday 🙂 Are there any specific parts that I should expand more?
Also curious which part of this article clicked with you the most 🙂
Mosty about file segments and broker perspective.
Thank you for expanding. 🙂
Such feedback is helpful to fine-tune the scope and style of further releases.
That's an amazing post, Oskar. I'm impressed by the depth of the post. This is a CodeCrafter's (https://app.codecrafters.io/catalog) worth material - have you considered creating a course from that type of content? It might reach an even bigger audience that wants a deeper understanding of event streaming
Thank you, Thiago; I have been thinking for a long time about the online course about Event Sourcing but I haven't decided to do it yet. Maybe indeed, I could change those blog article series into a course or e-book. I'll think about it :)
I like this post very much! The most interesting part for me was one where you showed how high level concept of WAL is implemented while being aware of the physical constraints of fsync concurrency. With workloads where I expect high load it’s important to me to ensure I configured crucial parts of app settings correctly (like batch or replica sync configuration) and this post was one of the more concise guides on basic producer/broker configuration I’ve read in the last months.
Having this kind of cross-layer understanding helps building significantly more robust software in fewer iterations than usual.
Thank you! Such a comment is the one I hoped to get 😅 My goal was to precisely show that the magic is that there's no magic, and the physics law still works for Kafka and other tools we use. I personally like to learn new things this way, so zoom in, zoom out, to understand micro and macro scale.
As you said, understanding how fsync can be seen as technical detail, but it's not as it helps to understand where are limitations and how much we can bend the tool to our will. Plus understand the original use case behind it. 🙂