Show me the money! Practically navigating the Cloud Costs Complexity
In my last article, I explored how Amazon S3’s new conditional writes feature can be used to implement a strongly consistent event store. I think that this feature opens up powerful architectural possibilities for distributed systems. But as much as we’d love to dive right into the code, there’s a bigger question to answer first: how much will it cost?
This article cannot be started in any other way than one of my favourite movie scenes:
Show you the money! Not you, show me the money!
We’ve all seen cloud bills get out of hand, often because the true infrastructure costs are harder to predict than they seem at first glance.
Today, we’ll grab a calculator to discuss the costs of building an event store on S3. I’m not an expert here; there are smarter and more skilled people than me. It might be that you’re one of them.
That’s why I’m counting on your feedback, not only the money. I will show you the money and me the money so you can see how I typically calculate, manage, and optimize the costs in the Cloud. That can be food for thought.
You’ll see that this is not your typical “pay for storage and requests” breakdown. We’ll examine the specific challenges of request patterns, working set size, DELETE operations, and load spikes and how to make smarter decisions based on AWS pricing models. I’ll use AWS, but similar calculations can and should be done for other cloud providers.
Using S3 Conditional Writes for Event Stores
Before we dive into the math equation, an event store is a key-value database where all business operations results are recorded as immutable events. A traditional record is represented as a sequence of events called an event stream. Each operation loads all events from the stream, builds the state in memory and appends a new fact-event.
As event stores are databases, they should give strong consistency guarantees like supporting optimistic concurrency. The new conditional writes feature was triggered if S3 can give such guarantees now, and it can! The If-None-Match header support in S3 ensures that only a single event can be appended with a specific stream (record) version.
As If-None-Match header works only during the file creation, we need the following naming schema (or similar) to guarantee that:
{streamPrefix}/{streamtType}/{streamId}/{streamVersion}.{chunkVersion}
Effectively, the new event is a new file in the S3 bucket. And S3 is not optimised for such usage, as it favours smaller amounts of bigger files. We discussed how to overcome that.
The key design is around active chunks. In this system, new events are written to a chunk (file) that remains active while being appended. A chunk can contain:
one or more events as a result of business operations,
stream metadata (id, version, etc.)
snapshot - full representation of our entity/aggregate.
Replaying all events from the beginning to rebuild an entity’s state can be costly regarding GET requests (as we’re paying for each request). To reduce this, the design incorporates snapshots, which store a full representation of the current state within a chunk. By fetching the latest snapshot, the system can avoid replaying the entire event history, minimizing the number of GET operations.
We also need to pay for PUT operation to append each event.
As each chunk will be named with an autoincremented stream version, we must pay for LIST to find the latest one.
We also discussed compacting event stream data to reduce storage costs. Chunks can be compacted periodically, meaning old events are merged and unnecessary chunks are deleted, further lowering storage and request costs. Sealed chunks can be deleted or moved to lower-cost storage tiers like S3 Intelligent-Tiering or S3 Glacier, reducing storage costs.
This approach leverages S3's conditional writes to ensure consistency and manages costs by strategically using snapshots, chunking, and storage tiers.
So ok, how much will it cost?
Basic costs calculations
When designing an event-sourced system, it’s easy to assume that keeping track of system changes is a cheap and neglectable cost nowadays. You just log some events and store them, right? But what happens when your event payloads grow larger than expected? Suddenly, those small changes to event size can greatly impact your storage costs. Let’s break down three scenarios, starting with a common size of the events, then looking at what happens if things go wrong.
4KB Events: The Lean and Efficient System
Let’s start with an efficient design. In this system, each event logs essential information about an insurance claim—claim creation, adjustments, approvals, and payments. It includes the necessary details like timestamps, customer info, and claim data. In such a case, 4KB on average should be enough to also keep snapshot and stream metadata (as snapshots should be trimmed to keep only data used in business logic; they don’t need complete information).
Here’s what the system could look like:
500,000 active insurance policies.
Each policy generates 1 claim per year.
Each claim has ten events (e.g., submission, review, payment).
Total: 5 million events per year.
1000 GET Requests: $0,0004
1000 POST/PUT/DELETE/LIST Requests: $0,005
Costs for 4KB Events
Let’s ignore the free tier and other promotions (for now). The costs could look as follows.
Storage:
5 million events per year × 4KB = 20GB of data per year.
417 thousand events per year × 4KB = 1,67 GB of icrement data per month.
S3 Standard storage: $0.023 per GB per month.
Monthly increment cost: $0.038 (storage cost of new events within a month).
Total accumulated yearly storage cost: $2.99 (we only pay for what we accumulate: 16.67 GB in the first month, 33.33 GB in the second, etc).
Requests:
5 million GET requests (one per event) = $2.
5 million PUT requests (one per event) = $25
5 million LIST requests (one per event) = $25 (we need it to find the active stream chunk)
Total Year 1 Costs:
Storage: $2.99
Requests: $52
Total: $54,99
40KB Events: things start to escalate
While 4KB events are a more reasonable size for most event sourcing systems, it’s possible that due to poor design, additional (meta)data, or overly verbose data formats, events could balloon to 40KB. I wrote about this in Anti-patterns in event modelling - I'll just add one more field.
Costs for 40KB Events
Storage:
5 million events per year × 4KB = 20GB of data per year.
417 thousand events per year × 4KB = 1,67 GB of icrement data per month.
S3 Standard storage: $0.023 per GB per month.
Monthly increment cost: $0.38 (storage cost of new events within a month).
Total accumulated yearly storage cost: $29.90 (we only pay for what we accumulate: 16.67 GB in the first month, 33.33 GB in the second, etc).
Requests:
5 million GET requests (one per event) = $2.
5 million PUT requests (one per event) = $25
5 million LIST requests (one per event) = $25 (we need it to find the active stream chunk)
Total Year 1 Costs:
Storage: $29.90
Requests: $52
Total: $81,90
It’s visible that the request costs stay the same, and that’s a significant power of S3. You don’t pay the transfer cost as long as you stay inside the AWS network.
Still, it’s visible that storage costs went ten times higher, going above the request cost. Let’s check the next scenario.
400KB Events: The Worst-Case Scenario
Now, let’s assume something went wrong in the design process. Instead of storing references to large documents (like PDFs or images), someone included the entire file within each event. The event size skyrockets to 400KB—an inefficient bloat.
As with S3, math is relatively simple; we can multiply previous results by 10 and get:
Total Year 1 Costs:
Total Storage: 200GB
Storage: $299
Requests: $52
Total: $351
Now, the storage costs skyrocketed.
Optimising storage costs
The key lesson is that storage costs grow with event size, but request costs stay flat. Also, the event number adds to both the request and storage costs.
Notice that the request costs remain the same no matter how much the event size increases. AWS charges for requests based on the number of PUT, GET or LIST operations, not the size of the data being sent or retrieved. This means your request costs remain flat, but your storage costs balloon as your event size grows.
If you stick with 4KB events, storage costs are tiny. However, as we saw in the 40KB and 400KB examples, larger events can increase your storage costs by 10x or even 100x.
Here are the things we can learn from it:
Keep reading with a 7-day free trial
Subscribe to Architecture Weekly to keep reading this post and get 7 days of free access to the full post archives.