Architecture Weekly #191 - Is It Production Ready?
Welcome to the new week!
Why is connection pooling important? How can queuing, backpressure, and single-writer patterns help? If you don't know the answer, then go go go, and check the previous two editions (here and there)!
Jokes aside, we did a learning by doing. We've built a simple connection pool, added queuing with backpressure, and implemented patterns like single-writer. All of that is to show, on a smaller scale, how those patterns can be extrapolated to broader usage.
We wrote it in TypeScript, and the code looks solid; it sounds like we could deploy it to production or maybe pack it and release it as an open-source library, right?
Hold your horses! Before we hit deploy, there's a usual question we need to ask:
Is this ready for production?
If we don't ask ourselves, we'll either ask soon or learn the answer the hard way.
This is the usual border between the proof of concept and reality. Even if:
something conceptually makes sense,
we talked through it with our colleagues,
we tested it on a smaller scale, we should validate it before applying it.
How to do it? We'll discuss that today.
What does production-readiness even mean?
Does this mean it's feature-complete, has no bugs, or doesn't have security leaks? What about working as expected? Expected from who? Users, customers, operations?
What's your definition?
Google invented the whole concept of the Production Readiness Review. The name is so posh that it clearly says it's hard to define what production means in general.
We cannot define a simple checklist that fits all cases, but we can decide if our software matches our contextual criteria.
In our case, the code was production-ready as it matched the criteria of being an illustration for a discussion about the architecture concept. Still, I tested it today, and it didn't even compile (sloppy me!). And that's not the end of the list.
Let's think about verifying if our connection pool and queue are production-ready in a more common sense.
1. Unit Testing, so how to verify that?
Testing is more than just verifying that the code works - it's about ensuring it holds up under all specified conditions. Some say that we should always write the tests first. Test means specification and verification are here if the code matches it. We didn't do that, as the code was more an illustration of the idea, but if we wanted to make it work, we'd need to run it finally. Of course, we could write an application using our connection pool, run it and manually test it. Yet, that wouldn't be repeatable and would be a good measure. We could easily skip some edge cases. We need a more systematic approach.
We could start with unit testing. What a unit means is an endless discussion. For the sake of this article, let's assume that unit tests are where we validate the smallest possible components of our system in isolation (or actually their behaviour). We need to ensure that each function does what it's supposed to, without surprises.
In our case, this could mean:
QueueBroker Initialization: The
QueueBroker
is the heart of our queuing system. When it starts, it should initialise with the correct options, have an empty queue, and be ready to accept tasks immediately. If it fails here, nothing else will work correctly. We'll test to confirm this foundational behavior.Task Enqueuing: Tasks need to be added to the queue correctly. The system should respect the
maxQueueSize
limit—if we try to enqueue beyond that, it should reject the tasks cleanly. It is crucial to ensure that the system won't overwhelmed under high load, potentially leading to a crash.Task Processing: The
processQueue
method is responsible for managing active tasks. It must handle tasks without exceeding themaxActiveTasks
limit. This prevents resource exhaustion and ensures that the system can maintain performance under load. We'll test how it deals with successful and failed tasks to ensure resilience.Graceful Shutdown: When the system needs to shut down, the
end
method should stop accepting new tasks while allowing current tasks to finish. This is critical to avoid leaving work in an inconsistent state, which could lead to data corruption. We'll ensure the shutdown process is smooth, even whenend
is called multiple times.Connection Management: We'll test how the connection pool handles opening and releasing connections. It must respect the
maxConnections
limit and efficiently trigger the next task when a connection is released. It ensures that the system uses resources efficiently without bottlenecks.
The next part of the article is for paid users. If you’re not such yet, till the end of August, you can use a free month's trial: https://www.architecture-weekly.com/b3b7d64d. You can check it out and decide if you like it and want to stay. I hope that you will!
Keep reading with a 7-day free trial
Subscribe to Architecture Weekly to keep reading this post and get 7 days of free access to the full post archives.