Welcome to the next week!
Did you watch or listen to the last releases with our special guests?
Maybe you thought that they were sidetracks of the observability series, so the previous editions we did?
If you did, then that’s not quite the intention I had behind them. Ok, so what was it?
I wanted to look on observability, measurements, and instrumentation from different perspectives. Observability is not only about the technical CPU metrics or memory usage; it’s about getting insights into our system's behaviour. Moreover, those insights are not worth a penny if we cannot take benefit of them.
Before I tell you the story of my project, let’s see what our guests said.
Gojko highlighted that we live in an unpredictable world and need to accept it. We cannot predict everything, but we can prepare our systems by making them observable, which will at least give us tools for investigation.
Some things, you will never be able to explain, like why people, I mean, why people are uploading MP3 files, I don't know. I've tried to get in touch with a few of them. They don't respond to my emails. Some of these things just happen.
Some of these things are weird things that are caused by third-party browser extensions or things like that. There's going to be a lot of noise. Any kind of exception tracking and error tracking, there will be a lot of noise and then The question is whether you want to dig into that and figure out something. It's just one tool. It's not the only tool. It's not the best tool for everything. It's one tool that I thought would be interesting for people to learn about.
Primarily because I think when something like that happens and when we make a discovery like that, people usually attribute it to luck or serendipity. And I think there's a process to make these kinds of lucky accidents systematic. I think lucky accidents happen to everybody, but you need to understand that a lucky accident happened to you.
And you need to understand the context of it to be able to benefit from it. Otherwise, you're missing the opportunity. And I think that's kind of my lizard optimization is my attempt to make that systematic so that people can approach it in a more systematic way.
In my opinion, that also goes pretty well with Cynefin framework .
Knowns and unknowns
It states that we have four types of decision-making contexts (or domains): clear, complicated, complex, and chaotic. They also add confusion to the mix. And that sounds like a fair categorisation of the problems we face.
Clear issues we solve on autopilot, chaotic ones we tend to ignore, and complex ones sound like a nice challenge. And complicated problems? We call them tedious.
Complex problems are called unknown unknowns. This is the place where we feel creative; we do an explorer job. We probe sense and respond. The design emerges and Agile shines. We solve it, and then we go further into the sunset scenery like a lonesome cowboy. Off we go to the next exciting problem.
Complicated issues, on the other hand, are known unknowns. They represent something that has to be done. If we have the expertise, we can sense it with our educated gut feeling, analyse it and respond with a solution. In other words, we usually know what we need to do, but we need to find an exact how to solve it. Unknowns are more tactical than strategic. (Read also more in my article Not all issues are complex, some are complicated. Here's how to deal with them).
While doing design sessions, building our user personas, and interviewing domain experts and potential users, we’re trying to discover as much as we can and make some predictions on the outcomes. By that, we’re trying to reduce our Complex and Chaotic problems into smaller, manageable, Clear or Complicated features.
Still, as Gojko nicely explained in his recent article, there’s always a potential mismatch between our and users’ expectations:
The best case is when we have alignment with our users and reach an acceptable outcome. The others are least preferable; we’re getting
a bug if, together with users, we observe the unexpected behaviour. That’s still easy, as we both agree that this is unexpected.
an exploit if users’ expectations are far from ours. This can be a security breach or abusing our system in an unpredictable way. This can either make our pricing strategy ineffective or even kill our business. The most famous is the Knight Capital One case, where one mistake in deployment caused a $440 million loss in 45 minutes.
a mismatch if what we expected is unexpected by users. That means that what we deliver is either not aligned with user needs or solving their needs incorrectly.
We need data to understand whether our expectations about complexity and user needs match reality. And here’s my story.
A clickbait feature goes rogue
We were making a cloud version of our legacy product. By legacy, I mean that it was written in an old tech stack but still used and paid for by the users. It was the system used to manage, share and exchange big project files (one file could have over a few gigabytes).
Keep reading with a 7-day free trial
Subscribe to Architecture Weekly to keep reading this post and get 7 days of free access to the full post archives.