Architecture Weekly #148 - 9th October 2023

Oct 09, 2023

Sponsor: Do you build complex software systems? See how NServiceBus makes it easier to design, build, and manage software systems that use message queues to achieve loose coupling. Get started for free.

Welcome to the new week!

I’ve run numerous workshops in recent years. It’s intriguing to see different ways people solve the same problem. Some start from general vision and go into details, some the other way.

Based on those observations, and mostly on my own experience, I decided to share my process of working with design and software architecture:

How to design software architecture pragmatically

I tried to show step by step the factors I take and the tools I use.

I explained where and when I use tools like EventStorming and the C4 model, but also ye olde methods like 5 whys, mind map, whiteboard, etc. They're still valuable! I showed my approach as it is, so pragmatically, without blurring it with gaudy buzzwords.

Years ago, I was fascinated by the Lambda Architecture. I’ve read a Big Data book by Nathan Marz where he neatly explained the idea of embracing that data processing may have different dynamics. For big amounts of data, usually, it’s fine to process data on some cadence doing that in matches. For some, we may need data faster, but we can live with that being imprecise for a short time. The idea was tempting, and some companies adapted it, yet the tooling was not there yet, and the approach was hard to maintain, as batch and speed processing required totally different set of tooling. Nathan Marz also went a bit silent. But now he’s back.

Nathan Mraz - How we reduced the cost of building Twitter at Twitter-scale by 100x

Sounds like he was working on the Java development platform on a big scale, which he called Rama. He just published a big article making bold statement:

We built a Twitter-scale Mastodon instance from scratch in only 10k lines of code. This is 100x less code than the ~1M lines Twitter wrote to build and scale their original consumer product, which is very similar to Mastodon. (…)The instance has 100M bots posting 3,500 times per second at 403 average fanout to demonstrate its scale.

I’m not convinced yet, as it’s still more enigma and vision. The code is not open-sourced, but I’m curious to learn how his vision evolved. I think that’s an interesting read from the tech stack and architecture perspective and the product building.

If creating a gigantic artificial Mastodon server to prove the scale capabilities doesn’t sound interesting enough, check:

Engineer's Codex - How Instagram scaled to 14 million users with only 3 engineers

It’s a flashback from 2010 and 2011 showing how Instagram started and ran forward with its design. It’s yet another proof that choosing a boring stack is a win. They used:
- Python, Django,
- Postgres,
- load balancing with pg_bouncer and AWS ELB,
- S3 and Cloudfront as files storage,
- Redis and Memcached for cache,
- Sentry for monitoring.

Sounds easy? Simple is not easy, but it’s another example that choosing technologies you know, proper sharding techniques and focusing on business can take you far.

The same conclusion can be found in:

Cycle - Scaling GraphQL with Postgres - Lessons learned from our database timeout issues

It’s a nice article showing how to scale the GraphQL approach on Postgres. The final thought starts with:

All I said above is not rocket science. This is common sense, but often times we focus on cutting-edge solutions to solve what seem like very complex problems, while the root cause is actually much simpler.

and then the solution:

- Extensive use of dataloaders to batch and cache the query results
- Pagination of every potentially unlimited list, including all things related to view configuration
- Rework of the indexing logic, specifically for deeply linked data
- Setup of a queuing system to batch and mitigate intensive mutations
- Rate and complexity limits to avoid a single user/app using too many API ressources
- Custom subscription service to have a complete control of the data we publish

One more link related to Postgres. Oracle started to provide PostgreSQL on their cloud. Yes, they have cloud and yes, they want to host Postgres now. Going to Oracle Cloud is not something I’d recommend, but it’s an interesting place where this company came. The funny is that they provide PG 14.9, where the latest is 16, but still.

Oracle - Experience the best of PostgreSQL with OCI Database with PostgreSQL

Speaking about quirks and weird things, Slack will pause normal business operations for one week on Monday because employees have fallen behind on internal training. It’s not a half-year late April Fool’s joke; that really happened.

Yahoo - Slack will pause normal business operations for one week on Monday because employees have fallen behind on internal training

That sounds like a Monty Python joke:

A large percent of Slack’s roughly 3,000 staff have neglected to hit the target, according to sources inside the company. And since Salesforce provides Trailhead to other businesses as a way to “upskill” employees, some speculate that the slackers at Slack make for bad optics.

In a message to employees in mid-September, Slack CEO Lidiane Jones wrote that the one week shutdown, dubbed “Ranger Week,” is intended to give everyone “dedicated time to make a lot of progress towards the goal.”

I don’t even know how to comment on that, but I think that I don’t need to do it, as it comments itself.

Probably, John Cutler should add one more metaphor to his list:

John Cutler - 15 Metaphors for Waste In Product Development

Last week, I sent my article about the Strategy Pattern. It appeared that, at the same time, Jeremy D. Miller wrote his take on it.

Jeremy D. Miller - The Lowly Strategy Pattern is Still Useful

And that coincidence is for good, as those perspectives and examples should give you the full picture.

Last but not least, the US National Security Agency and Cybersecurity Infrastructure Security Agency wrote their findings on the most common security issues.

NSA and CISA - Red and Blue Teams Share Top Ten Cybersecurity Misconfigurations

Those are:

1. Default configurations of software and applications
2. Improper separation of user/administrator privilege
3. Insufficient internal network monitoring
4. Lack of network segmentation
5. Poor patch management
6. Bypass of system access controls
7. Weak or misconfigured multifactor authentication (MFA) methods
8. Insufficient access control lists (ACLs) on network shares and services
9. Poor credential hygiene
10. Unrestricted code execution

Points are, of course, not extremely surprising. Still, the report is interesting as it shows not only general information but also examples and references for other materials about how to deal with those issues.

Check also other links!
Oskar

p.s. I invite you to join the paid version of Architecture Weekly. It already contains the exclusive Discord channel for subscribers (and my GitHub sponsors), monthly webinars, etc. It is a vibrant space for knowledge sharing. Don’t wait to be a part of it!

p.s.2. Ukraine is still under brutal Russian invasion. A lot of Ukrainian people are hurt, without shelter and need help. You can help in various ways, for instance, directly helping refugees, spreading awareness, and putting pressure on your local government or companies. You can also support Ukraine by donating, e.g. to the Ukraine humanitarian organisation, Ambulances for Ukraine or Red Cross.

Architecture

DevOps

Databases

Azure

Martin Thwaites - Creating an AKS cluster with WebApplication Routing using Pulumi

Architecture Weekly

Architecture Weekly #148 - 9th October 2023

Architecture

DevOps

Databases

Azure

Java

.NET

Coding Life

Management

Product Design

Industry

Security

Discussion about this post