Architecture Weekly #170 - 11th March 2024
Welcome to the new week!
Sometimes, you feel you learned more about the problem than you wanted to. I felt that when I fixed the ECMAScript module's compatibility in Emmett. The struggle was not because of the complexity of the fix but to get the first reproducible failure. As you know, that's the first and, too often, the hardest step.
As always, not to forget, I wrote all my notes on how I fixed it. Even if you're not in the JS/TS land, I hope the explained approach can give you a general mental framework for tackling compatibility issues in other environments. Read more:
Read also an excellent write-up on how to tackle bug fixes by Mathias Verraes systematically:
Google released a new whitepaper, this time on how they tackle the Secure by Design approach.
Google's strategy shifts the focus of software security from individual developers to the broader development ecosystem. They explain that the potential for vulnerabilities is significantly reduced by embedding security directly into the development tools and languages, such as enforcing memory safety using languages like Rust. The concept of 'safe coding' effectively mandates that certain security practices are inherently followed due to the architectural and design choices of the development tools themselves.
Additionally, Google introduces 'well-lit paths'—predefined routes through the development process that utilize vetted libraries and frameworks, ensuring developers are naturally guided towards more secure coding practices without requiring extensive security knowledge. This method leverages the ecosystem to minimize common security risks by design rather than relying on post-development security patches or interventions. This approach represents a shift towards a more systemic security integration within the software development lifecycle, aiming to reduce vulnerabilities through the environment developers work within rather than through individual actions alone.
Interestingly USA White House presented their recommendation on this topic:
The White House document and Google's approach advocate for memory-safe programming to tackle software vulnerabilities, highlighting a shared belief in preemptive security measures. However, the White House document calls for adopting specific cybersecurity metrics, such as vulnerability frequency and severity, diverging from Google's broader focus on secure development practices. It explicitly mentions employing formal methods like sound static analysis and model checking to verify code security before deployment, providing a concrete strategy for security integration not specifically outlined by Google.
Additionally, the White House introduced the idea of enhancing security through memory-safe hardware solutions, such as memory-tagging extensions. This suggests a comprehensive approach to cybersecurity, incorporating both software and hardware solutions. A key difference lies in the emphasis on quantifiable security improvements, with the White House advocating for measurable security outcomes. This approach aims to establish a more accountable framework for cybersecurity, broadening the scope beyond software to include policy and hardware considerations, unlike Google's primary focus on development environments and practices.
Documents are well thought out. Yet, we should remember that we’re not Google. I saw the approach to the recommended and core libraries skewed to be a pastiche of security. There’s a big spectrum of “YOLO, use whatever you like” and “you can only use this library.” I agree with the general sentiment, yet we should align that process to our capabilities. Still, security is not something you’d like to make rotten compromises on.
I think an important part is ensuring ownership and accountability in the teams. So, recommend practices and build an environment that promotes good practices but allows teams to diverge as long as they can own their custom solution and prove that it’ll be sustainable.
Security by default is essential today when companies sell and push our data without control and thinking twice. Read more in the latest example, on what Tumblr and WordPress did:
Speaking about the costs. Cast.ai published their report on the Kubernetes Costs. Not surprisingly, it seems that we’re overprovisioning our clusters.
They wrote:
In clusters with 50 CPUs or more, only 13% of the CPUs that were provisioned were utilized, on average. Memory utilization was slightly higher at 20%, on average.
It’s intriguing, as cloud and container technologies were meant to improve cost utilisation, but we’re still falling into the same trap. The conclusion is also saddening:
The trend appears unlikely to change in the near future given the widening gap between provisioned and requested CPUs between 2022 and 2023 (37% versus 43%). As more companies adopt Kubernetes, cloud waste will likely continue to grow.
Of course, remember that it’s the report prepared by the tool that is built for detecting such utilisation, so it’s in their interest to prove that. Plus, they analysed those clusters to which they had access (still, a few thousand clusters). So, as always, think for yourself.
Speaking about CPUs, memory, and utilisation. Check a great case study (with a lot of technical details) on how Allegro (the biggest Polish e-commerce platform) troubleshoots Kafka latency with eBPF:
Getting back to Google. Now, on the less positive side. Some time ago, I wrote an article about my thoughts on the diversity issues in IT (read more in Women in IT). Now we have the next unfortunate example:
SkyNews covers:
Google has agreed to pay $118m (£96m) to approximately 15,500 employees to settle a lawsuit over gender discrimination in pay.
The claim against the tech giant was first brought in 2017 by former employees, all of whom worked for Google in California, and who alleged that they were being paid less than their male counterparts.
It could be treated as both a negative and positive sign. The negative is obvious, but the positive is that something is slowly changing in our industry. Yet, read the comment from Google's spokesperson:
"While we strongly believe in the equity of our policies and practices, after nearly five years of litigation, both sides agreed that resolution of the matter, without any admission or findings, was in the best interest of everyone, and we're very pleased to reach this agreement.
"We are absolutely committed to paying, hiring and levelling all employees fairly and equally and for the past nine years we have run a rigorous pay equity analysis to make sure salaries, bonuses and equity awards are fair,"
So yeah, again: Sorry, No Sorry… It’s a long way still in front of us.
Ian Cartwright, Rob Horn, and James Lewis presented a new legacy modernisation technique they called Event Interception:
Unlike the broader Strangler Fig Pattern, which focuses on incrementally replacing or building around old systems, Event Interception focuses on the flow of events between components. Event Interception focuses on intercepting and possibly rerouting events to new functionalities. This technique is instrumental in scenarios where making direct changes to the legacy system is impractical, offering a path to introduce new components by using existing integration points such as messaging systems or API gateways. However, its success relies on the availability and accessibility of these integration points, and it introduces an extra layer of complexity to the system architecture, which could complicate maintenance and debugging.
The value of Event Interception lies in its ability to facilitate the iterative addition of new features, aligning with agile practices by minimizing the risks associated with large-scale system overhauls. For architects, this means a strategic tool for gradually transitioning to more modern, service-oriented architectures while maintaining system integrity. Yet, this approach requires a careful evaluation of the legacy system to ensure it's a good fit, considering the ease of identifying integration points and managing the added complexity. When applied thoughtfully, Event Interception can smooth the path towards system modernization. Still, it demands detailed planning and a solid understanding of the existing system's architecture to navigate potential challenges and maintain system reliability.
There are two types of people: those who do backups and those who will be doing backups. Okay, there’s also a third group: science papers publishers.
Martin Eve analysed the archive strategy in the scientific paper publishers. And:
When Eve broke down the results by publisher, less than 1 percent of the 204 publishers had put the majority of their content into multiple archives. (The cutoff was 75 percent of their content in three or more archives.) Fewer than 10 percent had put more than half their content in at least two archives. And a full third seemed to be doing no organized archiving at all.
At the individual publication level, under 60 percent were present in at least one archive, and over a quarter didn't appear to be in any of the archives at all. (Another 14 percent were published too recently to have been archived or had incomplete records.)
The good news is that large academic publishers appear to be reasonably good about getting things into archives; most of the unarchived issues stem from smaller publishers.
Gulp…
Check also other links!
Cheers
Oskar
p.s. I invite you to join the paid version of Architecture Weekly. It already contains the exclusive Discord channel for subscribers (and my GitHub sponsors), monthly webinars, etc. It is a vibrant space for knowledge sharing. Don’t wait to be a part of it!
p.s.2. Ukraine is still under brutal Russian invasion. A lot of Ukrainian people are hurt, without shelter and need help. You can help in various ways, for instance, directly helping refugees, spreading awareness, and putting pressure on your local government or companies. You can also support Ukraine by donating, e.g. to the Ukraine humanitarian organisation, Ambulances for Ukraine or Red Cross.
Architecture
DevOps
Allegro Tech Blog - Unlocking Kafka's Potential: Tackling Tail Latency with eBPF
The Register - Companies flush money down the drain with overfed Kubernetes cloud clusters
Databases
AI
Elixir
Java
.NET
Andrew Lock - An introduction to the heap data structure and .NET's priority queue
Antão Almada - Measuring .NET Performance: Unleashing the Power of BenchmarkDotNet
Node.js
TypeScript
Coding Life
Mathias Verraes - How to Fix a Bug: Tests, Hypotheses, Timeboxes
Hillel Wayne - How to argue for something without any scientific evidence
Management
Industry
SkyNews - Google agrees $118m payout to female staff who were paid less than male colleagues
The Record - After decades of memory-related software bugs, White House calls on industry to act
404 Media - Tumblr and WordPress to Sell Users’ Data to Train AI Tools
ArsTechnica - Study finds that we could lose science if publishers go bankrupt