Architecture Weekly #186 - 1st July 2024
Welcome to the new week!
Regular expressions are one of the classic examples of hate and hate relationships. Yes, it’s not a typo; hate and hate. Do you know anyone who loves or knows how to write moderately complex regex? And can they keep their skill for longer than two weeks without forgetting how to do it? Maybe this whole wave of Large Language Models is about having someone who will show us how to write regexes. Maybe we hate regular expressions so much that we don’t care about hallucinations.
Still, undeniably, regular expressions are useful and powerful. Let me show you an example, but be careful; I warned you already!
In my recent article, I showed an example of using Regular Expressions to filter Event Store catch-up subscriptions by event types. Thanks to that, you can reduce network traffic by getting notifications about new events.
Speaking about EventStoreDB, on their blog, there’s also an interesting case study written by their customer on how they joined the Event Sourcing capabilities with Machine Learning. Intriguing, detailed write-up showing how they translated business use case into this mixture:
And as we’re into Event-Driven solutions, check a nice list of common misconceptions around the guarantees you may expect from them:
Sometimes, I get the feeling that I could rename this newsletter to Supply Chain Attack Weekly. Those types of attacks are getting so popular and spectacular. We talked about the tooling infiltration in the Solar Winds case and OSS maintainer injection with the xz library; today, I have the next version.
The rapid pace of new improvements in JavaScript tooling required the development of custom polyfills to align implementations where environments (e.g., browser type) can’t keep up with standard enhancements. One of the most popular was Polyfill JS. It was distributed in multiple ways; one of the most popular was CDN, which is their custom CDN: cdn.polyfill.io.
Yet, in February, an unexpected thing happened: a Chinese company bought the project together with the CDN domain. And now, bang! it appeared the domain was injecting the malware. It was enough to include the link to the script from the CDN.
The issue was found and explained by Sansec, a company that is specialising in ECommerce security and quickly after by BleepingComputer
It seems that even more CDNs were used as the attack's vector (even Cloudflare was probably used). Both Sansec and Bleeping Computer sites were targeted by DDoS attacks either to slow down the spreading of the news about the issue or as revenge.
Both the most popular CDNs, Cloudflare and Fastly, are now doing the automatic redirection of the malware redirections:
All of that points to the Chinese hacker group. Funnily enough, the owner claims that no malware was distributed…
Ok, but as it’s the Architecture Weekly, what should we, the people responsible for architects, get out of that? Security should definitely be one of our main concerns and part of the design and implementation process. Obviously we should ensure that our dependencies are continuously updated, we should also invest in being able to quickly make deployment to guard ourselves if something like that happens. We should also use trustworthy CDNs. Of course, trust is not easy to detect.
It’s also interesting how our decentralised web is decentralised in theory. Having central trustworthy vendors makes it harder to commit a breach, as they’re investing heavily in it, but… if the breach is committed, it’ll likely spread much, much further and faster.
Read also about another dangerous breach made by Russian hackers:
Cyber wars are definitely real nowadays and mixed precisely with geopolitics.
Jumping to other industry news. Uber wrote that they're moving their Big Data and Machine Learning to Google Cloud.
The article itself is not that interesting. It's mostly marketing news, but what’s the most interesting there is that they’re still using HDFS, Hadoop, and Spark, which got out of fashion recently. They’re planning to move to Google Cloud services like Google Big Query. It’s also interesting if taking such a load and popular platform will help Google increase its ML/AI adoption, which was losing publicity with GenAI advances and marketing.
DataDog released their annual report on Cloud Costs:
Here are the most important points from it:
Spending on GPU instances now makes up 14 percent of compute costs.
Arm spending as a proportion of compute costs has doubled in the past year.
Container costs comprise one third of EC2 spend.
More than 80 percent of container spend is wasted on idle resources.
Previous-generation technologies are still widely used.
Cross-AZ traffic makes up half of data transfer costs.
A decreasing percentage of organizations use commitment-based discounts.
More than four times as many organizations use Savings Plans vs. Reserved Instances.
The amount of wasted time on containers is crazy. Also, cross-AZ costs mean that we’re getting better at designing with redundancy, and cloud providers know how to charge us. I’m wondering what would be the real GPU usage if it wasn’t so hard to get them.
Check also interesting write-ups on documenting our architectures:
And check the free ebook from ScyllaDB on Database performance:
I haven’t read it yet fully, but from what I skimmed so far, it’s a decent reading.
Check also other links!
Cheers
Oskar
p.s. I invite you to join the paid version of Architecture Weekly. It already contains the exclusive Discord channel for subscribers (and my GitHub sponsors), monthly webinars, etc. It is a vibrant space for knowledge sharing. Don’t wait to be a part of it!
p.s.2. Ukraine is still under brutal Russian invasion. A lot of Ukrainian people are hurt, without shelter and need help. You can help in various ways, for instance, directly helping refugees, spreading awareness, and putting pressure on your local government or companies. You can also support Ukraine by donating, e.g. to the Ukraine humanitarian organisation, Ambulances for Ukraine or Red Cross.
Architecture
Indu Alagarsamy - Document your product and software architecture decisions.
Loïc Carr - Falsehoods Software Developers Believe About Event-Driven Systems
Uber - Modernizing Uber’s Batch Data Infrastructure with Google Cloud Platform
📺 Michael Staib - Why you should consider using persisted queries with GraphQL
Databases
Oskar Dudycz - Filtering EventStoreDB subscriptions by event types
F. Cardeneti Mendes, P. Sarna, P. Emelyanov, C. Dunlop - Database Performance at Scale
DevOps
AI
Kaan Can Fidan - How Event Sourcing Can Power Machine Learning
Firefox - Choose how you want to navigate the web with Firefox
AWS
.NET
linux-dev-certs - global tool that creates and installs a developer certificate on Linux
Microsoft - Announcement: Swashbuckle.AspNetCore is being removed in .NET 9
David Fowler - "Eventing framework" postponed and won't be a part of .NET 9
Coding Life
Industry
PC World - Microsoft blocks Windows 11 workaround that enabled local accounts
ArsTechnica - Internet Archive forced to remove 500,000 books after publishers’ court win
Security
Bleeping Computer - Polyfill.io, BootCDN, Bootcss, Staticfile attack traced to 1 operator
The Register - Polyfill.io owner punches back at 'malicious defamation' amid domain shutdown
Guardian - NHS patients affected by cyber-attack may face six-month wait for blood test