"You Should Write An Agent" by Thomas Ptacek
fly.io/blog/everyon...
@mccricardo.bsky.social
Senior Principal Engineer, tech speaker & writer, @DevOpsPorto and @DevOpsDaysPT, @CDeliveryFdn Ambassador, martial arts amateur, and metal lover. Opinions are my own. mccricardo.com
"You Should Write An Agent" by Thomas Ptacek
fly.io/blog/everyon...
"Faster root cause for slow traces with ClickStack Event Deltas" by Dale McDiarmid
clickhouse.com/blog/%20fast...
Revision 149 is out!
@koslib.com
#devops #sre #platformengineering
embracerisk.substack.com/p/revision-149
"How Databricks Implemented Intelligent Kubernetes Load Balancing" by ByteByteGo
blog.bytebytego.com/p/how-databr...
"Announcing Istio 1.28.0"
istio.io/latest/news/...
"Cloud Native Computing Foundation Announces Graduation of Crossplane"
www.cncf.io/announcement...
TicketOps is perfectly fine for relatively stable stuff.
At scale, it breaks.
"SQL expressions in Grafana: Combine and manipulate data from multiple sources" by Sam Jewell and Kyle Brandt
grafana.com/blog/2025/10...
In the dawn of a new wave of AI, if you're still thinking about infrastructure as code and not infrastructure as software, you're living in the past.
07.11.2025 12:56 β π 0 π 0 π¬ 0 π 0SRE is much more than just incident response.
I thought this needed to be highlighted since many are talking about "AI SRE", which mostly focuses on incident response.
"OTel Updates: Consistent Probability Sampling Fixes Fragmented Traces" by Anjali Udasi
last9.io/blog/consist...
Consistency is underrated.
Many people believe in a "big bang" event that propels their career. And while there are certain cases where that's true, consistency is usually a better investment of your time.
Invest in being consistent and you'll reap rewards.
"Introducing Agent HQ: Any agent, any way you work" by Kyle Daigle
github.blog/news-insight...
"Effortless Observability - Integrating CloudWatch Application Signals with OpenTelemetry" by Tobias Schmidt
awsfundamentals.com/blog/cloudwa...
"Go and enhance your calm: demolishing an HTTP/2 interop problem" by Lucas Pardue and Zak Cutner
blog.cloudflare.com/go-and-enhan...
"From Signals to Reliability: SLOs, Runbooks and Post-Mortems" by Fatih KoΓ§
fatihkoc.net/posts/sre-ob...
Reliability, like any other feature, needs to be prioritised accordingly.
There will be times where reliability work will be the priority. Other times, product features will be the priority.
And so on.
If one topic massively overshadows all the others, problems will arise.
"Quick thoughts on the recent AWS outage" by Lorin Hochstein
surfingcomplexity.blog/2025/10/25/q...
I've seen a few of those and I've built a few as well π
Once, my Tech Lead at the time, architected an 8 microservice system for something not complex that our company wasn't even sure we were going to pursue, and that, at most, would have a couple of hundred users.
For platforms to be valuable they need to be force multipliers.
That means being more than the sum of its parts.
You always need to take roles and titles with a grain of salt.
I often meet DevOps/SREs/PlatEng all doing very similar jobs.
I also often meet groups of DevOps doing quite different jobs. The same applies for SREs and PlatEngs.
Context is crucial.
Some people look down on or think of quality assurance and security as annoyances.
In the age of AI, if they continue to have that perspective, they'll have a rude awakening.
Important: hire adults.
Also important: treat them like adults.
Strive for civil discourse on your teams.
Some of the most creative solutions I've seen were born from discussions between people with completely different views on how to approach a problem.
Promoting diversity lays a good foundation for this to happen organically.
People that say "that's a DevOps team problem" have absolutely no clue what DevOps is about.
30.10.2025 18:02 β π 1 π 0 π¬ 0 π 0For complex issues, I like runbooks because they allow me to really understand the problem before trying to automate it.
In the long-run, for most issues, I strive for automation. But starting with runbooks allows me to understand the quirks before automation.
More often than not, when people reach out to me at events to ask "should I use Kubernetes", the answer is "no".
That's because, usually, people approach it from the tech side, not a problem they need fixing.
Focus on the problem and only apply tech that helps you address it.
My face when I hear people say Platform Engineering replaces DevOps.
Let's be clear, Platform Engineering *enables* DevOps.
If it doesn't, something's wrong.
Whether you like it or not, reliability and security aren't non-functional requirements.
They're features!
Imagine storing your money in a non-secure bank.
And, as features, they need to be prioritized accordingly.
The best on-call is when you don't get called.
For that to happen, you need to put some serious effort into it.