Ricardo Castro @mccricardo - Bluesky Profile

You Should Write An Agent They're like riding a bike: easy, and you don't get it until you try.

"You Should Write An Agent" by Thomas Ptacek

fly.io/blog/everyon...

10.11.2025 17:01 — 👍 0 🔁 0 💬 0 📌 0

Faster root cause for slow traces with ClickStack Event Deltas Read how ClickStack's improved Event Deltas make it effortless to pinpoint the root causes of performance outliers in observability data - turning complex trace analysis into instant, actionable…

"Faster root cause for slow traces with ClickStack Event Deltas" by Dale McDiarmid

clickhouse.com/blog/%20fast...

10.11.2025 13:01 — 👍 0 🔁 0 💬 0 📌 0

Revision 149 Articles and updates:

Revision 149 is out!

@koslib.com

#devops #sre #platformengineering

embracerisk.substack.com/p/revision-149

10.11.2025 10:23 — 👍 2 🔁 0 💬 0 📌 0

How Databricks Implemented Intelligent Kubernetes Load Balancing The Databricks Engineering Team needed something smarter: a Layer 7, request-level load balancer that could react dynamically to real service conditions instead of relying on connection-level routing…

"How Databricks Implemented Intelligent Kubernetes Load Balancing" by ByteByteGo

blog.bytebytego.com/p/how-databr...

09.11.2025 18:01 — 👍 0 🔁 0 💬 0 📌 0

Announcing Istio 1.28.0 Istio 1.28 Release Announcement.

"Announcing Istio 1.28.0"

istio.io/latest/news/...

08.11.2025 18:01 — 👍 0 🔁 0 💬 0 📌 0

Crossplane’s Graduation Announcement Graduation marks Crossplane’s readiness for widespread use and its evolution from a control plane framework to groundwork for intelligent, secure, and scalable cloud operations and platform…

"Cloud Native Computing Foundation Announces Graduation of Crossplane"

www.cncf.io/announcement...

07.11.2025 17:01 — 👍 0 🔁 0 💬 0 📌 0

TicketOps is perfectly fine for relatively stable stuff.

At scale, it breaks.

07.11.2025 14:51 — 👍 0 🔁 0 💬 0 📌 0

SQL expressions in Grafana: Combine and manipulate data from multiple sources | Grafana Labs SQL expressions are a versatile and powerful feature that opens up all sorts of creative possibilities by manipulating and combining data from different data sources.

"SQL expressions in Grafana: Combine and manipulate data from multiple sources" by Sam Jewell and Kyle Brandt

grafana.com/blog/2025/10...

07.11.2025 13:01 — 👍 0 🔁 0 💬 0 📌 0

In the dawn of a new wave of AI, if you're still thinking about infrastructure as code and not infrastructure as software, you're living in the past.

07.11.2025 12:56 — 👍 0 🔁 0 💬 0 📌 0

SRE is much more than just incident response.

I thought this needed to be highlighted since many are talking about "AI SRE", which mostly focuses on incident response.

06.11.2025 18:03 — 👍 0 🔁 0 💬 0 📌 0

OTel Updates: Consistent Probability Sampling Fixes Fragmented Traces | Last9 One sampling decision, propagated everywhere. OpenTelemetry's Consistent Probability Sampling fixes fragmented traces across services.

"OTel Updates: Consistent Probability Sampling Fixes Fragmented Traces" by Anjali Udasi

last9.io/blog/consist...

06.11.2025 13:01 — 👍 1 🔁 0 💬 0 📌 0

Consistency is underrated.

Many people believe in a "big bang" event that propels their career. And while there are certain cases where that's true, consistency is usually a better investment of your time.

Invest in being consistent and you'll reap rewards.

05.11.2025 18:02 — 👍 0 🔁 0 💬 0 📌 0

Introducing Agent HQ: Any agent, any way you work At Universe 2025, GitHub's next evolution introduces a single, unified workflow for developers to be able to orchestrate any agent, any time, anywhere.

"Introducing Agent HQ: Any agent, any way you work" by Kyle Daigle

github.blog/news-insight...

05.11.2025 17:01 — 👍 0 🔁 0 💬 0 📌 0

How to Use AWS CloudWatch Application Signals with OpenTelemetry on ECS Fargate and Lambda This guide shows how to connect CloudWatch Application Signals with OpenTelemetry. See simple steps for ECS Fargate and Lambda. Example code included. Get clear metrics and traces fast.

"Effortless Observability - Integrating CloudWatch Application Signals with OpenTelemetry" by Tobias Schmidt

awsfundamentals.com/blog/cloudwa...

05.11.2025 13:01 — 👍 0 🔁 0 💬 0 📌 0

Go and enhance your calm- demolishing an HTTP:2 interop problem HTTP/2 implementations often respond to suspected attacks by closing the connection with an ENHANCE_YOUR_CALM error code. Learn how a common pattern of using Go's HTTP/2 client can lead to unintended…

"Go and enhance your calm: demolishing an HTTP/2 interop problem" by Lucas Pardue and Zak Cutner

blog.cloudflare.com/go-and-enhan...

04.11.2025 17:04 — 👍 0 🔁 0 💬 0 📌 0

From Signals to Reliability: SLOs, Runbooks and Post-Mortems Build reliability with SLOs, runbooks and post-mortems. Turn observability into systematic incident response and learning. Practical examples for Kubernetes environments.

"From Signals to Reliability: SLOs, Runbooks and Post-Mortems" by Fatih Koç

fatihkoc.net/posts/sre-ob...

04.11.2025 13:02 — 👍 0 🔁 0 💬 0 📌 0

Reliability, like any other feature, needs to be prioritised accordingly.

There will be times where reliability work will be the priority. Other times, product features will be the priority.
And so on.

If one topic massively overshadows all the others, problems will arise.

03.11.2025 18:03 — 👍 0 🔁 0 💬 0 📌 0

Quick thoughts on the recent AWS outage AWS recently posted a public write-up of the us-east-1 incident that hit them this past Monday. Here are a couple of quick thoughts on it. Reliability → Automation → Complexity → New failure modes …

"Quick thoughts on the recent AWS outage" by Lorin Hochstein

surfingcomplexity.blog/2025/10/25/q...

03.11.2025 17:02 — 👍 0 🔁 0 💬 0 📌 0

I've seen a few of those and I've built a few as well 😜

Once, my Tech Lead at the time, architected an 8 microservice system for something not complex that our company wasn't even sure we were going to pursue, and that, at most, would have a couple of hundred users.

03.11.2025 14:52 — 👍 1 🔁 0 💬 0 📌 0

For platforms to be valuable they need to be force multipliers.

That means being more than the sum of its parts.

03.11.2025 13:02 — 👍 0 🔁 0 💬 0 📌 0

You always need to take roles and titles with a grain of salt.

I often meet DevOps/SREs/PlatEng all doing very similar jobs.

I also often meet groups of DevOps doing quite different jobs. The same applies for SREs and PlatEngs.

Context is crucial.

31.10.2025 18:03 — 👍 0 🔁 0 💬 0 📌 0

Some people look down on or think of quality assurance and security as annoyances.

In the age of AI, if they continue to have that perspective, they'll have a rude awakening.

31.10.2025 13:04 — 👍 0 🔁 0 💬 0 📌 0

Important: hire adults.

Also important: treat them like adults.

31.10.2025 12:50 — 👍 0 🔁 0 💬 0 📌 0

Strive for civil discourse on your teams.

Some of the most creative solutions I've seen were born from discussions between people with completely different views on how to approach a problem.

Promoting diversity lays a good foundation for this to happen organically.

31.10.2025 09:46 — 👍 0 🔁 0 💬 0 📌 0

People that say "that's a DevOps team problem" have absolutely no clue what DevOps is about.

30.10.2025 18:02 — 👍 1 🔁 0 💬 0 📌 0

For complex issues, I like runbooks because they allow me to really understand the problem before trying to automate it.

In the long-run, for most issues, I strive for automation. But starting with runbooks allows me to understand the quirks before automation.

30.10.2025 13:05 — 👍 0 🔁 0 💬 0 📌 0

More often than not, when people reach out to me at events to ask "should I use Kubernetes", the answer is "no".

That's because, usually, people approach it from the tech side, not a problem they need fixing.

Focus on the problem and only apply tech that helps you address it.

29.10.2025 18:06 — 👍 0 🔁 0 💬 0 📌 0

My face when I hear people say Platform Engineering replaces DevOps.

Let's be clear, Platform Engineering *enables* DevOps.

If it doesn't, something's wrong.

29.10.2025 14:25 — 👍 0 🔁 0 💬 0 📌 0

Whether you like it or not, reliability and security aren't non-functional requirements.

They're features!

Imagine storing your money in a non-secure bank.

And, as features, they need to be prioritized accordingly.

28.10.2025 18:05 — 👍 0 🔁 0 💬 0 📌 0

The best on-call is when you don't get called.

For that to happen, you need to put some serious effort into it.

28.10.2025 16:53 — 👍 0 🔁 0 💬 0 📌 0

Ricardo Castro

Latest posts by mccricardo.bsky.social on Bluesky

@mccricardo is following 20 prominent accounts