While the actual usefulness of LLMs is still debated, one thing is certain: engineers are being asked to do more with less.
By @sylvainkalache.bsky.social
@sylvainkalache.bsky.social
Leading the AI Labs @rootly.com - Former LinkedIn SRE and Founder of Holberton School
While the actual usefulness of LLMs is still debated, one thing is certain: engineers are being asked to do more with less.
By @sylvainkalache.bsky.social
Join us for a roundtable conversation "From Weak Signals to Confident Fixes"
Weโll tackle everything from cutting alert noise to discussing underrated and overrated alert signals, as well as bug validation workflow before triggering an incident.
Happening tomorrow at 12PM ET lu.ma/aq5mp94m
Thank you for having me ๐
17.06.2025 19:14 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0MLOps got you down? Sick of wrestling with Dockerfiles?
@sylvainkalache.bsky.social unpacks a streamlined approach to MLOps, showing you how to automate your training pipeline with a clean, reproducible, and cloud-native workflow.
๐งฒ Microsoft and Google report that AI writes 30% of their code. Is AI-assisted coding becoming an incident magnet?
Are SREs about to be overwhelmed with the volume and complexity of incidents?
Thatโs what I explore in my latest for @leaddev.com leaddev.com/software-qua...
SREs: not all traffic drops are outages. Sometimes itโs Diwali. Or the World Cup.
@rootly.com's digging into that with LLMs- check out what their Head of Devrel @sylvainkalache.bsky.social had to say about it at KubeCon.
3๏ธโฃ An older version โ Llama 3.3 70B-Versatile โ performed even better than Llama 4 Maverick.
The benchmark โ designed by the
@rootly.com AI Labs โ tests models' ability to pick the correct pull request for a given bug description. The full findings ๐ rootly.com/blog/llama-4...
2๏ธโฃ Second, we wanted to test it against models tailored for coding tasks. Unsurprisingly, it performs way under those. Llama 4 Maverick achieved only a 70% accuracy score. Alibabaโs Qwen2.5-Coder-32B is ranking the best at (90%), closely followed by GPT o3-mini (89%).
14.04.2025 16:22 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 01๏ธโฃ First, we wanted to reproduce Meta's findings that Llama 4 outperformed GPT-4o, Gemini 2.0 Flash, and DeepSeek v3.1โwe found the exact opposite.
It came last, 6% less than the next best-performing model (DeepSeek) and 18% behind the overall top-performing model (GPT-4o).
There's been a lot of controversy with the launch of Llama 4 and its performance. So we decided to do our own benchmark, and here is what we found:
14.04.2025 16:22 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0Just finished building @rootly.com MCP server: go from incident to resolution in under a minute. โฑ๏ธ
-Plug it into your IDE
-Import an incident in Cursorโs chat
-Cursors investigate the issue based on the metadata
-Cursors suggest a fix, review, and save
github.com/Rootly-AI-La...
โ๏ธ Or just meet for a coffee; DM me ๐
06.03.2025 15:24 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0๐ค Interview guests wanted: speak about your favorite AI tool.
scheduler.default.com/7992/member/...
๐ญ Join our Code to Clarity event: The Future of Monitoring, Observability, and Reliability with our friends at Checkly & @coralogix.bsky.social
lu.ma/fhl522f4
Are you going to #SREcon Americas? Iโll be there with @rootly.com, letโs meet! (4 ways)
๐น๏ธ Join our SRECon Arcade Happy Hour with our friends
@sentry.io, Stanza, and Cortex
lu.ma/hid3pwq4
Join me for the next @rootly.com Roundtable to discuss AI in Incident Management.
Note: we wonโt share a video recording of the event, like Las Vegas: what happens at the roundtable stays at the roundtable ๐
lu.ma/march_rootly...
๐ง We are hiring across the board and are looking for contractors for the AI Lab โ shoot me a DM if you are interested!
19.02.2025 20:07 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0๐ก The AI Lab mission is to leverage AI to improve incident management and systems operations. Weโll be building POCs, open-sourcing tools, and benchmarking models.
19.02.2025 20:07 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0๐จโ๐ปJoinly Rootly feels like the perfect next step. My career has always been about SREsโI worked as one, trained them, and helped startups engage with them.
19.02.2025 20:07 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0๐ฅ Iโve joined @rootly.com, where I will lead developer relations and the AI Lab.
๐ My first project was a hackathon distilling DeepSeek R1 and proving it could outperform GPT-4o and Llama 3 on system log analysis
Read more ๐
rootly.com/blog/classif...
Obviously, @MistralAI promoted how good Le Chat is at finding food pairings for wine ๐ซ๐ท.
That should be included in all model benchmarks.
SlideShare's founders are at it again with
@jaunthq.bsky.social, a document-based social site to read, share & post.
Congratulations @jboutelle.bsky.social, @rashmi.bsky.social & @amitranjan.bsky.social on the launch ๐
techcrunch.com/2024/12/12/t...
Heard about Flux? The incubating @cncf.bsky.social project simplifies continuous delivery for K8s and strengthens supply chain security.
In this episode of @thelandscape.bsky.social, Flux maintainer @stefanprodan.com shares his favorite feature and more
In this episode with @joaquimrocha.com, we speak about Headlamp.
The sandboxed @cncf.bsky.social project provides a powerful and flexible UI for Kubernetes.๐
Watch the full episode ๐
Perses, a sandboxed @cncf.bsky.social porject, provides standards for visualization and dashboards for metrics monitoring.
@schabell.org is sharing everything you need to know about the project
Looking for where to store your AI assets? Harbor - the @cncf.bsky.social incubating project - might be what you are looking for.
Learn more from Harbor maintainer Vadim Bauer by watching the full episode ๐
View from Hawaii Diamond Head
14.11.2024 16:56 โ ๐ 3 ๐ 0 ๐ฌ 0 ๐ 0