Siobhan Rossi @shivros - Bluesky Profile

It's finally happened - I spend more time talking to my computer than typing on it.

19.01.2026 14:42 — 👍 0 🔁 0 💬 0 📌 0

Beyond Code Mode: Agentica Build agents that interact with runtime objects through code.

Agentica sandboxes agents in WASM-inside-microVMs so they can spawn sub-agents safely. Because nothing says "we trust AI" like two nested prison cells. #AISafety #AgenticAI #CodeExecution
zurl.co/HARtX

07.01.2026 14:56 — 👍 3 🔁 2 💬 0 📌 0

How to Build a Production-Ready Multi-Agent Incident Response System Using OpenAI Swarm and Tool-Augmented Agents How to Build a Production-Ready Multi-Agent Incident Response System Using OpenAI Swarm and Tool-Augmented Agents

Multi-agent incident response using OpenAI Swarm, or, why pay humans to panic when algorithms can do it with 40% less screaming? #AI #DevOps

zurl.co/t3XdJ

06.01.2026 19:29 — 👍 1 🔁 0 💬 0 📌 0

OpenRouter has the rare "Wrapped" that didn’t make me roll my eyes too hard.

Personal usage only. No work or vibe coding included:

* 256.1M tokens routed
* 77 models used

Overdone trend, but this one made me grin a bit.

Embarrassed a grok model made it into my top 5.

18.12.2025 21:15 — 👍 0 🔁 0 💬 0 📌 0

Is this the future?

AI pretending to be humans using a UI.
Even better - AI pretending to be a human using an AI pretending to be humans using a UI.
Training on labeled OpenSCAD (zurl.co/1tG2v) code seems 100x easier if you want to make an AI make 3D models.

16.12.2025 22:50 — 👍 0 🔁 0 💬 0 📌 0

MIT’s VideoCAD (zurl.co/YwfkN ) AI learned CAD by watching 41k+ videos of humans clicking, dragging, and typing.

Super cool, but positively unhinged. Why are we automating UI use?

16.12.2025 22:49 — 👍 0 🔁 0 💬 1 📌 0

Thanks for the suggestion. I hadn't tried the MCP server, but I'll give it a shot to see if it resolves the issues.

I'm not dogmatic about which model I'm using. It's only that the Codex CLI is working best for me in most contexts at the moment. Maybe the Claude 4.5 release changes that.

26.11.2025 12:36 — 👍 0 🔁 0 💬 1 📌 0

Do gpt-5.1-codex and gpt-5.1-codex-max not count as newer models? I see them constantly treating runes like functions and creating deeply nested derived values when it makes no sense to do so.

The only language I've seen worse performance from LLMs in than Svelte 5 is Rust.

26.11.2025 12:19 — 👍 0 🔁 0 💬 1 📌 0

Until we build tools that reduce the verification load — not just generate more code — the bottleneck is only going to get tighter.

26.11.2025 01:34 — 👍 0 🔁 0 💬 0 📌 0

And the ergonomics of a language — how easy it is for an LLM to use correctly — is becoming a serious factor. (Try using Svelte 5 runes with an LLM if you want a verifiable nightmare.)

LLMs are shifting the center of gravity of what engineering work actually is.

26.11.2025 01:34 — 👍 0 🔁 0 💬 2 📌 0

Fewer “let me write this module,” more “let me prevent this AI from quietly breaking our entire system.”

Test-driven development and automated quality checks are becoming more important.

26.11.2025 01:33 — 👍 0 🔁 0 💬 1 📌 0

The output firehose got bigger, and the review surface area grew with it.

The job is changing shape: less raw implementation, more steering and evaluation. More attention to failure modes than syntax.

26.11.2025 01:33 — 👍 0 🔁 0 💬 1 📌 0

The model writes most of the code — but someone still has to do the shaping, the corrections, the guardrails, and the judgment.

And that’s the real bottleneck.
We’ve accelerated code *production*, but not code *verification*.

26.11.2025 01:32 — 👍 0 🔁 0 💬 1 📌 0

* Let it execute
* Watch the diffs like I’m monitoring a toddler near an open flame
* Steer it back when it wanders
* Make it write tests if it “forgets”
* Then manually repair the subtle, end-to-end issues that only show up once everything is wired together

26.11.2025 01:32 — 👍 0 🔁 0 💬 1 📌 0

They can follow a plan for more steps and lose the plot less often. Endurance improved; the ceiling didn’t.

My workflow today is almost muscle memory:

* Write down the requirements and the approach
* Tell the model to generate a plan
* Fix the plan (always)

26.11.2025 01:31 — 👍 0 🔁 0 💬 1 📌 0

Since reasoning models dropped a year ago, I haven’t noticed the core complexity ceiling shifting much. Models aren’t solving meaningfully harder problems. What *has* changed is how long they can stay coherent without drifting into nonsense.

26.11.2025 01:31 — 👍 2 🔁 0 💬 1 📌 0

I’ve been using LLM coding tools seriously since mid-2023 — Cursor, Windsurf, VSCode + Roo Code, Claude Code, Gemini CLI, Codex CLI. At this point I’ve seen every phase of the hype cycle up close.

26.11.2025 01:30 — 👍 0 🔁 0 💬 1 📌 0

Checked out "AI's 70% Problem w/ Addy Osmani" (zurl.co/pyG2O) — an episode of Zed Industries' podcast. The TL;DR: AI can write about 70% of the code you need, while the remaining 30% demands nuance the models simply don’t have.

26.11.2025 01:30 — 👍 0 🔁 0 💬 1 📌 0

zurl.co/2MRfd

26.11.2025 00:01 — 👍 1 🔁 0 💬 0 📌 0

They validated some of the generated proteins in bacteria, including antitoxins that barely resemble anything in known biology.

Unfortunately, human genes are much tougher, but it’s a sign of where models in bio are drifting — from modeling biology to proposing it.

26.11.2025 00:00 — 👍 1 🔁 0 💬 1 📌 0

Stanford built a model (Evo 1.5) that works like an LLM, except it runs on bacterial DNA instead of text tokens. You give it some genomic context, and it can complete genes or even invent new ones.

26.11.2025 00:00 — 👍 0 🔁 0 💬 1 📌 0

Codex CLI down. Back to Stack Overflow copy-pasting like it's 2019.

22.11.2025 21:23 — 👍 1 🔁 0 💬 0 📌 0

Siobhan Rossi

Latest posts by shivros.bsky.social on Bluesky

@shivros is following 1 prominent accounts