Andrew Stevens @andrewjstevens.com

PHANTOM RECALL: When Familiar Puzzles Fool Smart Models Large language models (LLMs) such as GPT, Gemini, and Claude often appear adept at solving classic logic puzzles--but how much genuine reasoning underlies their answers? Recent evidence suggests that ...

Context drift: how models break when a problem looks the same but isn’t.

New research shows LLMs often “remember” logic puzzles instead of re-reasoning them.

Change a few names or numbers, and performance collapses but confidence stays high.

🔗 arxiv.org/abs/2510.11812

16.10.2025 09:22 — 👍 0 🔁 0 💬 0 📌 0

Elon Musk’s xAI joins race to build ‘world models’ to power video games Artificial intelligence group hired staff from Nvidia to work on advanced AI that can design and navigate physical spaces

www.ft.com/content/ac56...

13.10.2025 12:06 — 👍 0 🔁 0 💬 0 📌 0

xAI Grok World Model: Strategic Applications of Large World Models in Multi-Agent Interaction… Summary

medium.com/@Neural_netw...

13.10.2025 12:06 — 👍 0 🔁 0 💬 0 📌 0

A shift in AI: from systems that generate outputs to systems that model reality.

World models learn from video, sensors & robot data to understand space, time, & cause. The “physics” of the real world.

Robotics that predict reactions, games with real physics, and digital twins that reason.

13.10.2025 12:05 — 👍 1 🔁 0 💬 2 📌 0

Lumos: Performance Characterization of WebAssembly as a Serverless Runtime in the Edge-Cloud Continuum WebAssembly has emerged as a lightweight and portable runtime to execute serverless functions, particularly in heterogeneous and resource-constrained environments such as the Edge Cloud Continuum. How...

Can WebAssembly replace containers at the edge?

A new paper benchmarks Wasm vs containers across the Edge–Cloud Continuum. Gains in cold starts & image size, but major I/O & latency trade-offs.

Read here arxiv.org/abs/2510.05118

#WebAssembly #EdgeComputing #Serverless #CloudNative

08.10.2025 09:18 — 👍 1 🔁 0 💬 1 📌 0

Agent Identity & Attestation Go beyond API keys. Learn to engineer trustworthy AI agents with verifiable identity and attestation using the SPIFFE framework and a Python example.

How do you trust an autonomous AI agent?

In our latest post, we look at workload identity as another missing primitive for trustworthy AI.

Read more on our blog: www.sakurasky.com/blog/missing...

#AI #AISecurity #SPIFFE #WorkloadIdentity #DevSecOps

07.10.2025 08:01 — 👍 2 🔁 1 💬 0 📌 0

"Grit" doesn't build a lasting tech services company. Deliberate structure does.

The choices matter:

Reusable IP > Individual heroes
Deep specialization > Chasing low rates
A balanced client portfolio > Relying on one huge account

These are what separate a true partner from a temporary vendor.

01.10.2025 09:33 — 👍 0 🔁 0 💬 0 📌 0

Your AI moat isn't the model. It's the data.

But a data moat requires serious engineering:
* Reliable Pipelines
* Clear Lineage
* Automated Quality Gates
* Strong Security

Without these, your proprietary data is a liability, not a defensible asset. Moats are built, not found.

#AI #DataEngineering

30.09.2025 12:48 — 👍 0 🔁 0 💬 0 📌 0

Breakthroughs excite investors. Smart innovation sustains organisations.

The hardest call in tech leadership? Knowing when to push a bold idea vs. double down on iteration.

Big wins need both.

#TechLeadership #Innovation #Cloud #Data #Security

23.09.2025 08:53 — 👍 1 🔁 0 💬 0 📌 0

Technical debt always gets paid. The only question is when, and who pays it.

Shortcuts show up as:
* Slower velocity
* Security risk
* Talent drain

Treat debt pay-down like security: non-negotiable, budgeted, and strategic.

The speed of next year depends on the cleanup you invest in today.

22.09.2025 15:09 — 👍 0 🔁 0 💬 0 📌 0

End-to-End Encryption (Part 1) Part 0 of a 13-part series on trustworthy AI agents—an overview of 12 missing engineering primitives (encryption, identity, guardrails, audit, governance) required for production at scale.

Are your AI agents actually secure?

In this instalment of our blog series on Trustworthy AI, we explain why true End-to-End Encryption (E2EE) is non-negotiable and provide a hands-on Python example to fix it.

www.sakurasky.com/blog/missing...

19.09.2025 09:52 — 👍 1 🔁 1 💬 0 📌 0

Your ping-pong table isn't culture.

For tech teams real culture is a system built on psychological safety, a clear mission, and accountability.

It’s not a soft skill - it’s a core requirement for building reliable and secure systems.

#TechCulture #Leadership

18.09.2025 09:31 — 👍 1 🔁 0 💬 0 📌 0

A new paper on hallucination detection has a clever idea: probe all LLM layers at once, not just one (Cross-Layer Attention Probing).

Absolutely worth reading: arxiv.org/pdf/2509.09700

#AI #AIGovernance #LLM

17.09.2025 09:47 — 👍 0 🔁 0 💬 0 📌 0

This paper has a pattern for making LLMs reliable for structured data extraction: wrap the model with a domain ontology to define the rules and an automated correction loop to enforce them.
The study is tiny (only 50 test logs) but the architectural pattern is the takeaway

arxiv.org/pdf/2509.00081

10.09.2025 08:28 — 👍 1 🔁 0 💬 0 📌 0

Shadow AI is the new shadow IT.

Teams are spinning up LLMs + pipelines outside governance.
The risks? Data leakage, privacy violations, compliance failures.
The challenge? People can build AI faster than you can regulate it.

#AI #Privacy #Compliance

09.09.2025 05:11 — 👍 0 🔁 0 💬 0 📌 0

The Missing Primitives for Trustworthy AI Agents Part 0 of a 13-part series on trustworthy AI agents—an overview of 12 missing engineering primitives (encryption, identity, guardrails, audit, governance) required for production at scale.

Most AI agents today? Great demos, mostly just prototypes.

I’m starting a 13-part series on the missing primitives agents need to be production-ready.

Part 0 (overview) is out now → www.sakurasky.com/blog/missing...

First deep dive (Part 1: encryption) drops soon.

08.09.2025 08:42 — 👍 0 🔁 0 💬 0 📌 0

A new large-scale study on AI vs. human code reveals a critical trade-off. It shows AI code is simpler & less complex. However, it also contains more high-risk security vulnerabilities than human-written code.

Security playbooks need to adapt for this new reality.

Read it: arxiv.org/pdf/2508.21634

08.09.2025 05:16 — 👍 0 🔁 0 💬 0 📌 0

Watching ceramic artisans in Italy, a powerful reminder that building enduring technology follows the same ancient rules as their pottery:

Master the fundamentals.
Reliability over aesthetics.
Embrace the discarded prototypes.

Craftsmanship is timeless.
#TechLeadership #Craftsmanship

06.09.2025 09:05 — 👍 1 🔁 0 💬 0 📌 0

“Human-in-the-loop” doesn’t scale. Millions of autonomous agents can’t wait for approval on every action.

The future is “human-on-the-loop”: humans as supervisors, not gatekeepers. We set guardrails, monitor, and step in only when it truly matters.

#AIGovernance #AI

02.09.2025 09:14 — 👍 0 🔁 0 💬 0 📌 0

Your job as a CTO in the boardroom isn't to be the lead architect; it's to be the lead translator.

Don't say: "We're refactoring the monolith."
Say: "We're de-risking the business."

Risk, Cost, Growth. That's the language they speak.

#CTO #TechLeadership #Startup

28.08.2025 05:29 — 👍 0 🔁 0 💬 0 📌 0

From CSPM to CNAPP and Why Cloud Security Is Converging Cloud security today feels like flying a plane with a dozen blinking warning lights, each important in isolation, but overwhelming in aggregate. A CSPM tells you a security group is misconfigured.

Too many dashboards. Too many alerts.
Attackers connect the dots instantly; our tools don’t.

That’s why cloud security is converging from CSPM + CWPP into CNAPP.

Wrote my first LinkedIn article on why this shift matters now:

www.linkedin.com/pulse/from-c...

27.08.2025 06:37 — 👍 0 🔁 0 💬 0 📌 0

Investing in your internal platform is the highest-leverage way to accelerate your entire org.

#PlatformEngineering #TechLeadership

26.08.2025 05:37 — 👍 0 🔁 0 💬 0 📌 0

Your internal tech stack isn't a project you finish. It's a product you own.

Treating it like a project creates tech debt and slows your team down. Treating it like a product, with a roadmap and real investment, turns it into a competitive advantage.

26.08.2025 05:37 — 👍 0 🔁 0 💬 1 📌 0

Navigating AI risk is chaotic. The "AI Risk Atlas" by IBM Research offers a unified map, organising risks into 5 core categories : Training Data , Inference , Output , Non-Technical , & Agentic.

A shared vocabulary for real AI governance.

github.com/IBM/risk-atl...

#AIGovernance #ResponsibleAI

25.08.2025 05:09 — 👍 1 🔁 0 💬 0 📌 0

The technical founder's playbook:

PMF Superpowers: Rapid prototyping, centralised control.
Scaling Traps: Founder centrality, resisting process.
The Fix: Delegate. Hire for depth. Build real systems.

The core of it is shifting from building a product to building a company.

#Founder #Startup

24.08.2025 07:53 — 👍 0 🔁 0 💬 0 📌 0

Your AI starts drifting the day it’s deployed.

Data shifts.
Behaviours change.
Scammers adapt.

Without monitoring + retraining, your “cutting-edge” model quietly becomes a fossil.

MLOps isn’t optional, it’s survival.

#AI #MLOps #DataScience

23.08.2025 09:13 — 👍 1 🔁 0 💬 0 📌 0

The model learns the flaw as truth and makes terrible decisions with absolute certainty.

That’s why responsible AI isn’t just about outputs, it’s about uncertainty signals, data validation, and human-in-the-loop checks.

#AI #MachineLearning #DataQuality #ResponsibleAI

22.08.2025 08:15 — 👍 0 🔁 1 💬 0 📌 0

An AI model that’s wrong is bad.
An AI model that’s confidently wrong is a disaster.

The trickiest data issues aren’t messy nulls or outliers. They’re the subtle ones that look “clean” but are contextually wrong, like a fintech risk score logged on a 1–10 scale instead of 1–100.

22.08.2025 08:15 — 👍 1 🔁 0 💬 1 📌 0

Your brilliant model is a useless science experiment without the boring, unsexy work of MLOps to make it reliable, repeatable, and auditable. It's the unseen 90%

#MLOps #AI #ProductionAI

21.08.2025 14:01 — 👍 0 🔁 1 💬 0 📌 0

Your Most Powerful User Is Your Growing Security Blind Spot AI agents are a powerful new tool, but they also represent a growing security blind spot. Traditional security models are failing and a Zero Trust architecture is essential to mitigate this new inside...

We're giving AI agents the keys to the kingdom and securing them with decade-old models. Your shiny new copilot is a massive, overlooked insider threat.

Legacy security is not built for this. It's time for Zero Trust. I wrote about it: www.sakurasky.com/blog/your-mo...

20.08.2025 18:15 — 👍 0 🔁 0 💬 0 📌 0

Andrew Stevens

Latest posts by andrewjstevens.com on Bluesky

@andrewjstevens.com is following 4 prominent accounts