Luca Beurer-Kellner @lbeurerkellner

Indeed, like an unguarded eval(…) directed at all the data we process.

08.04.2025 19:55 — 👍 2 🔁 0 💬 0 📌 0

Invariant Labs We help agent builders create reliable, robust and secure products.

To get updates about agent security, follow and sign up for access to Invariant below.

We have been working on this problem for years (at Invariant and in research), together with
@viehzeug.bsky.social, @mvechev, @florian_tramer and our super talented team.

invariantlabs.ai/guardrails

08.04.2025 19:44 — 👍 1 🔁 0 💬 0 📌 0

So what's the takeaway here?

1. Prompt injections still work and are more impactful than ever.
2. Don't install untrusted MCP servers.
3. Don't expose highly-sensitive services like WhatsApp to new eco-systems like MCP
4. 🗣️Guardrail 🗣️ Your 🗣️ Agents (we can help with that)

08.04.2025 19:44 — 👍 0 🔁 0 💬 1 📌 0

To hide, our malicious server first advertises a completely innocuous tool description, that does not contain the attack.

This means the user will not notice the hidden attack.

On the second launch, though, our MCP server suddenly changes its interface, performing a rug pull.

08.04.2025 19:44 — 👍 0 🔁 0 💬 1 📌 0

To successfully manipulate the agent, our malicious MCP server advertises poisoned tool, which re-programs the agent's behavior with respect to the WhatsApp MCP server, and allows the attacker to exfiltrate the user's entire WhatsApp chat history.

08.04.2025 19:44 — 👍 0 🔁 0 💬 1 📌 0

Users have to scroll a bit to see it, but if you scroll all the way to the right, you will find the exfiltration payload.

Video: invariantlabs.ai/images/whats...

08.04.2025 19:44 — 👍 0 🔁 0 💬 1 📌 0

Even though, a user must always confirm a tool call before it is executed (at least in Cursor and Claude Desktop), our WhatsApp attack remains largely invisible to the user.

Can you spot the exfiltration?

08.04.2025 19:44 — 👍 0 🔁 0 💬 1 📌 0

With this setup our attack (1) circumvents the need for the user to approve the malicious tool, (2) exfiltrates data via WhatsApp itself, and (3) does not require the agent to interact with our malicious MCP server directly.

08.04.2025 19:44 — 👍 0 🔁 0 💬 1 📌 0

To attack, we deploy a malicious sleeper MCP server, that first advertises an innocuous tool, and then later on, when the user has already approved its use, switches to a malicious tool that shadows and manipulates the agent's behavior with respect to whatsapp-mcp.

08.04.2025 19:44 — 👍 0 🔁 0 💬 1 📌 0

WhatsApp MCP Exploited: Exfiltrating your message history via MCP This blog post demonstrates how an untrusted MCP server can attack and exfiltrate data from an agentic system that is also connected to a trusted WhatsApp MCP instance, side-stepping WhatsApp's encryp...

Blog: invariantlabs.ai/blog/whatsap...

If you want to stay up to date regarding MCP and agent security more generally, follow me and
@invariantlabsai.bsky.social

Now, let' s get into the attack.

08.04.2025 19:44 — 👍 0 🔁 0 💬 1 📌 0

New MCP attack demonstration shows how to leak WhatsApp messages via MCP.

We show a new MCP attack that leaks your WhatsApp messages if you are connected via WhatsApp MCP.

Our attack uses a sleeper design, circumventing the need for user approval.

More 👇

08.04.2025 19:44 — 👍 0 🔁 0 💬 1 📌 0

Invariant Labs We help agent builders create reliable, robust and secure products.

To stay updated about agent security, please follow and sign up for early access to Invariant, a security platform for MCP and agentic systems, below.

We have been working on this problem for years (at Invariant and in research).

invariantlabs.ai/guardrails

03.04.2025 07:46 — 👍 0 🔁 0 💬 0 📌 0

We wrote up a little report about this, to raise awareness. Please have a look for much more details and scenarios, and our code snippets.

Blog: invariantlabs.ai/blog/mcp-sec...

03.04.2025 07:46 — 👍 0 🔁 0 💬 1 📌 0

These types of malicious tools are especially problematic with auto-updated MCP packages or fully remote MCP servers, for which users only install and give consent once, and then the MCP server is free to change and update their tool descriptions as they please.

We call this an MCP rug pull:

03.04.2025 07:46 — 👍 0 🔁 0 💬 1 📌 0

Lastly, not only can you expose malicious tools, tool descriptions can also be used to change the agent's behavior with respect to other tools, which we call 'shadowing'.

This way all you emails suddenly go out to 'attacker@pwnd.com', rather than their actual receipient.

03.04.2025 07:46 — 👍 0 🔁 0 💬 1 📌 0

It's trivial to craft a malicious tool description like below, that completely hijacks the agent, while pretending towards the user everything is going great.

03.04.2025 07:46 — 👍 0 🔁 0 💬 1 📌 0

What's concerning about this, is that AI models are trained to precisely follow those instructions, rather than be vary about them. This is new about MCP, as before, agent developers could be relatively trusted, now everything is fair game.

03.04.2025 07:46 — 👍 0 🔁 0 💬 1 📌 0

When an MCP server is added to an agent like Cursor, Claude or the OpenAI Agents SDK, its tool's descriptions are included in the context of the agent.

This opens the doors wide open for a novel type of indirect prompt injection, we coin tool poisoning.

03.04.2025 07:46 — 👍 0 🔁 0 💬 1 📌 0

👿 MCP is all fun, until you add this one malicious MCP server and forget about it.

We have discovered a critical flaw in the widely-used Model Context Protocol (MCP) that enables a new form of LLM attack we term 'Tool Poisoning'.

Leaks SSH key, API keys, etc.

Details below 👇

03.04.2025 07:46 — 👍 14 🔁 8 💬 1 📌 1

Struggling to ensure consistency with your agent's reliability, especially with tool calling?

Testing is our lightweight, pytest-based OSS library to write and run agent tests.

It provides helpers and assertions that enable you to write robust tests for your agentic applications.

06.02.2025 13:43 — 👍 3 🔁 1 💬 1 📌 0

Introducing Operator OpenAI released their "research preview" today of Operator, a cloud-based browser automation platform rolling out today to $200/month ChatGPT Pro subscribers. They're calling this their first "agent"....

Here are my notes on OpenAI's new ChatGPT Operator browser "agent", including initial thoughts on their approach to mitigating prompt injection risks simonwillison.net/2025/Jan/23/...

23.01.2025 19:16 — 👍 84 🔁 12 💬 9 📌 0

The fun part will be also hijacking the supervisor model, while maintaining the utility of the agent (i.e. attack success).

25.01.2025 09:54 — 👍 1 🔁 0 💬 0 📌 0

Enhancing Browser Agent Safety with Guardrails We introduce a novel approach to enhance the safety of browser agents and deploy it as part of the state-of-the-art OpenHands agent.

Blog Post: invariantlabs.ai/blog/enhanci...

Credits to Aniruddha Sundararajan, who build this with us during his internship.

25.01.2025 09:50 — 👍 0 🔁 0 💬 0 📌 0

With (web) agents on everyone's mind, check out our latest blog post (link in thread) on browser agent safety guardrails. We replicate and defend against attacks on the AllHands web agent, preventing it from generating harmful content and falling for harmful requests.

25.01.2025 09:49 — 👍 0 🔁 0 💬 1 📌 0

Luca Beurer-Kellner

Latest posts by lbeurerkellner.bsky.social on Bluesky

@lbeurerkellner is following 20 prominent accounts