Gadi Evron @gadievron - Bluesky Profile

From Assistant to Adversary: Exploiting Agentic AI Developer Tools | NVIDIA Technical Blog Developers are increasingly turning to AI-enabled tools for coding, including Cursor, OpenAI Codex, Claude Code, and GitHub Copilot. While these automation tools can enable faster development and…

developer.nvidia.com/blog/from-as...

14.10.2025 23:49 — 👍 0 🔁 0 💬 0 📌 0

We protect coding assistants and developers, by managing their risk, and detect long and blocking these attacks - live

Send me a message for a demo. We’re happy to show you how fast and simple it is to defend AI development

Read NVIDIA’s blog, here:

14.10.2025 23:42 — 👍 0 🔁 0 💬 1 📌 0

2. AppSec: use tools like NVIDIA’s garak and NeMo for testing and AI guardrails.
3. Detection, response, and posture management: this is what we do at Knostic, and we’re pretty good at it

14.10.2025 23:41 — 👍 0 🔁 0 💬 1 📌 0

Three defensive measures:
1. Best practices: disable agent auto-run, log and alert on shell or install actions, block agent-initiated installs, and sandbox autonomous agents. Although, these are too heavy handed for many developers to easily accept and be able to keep working

14.10.2025 23:40 — 👍 0 🔁 0 💬 1 📌 0

Agentic tools didn’t just boost productivity. They redrew the perimeter. Treat them like an untrusted, and unfortunately fully privileged liability

14.10.2025 23:40 — 👍 0 🔁 0 💬 1 📌 0

NVIDIA’s blog states:
“Knowing that Cursor has the capability to autonomously execute terminal commands. … In this example, we obfuscated a basic PowerShell script that achieves a reverse shell, with the intention of targeting Windows developers.”

14.10.2025 23:39 — 👍 0 🔁 0 💬 1 📌 0

NVIDIA’s From Assistant to Adversary, by Becca Lynch and Rich Harang, shows how these tools can be hijacked to execute attack code. No 0-days needed

14.10.2025 23:39 — 👍 0 🔁 0 💬 1 📌 0

Coding agents aren’t just productivity boosters. They’ve expanded the CI/CD security boundary and are already being exploited in the wild. The developer workstation’s now a full attack surface, like the endpoint and browser before it

14.10.2025 23:39 — 👍 0 🔁 0 💬 1 📌 0

Another day, another CISO wake-up call for securing Copilot, Claude Code, Cursor, Windsurf, and every other agentic dev tool

14.10.2025 23:38 — 👍 0 🔁 0 💬 1 📌 0

AI coding assistants just got called out by NVIDIA 🪵https://developer.nvidia.com/blog/from-assistant-to-adversary-exploiting-agentic-ai-developer-tools/

14.10.2025 23:37 — 👍 1 🔁 0 💬 1 📌 0

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions Today's LLMs are susceptible to prompt injections, jailbreaks, and other attacks that allow adversaries to overwrite a model's original instructions with their own malicious prompts. In this work, we ...

- The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions. Eric Wallace, Kai Xiao, Reimar Leike, Lilian Weng, Johannes Heidecke, Alex Beutel. arXiv:2404.13208 (2024). arxiv.org/abs/2404.13208

14.10.2025 23:23 — 👍 0 🔁 0 💬 0 📌 0

Attention Tracker: Detecting Prompt Injection Attacks in LLMs Large Language Models (LLMs) have revolutionized various domains but remain vulnerable to prompt injection attacks, where malicious inputs manipulate the model into ignoring original instructions and ...

- Attention Tracker: Detecting Prompt Injection Attacks in LLMs. Kuo-Han Hung, Ching-Yun Ko, Ambrish Rawat, I-Hsin Chung, Winston H. Hsu, Pin-Yu Chen. arXiv:2411.00348 (2025). arxiv.org/abs/2411.00348

14.10.2025 23:22 — 👍 1 🔁 0 💬 1 📌 0

Automatic and Universal Prompt Injection Attacks against Large Language Models Large Language Models (LLMs) excel in processing and generating human language, powered by their ability to interpret and follow instructions. However, their capabilities can be exploited through prom...

- Automatic and Universal Prompt Injection Attacks against Large Language Models. Xiaogeng Liu, Zhiyuan Yu, Yizhe Zhang, Ning Zhang, Chaowei Xiao. arXiv:2403.04957 (2024). arxiv.org/abs/2403.04957

14.10.2025 23:22 — 👍 0 🔁 0 💬 1 📌 0

Cognitive Overload Attack:Prompt Injection for Long Context Large Language Models (LLMs) have demonstrated remarkable capabilities in performing tasks across various domains without needing explicit retraining. This capability, known as In-Context Learning (IC...

- Cognitive Overload Attack: Prompt Injection for Long Context. Bibek Upadhayay, Vahid Behzadan, Amin Karbasi. arXiv:2410.11272 (2024). arxiv.org/abs/2410.11272

14.10.2025 23:21 — 👍 0 🔁 0 💬 1 📌 0

Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection Large Language Models (LLMs) are increasingly being integrated into various applications. The functionalities of recent LLMs can be flexibly modulated via natural language prompts. This renders them s...

- Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, Mario Fritz. arXiv:2302.12173 (2023). arxiv.org/abs/2302.12173

14.10.2025 23:21 — 👍 0 🔁 0 💬 1 📌 0

Limitations of Normalization in Attention Mechanism This paper investigates the limitations of the normalization in attention mechanisms. We begin with a theoretical framework that enables the identification of the model's selective ability and the geo...

References:
- Limitations of Normalization in Attention Mechanism. Timur Mudarisov, Mikhail Burtsev, Tatiana Petrova, Radu State. arXiv:2508.17821 (2025). arxiv.org/abs/2508.17821

14.10.2025 23:20 — 👍 0 🔁 0 💬 1 📌 0

Not an expert

14.10.2025 23:19 — 👍 0 🔁 0 💬 1 📌 0

This is conjecture, but flattening might act as a complementary factor, making it harder for models to stay focused on prioritized instructions as context expands. If attention collapse and obedience interact, that could explain why longer or layered prompts make models lose focus or shift intent

14.10.2025 23:19 — 👍 0 🔁 0 💬 1 📌 0

OpenAI’s Instruction Hierarchy work shows that teaching models to prioritize system or privileged instructions improves robustness, but it doesn’t explain why longer contexts make attacks more effective or why attention shifts during them

14.10.2025 23:18 — 👍 1 🔁 0 💬 1 📌 0

and decoding strategies can shift which instruction the model follows

14.10.2025 23:17 — 👍 1 🔁 0 💬 1 📌 0

Other explanations exist: instruction-tuning and RLHF make models over-obey any text that looks like a command; attention often shows recency bias that favors later tokens; retrieval and agent systems blur the boundary between trusted and untrusted text

14.10.2025 23:17 — 👍 1 🔁 0 💬 1 📌 0

But, as much as I love theorizing, this isn't where current research is pointing

14.10.2025 23:15 — 👍 1 🔁 0 💬 1 📌 0

The fundamental issue with prompt injection is that LLMs cannot separate instructions from data, since everything is processed as natural language. Attention flattening could be one architectural factor. If true, prompt injection may be partly built into how attention scales

14.10.2025 23:15 — 👍 1 🔁 0 💬 1 📌 0

The degree of that shift correlates with attack success rates

This supports my earlier question. Evidence shows attacks exploit attention, and attention becomes less selective as context grows. Whether one causes the other remains unproven

14.10.2025 23:14 — 👍 1 🔁 0 💬 1 📌 0

The researchers measured attention during successful prompt-injection attacks and found that focus literally shifts from the original instruction to the injected one, a clear “distraction effect.”

14.10.2025 23:14 — 👍 1 🔁 0 💬 1 📌 0

to see whether this is just a scaling side effect or a deeper flaw in how attention works

One more paper deepened the trail: Attention Tracker

14.10.2025 23:13 — 👍 1 🔁 0 💬 1 📌 0

But, no study I found proves that longer context directly enables attacks, or how they relate to attention itself. I started imagining controlled experiments that could track how flattening interacts with vulnerability

14.10.2025 23:12 — 👍 1 🔁 0 💬 1 📌 0

Then, Automatic and Universal Prompt Injection Attacks confirmed the pattern: an automated method produced universal attack strings that worked across different models and defenses, a sign the weakness is systemic

14.10.2025 23:12 — 👍 1 🔁 0 💬 1 📌 0

The trail started with two well-known studies on prompt-injection:
Not What You’ve Signed Up For showed that adding untrusted or extra text to a model’s context can quietly bypass guardrails. Cognitive Overload Attack showed that long, noisy prompts make models lose focus and miss important cues

14.10.2025 23:10 — 👍 1 🔁 0 💬 1 📌 0

As input grows, attention flattening makes attention less selective, so the model’s "focus" spreads out until fine detail fades

That made me wonder: could context length and attention flattening affect model security?

14.10.2025 23:10 — 👍 1 🔁 0 💬 1 📌 0

Gadi Evron

Latest posts by gadievron.bsky.social on Bluesky

@gadievron is following 20 prominent accounts