@gadievron.bsky.social
CEO & Co-Founder at Knostic, CISO-in-Residence for AI at Cloud Security Alliance. Former Founder @Cymmetria (acquired). Host at Prompt||GTFO. Threat hunter, scifi geek, dance teacher. Opinions my own.
We protect coding assistants and developers, by managing their risk, and detect long and blocking these attacks - live
Send me a message for a demo. Weβre happy to show you how fast and simple it is to defend AI development
Read NVIDIAβs blog, here:
2. AppSec: use tools like NVIDIAβs garak and NeMo for testing and AI guardrails.
3. Detection, response, and posture management: this is what we do at Knostic, and weβre pretty good at it
Three defensive measures:
1. Best practices: disable agent auto-run, log and alert on shell or install actions, block agent-initiated installs, and sandbox autonomous agents. Although, these are too heavy handed for many developers to easily accept and be able to keep working
Agentic tools didnβt just boost productivity. They redrew the perimeter. Treat them like an untrusted, and unfortunately fully privileged liability
14.10.2025 23:40 β π 0 π 0 π¬ 1 π 0NVIDIAβs blog states:
βKnowing that Cursor has the capability to autonomously execute terminal commands. β¦ In this example, we obfuscated a basic PowerShell script that achieves a reverse shell, with the intention of targeting Windows developers.β
NVIDIAβs From Assistant to Adversary, by Becca Lynch and Rich Harang, shows how these tools can be hijacked to execute attack code. No 0-days needed
14.10.2025 23:39 β π 0 π 0 π¬ 1 π 0Coding agents arenβt just productivity boosters. Theyβve expanded the CI/CD security boundary and are already being exploited in the wild. The developer workstationβs now a full attack surface, like the endpoint and browser before it
14.10.2025 23:39 β π 0 π 0 π¬ 1 π 0Another day, another CISO wake-up call for securing Copilot, Claude Code, Cursor, Windsurf, and every other agentic dev tool
14.10.2025 23:38 β π 0 π 0 π¬ 1 π 0AI coding assistants just got called out by NVIDIA πͺ΅https://developer.nvidia.com/blog/from-assistant-to-adversary-exploiting-agentic-ai-developer-tools/
14.10.2025 23:37 β π 1 π 0 π¬ 1 π 0- The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions. Eric Wallace, Kai Xiao, Reimar Leike, Lilian Weng, Johannes Heidecke, Alex Beutel. arXiv:2404.13208 (2024). arxiv.org/abs/2404.13208
14.10.2025 23:23 β π 0 π 0 π¬ 0 π 0- Attention Tracker: Detecting Prompt Injection Attacks in LLMs. Kuo-Han Hung, Ching-Yun Ko, Ambrish Rawat, I-Hsin Chung, Winston H. Hsu, Pin-Yu Chen. arXiv:2411.00348 (2025). arxiv.org/abs/2411.00348
14.10.2025 23:22 β π 1 π 0 π¬ 1 π 0- Automatic and Universal Prompt Injection Attacks against Large Language Models. Xiaogeng Liu, Zhiyuan Yu, Yizhe Zhang, Ning Zhang, Chaowei Xiao. arXiv:2403.04957 (2024). arxiv.org/abs/2403.04957
14.10.2025 23:22 β π 0 π 0 π¬ 1 π 0- Cognitive Overload Attack: Prompt Injection for Long Context. Bibek Upadhayay, Vahid Behzadan, Amin Karbasi. arXiv:2410.11272 (2024). arxiv.org/abs/2410.11272
14.10.2025 23:21 β π 0 π 0 π¬ 1 π 0- Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, Mario Fritz. arXiv:2302.12173 (2023). arxiv.org/abs/2302.12173
14.10.2025 23:21 β π 0 π 0 π¬ 1 π 0References:
- Limitations of Normalization in Attention Mechanism. Timur Mudarisov, Mikhail Burtsev, Tatiana Petrova, Radu State. arXiv:2508.17821 (2025). arxiv.org/abs/2508.17821
Not an expert
14.10.2025 23:19 β π 0 π 0 π¬ 1 π 0This is conjecture, but flattening might act as a complementary factor, making it harder for models to stay focused on prioritized instructions as context expands. If attention collapse and obedience interact, that could explain why longer or layered prompts make models lose focus or shift intent
14.10.2025 23:19 β π 0 π 0 π¬ 1 π 0OpenAIβs Instruction Hierarchy work shows that teaching models to prioritize system or privileged instructions improves robustness, but it doesnβt explain why longer contexts make attacks more effective or why attention shifts during them
14.10.2025 23:18 β π 1 π 0 π¬ 1 π 0and decoding strategies can shift which instruction the model follows
14.10.2025 23:17 β π 1 π 0 π¬ 1 π 0Other explanations exist: instruction-tuning and RLHF make models over-obey any text that looks like a command; attention often shows recency bias that favors later tokens; retrieval and agent systems blur the boundary between trusted and untrusted text
14.10.2025 23:17 β π 1 π 0 π¬ 1 π 0But, as much as I love theorizing, this isn't where current research is pointing
14.10.2025 23:15 β π 1 π 0 π¬ 1 π 0The fundamental issue with prompt injection is that LLMs cannot separate instructions from data, since everything is processed as natural language. Attention flattening could be one architectural factor. If true, prompt injection may be partly built into how attention scales
14.10.2025 23:15 β π 1 π 0 π¬ 1 π 0The degree of that shift correlates with attack success rates
This supports my earlier question. Evidence shows attacks exploit attention, and attention becomes less selective as context grows. Whether one causes the other remains unproven
The researchers measured attention during successful prompt-injection attacks and found that focus literally shifts from the original instruction to the injected one, a clear βdistraction effect.β
14.10.2025 23:14 β π 1 π 0 π¬ 1 π 0to see whether this is just a scaling side effect or a deeper flaw in how attention works
One more paper deepened the trail: Attention Tracker
But, no study I found proves that longer context directly enables attacks, or how they relate to attention itself. I started imagining controlled experiments that could track how flattening interacts with vulnerability
14.10.2025 23:12 β π 1 π 0 π¬ 1 π 0Then, Automatic and Universal Prompt Injection Attacks confirmed the pattern: an automated method produced universal attack strings that worked across different models and defenses, a sign the weakness is systemic
14.10.2025 23:12 β π 1 π 0 π¬ 1 π 0The trail started with two well-known studies on prompt-injection:
Not What Youβve Signed Up For showed that adding untrusted or extra text to a modelβs context can quietly bypass guardrails. Cognitive Overload Attack showed that long, noisy prompts make models lose focus and miss important cues
As input grows, attention flattening makes attention less selective, so the modelβs "focus" spreads out until fine detail fades
That made me wonder: could context length and attention flattening affect model security?