@usmananwar - Bluesky Profile

Latest posts by usmananwar.bsky.social on Bluesky

Adversarial Robustness of In-Context Learning in Transformers for Linear Regression Transformers have demonstrated remarkable in-context learning capabilities across various domains, including statistical learning tasks. While previous work has shown that transformers can implement c...

This was joint work with amazing co-authors: Spencer Frei, Johannes von Oswald, David Krueger and Louis Kirsch.
Check out the paper on arxiv: arxiv.org/abs/2411.05189

11.11.2024 16:20 — 👍 0 🔁 0 💬 0 📌 0

To conclude, transformers do not learn robust in-context learning algorithms and we still do not really understand what algorithms GPT-style transformers implement in-context even for a simple setting like linear regression. 🥹

11.11.2024 16:20 — 👍 0 🔁 0 💬 1 📌 0

Similarly, we find that hijacking attacks transfer poorly btw GPT ↔ OLS – even though ‘in-distribution’ behavior matches quite well btw GPT and OLS! Interestingly, the transfer is considerably worse when going GPT → OLS.. 🤔

11.11.2024 16:20 — 👍 0 🔁 0 💬 1 📌 0

..Probably not. Our adversarial attacks designed for linear transformers implementing gradient-descent do poorly on (GPT-style) transformers indicating that they are likely not implementing gradient-based ICL algorithms.

11.11.2024 16:20 — 👍 0 🔁 0 💬 1 📌 0

Finally, are transformers implementing gradient descent or ordinary least squares (OLS) when solving linear regression tasks in context as argued by previous works(arxiv.org/abs/2208.01066, arxiv.org/abs/2211.15661)?

11.11.2024 16:20 — 👍 0 🔁 0 💬 1 📌 0

We also find that larger transformers are less universal in what in-context learning algorithms they implement – transferability of hijacking attacks gets worse as transformer’s size increases!

11.11.2024 16:20 — 👍 0 🔁 0 💬 1 📌 0

Can the adversarial robustness of transformers be improved? Yes; we found that gradient-based adversarial training works (even when just fine-tuning), and the tradeoff between clean-performance and adversarial robustness is not significant.

11.11.2024 16:20 — 👍 0 🔁 0 💬 1 📌 0

We show that linear transformers – which provably implement gradient descent on linear regression tasks – are provably non-robust and can be hijacked by attacking a SINGLE token! Standard GPT-style transformers are similarly non-robust.

11.11.2024 16:20 — 👍 0 🔁 0 💬 1 📌 0

We specifically study “hijacking attacks” on transformers trained to solve linear regression in-context in which the adversary’s goal is to force the transformer to make an arbitrary prediction by attacking the in-context data.

11.11.2024 16:20 — 👍 0 🔁 0 💬 1 📌 0

We find
1. Transformers do NOT implement robust ICL algorithms
2. Adversarial training (even at finetuning stage) works!
3. Attacks transfer for small models but not for ‘larger’ transformers.
Arxiv: arxiv.org/abs/2411.05189

11.11.2024 16:20 — 👍 0 🔁 0 💬 1 📌 0

Transformers are REALLY good at in-context learning (ICL); but do they learn ‘adversarially robust’ ICL algorithms? We study this and much more in our new paper! 🧵

11.11.2024 16:20 — 👍 2 🔁 1 💬 1 📌 0

@usmananwar is following 20 prominent accounts

Yash Savani
@yashsavani

PhD student @ CMU with Zico Kolter | Prev. research scientist @abacusai, ml eng @primer_ai | Prev. Prev. CS+Stats @Stanford

Thomas Fel
@thomasfel

Explainability, Computer Vision, Neuro-AI.🪴 Kempner Fellow @Harvard. Prev. PhD @Brown, @Google, @GoPro. Crêpe lover. 📍 Boston | 🔗 thomasfel.me

Tim G. J. Rudner
@timrudner

Assistant Professor, University of Toronto. Junior Research Fellow, Trinity College, Cambridge. AI Fellow, Georgetown University. Probabilistic Machine Learning, AI Safety & AI Governance. Prev: Oxford, Yale, UC Berkeley, NYU. https://timrudner.com

Neha Srikanth
@nehasrikanth

NLP PhD student @ University of Maryland, College Park || prev lyft, UT Austin NLP https://nehasrikn.github.io/

Al_th
@alth.fr

Applied Math Ph.D, R&D engineer (Image processing, numerical modeling, Machine Learning) in the healthcare sector. #MLsky Also cooking (Pâté en croûte maker) and slowly learning guitar. Alth.fr @althcuisine on Instagram @AlthCuisine on YouTube FR/EN

Diyi Yang
@diyiyang

Assistant Professor @Stanford CS @StanfordNLP @StanfordAILab Computational Social Science & NLP

Ryan Chan
@ryanchankh

Machine Learning PhD at UPenn. Interested in the theory and practice of interpretable machine learning. ML Intern@Apple.

Philippe Desjardins-Proulx
@lambdaphdp

Exploring {Probabilistic Programming w/ Typed λ-Calculi; Lean; Program Synthesis; Self-Improving A.I.; Evolutionary Genetics} @ umontreal. Ph.D. USherbrooke

Chris Olah
@colah

Reverse engineering neural networks at Anthropic. Previously Distill, OpenAI, Google Brain.Personal account.

Lucas Beyer (bl16)
@giffmana.ai

Researcher (OpenAI. Ex: DeepMind, Brain, RWTH Aachen), Gamer, Hacker, Belgian. Anon feedback: https://admonymous.co/giffmana 📍 Zürich, Suisse 🔗 http://lucasb.eyer.be

Desi R Ivanova
@desirivanova

Research fellow @OxfordStats @OxCSML, spent time at FAIR and MSR Former quant 📈 (@GoldmanSachs), former former gymnast 🤸‍♀️ My opinions are my own 🇧🇬-🇬🇧 sh/ssh

Sarah Wiegreffe
@sarah-nlp

Research in NLP (mostly LM interpretability & explainability). Assistant prof at UMD CS + CLIP. Previously @ai2.bsky.social @uwnlp.bsky.social Views my own. sarahwie.github.io

Natasha Jaques
@natashajaques

Assistant Professor at UW and Staff Research Scientist at Google DeepMind. Social Reinforcement Learning in multi-agent and human-AI interactions. PhD from MIT. Check out https://socialrl.cs.washington.edu/ and https://natashajaques.ai/.

AACL
@aaclmeeting

The Asia-Pacific Chapter of the Association for Computational Linguistics The 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (AACL 2025) https://www.afnlp.org/conferences/ijcnlp2025 #AACL2025 #NLProc #NLP

EMNLP
@emnlpmeeting

EMNLP 2026 - The annual Conference on Empirical Methods in Natural Language Processing Dates: October 2026 in Hungary Hashtags: #EMNLP2026 #NLP Submission Deadline: May 25, 2026 (TBC)

Sweta Karlekar
@swetakar

Machine learning PhD student @ Blei Lab in Columbia University Working in mechanistic interpretability, nlp, causal inference, and probabilistic modeling! Previously at Meta for ~3 years on the Bayesian Modeling & Generative AI teams. 🔗 www.sweta.dev

Isabelle Lee
@wordscompute

ml/nlp phding @ usc, currently visiting harvard, scientisting @ startup; interpretability & training & reasoning & ai for physics iglee.me

Lee Sharkey
@leesharkey

Scruting matrices @ Apollo Research

jake
@yetanotheruseless.com

he/they /in/jakemannix, fka @pbrane professionally: Tech Fellow, AI/Relevance, Walmart Global Tech here: bad math/physics jokes, AI++, puns, OSS ML news, ultras/MTB/outdoorsy stuff, DL papers, shitposting, the fall of democracy Abolish ICE, full stop.

Sebastian Schönnenbeck
@homomorphiesatz

Trail- and Ultrarunner, also mathematician, data scientist, mountain lover, hobby cook, train fan, part-time coliver, beginner photographer, and links-grün-versifft