's Avatar

@usmananwar.bsky.social

34 Followers  |  125 Following  |  11 Posts  |  Joined: 16.09.2024  |  1.6497

Latest posts by usmananwar.bsky.social on Bluesky

Preview
Adversarial Robustness of In-Context Learning in Transformers for Linear Regression Transformers have demonstrated remarkable in-context learning capabilities across various domains, including statistical learning tasks. While previous work has shown that transformers can implement c...

This was joint work with amazing co-authors: Spencer Frei, Johannes von Oswald, David Krueger and Louis Kirsch.
Check out the paper on arxiv: arxiv.org/abs/2411.05189

11.11.2024 16:20 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

To conclude, transformers do not learn robust in-context learning algorithms and we still do not really understand what algorithms GPT-style transformers implement in-context even for a simple setting like linear regression. ๐Ÿฅน

11.11.2024 16:20 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Similarly, we find that hijacking attacks transfer poorly btw GPT โ†” OLS โ€“ even though โ€˜in-distributionโ€™ behavior matches quite well btw GPT and OLS! Interestingly, the transfer is considerably worse when going GPT โ†’ OLS.. ๐Ÿค”

11.11.2024 16:20 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

..Probably not. Our adversarial attacks designed for linear transformers implementing gradient-descent do poorly on (GPT-style) transformers indicating that they are likely not implementing gradient-based ICL algorithms.

11.11.2024 16:20 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Finally, are transformers implementing gradient descent or ordinary least squares (OLS) when solving linear regression tasks in context as argued by previous works(arxiv.org/abs/2208.01066, arxiv.org/abs/2211.15661)?

11.11.2024 16:20 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

We also find that larger transformers are less universal in what in-context learning algorithms they implement โ€“ transferability of hijacking attacks gets worse as transformerโ€™s size increases!

11.11.2024 16:20 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Can the adversarial robustness of transformers be improved? Yes; we found that gradient-based adversarial training works (even when just fine-tuning), and the tradeoff between clean-performance and adversarial robustness is not significant.

11.11.2024 16:20 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

We show that linear transformers โ€“ which provably implement gradient descent on linear regression tasks โ€“ are provably non-robust and can be hijacked by attacking a SINGLE token! Standard GPT-style transformers are similarly non-robust.

11.11.2024 16:20 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

We specifically study โ€œhijacking attacksโ€ on transformers trained to solve linear regression in-context in which the adversaryโ€™s goal is to force the transformer to make an arbitrary prediction by attacking the in-context data.

11.11.2024 16:20 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Adversarial Robustness of In-Context Learning in Transformers for Linear Regression Transformers have demonstrated remarkable in-context learning capabilities across various domains, including statistical learning tasks. While previous work has shown that transformers can implement c...

We find
1. Transformers do NOT implement robust ICL algorithms
2. Adversarial training (even at finetuning stage) works!
3. Attacks transfer for small models but not for โ€˜largerโ€™ transformers.
Arxiv: arxiv.org/abs/2411.05189

11.11.2024 16:20 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Transformers are REALLY good at in-context learning (ICL); but do they learn โ€˜adversarially robustโ€™ ICL algorithms? We study this and much more in our new paper! ๐Ÿงต

11.11.2024 16:20 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

@usmananwar is following 20 prominent accounts