This was joint work with amazing co-authors: Spencer Frei, Johannes von Oswald, David Krueger and Louis Kirsch.
Check out the paper on arxiv: arxiv.org/abs/2411.05189
@usmananwar.bsky.social
This was joint work with amazing co-authors: Spencer Frei, Johannes von Oswald, David Krueger and Louis Kirsch.
Check out the paper on arxiv: arxiv.org/abs/2411.05189
To conclude, transformers do not learn robust in-context learning algorithms and we still do not really understand what algorithms GPT-style transformers implement in-context even for a simple setting like linear regression. ๐ฅน
11.11.2024 16:20 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0Similarly, we find that hijacking attacks transfer poorly btw GPT โ OLS โ even though โin-distributionโ behavior matches quite well btw GPT and OLS! Interestingly, the transfer is considerably worse when going GPT โ OLS.. ๐ค
11.11.2024 16:20 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0..Probably not. Our adversarial attacks designed for linear transformers implementing gradient-descent do poorly on (GPT-style) transformers indicating that they are likely not implementing gradient-based ICL algorithms.
11.11.2024 16:20 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0Finally, are transformers implementing gradient descent or ordinary least squares (OLS) when solving linear regression tasks in context as argued by previous works(arxiv.org/abs/2208.01066, arxiv.org/abs/2211.15661)?
11.11.2024 16:20 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0We also find that larger transformers are less universal in what in-context learning algorithms they implement โ transferability of hijacking attacks gets worse as transformerโs size increases!
11.11.2024 16:20 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0Can the adversarial robustness of transformers be improved? Yes; we found that gradient-based adversarial training works (even when just fine-tuning), and the tradeoff between clean-performance and adversarial robustness is not significant.
11.11.2024 16:20 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0We show that linear transformers โ which provably implement gradient descent on linear regression tasks โ are provably non-robust and can be hijacked by attacking a SINGLE token! Standard GPT-style transformers are similarly non-robust.
11.11.2024 16:20 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0We specifically study โhijacking attacksโ on transformers trained to solve linear regression in-context in which the adversaryโs goal is to force the transformer to make an arbitrary prediction by attacking the in-context data.
11.11.2024 16:20 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0We find
1. Transformers do NOT implement robust ICL algorithms
2. Adversarial training (even at finetuning stage) works!
3. Attacks transfer for small models but not for โlargerโ transformers.
Arxiv: arxiv.org/abs/2411.05189
Transformers are REALLY good at in-context learning (ICL); but do they learn โadversarially robustโ ICL algorithms? We study this and much more in our new paper! ๐งต
11.11.2024 16:20 โ ๐ 2 ๐ 1 ๐ฌ 1 ๐ 0