This was an unfortunate mistake, sorry about that.
But the conclusions of our paper don't change drastically: there is significant gradient masking (as shown by the transfer attack) and the cifar robustness is at most in the 15% range. Still cool though!
We'll see if we can fix the full attack
12.12.2024 16:38 —
👍 5
🔁 1
💬 0
📌 0
I discovered a fatal flaw in a paper by @floriantramer.bsky.social et al claiming to break our Ensemble Everything Everywhere defense. Due to a coding error they used attacks 20x above the standard 8/255. They confirmed this but the paper is already out & quoted on OpenReview. What should we do now?
12.12.2024 16:29 —
👍 11
🔁 4
💬 2
📌 1
This was an unfortunate mistake, sorry about that.
But the conclusions of our paper don't change drastically: there is significant gradient masking (as shown by the transfer attack) and the cifar robustness is at most in the 15% range. Still cool though!
We'll see if we can fix the full attack
12.12.2024 16:38 —
👍 5
🔁 1
💬 0
📌 0
🚨Unlearned hazardous knowledge can be retrieved from LLMs 🚨
Our results show that current unlearning methods for AI safety only obfuscate dangerous knowledge, just like standard safety training.
Here's what we found👇
06.12.2024 17:47 —
👍 12
🔁 3
💬 1
📌 0
Come do open AI with us in Zurich!
We're hiring PhD students, postdocs (and faculty!)
04.12.2024 13:49 —
👍 11
🔁 3
💬 0
📌 1
Yeah they mostly are
25.11.2024 10:12 —
👍 1
🔁 0
💬 0
📌 0
Gradient Masking All-at-Once: Ensemble Everything Everywhere Is Not Robust
Ensemble everything everywhere is a defense to adversarial examples that was recently proposed to make image classifiers robust. This defense works by ensembling a model's intermediate representations...
Ensemble Everything Everywhere is a defense against adversarial examples that people got quite exited about a few months ago (in particular, the defense causes "perceptually aligned" gradients just like adversarial training)
Unfortunately, we show it's not robust...
arxiv.org/abs/2411.14834
25.11.2024 08:38 —
👍 28
🔁 9
💬 1
📌 0
probably -> provably...
23.11.2024 08:43 —
👍 1
🔁 0
💬 0
📌 0