(9/n) Finally, I would like to thank all my amazing co-authors: Avinava, @abeirami.bsky.social , Rahul, Nicholas, Amr, Snigdha.
cc @unccs.bsky.social
@somnathbrc.bsky.social
Research Scientist at Google Research https://www.cs.unc.edu/~somnath/
(9/n) Finally, I would like to thank all my amazing co-authors: Avinava, @abeirami.bsky.social , Rahul, Nicholas, Amr, Snigdha.
cc @unccs.bsky.social
(8/n) Here is a blog post with a simplified overview of our work: www.cs.unc.edu/~somnath/blo...
Code: github.com/brcsomnath/pef
Paper link: arxiv.org/abs/2503.20098
(7/n) We would like to highlight previous great works, like LEACE, that perfectly erase concepts to protect against linear adversaries. In our work, we improve upon this method and present a technique that can protect against any adversary.
x.com/norabelrose/...
(6/n) We also visualize the learned representations from different erasure methods. We observe that PEF perfectly erasure group (or concept) information without losing other information (collapsing the representation space).
02.04.2025 16:03 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0(5/n) Empirically, we observe that PEF reaches the theoretical limits of erasure even in challenging settings where other methods struggle, including both linear (INLP, LEACE) and non-linear techniques (FaRM, KRaM).
02.04.2025 16:03 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0(4/n) When the distributions are unequal, we still achieve perfect erasure but with a slightly reduced utility. The erasure function in this setting is shown below.
02.04.2025 16:03 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0(3/n) From the above limits, we show that optimally perfect concept erasure is only feasible when the underlying distributions are equal up to permutations. In such scenarios, the erasure function is shown in the diagram.
02.04.2025 16:03 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0(2/n) We study the fundamental limits of concept erasure. Borrowing from the work of @FlavioCalmon et al in information theory literature, we characterize the erasure capacity and maximum utility that can be retained during concept erasure.
02.04.2025 16:03 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0๐๐จ๐ฐ ๐๐๐ง ๐ฐ๐ ๐ฉ๐๐ซ๐๐๐๐ญ๐ฅ๐ฒ ๐๐ซ๐๐ฌ๐ ๐๐จ๐ง๐๐๐ฉ๐ญ๐ฌ ๐๐ซ๐จ๐ฆ ๐๐๐๐ฌ?
Our method, Perfect Erasure Functions (PEF), erases concepts perfectly from LLM representations. We analytically derive PEF w/o parameter estimation. PEFs achieve pareto optimal erasure-utility tradeoff backed w/ theoretical guarantees. #AISTATS2025 ๐งต
Please stop by our posters if youโre interested. Feel free to reach out if you're interested in AI safety, efficiency, and just want to chat!
CC: @unccs.bsky.social
(3/3) ๐๐จ๐ฐ๐๐ซ๐๐ฌ ๐๐๐๐ฅ๐๐๐ฅ๐ ๐๐ฑ๐๐๐ญ ๐๐๐๐ก๐ข๐ง๐ ๐๐ง๐ฅ๐๐๐ซ๐ง๐ข๐ง๐ ๐๐ฌ๐ข๐ง๐ ๐๐๐
๐
Iโm also presenting my ongoing unlearning work at SafeGenAI Workshop. This uses a novel PEFT training approach to improve exact unlearning efficiency
arxiv.org/abs/2406.16257
(2/3) ๐
๐๐ฌ๐ญ ๐๐ซ๐๐-๐
๐ข๐๐ฅ๐ ๐๐ง๐ญ๐๐ ๐ซ๐๐ญ๐จ๐ซ
An efficient method for graph field integration (a special case of matrix-vector mult.) using integrator trees. FTFI enables polylog-lin. time multiplication w/ performance boost in vision transformers
arxiv.org/abs/2406.15881
๐จIโm traveling to #NeurIPS2024 next week to present these papers.
(1/3) ๐๐ญ๐ซ๐ฎ๐๐ญ๐ฎ๐ซ๐๐ ๐๐ง๐ซ๐๐ฌ๐ญ๐ซ๐ข๐๐ญ๐๐-๐๐๐ง๐ค ๐๐๐ญ๐ซ๐ข๐๐๐ฌ ๐๐จ๐ซ ๐๐๐
๐
A new PEFT method replacing low-rank matrices (LoRA) with more expressive structured matrices
arxiv.org/abs/2406.17740
Please stop by our posters if youโre interested. Feel free to reach out if you're interested in AI safety, efficiency, and just want to chat!
CC: @unccs.bsky.social
(3/3) ๐๐จ๐ฐ๐๐ซ๐๐ฌ ๐๐๐๐ฅ๐๐๐ฅ๐ ๐๐ฑ๐๐๐ญ ๐๐๐๐ก๐ข๐ง๐ ๐๐ง๐ฅ๐๐๐ซ๐ง๐ข๐ง๐ ๐๐ฌ๐ข๐ง๐ ๐๐๐
๐
Iโm also presenting my ongoing unlearning work at SafeGenAI Workshop. This uses a novel PEFT training approach to improve exact unlearning efficiency
arxiv.org/abs/2406.16257
(2/3) ๐
๐๐ฌ๐ญ ๐๐ซ๐๐-๐
๐ข๐๐ฅ๐ ๐๐ง๐ญ๐๐ ๐ซ๐๐ญ๐จ๐ซ
An efficient method for graph field integration (a special case of matrix-vector mult.) using integrator trees. FTFI enables polylog-lin. time multiplication w/ performance boost in vision transformers
arxiv.org/abs/2406.15881