Grateful to the institutions that supported this work:
@tuberlin.bsky.social
@bifold.berlin
UMI Lab
@fraunhoferhhi.bsky.social
@unipotsdam.bsky.social
@leibnizatb.bsky.social
(7/7)
@lkopf.bsky.social
PhD student in Interpretable Machine Learning at TU Berlin & BIFOLD
Grateful to the institutions that supported this work:
@tuberlin.bsky.social
@bifold.berlin
UMI Lab
@fraunhoferhhi.bsky.social
@unipotsdam.bsky.social
@leibnizatb.bsky.social
(7/7)
Many thanks to my amazing co-authors:
@nfel.bsky.social
@kirillbykov.bsky.social
@philinelb.bsky.social
Anna HedstrΓΆm
Marina M.-C. HΓΆhne
@eberleoliver.bsky.social
(6/7)
Our results highlight that the PRISM framework not only provides multiple human interpretable descriptions for neurons but also aligns with the human interpretation of polysemanticity. (5/7)
19.06.2025 15:18 β π 1 π 0 π¬ 1 π 0In exploring the concept space, we use PRISM to characterize more complex components, finding and interpreting patterns that specific attention heads or groups of neurons respond to. (4/7)
19.06.2025 15:18 β π 1 π 0 π¬ 1 π 0We benchmark PRISM across layers and architectures, showing how polysemanticity and interpretability shift through the model. (3/7)
19.06.2025 15:18 β π 1 π 0 π¬ 1 π 0PRISM samples sentences from the top percentile activation distribution, clusters them in embedding space, and uses an LLM to generate labels for each concept cluster. (2/7)
19.06.2025 15:18 β π 1 π 0 π¬ 1 π 0π When do neurons encode multiple concepts?
We introduce PRISM, a framework for extracting multi-concept feature descriptions to better understand polysemanticity.
π Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework
arxiv.org/abs/2506.15538
π§΅ (1/7)
Huge thanks to my incredible supervisor
@kirillbykov.bsky.social, who laid the foundation for this project and provided brilliant guidance π, and to @philinelb.bsky.social and Sebastian Lapuschkin, who unfortunately couldnβt be there.
Still overwhelmed by the amazing response to our poster session at @neuripsconf.bsky.social with Anna HedstrΓΆm and Marina HΓΆhne! It was incredible to have such lively and inspiring discussions with brilliant people whose work I admire. β¨
13.12.2024 02:48 β π 10 π 2 π¬ 1 π 0Thanks for putting together this amazing list Margaret! I would love to be added if you still have space :)
12.12.2024 08:24 β π 2 π 0 π¬ 1 π 0Want to know more about CoSy?
π Paper: arxiv.org/abs/2405.20331
π» Code: github.com/lkopf/cosy
π Poster: neurips.cc/virtual/2024...
#NeurIPS2024 #MechInterp #ExplainableAI #Interpretability
Special thanks to our supporting institutions: UMI Lab, @xtraexer.bsky.social, @tuberline.bsky.social, Uni Potsdam, ATB Potsdam, and Fraunhofer Heinrich-Hertz-Institut.
11.12.2024 06:43 β π 0 π 0 π¬ 1 π 0My co-authors Anna HedstrΓΆm and Marina HΓΆhne will also be at @neuripsconf.bsky.social. A big thank you to my other co-authors @kirillbykov.bsky.social, @philinelb.bsky.social and Sebastian Lapuschkin, who unfortunately couldnβt be there.
11.12.2024 06:43 β π 0 π 0 π¬ 1 π 0Iβll be presenting our work at @neuripsconf.bsky.social in Vancouver! π
Join me this Thursday, December 12th, in East Exhibit Hall A-C, Poster #3107, from 11 a.m. PST to 2 p.m. PST. I'll be discussing our paper βCoSy: Evaluating Textual Explanations of Neurons.β