Laura Kopf's Avatar

Laura Kopf

@lkopf.bsky.social

PhD student in Interpretable Machine Learning at TU Berlin & BIFOLD

325 Followers  |  380 Following  |  17 Posts  |  Joined: 06.12.2024  |  1.6361

Latest posts by lkopf.bsky.social on Bluesky

Overview of descriptions for model components (neurons, attention heads) and model abstractions (SAE features, circuits).

Overview of descriptions for model components (neurons, attention heads) and model abstractions (SAE features, circuits).

πŸ” Are you curious about uncovering the underlying mechanisms and identifying the roles of model components (neurons, …) and abstractions (SAEs, …)?

We provide the first survey of concept description generation and evaluation methods.

Joint effort w/ @lkopf.bsky.social

πŸ“„ arxiv.org/abs/2510.01048

02.10.2025 09:13 β€” πŸ‘ 17    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0

Many thanks as well to the institutions that supported this research:
@tuberlin.bsky.social
@bifold.berlin
UMI Lab
@fraunhoferhhi.bsky.social
@unipotsdam.bsky.social
@leibnizatb.bsky.social

19.09.2025 12:01 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I’m very grateful to my amazing collaborators @nfel.bsky.social, @kirillbykov.bsky.social, @philinelb.bsky.social, Anna HedstrΓΆm, Marina M.-C. HΓΆhne, and @eberleoliver.bsky.social πŸ™

19.09.2025 12:01 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

Happy to share that our PRISM paper has been accepted at #NeurIPS2025 πŸŽ‰

In this work, we introduce a multi-concept feature description framework that can identify and score polysemantic features.

πŸ“„ Paper: arxiv.org/abs/2506.15538

#NeurIPS #MechInterp #XAI

19.09.2025 12:01 β€” πŸ‘ 25    πŸ” 3    πŸ’¬ 1    πŸ“Œ 3

Grateful to the institutions that supported this work:
@tuberlin.bsky.social
@bifold.berlin
UMI Lab
@fraunhoferhhi.bsky.social
@unipotsdam.bsky.social
@leibnizatb.bsky.social

(7/7)

19.06.2025 15:18 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Many thanks to my amazing co-authors:
@nfel.bsky.social
@kirillbykov.bsky.social
@philinelb.bsky.social
Anna HedstrΓΆm
Marina M.-C. HΓΆhne
@eberleoliver.bsky.social

(6/7)

19.06.2025 15:18 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Our results highlight that the PRISM framework not only provides multiple human interpretable descriptions for neurons but also aligns with the human interpretation of polysemanticity. (5/7)

19.06.2025 15:18 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

In exploring the concept space, we use PRISM to characterize more complex components, finding and interpreting patterns that specific attention heads or groups of neurons respond to. (4/7)

19.06.2025 15:18 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We benchmark PRISM across layers and architectures, showing how polysemanticity and interpretability shift through the model. (3/7)

19.06.2025 15:18 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

PRISM samples sentences from the top percentile activation distribution, clusters them in embedding space, and uses an LLM to generate labels for each concept cluster. (2/7)

19.06.2025 15:18 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸ” When do neurons encode multiple concepts?

We introduce PRISM, a framework for extracting multi-concept feature descriptions to better understand polysemanticity.

πŸ“„ Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework
arxiv.org/abs/2506.15538

🧡 (1/7)

19.06.2025 15:18 β€” πŸ‘ 36    πŸ” 12    πŸ’¬ 1    πŸ“Œ 3

Huge thanks to my incredible supervisor
@kirillbykov.bsky.social, who laid the foundation for this project and provided brilliant guidance πŸ™, and to @philinelb.bsky.social and Sebastian Lapuschkin, who unfortunately couldn’t be there.

13.12.2024 02:48 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Still overwhelmed by the amazing response to our poster session at @neuripsconf.bsky.social with Anna Hedstrâm and Marina Hâhne! It was incredible to have such lively and inspiring discussions with brilliant people whose work I admire. ✨

13.12.2024 02:48 β€” πŸ‘ 11    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0

Thanks for putting together this amazing list Margaret! I would love to be added if you still have space :)

12.12.2024 08:24 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Want to know more about CoSy?
πŸ“„ Paper: arxiv.org/abs/2405.20331
πŸ’» Code: github.com/lkopf/cosy
πŸ”— Poster: neurips.cc/virtual/2024...

#NeurIPS2024 #MechInterp #ExplainableAI #Interpretability

11.12.2024 06:43 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Special thanks to our supporting institutions: UMI Lab, @xtraexer.bsky.social, @tuberline.bsky.social, Uni Potsdam, ATB Potsdam, and Fraunhofer Heinrich-Hertz-Institut.

11.12.2024 06:43 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

My co-authors Anna HedstrΓΆm and Marina HΓΆhne will also be at @neuripsconf.bsky.social. A big thank you to my other co-authors @kirillbykov.bsky.social, @philinelb.bsky.social and Sebastian Lapuschkin, who unfortunately couldn’t be there.

11.12.2024 06:43 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

I’ll be presenting our work at @neuripsconf.bsky.social in Vancouver! πŸŽ‰
Join me this Thursday, December 12th, in East Exhibit Hall A-C, Poster #3107, from 11 a.m. PST to 2 p.m. PST. I'll be discussing our paper β€œCoSy: Evaluating Textual Explanations of Neurons.”

11.12.2024 06:43 β€” πŸ‘ 10    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

@lkopf is following 20 prominent accounts