Thomas Fel @thomasfel - Bluesky Profile

Check out @mryskina.bsky.social's talk and poster at COLM on Tuesday—we present a method to identify 'semantically consistent' brain regions (responding to concepts across modalities) and show that more semantically consistent brain regions are better predicted by LLMs.

04.10.2025 12:43 — 👍 14 🔁 4 💬 0 📌 0

1/🚨 New preprint

How do #LLMs’ inner features change as they train? Using #crosscoders + a new causal metric, we map when features appear, strengthen, or fade across checkpoints—opening a new lens on training dynamics beyond loss curves & benchmarks.

#interpretability

25.09.2025 14:02 — 👍 13 🔁 6 💬 2 📌 0

Employing mechanistic interpretability to study how models learn, not just where they end up
2 papers find:
There are phase transitions where features emerge and stay throughout learning
🤖📈🧠
alphaxiv.org/pdf/2509.17196
@amuuueller.bsky.social @abosselut.bsky.social
alphaxiv.org/abs/2509.05291

26.09.2025 15:27 — 👍 6 🔁 1 💬 2 📌 0

Paper title "The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?" with the paper's graphical abstract showing how more powerful alignment maps between a DNN and an algorithm allow more complex features to be found and more "accurate" abstractions.

Mechanistic interpretability often relies on *interventions* to study how DNNs work. Are these interventions enough to guarantee the features we find are not spurious? No!⚠️ In our new paper, we show many mech int methods implicitly rely on the linear representation hypothesis🧵

14.07.2025 12:15 — 👍 62 🔁 12 💬 1 📌 1

I was part of an interesting panel discussion yesterday at an ARC event. Maybe everybody knows this already, but I was quite surprised by how "general" intelligence was conceptualized in relation to human intelligence and the ARC benchmarks.

28.09.2025 10:06 — 👍 23 🔁 3 💬 2 📌 1

From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit Motivated by the hypothesis that neural network representations encode abstract, interpretable features as linearly accessible, approximately orthogonal directions, sparse autoencoders (SAEs) have bec...

Phenomenology → principle → method.

From observed phenomena in representations (conditional orthogonality) we derive a natural instantiation.

And it turns out to be an old friend: Matching Pursuit!

📄 arxiv.org/abs/2506.03093

See you in San Diego,
@neuripsconf.bsky.social
🎉

#interpretability

28.09.2025 14:01 — 👍 5 🔁 0 💬 0 📌 0

🚨Our preprint is online!🚨

www.biorxiv.org/content/10.1...

How do #dopamine neurons perform the key calculations in reinforcement #learning?

Read on to find out more! 🧵

19.09.2025 13:05 — 👍 190 🔁 67 💬 10 📌 3

Are there conceptual directions in VLMs that transcend modality? Check out our COLM oral spotlight 🔦 paper! We use SAEs to analyze the multimodality of linear concepts in VLMs

with @chloesu07.bsky.social, @thomasfel.bsky.social, @shamkakade.bsky.social and Stephanie Gil
arxiv.org/abs/2504.11695

17.09.2025 19:12 — 👍 25 🔁 6 💬 1 📌 1

One interesting result: our "Bridge Score" points to concept pairs that connect vision & language.

In the demo you can explore these bridges (links) and see how multimodality shows up ! :)

with @isabelpapad.bsky.social, @chloesu07.bsky.social, @shamkakade.bsky.social and Stephanie Gil

17.09.2025 19:42 — 👍 2 🔁 0 💬 0 📌 0

Check out our COLM 2025 (oral) 🎤

SAEs reveal that VLM embedding spaces aren’t just "image vs. text" cones.
They contain stable conceptual directions, some forming surprising bridges across modalities.

arxiv.org/abs/2504.11695
Demo 👉 vlm-concept-visualization.com

17.09.2025 19:42 — 👍 5 🔁 0 💬 1 📌 0

Home First Workshop on Interpreting Cognition in Deep Learning Models (NeurIPS 2025)

Excited to announce the first workshop on CogInterp: Interpreting Cognition in Deep Learning Models @ NeurIPS 2025! 📣

How can we interpret the algorithms and representations underlying complex behavior in deep learning models?

🌐 coginterp.github.io/neurips2025/

1/4

16.07.2025 13:08 — 👍 58 🔁 19 💬 1 📌 3

How do language models generalize from information they learn in-context vs. via finetuning? In arxiv.org/abs/2505.00661 we show that in-context learning can generalize more flexibly, illustrating key differences in the inductive biases of these modes of learning — and ways to improve finetuning. 1/

02.05.2025 17:02 — 👍 78 🔁 21 💬 4 📌 4

Our work finding universal concepts in vision models is accepted at #ICML2025!!!

My first major conference paper with my wonderful collaborators and friends @matthewkowal.bsky.social @thomasfel.bsky.social
@Julian_Forsyth
@csprofkgd.bsky.social

Working with y'all is the best 🥹

Preprint ⬇️!!

01.05.2025 22:57 — 👍 15 🔁 4 💬 0 📌 1

Accepted at #ICML2025! Check out the preprint.

HUGE shoutout to Harry (1st PhD paper, in 1st year), Julian (1st ever, done as an undergrad), Thomas and Matt!

@hthasarathan.bsky.social @thomasfel.bsky.social @matthewkowal.bsky.social

01.05.2025 15:03 — 👍 35 🔁 7 💬 2 📌 0

<proud advisor>
Hot off the arXiv! 🦬 "Appa: Bending Weather Dynamics with Latent Diffusion Models for Global Data Assimilation" 🌍 Appa is our novel 1.5B-parameter probabilistic weather model that unifies reanalysis, filtering, and forecasting in a single framework. A thread 🧵

29.04.2025 04:48 — 👍 50 🔁 15 💬 2 📌 3

Have you thought that in computer memory model weights are given in terms of discrete values in any case. Thus, why not do probabilistic inference on the discrete (quantized) parameters. @trappmartin.bsky.social is presenting our work at #AABI2025 today. [1/3]

29.04.2025 06:58 — 👍 46 🔁 11 💬 3 📌 1

Interpreting the Linear Structure of Vision-Language Model Embedding Spaces - Kempner Institute Using sparse autoencoders, the authors show that vision-language embeddings boil down to a small, stable dictionary of single-modality concepts that snap together into cross-modal bridges. This resear...

New in the Deeper Learning blog: Kempner researchers show how VLMs speak the same semantic language across images and text.

bit.ly/KempnerVLM

by @isabelpapad.bsky.social ,Chloe Huangyuan Su, @thomasfel.bsky.social, Stephanie Gil, and @shamkakade.bsky.social

#AI #ML #VLMs #SAEs

28.04.2025 16:57 — 👍 9 🔁 3 💬 0 📌 0

Firing rates in visual cortex show representational drift, while temporal spike sequences remain stable Neural firing-rate responses to sensory stimuli show progressive changes both within and across sessions, raising the question of how the brain mainta…

Firing rates in visual cortex show representational drift, while temporal spike sequences remain stable

www.sciencedirect.com/science/arti...

Great work by Boris Sotomayor and with @battaglialab.bsky.social

10.04.2025 10:44 — 👍 70 🔁 20 💬 0 📌 1

APA PsycNet

PINEAPPLE, LIGHT, HAPPY, AVALANCHE, BURDEN

Some of these words are consistently remembered better than others. Why is that?
In our paper, just published in J. Exp. Psychol., we provide a simple Bayesian account and show that it explains >80% of variance in word memorability: tinyurl.com/yf3md5aj

10.04.2025 14:38 — 👍 40 🔁 14 💬 1 📌 0

📽️Recordings from our
@cosynemeeting.bsky.social
#COSYNE2025 workshop on “Agent-Based Models in Neuroscience: Complex Planning, Embodiment, and Beyond" are now online: neuro-agent-models.github.io
🧠🤖

07.04.2025 20:57 — 👍 36 🔁 11 💬 1 📌 0

[...] overall, we argue an SAE does not just reveal concepts—it determines what can be seen at all."

We propose to examine how constraints on SAE impose dual assumptions on the data, led by the amazing
@sumedh-hindupur.bsky.social 😎

07.03.2025 03:27 — 👍 6 🔁 1 💬 0 📌 0

New paper–accepted as *spotlight* at #ICLR2025! 🧵👇

We show a competition dynamic between several algorithms splits a toy model’s ICL abilities into four broad phases of train/test settings! This means ICL is akin to a mixture of different algorithms, not a monolithic ability.

16.02.2025 18:57 — 👍 32 🔁 5 💬 2 📌 1

Want strong SSL, but not the complexity of DINOv2?

CAPI: Cluster and Predict Latents Patches for Improved Masked Image Modeling.

14.02.2025 18:04 — 👍 49 🔁 10 💬 1 📌 1

🚨 New Paper!

Can neuroscience localizers uncover brain-like functional specializations in LLMs? 🧠🤖

Yes! We analyzed 18 LLMs and found units mirroring the brain's language, theory of mind, and multiple demand networks!

w/ @gretatuckute.bsky.social, @abosselut.bsky.social, @mschrimpf.bsky.social
🧵👇

19.12.2024 15:06 — 👍 105 🔁 27 💬 2 📌 5

Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More

Feng Wang, Yaodong Yu, Guoyizhe Wei, Wei Shao, Yuyin Zhou, Alan Yuille, Cihang Xie

tl;dr: we trained 1px patch size ViT so you don't have to. It improves results, but costly.

arxiv.org/abs/2502.03738

08.02.2025 17:10 — 👍 19 🔁 1 💬 1 📌 0

I'm delighted to share this latest research, led by the talented @hthasarathan.bsky.social
and Julian. Their work uncovered both universal conceptual across models but also unique concepts specific to DINOv2 and SigLip! 🔥

07.02.2025 23:39 — 👍 4 🔁 0 💬 0 📌 0

🌌🛰️🔭Wanna know which features are universal vs unique in your models and how to find them? Excited to share our preprint: "Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment"!

arxiv.org/abs/2502.03714

(1/9)

07.02.2025 15:15 — 👍 56 🔁 17 💬 1 📌 5

Using mechanistic #interpretability 💻 to advance scientific #discovery 🧪 & capture striking biology? 🧬
Come see @jhartford.bsky.social's oral presentation 👨‍🏫 @ #NeurIPS2024 Interpretable AI workshop 🦾 to learn more about extracting features from large 🔬 MAEs! Paper 📄 ➡️: openreview.net/forum?id=jYl...

15.12.2024 14:30 — 👍 22 🔁 2 💬 1 📌 0

Did you know that @PyTorch implements the Bessel's correction to standard deviation, but not numpy or jax.

A possible source of disagreements when porting models to pytorch! @numpy_team

03.02.2025 12:52 — 👍 22 🔁 3 💬 1 📌 0

Thomas Fel

Latest posts by thomasfel.bsky.social on Bluesky

@thomasfel is following 20 prominent accounts