Thomas Fel's Avatar

Thomas Fel

@thomasfel.bsky.social

Explainability, Computer Vision, Neuro-AI.๐Ÿชด Kempner Fellow @Harvard. Prev. PhD @Brown, @Google, @GoPro. Crรชpe lover. ๐Ÿ“ Boston | ๐Ÿ”— thomasfel.me

1,334 Followers  |  340 Following  |  15 Posts  |  Joined: 16.11.2024  |  2.0932

Latest posts by thomasfel.bsky.social on Bluesky

Check out @mryskina.bsky.social's talk and poster at COLM on Tuesdayโ€”we present a method to identify 'semantically consistent' brain regions (responding to concepts across modalities) and show that more semantically consistent brain regions are better predicted by LLMs.

04.10.2025 12:43 โ€” ๐Ÿ‘ 14    ๐Ÿ” 4    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

1/๐Ÿšจ New preprint

How do #LLMsโ€™ inner features change as they train? Using #crosscoders + a new causal metric, we map when features appear, strengthen, or fade across checkpointsโ€”opening a new lens on training dynamics beyond loss curves & benchmarks.

#interpretability

25.09.2025 14:02 โ€” ๐Ÿ‘ 13    ๐Ÿ” 6    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Post image

Employing mechanistic interpretability to study how models learn, not just where they end up
2 papers find:
There are phase transitions where features emerge and stay throughout learning
๐Ÿค–๐Ÿ“ˆ๐Ÿง 
alphaxiv.org/pdf/2509.17196
@amuuueller.bsky.social @abosselut.bsky.social
alphaxiv.org/abs/2509.05291

26.09.2025 15:27 โ€” ๐Ÿ‘ 6    ๐Ÿ” 1    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Paper title "The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?" with the paper's graphical abstract showing how more powerful alignment maps between a DNN and an algorithm allow more complex features to be found and more "accurate" abstractions.

Paper title "The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?" with the paper's graphical abstract showing how more powerful alignment maps between a DNN and an algorithm allow more complex features to be found and more "accurate" abstractions.

Mechanistic interpretability often relies on *interventions* to study how DNNs work. Are these interventions enough to guarantee the features we find are not spurious? No!โš ๏ธ In our new paper, we show many mech int methods implicitly rely on the linear representation hypothesis๐Ÿงต

14.07.2025 12:15 โ€” ๐Ÿ‘ 62    ๐Ÿ” 12    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

I was part of an interesting panel discussion yesterday at an ARC event. Maybe everybody knows this already, but I was quite surprised by how "general" intelligence was conceptualized in relation to human intelligence and the ARC benchmarks.

28.09.2025 10:06 โ€” ๐Ÿ‘ 23    ๐Ÿ” 3    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1
Preview
From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit Motivated by the hypothesis that neural network representations encode abstract, interpretable features as linearly accessible, approximately orthogonal directions, sparse autoencoders (SAEs) have bec...

Phenomenology โ†’ principle โ†’ method.

From observed phenomena in representations (conditional orthogonality) we derive a natural instantiation.

And it turns out to be an old friend: Matching Pursuit!

๐Ÿ“„ arxiv.org/abs/2506.03093

See you in San Diego,
@neuripsconf.bsky.social
๐ŸŽ‰

#interpretability

28.09.2025 14:01 โ€” ๐Ÿ‘ 5    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

๐ŸšจOur preprint is online!๐Ÿšจ

www.biorxiv.org/content/10.1...

How do #dopamine neurons perform the key calculations in reinforcement #learning?

Read on to find out more! ๐Ÿงต

19.09.2025 13:05 โ€” ๐Ÿ‘ 190    ๐Ÿ” 67    ๐Ÿ’ฌ 10    ๐Ÿ“Œ 3
Post image

Are there conceptual directions in VLMs that transcend modality? Check out our COLM oral spotlight ๐Ÿ”ฆ paper! We use SAEs to analyze the multimodality of linear concepts in VLMs

with @chloesu07.bsky.social, @thomasfel.bsky.social, @shamkakade.bsky.social and Stephanie Gil
arxiv.org/abs/2504.11695

17.09.2025 19:12 โ€” ๐Ÿ‘ 25    ๐Ÿ” 6    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Video thumbnail

One interesting result: our "Bridge Score" points to concept pairs that connect vision & language.

In the demo you can explore these bridges (links) and see how multimodality shows up ! :)

with @isabelpapad.bsky.social, @chloesu07.bsky.social, @shamkakade.bsky.social and Stephanie Gil

17.09.2025 19:42 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Check out our COLM 2025 (oral) ๐ŸŽค

SAEs reveal that VLM embedding spaces arenโ€™t just "image vs. text" cones.
They contain stable conceptual directions, some forming surprising bridges across modalities.

arxiv.org/abs/2504.11695
Demo ๐Ÿ‘‰ vlm-concept-visualization.com

17.09.2025 19:42 โ€” ๐Ÿ‘ 5    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Home First Workshop on Interpreting Cognition in Deep Learning Models (NeurIPS 2025)

Excited to announce the first workshop on CogInterp: Interpreting Cognition in Deep Learning Models @ NeurIPS 2025! ๐Ÿ“ฃ

How can we interpret the algorithms and representations underlying complex behavior in deep learning models?

๐ŸŒ coginterp.github.io/neurips2025/

1/4

16.07.2025 13:08 โ€” ๐Ÿ‘ 58    ๐Ÿ” 19    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 3

How do language models generalize from information they learn in-context vs. via finetuning? In arxiv.org/abs/2505.00661 we show that in-context learning can generalize more flexibly, illustrating key differences in the inductive biases of these modes of learning โ€” and ways to improve finetuning. 1/

02.05.2025 17:02 โ€” ๐Ÿ‘ 78    ๐Ÿ” 21    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 4

Our work finding universal concepts in vision models is accepted at #ICML2025!!!

My first major conference paper with my wonderful collaborators and friends @matthewkowal.bsky.social @thomasfel.bsky.social
@Julian_Forsyth
@csprofkgd.bsky.social

Working with y'all is the best ๐Ÿฅน

Preprint โฌ‡๏ธ!!

01.05.2025 22:57 โ€” ๐Ÿ‘ 15    ๐Ÿ” 4    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1

Accepted at #ICML2025! Check out the preprint.

HUGE shoutout to Harry (1st PhD paper, in 1st year), Julian (1st ever, done as an undergrad), Thomas and Matt!

@hthasarathan.bsky.social @thomasfel.bsky.social @matthewkowal.bsky.social

01.05.2025 15:03 โ€” ๐Ÿ‘ 35    ๐Ÿ” 7    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Video thumbnail

<proud advisor>
Hot off the arXiv! ๐Ÿฆฌ "Appa: Bending Weather Dynamics with Latent Diffusion Models for Global Data Assimilation" ๐ŸŒ Appa is our novel 1.5B-parameter probabilistic weather model that unifies reanalysis, filtering, and forecasting in a single framework. A thread ๐Ÿงต

29.04.2025 04:48 โ€” ๐Ÿ‘ 50    ๐Ÿ” 15    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 3
Post image

Have you thought that in computer memory model weights are given in terms of discrete values in any case. Thus, why not do probabilistic inference on the discrete (quantized) parameters. @trappmartin.bsky.social is presenting our work at #AABI2025 today. [1/3]

29.04.2025 06:58 โ€” ๐Ÿ‘ 46    ๐Ÿ” 11    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 1
Preview
Interpreting the Linear Structure of Vision-Language Model Embedding Spaces - Kempner Institute Using sparse autoencoders, the authors show that vision-language embeddings boil down to a small, stable dictionary of single-modality concepts that snap together into cross-modal bridges. This resear...

New in the Deeper Learning blog: Kempner researchers show how VLMs speak the same semantic language across images and text.

bit.ly/KempnerVLM

by @isabelpapad.bsky.social ,Chloe Huangyuan Su, @thomasfel.bsky.social, Stephanie Gil, and @shamkakade.bsky.social

#AI #ML #VLMs #SAEs

28.04.2025 16:57 โ€” ๐Ÿ‘ 9    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Firing rates in visual cortex show representational drift, while temporal spike sequences remain stable Neural firing-rate responses to sensory stimuli show progressive changes both within and across sessions, raising the question of how the brain maintaโ€ฆ

Firing rates in visual cortex show representational drift, while temporal spike sequences remain stable

www.sciencedirect.com/science/arti...

Great work by Boris Sotomayor and with @battaglialab.bsky.social

10.04.2025 10:44 โ€” ๐Ÿ‘ 70    ๐Ÿ” 20    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
APA PsycNet

PINEAPPLE, LIGHT, HAPPY, AVALANCHE, BURDEN

Some of these words are consistently remembered better than others. Why is that?
In our paper, just published in J. Exp. Psychol., we provide a simple Bayesian account and show that it explains >80% of variance in word memorability: tinyurl.com/yf3md5aj

10.04.2025 14:38 โ€” ๐Ÿ‘ 40    ๐Ÿ” 14    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

๐Ÿ“ฝ๏ธRecordings from our
@cosynemeeting.bsky.social
#COSYNE2025 workshop on โ€œAgent-Based Models in Neuroscience: Complex Planning, Embodiment, and Beyond" are now online: neuro-agent-models.github.io
๐Ÿง ๐Ÿค–

07.04.2025 20:57 โ€” ๐Ÿ‘ 36    ๐Ÿ” 11    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

[...] overall, we argue an SAE does not just reveal conceptsโ€”it determines what can be seen at all."

We propose to examine how constraints on SAE impose dual assumptions on the data, led by the amazing
@sumedh-hindupur.bsky.social ๐Ÿ˜Ž

07.03.2025 03:27 โ€” ๐Ÿ‘ 6    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

New paperโ€“accepted as *spotlight* at #ICLR2025! ๐Ÿงต๐Ÿ‘‡

We show a competition dynamic between several algorithms splits a toy modelโ€™s ICL abilities into four broad phases of train/test settings! This means ICL is akin to a mixture of different algorithms, not a monolithic ability.

16.02.2025 18:57 โ€” ๐Ÿ‘ 32    ๐Ÿ” 5    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1
Post image

Want strong SSL, but not the complexity of DINOv2?

CAPI: Cluster and Predict Latents Patches for Improved Masked Image Modeling.

14.02.2025 18:04 โ€” ๐Ÿ‘ 49    ๐Ÿ” 10    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Post image

๐Ÿšจ New Paper!

Can neuroscience localizers uncover brain-like functional specializations in LLMs? ๐Ÿง ๐Ÿค–

Yes! We analyzed 18 LLMs and found units mirroring the brain's language, theory of mind, and multiple demand networks!

w/ @gretatuckute.bsky.social, @abosselut.bsky.social, @mschrimpf.bsky.social
๐Ÿงต๐Ÿ‘‡

19.12.2024 15:06 โ€” ๐Ÿ‘ 105    ๐Ÿ” 27    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 5
Post image Post image Post image Post image

Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More

Feng Wang, Yaodong Yu, Guoyizhe Wei, Wei Shao, Yuyin Zhou, Alan Yuille, Cihang Xie

tl;dr: we trained 1px patch size ViT so you don't have to. It improves results, but costly.

arxiv.org/abs/2502.03738

08.02.2025 17:10 โ€” ๐Ÿ‘ 19    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I'm delighted to share this latest research, led by the talented @hthasarathan.bsky.social
and Julian. Their work uncovered both universal conceptual across models but also unique concepts specific to DINOv2 and SigLip! ๐Ÿ”ฅ

07.02.2025 23:39 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

๐ŸŒŒ๐Ÿ›ฐ๏ธ๐Ÿ”ญWanna know which features are universal vs unique in your models and how to find them? Excited to share our preprint: "Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment"!

arxiv.org/abs/2502.03714

(1/9)

07.02.2025 15:15 โ€” ๐Ÿ‘ 56    ๐Ÿ” 17    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 5
Post image

Using mechanistic #interpretability ๐Ÿ’ป to advance scientific #discovery ๐Ÿงช & capture striking biology? ๐Ÿงฌ
Come see @jhartford.bsky.social's oral presentation ๐Ÿ‘จโ€๐Ÿซ @ #NeurIPS2024 Interpretable AI workshop ๐Ÿฆพ to learn more about extracting features from large ๐Ÿ”ฌ MAEs! Paper ๐Ÿ“„ โžก๏ธ: openreview.net/forum?id=jYl...

15.12.2024 14:30 โ€” ๐Ÿ‘ 22    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Did you know that @PyTorch implements the Bessel's correction to standard deviation, but not numpy or jax.

A possible source of disagreements when porting models to pytorch! @numpy_team

03.02.2025 12:52 โ€” ๐Ÿ‘ 22    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

@thomasfel is following 20 prominent accounts