Chandan Singh's Avatar

Chandan Singh

@csinva.bsky.social

Seeking superhuman explanations. Senior researcher at Microsoft Research, PhD from UC Berkeley, https://csinva.io/

648 Followers  |  97 Following  |  9 Posts  |  Joined: 05.11.2024  |  1.9201

Latest posts by csinva.bsky.social on Bluesky

Post image

How can an imitative model like an LLM outperform the experts it is trained on? Our new COLM paper outlines three types of transcendence and shows that each one relies on a different aspect of data diversity. arxiv.org/abs/2508.17669

29.08.2025 21:45 β€” πŸ‘ 95    πŸ” 17    πŸ’¬ 3    πŸ“Œ 5

New paper with @rjantonello.bsky.social @csinva.bsky.social, Suna Guo, Gavin Mischler, Jianfeng Gao, & Nima Mesgarani: We use LLMs to generate VERY interpretable embeddings where each dimension corresponds to a scientific theory, & then use these embeddings to predict fMRI and ECoG. It WORKS!

18.08.2025 18:33 β€” πŸ‘ 16    πŸ” 8    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

In our new paper, we explore how we can build encoding models that are both powerful and understandable. Our model uses an LLM to answer 35 questions about a sentence's content. The answers linearly contribute to our prediction of how the brain will respond to that sentence. 1/6

18.08.2025 09:44 β€” πŸ‘ 25    πŸ” 9    πŸ’¬ 1    πŸ“Œ 1

This was a huge effort with a wonderful team:
@rjantonello.bsky.social (co-first), Suna Guo, Gavin Mischler, Jianfeng Gao, Nima Mesgarani, & @alexanderhuth.bsky.social Excited to see how folks use it!

14.08.2025 14:06 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

These maps largely agree with prior findings (from Neurosynth neurosynth.org) and new findings (from a follow-up fMRI experiment using generative causal testing
arxiv.org/abs/2410.00812), suggesting this method is an effective, *automated* way test new hypotheses!

14.08.2025 14:06 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

The model is small enough that we can visualize the whole thing. No feature importances or post-hoc summaries, just 35 questions and a map showing their linear weights for each brain voxel.

14.08.2025 14:06 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We scale up our prior method that builds interpretable embeddings by asking LLMs yes/no questions. We use bigger LLMs, more data, and stability selection to build a 35-question model that generalizes across subjects and modalities bsky.app/profile/csin...

14.08.2025 14:06 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

New paper: Ask 35 simple questions about sentences in a story and use the answers to predict brain responses. Interpretable, compact, & surprisingly high performance in both fMRI and ECoG. 🧡 biorxiv.org/content/10.1...

14.08.2025 14:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1

We’ve discovered a literal miracle with almost unlimited potential and it’s being scrapped for *no reason whatsoever*. This isn’t even nihilism, it’s outright worship of death and human suffering.

05.08.2025 23:09 β€” πŸ‘ 10456    πŸ” 3344    πŸ’¬ 49    πŸ“Œ 164
Post image

Binz et al. (in press, Nature) developed an LLM called Centaur that better predicts human responses in 159 of 160 behavioural experiments compared to existing cognitive models. See: arxiv.org/abs/2410.20268

26.06.2025 20:29 β€” πŸ‘ 64    πŸ” 23    πŸ’¬ 3    πŸ“Œ 7
Post image

We're excited about self-play unlocking continuously improving agents. RL selects CoT patterns from LLMs. Games=perfect testing grounds.
SPIRAL: models learn via self-competition. Kuhn Poker β†’ +8.7% math, +18.1 Minerva Math! πŸƒ
Paper: huggingface.co/papers/2506....
Code: github.com/spiral-rl/spiral

01.07.2025 20:11 β€” πŸ‘ 17    πŸ” 5    πŸ’¬ 2    πŸ“Œ 1
Post image

🚨 New preprint 🚨

Prior work has mapped how the brain encodes concepts: If you see fire and smoke, your brain will represent the fire (hot, bright) and smoke (gray, airy). But how do you encode features of the fire-smoke relation? We analyzed fMRI with embeddings extracted from LLMs to find out 🧡

24.06.2025 13:49 β€” πŸ‘ 32    πŸ” 8    πŸ’¬ 1    πŸ“Œ 2
Preview
Dimensions underlying the representational alignment of deep neural networks with humans - Nature Machine Intelligence An interpretability framework that compares how humans and deep neural networks process images has been presented. Their findings reveal that, unlike humans, deep neural networks focus more on visual ...

What makes humans similar or different to AI? In a paper out in @natmachintell.nature.com led by @florianmahner.bsky.social & @lukasmut.bsky.social, w/ Umut GΓΌclΓΌ, we took a deep look at the factors underlying their representational alignment, with surprising results.

www.nature.com/articles/s42...

23.06.2025 20:02 β€” πŸ‘ 103    πŸ” 36    πŸ’¬ 2    πŸ“Œ 3
Cortex Feature Visualization

🚨Paper alert!🚨
TL;DR first: We used a pre-trained deep neural network to model fMRI data and to generate images predicted to elicit a large response for each many different parts of the brain. We aggregate these into an awesome interactive brain viewer: piecesofmind.psyc.unr.edu/activation_m...

12.06.2025 16:33 β€” πŸ‘ 10    πŸ” 6    πŸ’¬ 2    πŸ“Œ 0
Video thumbnail

What are the organizing dimensions of language processing?

We show that voxel responses during comprehension are organized along 2 main axes: processing difficulty & meaning abstractnessβ€”revealing an interpretable, topographic representational basis for language processing shared across individuals

23.05.2025 16:59 β€” πŸ‘ 71    πŸ” 30    πŸ’¬ 3    πŸ“Œ 0
Post image

When it comes to text prediction, where does one LM outperform another? If you've ever worked on LM evals, you know this question is a lot more complex than it seems. In our new #acl2025 paper, we developed a method to find fine-grained differences between LMs:

🧡1/9

09.06.2025 13:47 β€” πŸ‘ 70    πŸ” 21    πŸ’¬ 2    πŸ“Œ 2
Preview
Novel color via stimulation of individual photoreceptors at population scale Image display by cell-by-cell retina stimulation, enabling colors impossible to see under natural viewing.

Five people have seen a color never before visible to the naked human eye, thanks to a new retinal stimulation technique called Oz.

Learn more in #ScienceAdvances: scim.ag/442Hjn6

21.04.2025 17:56 β€” πŸ‘ 74    πŸ” 11    πŸ’¬ 4    πŸ“Œ 6
This is figure 2 from β€œPhase I/II trial of iPS-cell-derived dopaminergic cells for Parkinson’s disease,” which shows chronological changes in clinical end points.

This is figure 2 from β€œPhase I/II trial of iPS-cell-derived dopaminergic cells for Parkinson’s disease,” which shows chronological changes in clinical end points.

Two clinical trials reported in Nature demonstrate the safety of stem cell therapies for Parkinson’s disease. The papers investigate the use of cells derived from human induced pluripotent stem cells and human embryonic stem cells. go.nature.com/4ikcJc2
go.nature.com/4jfSRYX πŸ§ͺ

16.04.2025 22:18 β€” πŸ‘ 42    πŸ” 9    πŸ’¬ 0    πŸ“Œ 0
Preview
Accelerated learning of a noninvasive human brain-computer interface via manifold geometry Brain-computer interfaces (BCIs) promise to restore and enhance a wide range of human capabilities. However, a barrier to the adoption of BCIs is how long it can take users to learn to control them. W...

New preprint! Excited to share our latest work β€œAccelerated learning of a noninvasive human brain-computer interface via manifold geometry” ft. outstanding former undergraduate Chandra Fincke, @glajoie.bsky.social, @krishnaswamylab.bsky.social, and @wutsaiyale.bsky.social's Nick Turk-Browne 1/8

03.04.2025 23:04 β€” πŸ‘ 66    πŸ” 20    πŸ’¬ 2    πŸ“Œ 3
Post image

New preprint β€œMonkey See, Model Knew: LLMs accurately predict visual responses in humans AND NHPs”
Led by Colin Conwell with @emaliemcmahon.bsky.social Akshay Jagadeesh, Kasper Vinken @amrahs-inolas.bsky.social @jacob-prince.bsky.social George Alvarez @taliakonkle.bsky.social & Marge Livingstone 1/n

14.03.2025 16:14 β€” πŸ‘ 50    πŸ” 19    πŸ’¬ 1    πŸ“Œ 0
Preview
Human neural dynamics of real-world and imagined navigation - Nature Human Behaviour Seeber et al. studied brain recordings from implanted electrodes in freely moving humans. Neural dynamics encoded actual and imagined routes similarly, demonstrating parallels between navigational, im...

🚨 New lab paper!🚨

A dream study of mine for nearly 20 yrs not possible until now thanks to NIH 🧠 funding & 1st-author lead @seeber.bsky.social

We tracked hippocampal activity as people walked memory-guided paths & imagined them again. Did brain patterns reappear?πŸ§΅πŸ‘‡

www.nature.com/articles/s41...

10.03.2025 16:52 β€” πŸ‘ 270    πŸ” 81    πŸ’¬ 10    πŸ“Œ 11
Sanity Checks for Saliency Maps
Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, Been Kim
Saliency methods have emerged as a popular tool to highlight features in an input deemed relevant for the prediction of a learned model. Several saliency methods have been proposed, often guided by visual appeal on image data. In this work, we propose an actionable methodology to evaluate what kinds of explanations a given method can and cannot provide. We find that reliance, solely, on visual assessment can be misleading. Through extensive experiments we show that some existing saliency methods are independent both of the model and of the data generating process. Consequently, methods that fail the proposed tests are inadequate for tasks that are sensitive to either data or model, such as, finding outliers in the data, explaining the relationship between inputs and outputs that the model learned, and debugging the model. We interpret our findings through an analogy with edge detection in images, a technique that requires neither training data nor model. Theory in the case of a linear model and a single-layer convolutional neural network supports our experimental findings.

Sanity Checks for Saliency Maps Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, Been Kim Saliency methods have emerged as a popular tool to highlight features in an input deemed relevant for the prediction of a learned model. Several saliency methods have been proposed, often guided by visual appeal on image data. In this work, we propose an actionable methodology to evaluate what kinds of explanations a given method can and cannot provide. We find that reliance, solely, on visual assessment can be misleading. Through extensive experiments we show that some existing saliency methods are independent both of the model and of the data generating process. Consequently, methods that fail the proposed tests are inadequate for tasks that are sensitive to either data or model, such as, finding outliers in the data, explaining the relationship between inputs and outputs that the model learned, and debugging the model. We interpret our findings through an analogy with edge detection in images, a technique that requires neither training data nor model. Theory in the case of a linear model and a single-layer convolutional neural network supports our experimental findings.

Sparse Autoencoders Can Interpret Randomly Initialized Transformers
Thomas Heap, Tim Lawson, Lucy Farnik, Laurence Aitchison
Sparse autoencoders (SAEs) are an increasingly popular technique for interpreting the internal representations of transformers. In this paper, we apply SAEs to 'interpret' random transformers, i.e., transformers where the parameters are sampled IID from a Gaussian rather than trained on text data. We find that random and trained transformers produce similarly interpretable SAE latents, and we confirm this finding quantitatively using an open-source auto-interpretability pipeline. Further, we find that SAE quality metrics are broadly similar for random and trained transformers. We find that these results hold across model sizes and layers. We discuss a number of number interesting questions that this work raises for the use of SAEs and auto-interpretability in the context of mechanistic interpretability.

Sparse Autoencoders Can Interpret Randomly Initialized Transformers Thomas Heap, Tim Lawson, Lucy Farnik, Laurence Aitchison Sparse autoencoders (SAEs) are an increasingly popular technique for interpreting the internal representations of transformers. In this paper, we apply SAEs to 'interpret' random transformers, i.e., transformers where the parameters are sampled IID from a Gaussian rather than trained on text data. We find that random and trained transformers produce similarly interpretable SAE latents, and we confirm this finding quantitatively using an open-source auto-interpretability pipeline. Further, we find that SAE quality metrics are broadly similar for random and trained transformers. We find that these results hold across model sizes and layers. We discuss a number of number interesting questions that this work raises for the use of SAEs and auto-interpretability in the context of mechanistic interpretability.

2018: Saliency maps give plausible interpretations of random weights, triggering skepticism and catalyzing the mechinterp cultural movement, which now advocates for SAEs.

2025: SAEs give plausible interpretations of random weights, triggering skepticism and ...

03.03.2025 18:42 β€” πŸ‘ 95    πŸ” 15    πŸ’¬ 2    πŸ“Œ 0

Yi Ma & colleagues managed to simplify DINO & DINOv2 by removing many ingredients and adding a robust regularization term from information theory (coding rate) that learn informative decorrelated features. Happy to see principled approaches advance deep representation learning!

18.02.2025 14:24 β€” πŸ‘ 6    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
Post image

Can LLMs be used to discover interpretable models of human and animal behavior?πŸ€”

Turns out: yes!

Thrilled to share our latest preprint where we used FunSearch to automatically discover symbolic cognitive models of behavior.
1/12

10.02.2025 12:21 β€” πŸ‘ 134    πŸ” 44    πŸ’¬ 3    πŸ“Œ 11
Post image

"...responses of V4 neurons under naturalistic conditions can be explained by a hierarchical three-stage model where each stage consists entirely of units like those found in area V1"

#NeuroAI

www.biorxiv.org/content/10.1...

24.12.2024 08:55 β€” πŸ‘ 39    πŸ” 16    πŸ’¬ 0    πŸ“Œ 1

Hi, could you possibly add me? Thanks!

08.12.2024 22:57 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

At NeurIPS this week, presenting our work on crafting *interpretable embeddings* by asking yes/no questions to black-box LLMs.

Drop me a message if you want to chat about interpretability/language neuroscience!

07.12.2024 15:05 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1

I tried to find everyone who works in the area but I certainly missed some folks so please lmk...
go.bsky.app/BYkRryU

23.11.2024 05:11 β€” πŸ‘ 53    πŸ” 18    πŸ’¬ 32    πŸ“Œ 0

Science faces an explainability crisis: ML models can predict many natural phenomena but can't explain them

We tackle this issue in language neuroscience by using LLMs to generate *and validate* explanations with targeted follow-up experiments

20.11.2024 19:31 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Mechanistic interp has made cool findings but struggled to make them useful

We show that "induction heads" found in LLMs can be reverse-engineered to yield accurate & interpretable next-word prediction models

20.11.2024 19:28 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@csinva is following 20 prominent accounts