Home
First Workshop on Interpreting Cognition in Deep Learning Models (NeurIPS 2025)
Excited to announce the first workshop on CogInterp: Interpreting Cognition in Deep Learning Models @ NeurIPS 2025! π£
How can we interpret the algorithms and representations underlying complex behavior in deep learning models?
π coginterp.github.io/neurips2025/
1/4
16.07.2025 13:08 β π 52 π 18 π¬ 1 π 1
How do language models generalize from information they learn in-context vs. via finetuning? In arxiv.org/abs/2505.00661 we show that in-context learning can generalize more flexibly, illustrating key differences in the inductive biases of these modes of learning β and ways to improve finetuning. 1/
02.05.2025 17:02 β π 78 π 21 π¬ 4 π 4
Our work finding universal concepts in vision models is accepted at #ICML2025!!!
My first major conference paper with my wonderful collaborators and friends @matthewkowal.bsky.social @thomasfel.bsky.social
@Julian_Forsyth
@csprofkgd.bsky.social
Working with y'all is the best π₯Ή
Preprint β¬οΈ!!
01.05.2025 22:57 β π 15 π 4 π¬ 0 π 1
Accepted at #ICML2025! Check out the preprint.
HUGE shoutout to Harry (1st PhD paper, in 1st year), Julian (1st ever, done as an undergrad), Thomas and Matt!
@hthasarathan.bsky.social @thomasfel.bsky.social @matthewkowal.bsky.social
01.05.2025 15:03 β π 35 π 7 π¬ 2 π 0
<proud advisor>
Hot off the arXiv! 𦬠"Appa: Bending Weather Dynamics with Latent Diffusion Models for Global Data Assimilation" π Appa is our novel 1.5B-parameter probabilistic weather model that unifies reanalysis, filtering, and forecasting in a single framework. A thread π§΅
29.04.2025 04:48 β π 49 π 15 π¬ 2 π 3
Have you thought that in computer memory model weights are given in terms of discrete values in any case. Thus, why not do probabilistic inference on the discrete (quantized) parameters. @trappmartin.bsky.social is presenting our work at #AABI2025 today. [1/3]
29.04.2025 06:58 β π 46 π 11 π¬ 3 π 1
APA PsycNet
PINEAPPLE, LIGHT, HAPPY, AVALANCHE, BURDEN
Some of these words are consistently remembered better than others. Why is that?
In our paper, just published in J. Exp. Psychol., we provide a simple Bayesian account and show that it explains >80% of variance in word memorability: tinyurl.com/yf3md5aj
10.04.2025 14:38 β π 40 π 15 π¬ 1 π 0
π½οΈRecordings from our
@cosynemeeting.bsky.social
#COSYNE2025 workshop on βAgent-Based Models in Neuroscience: Complex Planning, Embodiment, and Beyond" are now online: neuro-agent-models.github.io
π§ π€
07.04.2025 20:57 β π 36 π 11 π¬ 1 π 0
[...] overall, we argue an SAE does not just reveal conceptsβit determines what can be seen at all."
We propose to examine how constraints on SAE impose dual assumptions on the data, led by the amazing
@sumedh-hindupur.bsky.social π
07.03.2025 03:27 β π 6 π 1 π¬ 0 π 0
New paperβaccepted as *spotlight* at #ICLR2025! π§΅π
We show a competition dynamic between several algorithms splits a toy modelβs ICL abilities into four broad phases of train/test settings! This means ICL is akin to a mixture of different algorithms, not a monolithic ability.
16.02.2025 18:57 β π 32 π 5 π¬ 2 π 1
Want strong SSL, but not the complexity of DINOv2?
CAPI: Cluster and Predict Latents Patches for Improved Masked Image Modeling.
14.02.2025 18:04 β π 49 π 10 π¬ 1 π 1
π¨ New Paper!
Can neuroscience localizers uncover brain-like functional specializations in LLMs? π§ π€
Yes! We analyzed 18 LLMs and found units mirroring the brain's language, theory of mind, and multiple demand networks!
w/ @gretatuckute.bsky.social, @abosselut.bsky.social, @mschrimpf.bsky.social
π§΅π
19.12.2024 15:06 β π 105 π 27 π¬ 2 π 5
I'm delighted to share this latest research, led by the talented @hthasarathan.bsky.social
and Julian. Their work uncovered both universal conceptual across models but also unique concepts specific to DINOv2 and SigLip! π₯
07.02.2025 23:39 β π 4 π 0 π¬ 0 π 0
ππ°οΈπWanna know which features are universal vs unique in your models and how to find them? Excited to share our preprint: "Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment"!
arxiv.org/abs/2502.03714
(1/9)
07.02.2025 15:15 β π 56 π 17 π¬ 1 π 5
Using mechanistic #interpretability π» to advance scientific #discovery π§ͺ & capture striking biology? π§¬
Come see @jhartford.bsky.social's oral presentation π¨βπ« @ #NeurIPS2024 Interpretable AI workshop π¦Ύ to learn more about extracting features from large π¬ MAEs! Paper π β‘οΈ: openreview.net/forum?id=jYl...
15.12.2024 14:30 β π 22 π 2 π¬ 1 π 0
Did you know that @PyTorch implements the Bessel's correction to standard deviation, but not numpy or jax.
A possible source of disagreements when porting models to pytorch! @numpy_team
03.02.2025 12:52 β π 22 π 3 π¬ 1 π 0
Picture of neural manifolds for the two choices in a decision-making task, depicted in 3D and in 2D
New preprint: "The geometry of the neural state space of decisions", work by Mauro Monsalve-Mercado, buff.ly/42wVHD5. Surprising results & predictions! (Thread) We analyze neuropixel population recordings in macaque area LIP during a reaction time, random-dot motion 1/
31.01.2025 14:32 β π 175 π 67 π¬ 3 π 2
How do tokens evolve as they are processed by a deep Transformer?
With JosΓ© A. Carrillo, @gabrielpeyre.bsky.social and @pierreablin.bsky.social, we tackle this in our new preprint: A Unified Perspective on the Dynamics of Deep Transformers arxiv.org/abs/2501.18322
ML and PDE lovers, check it out!
31.01.2025 16:56 β π 96 π 16 π¬ 2 π 0
AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders
ArXiv link for AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders
The research introduces AXBENCH, demonstrating that prompting outperforms complex representation methods like sparse autoencoders. It features a new weakly-supervised method, ReFT-r1, which combines interpretability with competitive performance. https://arxiv.org/abs/2501.17148
29.01.2025 11:40 β π 2 π 1 π¬ 0 π 0
True haha !!
30.01.2025 15:21 β π 1 π 0 π¬ 0 π 0
Haha could be, but i have another hypothesis, i'll show you next week π
30.01.2025 15:21 β π 1 π 0 π¬ 0 π 0
DinoV2, C:5232... πΆβπ«οΈ
30.01.2025 02:32 β π 1 π 0 π¬ 2 π 0
Does the culture you grow up in shape the way you see the world? In a new Psych Review paper, @chazfirestone.bsky.social & I tackle this centuries-old question using the MΓΌller-Lyer illusion as a case study. Come think through one of history's mysteries with usπ§΅(1/13):
25.01.2025 22:05 β π 1095 π 419 π¬ 33 π 79
ICLR 2025 Workshop XAI4Science
Welcome to the OpenReview homepage for ICLR 2025 Workshop XAI4Science
Join us at our upcoming workshop at ICLR, XAI4Science: From Understanding Model Behavior to Discovering New Scientific Knwlg, Apr 27-28
submissions on a-priori (ante-hoc) & a-posteriori (post-hoc) interpretability & self-explainable models for understanding modelβs behvr welcm tinyurl.com/3w8sddpm
20.01.2025 17:04 β π 33 π 15 π¬ 1 π 0
The Mathematics of Artificial Intelligence: In this introductory and highly subjective survey, aimed at a general mathematical audience, I showcase some key theoretical concepts underlying recent advancements in machine learning. arxiv.org/abs/2501.10465
22.01.2025 09:11 β π 140 π 40 π¬ 2 π 1
π§ Researcher: noorsajid.com
PhD @Stanford working w Noah Goodman
Studying in-context learning and reasoning in humans and machines
Prev. @UofT CS & Psych
Workshop on Fine-Grained Visual Categorization (FGVC) - CVPR
Nashville, June 11, 9am-17pm
Room: 104 E
https://sites.google.com/view/fgvc12
Head of XAI research at Fraunhofer HHI
Google Scholar: https://scholar.google.de/citations?user=wpLQuroAAAAJ
Associate professor in machine learning at the University of Amsterdam. Topics: (online) learning theory and the mathematics of interpretable AI.
www.timvanerven.nl
Theory of Interpretable AI seminar: https://tverven.github.io/tiai-seminar
π§ computational neuroscience | neurophenomenology 2.0? π€
Science Writer, Kempner Institute for the study of Natural and Artificial Intelligence at Harvard University
(views here are my own)
Cognitive Neuroscientist and Associate Professor of Psychology at George Mason University. Perception of Time, Memory, & Action. Exec Director @ http://timingforum.org
research tech @ relcog lab, uci | ombbhatt.github.io | that blue pi in 3b1b videos is my spirit animal
Senior Director of AI/ML Research Engineering, Kempner Institute @Harvard , views are my own
Postdoctoral Fellow at Harvard Kempner Institute. Trying to bring natural structure to artificial neural representations. Prev: PhD at UvA. Intern @ Apple MLR, Work @ Intel Nervana
Chief Models Officer @ Stealth Startup; Inria & MVA - Ex: Llama @AIatMeta & Gemini and BYOL @GoogleDeepMind
Computational and systems neuroscience, data analysis, machine learning
phd @ mit, research @ genlm, intern @ apple
https://benlipkin.github.io/
PhD Candidate at Hebart-Lab (Vision and Computational Cognition).
https://laurastoinski.com/
https://hebartlab.com/
The 2025 Conference on Language Modeling will take place at the Palais des Congrès in Montreal, Canada from October 7-10, 2025
Group Leader in TΓΌbingen, Germany
Iβm π«π· and I work on RL and lifelong learning. Mostly posting on ML related topics.
Grad Student at Harvard SEAS
Interested in ML Interpretability, Computational Neuroscience, Signal Processing
Tenured Researcher @INRIA, Ockham team. Teacher @Polytechnique
and @ENSdeLyon
Machine Learning, Python and Optimization