Are you at #NeurIPS2025? Check out the #KempnerInstituteโs Day 2 presentations! ๐ก
#AI #NeuroAI
@cpehlevan.bsky.social @kanakarajanphd.bsky.social @thomasfel.bsky.social @andykeller.bsky.social @binxuwang.bsky.social @njw.fish @yilundu.bsky.social
04.12.2025 14:02 โ ๐ 4 ๐ 1 ๐ฌ 0 ๐ 0
Into the Rabbit Hull โ Part I - Kempner Institute
This blog post offers an interpretability deep dive, examining the most important concepts emerging in one of todayโs central vision foundation models, DINOv2. This blogpost is the first of a [โฆ]
๐Into the Rabbit Hull โ Part 1: A Deep Dive into DINOv2๐ง
Our latest Deeper Learning blog post is an #interpretability deep dive into one of todayโs leading vision foundation models: DINOv2.
๐Read now: bit.ly/4nNfq8D
Stay tuned โ Part 2 coming soon.
#AI #VLMs #DINOv2
12.11.2025 15:49 โ ๐ 11 ๐ 2 ๐ฌ 1 ๐ 0
The Bau lab is on fire ! ๐
06.11.2025 14:13 โ ๐ 3 ๐ 0 ๐ฌ 0 ๐ 0
Interested in doing a PhD at the intersection of human and machine cognition? โจ I'm recruiting students for Fall 2026! โจ
Topics of interest include pragmatics, metacognition, reasoning, & interpretability (in humans and AI).
Check out JHU's mentoring program (due 11/15) for help with your SoP ๐
04.11.2025 14:44 โ ๐ 27 ๐ 15 ๐ฌ 0 ๐ 1
Pleased to share new work with @sflippl.bsky.social @eberleoliver.bsky.social @thomasmcgee.bsky.social & undergrad interns at Institute for Pure and Applied Mathematics, UCLA.
Algorithmic Primitives and Compositional Geometry of Reasoning in Language Models
www.arxiv.org/pdf/2510.15987
๐งต1/n
27.10.2025 18:13 โ ๐ 74 ๐ 15 ๐ฌ 1 ๐ 0
๐ง Thrilled to share our NeuroView with Ellie Pavlick!
"From Prediction to Understanding: Will AI Foundation Models Transform Brain Science?"
AI foundation models are coming to neuroscienceโif scaling laws hold, predictive power will be unprecedented.
But is that enough?
Thread ๐งต๐
24.10.2025 11:22 โ ๐ 22 ๐ 8 ๐ฌ 2 ๐ 0
Thx a lot Naomi ! ๐๐ฅน
16.10.2025 21:50 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
This is so cool. When you look at representational geometry, it seems intuitive that models are combining convex regions of "concepts", but I wouldn't have expected that this is PROVABLY true for attention or that there was such a rich theory for this kind of geometry.
16.10.2025 18:33 โ ๐ 33 ๐ 5 ๐ฌ 2 ๐ 1
That concludes this two-part descent into the Rabbit Hull.
Huge thanks to all collaborators who made this work possible โ and especially to @binxuwang.bsky.social , with whom this project was built, experiment after experiment.
๐ฎ kempnerinstitute.github.io/dinovision/
๐ arxiv.org/pdf/2510.08638
15.10.2025 17:13 โ ๐ 5 ๐ 0 ๐ฌ 0 ๐ 0
If this holds, three implications:
(i) Concepts = points (or regions), not directions
(ii) Probing is bounded: toward archetypes, not vectors
(iii) Can't recover generating hulls from sum: we should look deeper than just a single-layer activations to recover the true latents
15.10.2025 17:13 โ ๐ 1 ๐ 1 ๐ฌ 1 ๐ 0
Synthesizing these observations, we propose a refined view, motivated by Gรคrdenfors' theory and attention geometry.
Activations = multiple convex hulls simultaneously: a rabbit among animals, brown among colors, fluffy among textures.
The Minkowski Representation Hypothesis.
15.10.2025 17:13 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
Taken together, the signs of partial density, local connectedness, and coherent dictionary atoms indicate that DINOโs representations are organized beyond linear sparsity alone.
15.10.2025 17:13 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
Can position explain this ?
We found that pos. information collapses: from high-rank to a near 2-dim sheet. Early layers encode precise location; later ones retain abstract axes.
This compression frees dimensions for features, and *position doesn't explain PCA map smoothness*
15.10.2025 17:13 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
Patch embeddings form smooth, connected surfaces tracing objects and boundaries.
This may suggests interpolative geometry: tokens as mixtures between landmarks, shaped by clustering and spreading forces in the training objectives.
15.10.2025 17:13 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
We found antipodal feature pairs (dแตข โ โ dโฑผ): vertical vs horizontal lines, white vs black shirts, left vs rightโฆ
Also, co-activation statistics only moderately shape geometry: concepts that fire together aren't necessarily nearbyโnor orthogonal when they don't.
15.10.2025 17:13 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
Under the Linear Rep. Hypothesis, we'd expect Dictionary to be quasi-orthogonality.
Instead, training drives atoms from near-Grassmannian initialization to higher coherence.
Several concepts fire almost always the embedding is partly dense (!), contradicting pure sparse coding.
15.10.2025 17:13 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
๐ณ๏ธ๐Into the Rabbit Hull โ Part II
Continuing our interpretation of DINOv2, the second part of our study concerns the *geometry of concepts* and the synthesis of our findings toward a new representational *phenomenology*:
the Minkowski Representation Hypothesis
15.10.2025 17:13 โ ๐ 31 ๐ 9 ๐ฌ 2 ๐ 1
Huge thanks to all collaborators who made this work possible, and especially to @binxuwang.bsky.social. This work grew from a year of collaboration!
Tomorrow, Part II: geometry of concepts and Minkowski Representation Hypothesis.
๐น๏ธ kempnerinstitute.github.io/dinovision
๐ arxiv.org/pdf/2510.08638
14.10.2025 21:00 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
Curious tokens, the registers.
DINO seems to use them to encode global invariants: we find concepts (directions) that fire exclusively (!) on registers.
Example of such concepts include motion blur detector and style (game screenshots, drawings, paintings, warped images...)
14.10.2025 21:00 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
Now for depth estimation. How does DINO know depth?
It turns out it has discovered several human-like monocular depth cues: texture gradients resembling blurring or bokeh, shadow detectors, and projective cues.
Most units mix cues, but a few remain remarkably pure.
14.10.2025 21:00 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
Another surprise here: the most important concepts are not object-centric at all, but boundary detectors. Remarkably, these concepts coalesce into a low-dimensional subspace within (see paper).
14.10.2025 21:00 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
This kind of concept breaks a key assumption in interpretability: that a concept is about the tokens where it fires. Here it is the oppositeโthe concept is defined by where it does not fire. An open question is how models form such concepts.
14.10.2025 21:00 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
Let's zoom in on classification.
For every class, we find two concepts: one fires on the object (e.g., "rabbit"), and another fires everywhere *except* the object -- but only when it's present!
We call them Elsewhere Concepts (credit: @davidbau.bsky.social).
14.10.2025 21:00 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
Assuming the Linear Rep. Hypothesis, SAEs arise naturally as instruments for concept extraction, they will be our companions in this descent.
Archetypal SAE uncovered 32k concepts.
Our first observation: different tasks recruit distinct regions of this conceptual space.
14.10.2025 21:00 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
๐ณ๏ธ๐ ๐๐ฃ๐ฉ๐ค ๐ฉ๐๐ ๐๐๐๐๐๐ฉ ๐๐ช๐ก๐ก โ ๐๐๐ง๐ฉ ๐ (๐๐๐๐ก ๐ผ๐ผ ๐ก๐๐๐๐๐๐๐ค)
๐๐ป ๐ถ๐ป๐๐ฒ๐ฟ๐ฝ๐ฟ๐ฒ๐๐ฎ๐ฏ๐ถ๐น๐ถ๐๐ ๐ฑ๐ฒ๐ฒ๐ฝ ๐ฑ๐ถ๐๐ฒ ๐ถ๐ป๐๐ผ ๐๐๐ก๐ข๐๐ฎ, one of visionโs most important foundation models.
And today is Part I, buckle up, we're exploring some of its most charming features. :)
14.10.2025 21:00 โ ๐ 36 ๐ 12 ๐ฌ 2 ๐ 0
Really neat, congrats !
12.10.2025 00:59 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
Superposition disentanglement of neural representations reveals hidden alignment
The superposition hypothesis states that a single neuron within a population may participate in the representation of multiple features in order for the population to represent more features than the ...
Superposition has reshaped interpretability research. In our @unireps.bsky.social paper led by @andre-longon.bsky.social we show it also matters for measuring alignment! Two systems can represent the same features yet appear misaligned if those features are mixed differently across neurons.
08.10.2025 20:54 โ ๐ 9 ๐ 2 ๐ฌ 2 ๐ 0
Explanations are a means to an end
Modern methods for explainable machine learning are designed to describe how models map inputs to outputs--without deep consideration of how these explanations will be used in practice. This paper arg...
For XAI itโs often thought explanations help (boundedly rational) user โunlockโ info in features for some decision. But no one says this, they say vaguer things like โsupporting trustโ. We lay out some implicit assumptions that become clearer when you take a formal view here arxiv.org/abs/2506.22740
08.10.2025 23:12 โ ๐ 30 ๐ 3 ๐ฌ 2 ๐ 0
Beautiful work !
10.10.2025 00:00 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
๐จUpdated: "How far can we go with ImageNet for Text-to-Image generation?"
TL;DR: train a text2image model from scratch on ImageNet only and beat SDXL.
Paper, code, data available! Reproducible science FTW!
๐งต๐
๐ arxiv.org/abs/2502.21318
๐ป github.com/lucasdegeorg...
๐ฝ huggingface.co/arijitghosh/...
08.10.2025 20:40 โ ๐ 43 ๐ 10 ๐ฌ 1 ๐ 2
โขPhD student @ https://www.ucl.ac.uk/gatsby ๐ง ๐ป
โขMasters Theoretical Physics UoM|UCLA๐ช
โขIntern @zuckermanbrain.bsky.social|
@SapienzaRoma | @CERN | @EPFL
https://linktr.ee/Clementine_Domine
Assistant Teaching Professor, Northeastern University (Robotics)
PhD Student at the TU Berlin ML group + BIFOLD | BUA Fellow
Model robustness/correction ๐ค๐ง
Understanding representation spaces ๐โจ
Geometric deep learning + Computer vision
Assistant Professor at UCLA | Alum @MIT @Princeton @UC Berkeley | AI+Cognitive Science+Climate Policy | https://ucla-cocopol.github.io/
Linguistics PhD student at UT Austin
PhD student @ Linkรถping University
I like 3D vision and training neural networks.
Code: https://github.com/parskatt
Weights: https://github.com/Parskatt/storage/releases/tag/roma
|| assistant prof at University of Montreal || leading the systems neuroscience and AI lab (SNAIL: https://www.snailab.ca/) ๐ || associate academic member of Mila (Quebec AI Institute) || #NeuroAI || vision and learning in brains and machines
PhD student in CompNeuro - ๐ฒ๐ฝ@๐ฉ๐ช
Postdoc at Northeastern and incoming Asst. Prof. at Boston U. Working on NLP, interpretability, causality. Previously: JHU, Meta, AWS
PhD Student at Northeastern, working to make LLMs interpretable
M.Eng. Dartmouth | AI alignment | ๐ธ๐ป
https://martinez-ml.github.io/
CS PhD Student, Northeastern University - Machine Learning, Interpretability https://ericwtodd.github.io
Explainable AI research from the machine learning group of Prof. Klaus-Robert Mรผller at @tuberlin.bsky.social & @bifold.berlin
ML Eng. and econometrics. Lot more left-posting than normal. Some hobby-level finance
Regrettably degen trading for the next 3 months, im sorry
Views dont reflect my employer
Mathematics professor at Collรจge de France and fellow of Trinity College Cambridge.
Principal Researcher @ Microsoft Research.
AI, RL, cog neuro, philosophy.
www.momen-nejad.org
Optical imaging, interested in STEM topics.
First generation graduate from ETH Zรผrich
Associate Professor at UT
Southwestern Medical Center
Cognitive neuroscientist.
Professor at College de France in Paris.
Head of the NeuroSpin brain imaging facility in Saclay.
President of the Scientific Council of the French national education ministry (CSEN)
computational neuroscience, emergent behaviour, development economics, human and environmental rights; believes in mathematical structures and safe climbing