It’s a very exciting time to be thinking about the interaction of vision and language, and what we can find in (and learn from) VLMs. Looking forward to talking to people about this at COLM, and thanks to everyone doing awesome research on this topic!
Lastly, we didn’t just go blindly into batchtopk SAEs, we tried other SAEs and a semi-NMF, but they don’t work as well: batchtopk dominates the reconstruction-sparsity tradeoff
Check out our interactive demo (by the amazing @napoolar), where bridges illustrate our BridgeScore metric: a combination geometrical alignment (cosine) and statistical alignment (coactivation on image-caption pairs): vlm-concept-visualization.com
And they’re stable ~across training data mixtures~! If we train the SAEs with a 5:1 ratio of text to images, we get a lot more text concepts (makes sense!). But if we weight the points by activation scores (bottom), we see basically the same concepts across very different mixtures
But, are the SAEs even stable? It wouldn’t be very enlightening if we were just analyzing a fluke of the SAE seed. Across seeds, we find that frequently-used concepts (the ones that take up 99% of activation weights) are remarkably stable, but the rest are pretty darn unstable.
How can this be? Because of the projection effect in SAEs! When we impose sparisty, then the inputs that are activated don’t necessarily reflect the whole story of what inputs align with that direction. Here, the batchtopk cutoff (dotted line) hides a multimodal story
On first blush, however, the concepts look pretty single-modality: see here their modality scores (how many of the top-activating inputs are images vs text). The classifier results above show us that the actual geometry is often much closer to modality-agnostic.
In fact, they often can’t even act as good modality classifiers: if we take the SAE concept direction, and see how well projecting on to that direction separates modality, we see that many of the concepts don’t get great accuracy
We trained SAEs on the embedding spaces of four VLMs, and analyzed the resulting dictionaries of concepts. Even though image and text concepts lie on separate anisotropic cones, the SAE concepts don’t lie within those cones.
Are there conceptual directions in VLMs that transcend modality? Check out our COLM oral spotlight 🔦 paper! We use SAEs to analyze the multimodality of linear concepts in VLMs
with @chloesu07.bsky.social, @thomasfel.bsky.social, @shamkakade.bsky.social and Stephanie Gil
arxiv.org/abs/2504.11695
@avzaagzonunaada.bsky.social
How do people trade off between speed and accuracy in reasoning tasks without easy heuristics? Come to my talk, "Thinking fast, slow, and everywhere in between in humans and language models," in the Reasoning session this afternoon #CogSci2025 to find out!
paper: escholarship.org/uc/item/5td9...
When people form conventions in reference games, how easy are they for outsiders to interpret? (for values of "outsider" that include naïve humans and vision-language models) Check out @vboyce.bsky.social's poster today at #CogSci2025 to find out.
paper: escholarship.org/uc/item/16c4...
@antararb.bsky.social is applying for PhDs this fall! She’s super impressive and awesome to work with, and conceived of this project independently and carried it out very successfully! Keep an eye out 🙂
More in the preprint! arxiv.org/abs/2506.13886 This project was led by Antara, with @dmelis.bsky.social and Kate Davidson
So is it really this implicit operators thing that’s tripping them up? We try many other ablations, looking at the effect of giving extra context in the prompt, using numbers vs words, left-to-right ordering, and subtractive systems, and none of them seem to affect the models that much.
Our experiments are based on Linguistics Olympiad problems that deal with number systems, like the one here. We created additional hand-standardized versions of each puzzle in order to be able to do all of the operator ablations.
This shows the types of reasoning and variable binding jumps that are hard for LMs. It’s hard to go one level up, and bind a variable to have the meaning of an operator, or to understand that an operator is implicit.
If we alter the problems to make the operators explicit, the models can solve these problems pretty easily. But it’s still harder to bind a random symbol or word to mean an operator like +. It’s much easier when we use the familiar symbols for the operators, like + and x.
Our main finding: LMs find it hard when *operators* are implicit. We don’t say “5 times 100 plus 20 plus 3”, we say “five hundred and twenty-three”. The Linguistics Olympiad puzzles are pretty simple systems of equations that an LM should solve – but the operators aren’t explicit.
Why can’t LMs solve puzzles about the number systems of languages, when they can solve really complex math problems? Our new paper, led by @antararb.bsky.social looks at why this intersection of language and math is difficult, and what this means for LM reasoning! arxiv.org/abs/2506.13886
ACL paper alert! What structure is lost when using linearizing interp methods like Shapley? We show the nonlinear interactions between features reflect structures described by the sciences of syntax, semantics, and phonology.
Congrats to Veronica Boyce on her dissertation defense! That’s three amazing talks by three great students in 8 days!
(the unfortunate truth is that I am really enjoying this mac and its battery life oops)
This work Mac (my first ever) is great because every time something seriously breaks, instead of becoming distressed and despondent like I usually do, it's just like "ooooooh yeahhh, yet another win for team Linux 😎😎😎🎉🐧"
😼SMOL DATA ALERT! 😼Anouncing SMOL, a professionally-translated dataset for 115 very low-resource languages! Paper: arxiv.org/pdf/2502.12301
Huggingface: huggingface.co/datasets/goo...
New paper in Psychological Review!
In "Causation, Meaning, and Communication" Ari Beller (cicl.stanford.edu/member/ari_b...) develops a computational model of how people use & understand expressions like "caused", "enabled", and "affected".
📃 osf.io/preprints/ps...
📎 github.com/cicl-stanfor...
🧵
Where are all of the phoneticians of the Boston area and why isn't there a storied subfield of fieldwork studying the Cambridge shopkeeper who seems to have a mix between a West Country (rhotic English!) and a Boston (non-rhotic American!) accent.
Apparently the shop's been open for decades, smh
Quanta write-up of our Mission: Impossible Language Models work, led by @juliekallini.bsky.social. As the photos suggest, Richard, @isabelpapad.bsky.social, and I do all our work sitting together around a single laptop and pointing at the screen.
My most controversial take is that you should never use commit -m, just let it open the damn vim file, let yourself think for a second, and then write something descriptive