⚛️ Introducing CREATE, a benchmark for creative associative reasoning in LLMs.
Making novel, meaningful connections is key for scientific & creative works.
We objectively measure how well LLMs can do this. 🧵👇
Martin = one of the kindest people I know! Don’t miss this opportunity to learn from one of the best in their field!
I'm hiring a new lab manager for my lab @ UCSD! For more info on the lab, check out our website: lillab.ucsd.edu
Target start date is June 1 (flexible) and application deadline is March 26. Please share with anyone you think might be a good fit!
Apply here: employment.ucsd.edu/laboratory-c...
📢 PhD position in Developmental Language Modelling
(PLZ RT)
What can human language acquisition teach us about training language models? Join us as a PhD!
mpi.nl/career-education/vacancies/vacancy/fully-funded-4-year-phd-position-developmental-language @carorowland.bsky.social
@mpi-nl.bsky.social
Thanks to everyone who gave us feedback: @lampinen.bsky.social, Ellie Pavlick, @glupyan.bsky.social, @phillipisola.bsky.social, and others!
Work with Tianyang Xu, @mudtriangle.com, Karen Livescu, and Greg Shakhnarovich!
This relates more broadly to literature reconciling how meaning obtained from relational grounding in language interacts with that obtained from other forms of grounding (see Mollo and Millere/@raphaelmilliere.com) and lays out a research program on the role of category coherence in learning!
11/
This suggests that representations learned from language are structured so as to expect incoming category information to cohere in a specific way in order to show cross-modal generalization!
10/
If models were generalizing arbitrarily, then we shouldn’t see any differences in their performance across these settings (i.e., no matter what, crow == bird). However, we find that models seem to only generalize when the training data preserves category coherence!
9/
By coherence we mean the visual similarity between members of the same category, which we calculate using the DINOv2 embeddings used in our VLM training. Even in the original configuration, we found models to perform better on categories that were visually more coherent
8/
To test this, we created counterfactual data: 1) where category-label pairings were shuffled across categories (🪛= “robin”; 🎸= “crow”) and 2) where they were shuffled within categories (🦅=“robin”; 🦜=“crow”). These swaps also manipulate the categories’ visual coherence
7/
Are LMs simply executing something like “If crow THEN bird?” regardless of what the image shows? E.g., if during supervision we label images of kayaks as “crow” would the model still generalize to birds or does the model expect categories to have some level of coherence?
6/
Having established these preconditions to our task, we then find that models are also able to generalize (non-trivially) to hypernyms without ever having “seen” them explicitly, suggesting that LM representations support cross-modal generalization!
5/
We establish that this paradigm works in the first place with a vision encoder that has never been trained on language data (i.e., ❌ SigLIP ✅DINO), that the models learn the task on the lower-level categories themselves, and that the LMs indeed have taxonomic knowledge
4/
Taxonomic knowledge is interesting because of number of hypotheses about the learnability of category knowledge from linguistic cues, for both computational models and humans. Evidence of cross-modal generalization would lend strong support for these hypotheses!
3/
We use a VLM-training paradigm (frozen vision encoder w/o language training mapped to frozen LM) where we partially supervise on lower level categories during training, and then test if the LM recovers hypernymy knowledge from what it has seen in language data.
2/
What is the interplay between representations learned from (language) surface forms alone, and those learned from more grounded evidence (e.g.,vision)?
Excited to share new work understanding “Cross-modal taxonomic generalization” in (V)LMs
arxiv.org/abs/2603.07474
1/
I want to unwatch this
@tylerachang.bsky.social and I will be presenting the Goldfish as an oral at #LREC2026 in Mallorca! 🌴
Short post on what I call the "no-magic approach to understanding intelligent systems" — the philosophy I think of as motivating our work on understanding intelligence without resorting to magical thinking about AI or humans!
infinitefaculty.substack.com/p/the-no-mag...
🚨New Paper!🚨 How do reasoning LLMs handle inferences that have no deterministic answer? We find that they diverge from humans in some significant ways, and fail to reflect human uncertainty… 🧵(1/10)
Check out our special theme: new missions for NLP research!
What’s a paper that made you think that way 👀
I wrote a short article on AI Model Evaluation for the Open Encyclopedia of Cognitive Science 📕👇
Hope this is helpful for anyone who wants a super broad, beginner-friendly intro to the topic!
Thanks @mcxfrank.bsky.social and @asifamajid.bsky.social for this amazing initiative!
Congratulations Andreas!!
Some days you finish 5 meta-reviews in ~one go, and some days you take 1.5 days to complete one meta-review. Such is the AC life!
Woohoo, will be in touch soon!
Wow!! Good luck with whatever it is you do next — so excited for you!!
Watch slow horses already!!
Japonaise and Jahunger mentioned in same thread 😍 my fav places in Boston!
I'm looking forward to @jennhu.bsky.social's South by Semantics talk next week at UT Austin! She'll discuss "micro-pragmatics" inferences and world modeling in language models 🤖