so nice to see this out sush!!
19.11.2025 08:47 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0@mheilbron.bsky.social
Assistant Professor of Cognitive AI @UvA Amsterdam language and vision in brains & machines cognitive science ๐ค AI ๐ค cognitive neuroscience michaheilbron.github.io
so nice to see this out sush!!
19.11.2025 08:47 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0๐จNew Preprint!
How can we model natural scene representations in visual cortex? A solution is in active vision: predict the features of the next glimpse! arxiv.org/abs/2511.12715
+ @adriendoerig.bsky.social , @alexanderkroner.bsky.social , @carmenamme.bsky.social , @timkietzmann.bsky.social
๐งต 1/14
archive.ph/smEj0 (or, unpaywalled ๐คซ)
07.11.2025 10:32 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0This is, without a doubt, the best popular article about current state of AI. And on whether LLMs are truly 'thinking' or 'understanding' -- and what that question even means
www.newyorker.com/magazine/202...
omg. what journal? name and shame
19.09.2025 12:34 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0huh! if these effects are similar and consistent, I think it should work, but the q. is how do you get a vector representation for novel pseudowords? we currently use lexicosemantic word vectors and they are undefined for novel words.
so how to represent the novel words? v. interesting test case
@nicolecrust.bsky.social might be of interest
18.09.2025 11:52 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0New paper on memorability, with @davogelsang.bsky.social !
18.09.2025 10:45 โ ๐ 10 ๐ 0 ๐ฌ 0 ๐ 0New preprint out together with @mheilbron.bsky.social
We find that a stimulus' representational magnitudeโthe L2 norm of its DNN representationโpredicts intrinsic memorability not just for images, but for words too.
www.biorxiv.org/content/10.1...
Together, our results support a classic idea: cognitive limitations can be a powerful inductive bias for learning
Yet they also reveal a curious distinction: a model with more human-like *constraints* is not necessarily more human-like in its predictions
This paradox โ better language models yielding worse behavioural predictions โ could not be explained by prior explanations: The mechanism appears distinct from those linked to superhuman training scale or memorisation
18.08.2025 12:40 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0However, we then used these models to predict human behaviour
Strikingly these same models that were demonstrably better at the language task, were worse at predicting human reading behaviour
The benefit was robust
Fleeting memory models achieved better next-token prediction (lower loss) and better syntactic knowledge (higher accuracy) on the BLiMP benchmark
This was consistent across seeds and for both 10M and 100M training sets
But we noticed this naive decay was too strong
Human memory has a brief 'echoic' buffer that perfectly preserves the immediate past. When we added this โ a short window of perfect retention before the decay -- the pattern flipped
Now, fleeting memory *helped* (lower loss)
Our first attempt, a "naive" memory decay starting from the most recent word, actually *impaired* language learning. Models with this decay had higher validation loss, and this worsened (even higher loss) as the decay became stronger
18.08.2025 12:40 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0To test this in a modern context, we propose the โfleeting memory transformerโ
We applied a power-law memory decay to the self-attention scores, simulating how access to past words fades over time, and ran controlled experiments on the developmentally realistic BabyLM corpus
However, this appears difficult to reconcile with the success of transformers, which can learn language very effectively, despite lacking working memory limitations or other recency biases
Would the blessing of fleeting memory still hold in transformer language models?
A core idea in cognitive science is that the fleetingness of working memory isn't a flaw
It may actually help at learning language by forcing a focus on the recent past and providing an incentive to discover abstract structure rather than surface details
New preprint! w/@drhanjones.bsky.social
Adding human-like memory limitations to transformers improves language learning, but impairs reading time prediction
This supports ideas from cognitive science but complicates the link between architecture and behavioural prediction
arxiv.org/abs/2508.05803
On Wednesday, Maithe van Noort will present a poster on โCompositional Meaning in Vision-Language Models and the Brainโ
First results from a much larger project on visual and linguistic meaning in brains and machines, with many collaborators -- more to come! โจ
t.ly/TWsyT
On Friday, during a contributed talk (and a poster), @wiegerscheurer will present the project he spearheaded: โA hierarchy of spatial predictions across human visual cortex during natural visionโ โจโจ(Full preprint soon)
t.ly/fTJqy
CCN has arrived here here in Amsterdam!
Come find me to meet or catch up
Some highlights from students and collaborators:
Waarom vergeet je namen maar weet je nog precies wat iemand doet? En zijn herinneringen ooit echt helemaal weg?
Ik ging bij Oplossing Gezocht in gesprek over hoe ons brein informatie opslaat en waarom vergeten eigenlijk heel slim is:
www.nemokennislink.nl/publicaties/...
Exciting new preprint from the lab: โAdopting a human developmental visual diet yields robust, shape-based AI visionโ. A most wonderful case where brain inspiration massively improved AI solutions.
Work with @zejinlu.bsky.social @sushrutthorat.bsky.social and Radek Cichy
arxiv.org/abs/2507.03168
New preprint, w/ @predictivebrain.bsky.social !
we've found that visual cortex, even when just viewing natural scenes, predicts *higher-level* visual features
The aligns with developments in ML, but challenges some assumptions about early sensory cortex
www.biorxiv.org/content/10.1...
iโm all in the โthis is a neat way to help explain thingsโ camp fwiw :)
23.05.2025 15:53 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0Our findings, together with some other recent studies, suggest the brain may use a similar strategy โ constantly predicting higher-level features โ to efficiently learn robust visual representations of (and from!) the natural world
23.05.2025 11:39 โ ๐ 6 ๐ 0 ๐ฌ 0 ๐ 0This preference for higher-level information departs from traditional predictive coding -- but aligns with recent, successful algorithms in AI for predictive self-supervised learning, which encourage predicting higher rather than lower-level visual features (e.g. MAE, CPC, JEPA)
23.05.2025 11:39 โ ๐ 5 ๐ 0 ๐ฌ 2 ๐ 0So, what does this all mean?
The visual system seems to be constantly engaged in a sophisticated guessing game, predicting sensory input based on context
But interestingly, it seems to predict more abstract, higher-level properties, even in the earliest stages of cortex
Remarkably, these prediction effects appeared independent of recent experience with the specific images presented
This suggests they rely on long-term, ingrained priors about the statistical structure of the visual world, rather than on recent exposure to these specific images