π¨π¨New Preprint Alert!π¨π¨
www.biorxiv.org/content/10.6...
Animal learning is painfully slow (at least initially). Yet, well trained animals can learn very fast, sometimes displaying few-shot inference. How does this transition occur?
π¨π¨New Preprint Alert!π¨π¨
www.biorxiv.org/content/10.6...
Animal learning is painfully slow (at least initially). Yet, well trained animals can learn very fast, sometimes displaying few-shot inference. How does this transition occur?
Thrilled to finally share this work! π§ π
Using a new reinforcement-free task we show mice (like humans) extract abstract structure from sound (unsupervised) & dCA1 is causally required by building factorised, orthogonal subspaces of abstract rules.
Led by Dammy Onih!
www.biorxiv.org/content/10.6...
Code for our multi-region motor learning model is now available on GitHub!
github.com/jessegeerts/...
Code to run this model and reproduce figures is now public: github.com/jessegeerts/...
09.02.2026 09:09 β π 0 π 0 π¬ 0 π 0
Updated work from @jessegeerts.bsky.social extending his results on transitive inference in transformers (including LLMs!)
updated paper: arxiv.org/abs/2506.04289
bleeprint (what are we calling these?) below β¬οΈ
Updated paper: arxiv.org/abs/2506.04289. Joint work @ndrewliu.bsky.social, @scychan.bsky.social, @clopathlab.bsky.social, and @neurokim.bsky.social
04.02.2026 15:10 β π 1 π 0 π¬ 0 π 0This parallels our small transformer findings: when models must reason from context, representational geometry determines success or failure at transitive inference.
04.02.2026 15:10 β π 1 π 0 π¬ 1 π 0This effect was strongest when models couldn't fall back on stored knowledge (incongruent/permuted items). For congruent items where weight-stored knowledge helps, the geometric scaffold barely mattered.
04.02.2026 15:10 β π 1 π 0 π¬ 1 π 0Across Gemini, Gemma, and GPT models, linear consistently led to higher accuracy on transitive inference prompts.
04.02.2026 15:08 β π 1 π 0 π¬ 1 π 0We then prompted LLMs with different geometric scaffolds: "imagine these items on a number line" (linear) vs "on a circle" (circular). Circular orderings violate transitivity because relationships can wrap around (A>B>C>A).
04.02.2026 15:07 β π 1 π 0 π¬ 1 π 0We used the ReCogLab dataset (github.com/google-deepm...) to test transitive inference with items that are congruent with world knowledge (whale > dolphin > goldfish), incongruent (goldfish > dolphin > whale), or random. This lets us tease apart reasoning from context vs relying on stored knowledge.
04.02.2026 15:07 β π 1 π 0 π¬ 1 π 0
Quick recap: how a transformer is pre-trained determines whether it can do transitive inference (A>B, B>C β A>C).
In-weights learning β yes.
ICL trained on copying β no.
ICL pre-trained on linear regression β yes.
But these are small-scale toy models. What about in LLMs?
Update on this work! We've extended our transitive inference study to large language models π§΅
04.02.2026 15:04 β π 10 π 1 π¬ 1 π 1
Iβm excited to share my first PhD preprint!π
We studied how interactions between medial entorhinal cortex (MEC) and hippocampus shape theta sequences during navigation, and asked whether some βplanning-likeβ patterns in hippocampus could arise from upstream MEC dynamics. (1/8)
With some trepidation, I'm putting this out into the world:
gershmanlab.com/textbook.html
It's a textbook called Computational Foundations of Cognitive Neuroscience, which I wrote for my class.
My hope is that this will be a living document, continuously improved as I get feedback.
Just to add one thing to this discussion: in our paper, the "supervised" network predicts the action, which is internally generated by the actor, which is why we assume the agent has access to it. We toyed with calling this self-supervised but didn't want to cause confusion with other SS work
08.01.2026 11:03 β π 2 π 0 π¬ 0 π 0Thanks for sharing that paper! I was unaware of this but it's a cool result
07.01.2026 16:34 β π 2 π 0 π¬ 0 π 0
New paper led by wonder postdocs Francesca Greenstreet and @jessegeerts.bsky.social and @clopathlab.bsky.social trying to understand why βin the "what for" senseβ there are multiple motor learning systems βsupervised and RL-basedβ in the brain.
Check out Jesse's π§΅
www.biorxiv.org/content/10.6...
Check out our new work on motor learning across multiple brain regions!
05.01.2026 17:00 β π 20 π 6 π¬ 0 π 0Thank you! Feel free to get in touch with comments or questions
05.01.2026 15:04 β π 1 π 0 π¬ 0 π 0Many thanks to first author Francesca Greenstreet (equal contribution), and to @juangallego.bsky.social and @clopathlab.bsky.social!
05.01.2026 13:05 β π 3 π 0 π¬ 1 π 0
The key insight: supervised learning in ctx/cerebellum doesn't just predict actions - it builds a structured space that makes RL in basal ganglia faster and enables generalization between similar movements. We make several predictions in the paper: www.biorxiv.org/content/10.6...
(8/9)
3. Limits on dual adaptation (shown by Woolley et al 2007) also emerge: learning opposite rotations for nearby targets fails because their policies overlap in embedding space. Distant targets adapt independently. (7/9)
05.01.2026 13:02 β π 3 π 0 π¬ 1 π 02. The model also captures classic behavioural findings such as fast visuomotor adaptation (e.g. Krakauer et al. 2000). In our model, this emerges from retraining only the linear decoder. The characteristic generalization profile falls out without additional assumptions. (6/9)
05.01.2026 13:02 β π 3 π 0 π¬ 1 π 01. Recent work shows similar striatal activity for similar reaches (Park et al. 2025), while classic work shows distinct activity for distinct choices. Our model captures both: if basal ganglia learn policies in a structured embedding, policy similarity scales with action similarity. (5/9)
05.01.2026 13:00 β π 3 π 0 π¬ 1 π 0Our model combines two learning systems: a supervised encoder-decoder (ctx/cerebellum) learns embeddings where similar movements cluster together, by predicting actions via a bottleneck. An actor-critic network (basal ganglia) learns policies directly in this low-dimensional embedding space. (4/9)
05.01.2026 12:59 β π 4 π 0 π¬ 1 π 0We were inspired by recent machine learning approaches which learn structured action representations for tasks like robotics, where action spaces are vast π€ (3/9)
05.01.2026 12:57 β π 4 π 0 π¬ 1 π 0The problem: learning motor skills means exploring vast action spaces (think: every muscle combination). Standard RL models treat each action independently, which is slow and scales badly with the size of the action space (2/9)
05.01.2026 12:57 β π 3 π 0 π¬ 2 π 0
π§ New year, new preprint!
Why does motor learning involve multiple brain regions? We propose that the cortico-cerebellar system learns a "map" of actions where similar movements are nearby, while basal ganglia do RL in this simplified space.
www.biorxiv.org/content/10.6...
Thrilled to start 2026 as faculty in Psych & CS
@ualberta.bsky.social + Amii.ca Fellow! π₯³ Recruiting students to develop theories of cognition in natural & artificial systems π€ππ§ . Find me at #NeurIPS2025 workshops (speaking coginterp.github.io/neurips2025 & organising @dataonbrainmind.bsky.social)