Andrew Saxe @saxelab - Bluesky Profile

Coming March 17, 2026!
Just got my advance copy of Emergence — a memoir about growing up in group homes and somehow ending up in neuroscience and AI. It’s personal, it’s scientific, and it’s been a wild thing to write. Grateful and excited to share it soon.

04.08.2025 16:21 — 👍 133 🔁 28 💬 7 📌 0

Our new paper is out! When navigating through an environment, how do we combine our general sense of direction with known landmark states? To explore this, @denislan.bsky.social used a task that allowed subjects (or neural networks) to choose either their next action or next state at each step.

02.08.2025 08:37 — 👍 63 🔁 25 💬 0 📌 0

🎓Thrilled to share I’ve officially defended my PhD!🥳

At @gatsbyucl.bsky.social, my research explored how prior knowledge shapes neural representations.

I’m deeply grateful to my mentors, @saxelab.bsky.social and @caswell.bsky.social, my incredible collaborators, and everyone who supported me!

29.07.2025 15:31 — 👍 37 🔁 2 💬 5 📌 0

Trying to train RNNs in a biol plausible (local) way? Well, try our new method using predictive alignment. Paper just out in Nat. Com. Toshitake Asabuki deserves all the credit!
www.nature.com/articles/s41...

23.07.2025 12:10 — 👍 54 🔁 16 💬 1 📌 0

🥳 Congratulations to Rodrigo Carrasco-Davison on passing his PhD viva with minor corrections! 🎉

📜 Principles of Optimal Learning Control in Biological and Artificial Agents.

21.07.2025 14:13 — 👍 15 🔁 1 💬 0 📌 0

Welcome to Bluesky @kenneth-harris.bsky.social !

18.07.2025 17:16 — 👍 20 🔁 3 💬 0 📌 0

🎉 Heron is finally out @elife.bsky.social! Led by George Dimitriadis, with Ella Svahn & @macaskillaf.bsky.social

🧪 🧠 🐭 🤖

If you wonder why yet another tool for experimental pipelines, read the 🧵 below:

#neuroscience #neuroskyence #OpenSource

1/
elifesciences.org/articles/91915

18.07.2025 13:55 — 👍 38 🔁 12 💬 1 📌 0

Come chat about this at the poster @icmlconf.bsky.social, 11:00-13:30 on Wednesday in the West Exhibition Hall #W-902!

15.07.2025 19:26 — 👍 7 🔁 1 💬 0 📌 0

With this work we hope to shed light on some of the universal aspects of representation learning in neural networks, and one way networks can generalize infinitely from finite experience.

(11/11)

14.07.2025 21:25 — 👍 3 🔁 0 💬 0 📌 0

Not all pairs of representations which can merge are actually expected to merge. Thus, the final learned automaton may look different for different training runs, even when in practice they implement an identical algorithm.

(10/11)

14.07.2025 21:25 — 👍 2 🔁 0 💬 1 📌 0

The theory predicts mergers can only occur given enough training data and small enough initial weights, resulting in a phase transition between an overfitting regime and an algorithm-learning regime.

(9/11)

14.07.2025 21:25 — 👍 3 🔁 0 💬 1 📌 0

Since these pairs share outputs, mergers do not affect the automaton's computation.

With enough mergers, the automaton becomes finite, fixing its behavior for long sequences.

If the training data uniquely specifies the task, this results in full generalization.

(8/11)

14.07.2025 21:25 — 👍 2 🔁 0 💬 1 📌 0

Using intuitions based on continuity, we derive local interactions between pairs of representations.

We find that pairs of sequences which always agree on target outputs after receiving any possible additional symbols will merge representations under certain conditions.

(7/11)

14.07.2025 21:25 — 👍 2 🔁 0 💬 1 📌 0

We find two phases:

-An initial phase where the RNN builds an infinite tree and fits it to the training data, reducing only the training loss.

-A second merging phase, where representations merge until the automaton becomes finite, with a sudden drop in validation loss.

(6/11)

14.07.2025 21:25 — 👍 5 🔁 0 💬 1 📌 0

To understand what is happening in the RNN, we extract automata from its hidden representations during training, which visualize the computational algorithm as it is being developed.

(5/11)

14.07.2025 21:25 — 👍 5 🔁 0 💬 1 📌 0

When training only on sequences up to length 10, we find complete generalization for any possible sequence length.

This cannot be explained by smooth interpolation of the training data, and suggests some kind of algorithm is being learned.

(4/11)

14.07.2025 21:25 — 👍 3 🔁 0 💬 1 📌 0

To explore this, we consider the simplest possible setting in which a neural network implicitly learns an algorithm.

We train a recurrent neural network on sequences of ones and zeros to predict if the number of ones is even or odd.

(3/11)

14.07.2025 21:25 — 👍 2 🔁 0 💬 1 📌 0

How can the continuous gradient descent dynamics of deep neural networks result in the development of a discrete algorithm capable of symbolic computation?

(2/11)

14.07.2025 21:25 — 👍 2 🔁 0 💬 1 📌 0

ICML Poster Algorithm Development in Neural Networks: Insights from the Streaming Parity TaskICML 2025

Excited to share new work @icmlconf.bsky.social by Loek van Rossem exploring the development of computational algorithms in recurrent neural networks.

Hear it live tomorrow, Oral 1D, Tues 15 Jul West Exhibition Hall C: icml.cc/virtual/2025...

Paper: openreview.net/forum?id=3go...

(1/11)

14.07.2025 21:25 — 👍 39 🔁 11 💬 1 📌 1

List of posters by/involving Gatsby Unit researchers

👋 Attending #ICML2025 next week?
Don't forget to check out work involving our researchers!

11.07.2025 12:52 — 👍 8 🔁 2 💬 1 📌 0

Cooperative thalamocortical circuit mechanism for sensory prediction errors - Nature Experiments in mice show that a cortico-thalamic circuit generates prediction-error signals in primary visual cortex that amplify visual input that deviates from animals’ expectations.

Pairs well with www.nature.com/articles/s41...

“Our results indicate that individual V1 neurons do not signal how the actual visual input deviates from the animal’s predictions, as postulated within the predictive coding framework”

Thrilling to see a major theory get tested!

11.07.2025 19:50 — 👍 56 🔁 11 💬 3 📌 1

In theory you could use the learning speed to get the input-output correlation singular values, and then the final representation to back out the input correlation singular values. This would get you the second order correlations of the dataset except for the output singular vectors.

11.07.2025 10:49 — 👍 8 🔁 0 💬 2 📌 0

But if you observe the full learning trajectories, you can better because the learning speed tracks just the input-output singular value. So the first of those datasets would be learnt more quickly, other things equal.

11.07.2025 10:49 — 👍 8 🔁 0 💬 1 📌 0

The basic reason is this: suppose that one dataset has just one example y=2, x=1. And another dataset has y=1, x=1/2. The weights after learning will be identical here (w_tot = 2) because it’s the transformation that matters. So different datasets yield the same weights, and aren't identifiable.

11.07.2025 10:49 — 👍 7 🔁 0 💬 1 📌 0

Love this question! At least in a deep linear network, the representation after training reveals the ratio of the input-output and input correlation singular values, and the input singular vectors. So you wouldn’t be able to recover the full correlation matrices, let alone specific data samples.

11.07.2025 10:49 — 👍 6 🔁 0 💬 1 📌 0

Dopamine encodes deep network teaching signals for individual learning trajectories Longitudinal tracking of long-term learning behavior and striatal dopamine reveals that dopamine teaching signals shape individually diverse yet systematic learning trajectories, captured mathematical...

Super excited to see this paper from Armin Lak & colleagues out! (I've seen @saxelab.bsky.social present it before.)

www.cell.com/cell/fulltex...

tl;dr: The learning trajectories that individual mice take correspond to different saddle points in a deep net's loss landscape.

🧠📈 🧪 #NeuroAI

10.07.2025 18:29 — 👍 83 🔁 17 💬 5 📌 1

Exciting new preprint from the lab: “Adopting a human developmental visual diet yields robust, shape-based AI vision”. A most wonderful case where brain inspiration massively improved AI solutions.

Work with @zejinlu.bsky.social @sushrutthorat.bsky.social and Radek Cichy

arxiv.org/abs/2507.03168

08.07.2025 13:03 — 👍 124 🔁 55 💬 3 📌 10

Defending the foundation model view of infant development

Thanks to fellow defenders @clionaod.bsky.social, Marc'Aurelio Ranzato, and @charvetcj.bsky.social

Defending the foundation model view of infant development www.sciencedirect.com/science/arti...

02.07.2025 15:36 — 👍 16 🔁 8 💬 3 📌 1

CCN 2025 Satellite Event Background The human visual system is full of optimisations—mechanisms designed to extract the most useful information from a constant stream of incoming data. The field of neuro-AI has made significa...

Not just one, but two fantastic chances to discuss how infant development can inform machine learning and vice-versa at CCN 2025 in Amsterdam!!! Satellite workshop sites.google.com/view/child2m...
and Generative Adversarial Collaboration sites.google.com/ccneuro.org/...

25.06.2025 20:45 — 👍 30 🔁 12 💬 0 📌 2

甘利俊一栄誉研究員が「京都賞」を受賞 甘利俊一栄誉研究員（本務：帝京大学先端総合研究機構特任教授）は、人工ニューラルネットワーク、機械学習、情報幾何学分野での先駆的な研究が評価され、第40回（2025）京都賞（先端技術部門　受賞対象分野：情報科学）を受賞しました。

Shunichi Amari has been awarded the 40th (2025) Kyoto Prize in recognition of his pioneering research in the fields of artificial neural networks, machine learning, and information geometry

www.riken.jp/pr/news/2025...

20.06.2025 13:26 — 👍 35 🔁 12 💬 2 📌 0

Andrew Saxe

Latest posts by saxelab.bsky.social on Bluesky

@saxelab is following 20 prominent accounts