Andrew Saxe's Avatar

Andrew Saxe

@saxelab.bsky.social

Professor at the Gatsby Unit and Sainsbury Wellcome Centre, UCL, trying to figure out how we learn

4,245 Followers  |  476 Following  |  45 Posts  |  Joined: 23.09.2023
Posts Following

Posts by Andrew Saxe (@saxelab.bsky.social)

Post image

πŸ“’πŸ“’Β Announcing this year's conference on the Mathematics of Neuroscience & AI (Rome, 9-12th June). We’ve got a stellar line-up and venue, and invite everyone to join:

www.neuromonster.org

05.03.2026 08:30 β€” πŸ‘ 19    πŸ” 14    πŸ’¬ 2    πŸ“Œ 0

πŸ“’ Job alert - Deep Learning Theory & AI Safety
Applications open for a postdoc fellow (@saxelab.bsky.social lab) to study artificial deep networks using techniques from applied maths & stat physics.

⏰ Deadline: 26 Mar 2026
🀝 In collaboration with @stefsm.bsky.social
ℹ️ www.ucl.ac.uk/life-science...

25.02.2026 11:18 β€” πŸ‘ 10    πŸ” 4    πŸ’¬ 0    πŸ“Œ 1
Preview
learningfromscratch march 16th, workshop day 1 @ cosyne 2026

Excited to be co-organising a #cosyne2026 workshop with Alison Comrie on 'algorithms for learning from scratch'! With a great line-up of speakers, we'll be tackling the question of what processes enable naive biological & artificial agents to adapt to new situations. Info here: tinyurl.com/4u8enf7k

24.02.2026 18:33 β€” πŸ‘ 35    πŸ” 15    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸ“’ We’re now accepting applications for the 2026 School on Analytical Connectionism dedicated this year to Language Acquisition.

πŸ“ Gothenburg, Sweden

πŸ—“οΈ August 17–28, 2026

☠️ Apply by April 17!

πŸ”— analytical-connectionism.net/school/2026/

πŸ‘‡ Meet the experts joining us this summer!

18.02.2026 11:42 β€” πŸ‘ 20    πŸ” 8    πŸ’¬ 1    πŸ“Œ 1

Thrilled to finally share this work! πŸ§ πŸ”Š

Using a new reinforcement-free task we show mice (like humans) extract abstract structure from sound (unsupervised) & dCA1 is causally required by building factorised, orthogonal subspaces of abstract rules.

Led by Dammy Onih!
www.biorxiv.org/content/10.6...

16.02.2026 13:01 β€” πŸ‘ 150    πŸ” 52    πŸ’¬ 3    πŸ“Œ 2
Preview
[Hiring] Principia Research Fellows Principia Research Fellows: Theoretical Model Organisms for AI Safety Principia Β· London Β· Fixed-term (6 months) with potential extension Β· Starting ASAP We are launching Principia, a new technical re...

How to apply:

Salary: USD 80,000–100,000 (50-74k GBP) annualised
Initial contract: 6 months, w/ extension based on funding

Details: docs.google.com/document/d/1...
Application: forms.gle/xKukH74iX16p...

4

16.02.2026 09:27 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

We’re hiring postdocs/research scientists! Your interests can be anywhere on the spectrum from pure theory to empirically testing predictions relevant to AI safety.

Our theoretical work relies on dynamical systems and tools from statistical physics.

3

16.02.2026 09:27 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

We avoid many unwanted outcomes in the physical world using our knowledge of physics, and basic deep learning theory should eventually enable the same for AI.

We focus on simple, analytically tractable β€œmodel organisms” that capture essential learning dynamics and behaviours.

2

16.02.2026 09:27 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Excited to launch Principia, a nonprofit research organisation at the intersection of deep learning theory and AI safety.

Our goal is to develop theory for modern machine learning systems that can help us understand complex network behaviors, including those critical for AI safety and alignment.

1

16.02.2026 09:27 β€” πŸ‘ 91    πŸ” 26    πŸ’¬ 1    πŸ“Œ 1
Post image

Our paper is out in @natneuro.nature.com!

www.nature.com/articles/s41...

We develop a geometric theory of how neural populations support generalization across many tasks.

@zuckermanbrain.bsky.social
@flatironinstitute.org
@kempnerinstitute.bsky.social

1/14

10.02.2026 15:56 β€” πŸ‘ 273    πŸ” 100    πŸ’¬ 7    πŸ“Œ 1

A great question, I'm not sure. It's important to understand if muon shares similar inductive biases.

05.02.2026 16:08 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I agree, there seem to be connections but it's not fully clear to me why. SLT is a static theory, and yet Daniel Murfet and others have shown that the stages we see also correspond to SLT posteriors of increasing complexity.

05.02.2026 16:05 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
DLMath&Efficiency This reading group examines the interplay between the theoretical foundations of deep learning and the practical challenge of making machine learning efficient. On the theory side, we study mathematic...

Upcoming online talk next Monday 9th February, at the ELLIS Reading Group on Mathematics & Efficiency of Deep Learning!

Open to all. Info at
sites.google.com/view/efficie...

03.02.2026 16:19 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Equipped with this theory, we make new predictions about how network width, data distribution, and initialization affect learning dynamics. For example, increasing the number of attention heads in linear attention shortens the plateaus in learning.

03.02.2026 16:19 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

So when progressing simple -> complex, linear networks learn solutions of increasing rank, ReLU networks learn solutions with increasing kinks, convolutional networks learn solutions with increasing convolutional kernels, and attention models learn solutions with increasing heads.

03.02.2026 16:19 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Here the notion of simplicity is the number of effective units in the architecture: hidden neurons, convolutional kernels, or attention heads.

03.02.2026 16:19 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

Finally, we demonstrate gradient descent sometimes naturally evolves along the connecting paths between saddles iteratively, yielding saddle-to-saddle dynamics.

We identify two distinct mechanisms: timescale separation between directions or units, depending on the architecture.

03.02.2026 16:19 β€” πŸ‘ 8    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We then show that saddles are connected by gradient descent paths (invariant manifolds).

Along these paths, a larger network behaves like a smaller one, retaining the same simplicity during a saddle-to-saddle transition.

03.02.2026 16:19 β€” πŸ‘ 8    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Post image

We first show that saddle points are ubiquitous in the loss landscape: fixed points of smaller networks can be embedded as saddle points of larger networks, yielding a nested hierarchy of saddles.

These saddles exist in any network that contains a sum of repeated units.

03.02.2026 16:19 β€” πŸ‘ 9    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Saddle-to-Saddle Dynamics Explains A Simplicity Bias Across Neural Network Architectures We define a notion of simplicity that applies to a broad class of architectures, and show both analytically and with explicit examples that during learning simplicity decreases in a step-wise manner.

We present a theoretical framework that explains dynamical simplicity bias arising from saddle-to-saddle learning dynamics across neural network architectures:

Fully-connected, convolutional, attention-based, and more.

yedizhang.github.io/simplicity

03.02.2026 16:19 β€” πŸ‘ 8    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

Why don’t neural networks learn all at once, but instead progress from simple to complex solutions? And what does β€œsimple” even mean across different neural network architectures?

Sharing our new paper @iclr_conf led by Yedi Zhang with Peter Latham

arxiv.org/abs/2512.20607

03.02.2026 16:19 β€” πŸ‘ 154    πŸ” 41    πŸ’¬ 7    πŸ“Œ 3
Applications for 2026 entry to the Gatsby Bridging Programme (7-week maths summer school) will open on 19 Jan and close on 16 Feb. Designed for students who wish to pursue a postgrad research degree in theoretical neuroscience or foundational machine learning but whose degree programme lacks a strong maths focus. Applications from students in underrepresented groups in STEM strongly encouraged. A small number of bursaries available.
Register for the information webinar on 23 Jan.

Applications for 2026 entry to the Gatsby Bridging Programme (7-week maths summer school) will open on 19 Jan and close on 16 Feb. Designed for students who wish to pursue a postgrad research degree in theoretical neuroscience or foundational machine learning but whose degree programme lacks a strong maths focus. Applications from students in underrepresented groups in STEM strongly encouraged. A small number of bursaries available. Register for the information webinar on 23 Jan.

πŸ“’ Applications open on 19 Jan for the 7-week #Mathematics #SummerSchool in London. You will develop the maths skills and intuition necessary to enter the #TheoreticalNeuroscience / #MachineLearning field.

Find out more & register for the information webinar πŸ‘‰ www.ucl.ac.uk/life-science...

15.01.2026 14:37 β€” πŸ‘ 25    πŸ” 27    πŸ’¬ 1    πŸ“Œ 1
Preview
A unifying account of replay as context-driven memory reactivation A context-driven memory model simulates a wide range of characteristics of waking and sleeping hippocampal replay, providing a new account of how and why replay occurs.

Really thrilled that this paper led by @neurozz.bsky.social is now published in its final version in @elife.bsky.social!!

This is a memory-focused (as opposed to RL-focused) account of the detailed characteristics of forward and backward awake and sleep replay!

elifesciences.org/articles/99931

15.01.2026 13:57 β€” πŸ‘ 140    πŸ” 53    πŸ’¬ 3    πŸ“Œ 1
The oneirogen hypothesis: modeling the hallucinatory effects of classical psychedelics in terms of replay-dependent plasticity mechanisms

Our paper on the "Oneirogen hypothesis" is now up in its revised form on eLife!

This is the hypothesis that psychedelics induce a dream-like state, which we show via modelling could explain a variety of perceptual and learning effects from such drugs.

elifesciences.org/reviewed-pre...

πŸ§ πŸ“ˆ πŸ§ͺ

14.01.2026 15:32 β€” πŸ‘ 61    πŸ” 14    πŸ’¬ 2    πŸ“Œ 1
W. Jeffrey Johnston - Postdoctoral position ad

By the way, if you’re interested in working together on problems like this, I’m starting my lab at UCSF this summer. Get in touch if you’re interested in doing a postdoc! More info here: wj2.github.io/postdoc_ad (7/7)

09.01.2026 19:06 β€” πŸ‘ 29    πŸ” 14    πŸ’¬ 1    πŸ“Œ 3
Preview
Dorsal striatal dopamine integrates sensory and reward prediction errors to guide perceptual decisions Perceptual decisions are shaped by expectations about sensory stimuli and rewards, learned through sensory and reward prediction errors. Dopamine is known to convey reward prediction errors that shape...

New preprint. We show that in addition to reward prediction errors (RPEs), dorsal striatal dopamine signals encode sensory prediction errors (SPEs), the difference between sensory prior & observed stimulus. www.biorxiv.org/content/10.6...

05.01.2026 10:49 β€” πŸ‘ 86    πŸ” 26    πŸ’¬ 3    πŸ“Œ 1

Sleep dependent consolidation and replay that doesn’t require the hippocampus?

Very beautiful work by Marcus Stephenson-Jones’ lab on sleep driven sequential skill consolidation in the striatum.

www.biorxiv.org/content/10.1...

22.12.2025 11:59 β€” πŸ‘ 30    πŸ” 10    πŸ’¬ 0    πŸ“Œ 0

Thrilled to start 2026 as faculty in Psych & CS
@ualberta.bsky.social + Amii.ca Fellow! πŸ₯³ Recruiting students to develop theories of cognition in natural & artificial systems πŸ€–πŸ’­πŸ§ . Find me at #NeurIPS2025 workshops (speaking coginterp.github.io/neurips2025 & organising @dataonbrainmind.bsky.social)

06.12.2025 19:26 β€” πŸ‘ 103    πŸ” 27    πŸ’¬ 4    πŸ“Œ 1
Memory by accident: a theory of learning as a byproduct of network... Synaptic plasticity is widely considered to be crucial to the brain’s ability to learn throughout life. Decades of theoretical work have therefore been invested in deriving and designing...

Find us at NeurIPS, Thur 4:30 pm #2115! We know networks have to be both plastic and stable but we're used to thinking about computations, such as memory, as additional requirements. Instead, we find that almost all stable & plastic networks display simple memory abilities.

01.12.2025 21:57 β€” πŸ‘ 17    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Post image

1/6 New preprint πŸš€ How does the cortex learn to represent things and how they move without reconstructing sensory stimuli? We developed a circuit-centric recurrent predictive learning (RPL) model based on JEPAs.
πŸ”— doi.org/10.1101/2025...
Led by @atenagm.bsky.social @mshalvagal.bsky.social

27.11.2025 08:24 β€” πŸ‘ 141    πŸ” 41    πŸ’¬ 3    πŸ“Œ 4