Simon Schug's Avatar

Simon Schug

@smonsays.bsky.social

postdoc @princeton computational cognitive science ∪ machine learning https://smn.one

918 Followers  |  233 Following  |  21 Posts  |  Joined: 07.02.2024  |  1.9281

Latest posts by smonsays.bsky.social on Bluesky

I am recruiting graduate students for the experimental side of my lab @mcgill.ca for admission in Fall 2026!
Get in touch if you're interested in how brain circuits implement distributed computation, including dopamine-based distributed RL and probabilistic representations.

19.11.2025 18:04 — 👍 33    🔁 24    💬 1    📌 0
Post image

Checking out the Princeton trails on our lab retreat

03.11.2025 17:44 — 👍 28    🔁 1    💬 0    📌 0
Post image

📄 Paper: arxiv.org/abs/2507.07207
💻 Code: github.com/smonsays/sca...

04.11.2025 14:34 — 👍 2    🔁 0    💬 0    📌 0
Post image

But, not all training distributions enable compositional generalization -- even with scale.
Strategically choosing the training data matters a lot.

04.11.2025 14:34 — 👍 1    🔁 0    💬 1    📌 0
Post image

We prove that MLPs can implement a general class of compositional tasks ("hyperteachers") using only a linear number of neurons in the number of modules, beating the exponential!

04.11.2025 14:34 — 👍 1    🔁 0    💬 1    📌 0
Post image

It turns out that simply scaling multilayer perceptrons / transformers can lead to compositional generalization.

04.11.2025 14:34 — 👍 1    🔁 0    💬 1    📌 0
Post image

Most natural data has compositional structure. This leads to a combinatorial explosion that is impossible to fully cover in the training data.

It might be tempting to think that we need to equip neural network architectures with stronger symbolic priors to capture this compositionality, but do we?

04.11.2025 14:34 — 👍 1    🔁 0    💬 1    📌 0
Plots showing how scaling model size and data size leads to compositional generalization

Plots showing how scaling model size and data size leads to compositional generalization

A generated image composition of a clock inside a treasure chest inside a transparent cube.

A generated image composition of a clock inside a treasure chest inside a transparent cube.

Does scaling lead to compositional generaliztation?

Our #NeurIPS2025 Spotlight paper suggests that it can -- with the right training distribution.

🧵 A short thread:

04.11.2025 14:34 — 👍 14    🔁 1    💬 1    📌 0
Nassau Hall. Photo credit to Debbie and John O'Boyle

Nassau Hall. Photo credit to Debbie and John O'Boyle

I'm joining Princeton University as an Associate Professor of Computer Science and Psychology this fall! Princeton is ambitiously investing in AI and Natural & Artificial Minds, and I'm excited for my lab to contribute. Recruiting postdocs and Ph.D. students in CS and Psychology — join us!

12.06.2025 14:29 — 👍 47    🔁 2    💬 4    📌 0
Post image

Are transformers smarter than you? Hypernetworks might explain why.

Come checkout our Oral at #ICLR tomorrow (Apr 26th, poster at 10:00, Oral session 6C in the afternoon).

openreview.net/forum?id=V4K...

25.04.2025 04:50 — 👍 10    🔁 0    💬 1    📌 0
Preview
Emergent Symbolic Mechanisms Support Abstract Reasoning in Large Language Models Many recent studies have found evidence for emergent reasoning capabilities in large language models, but debate persists concerning the robustness of these capabilities, and the extent to which they ...

LLMs have shown impressive performance in some reasoning tasks, but what internal mechanisms do they use to solve these tasks? In a new preprint, we find evidence that abstract reasoning in LLMs depends on an emergent form of symbol processing arxiv.org/abs/2502.20332 (1/N)

10.03.2025 19:08 — 👍 115    🔁 33    💬 4    📌 3
Preview
The Principle of Neural Science I first encountered Principles of Neural Science as a young student of neuroscience. The book was filled with delightful narratives…

New blog post: The principle of neuroscience. medium.com/@kording/the...

14.02.2025 23:04 — 👍 46    🔁 5    💬 3    📌 1
Preview
Dynamic consensus-building between neocortical areas via long-range connections The neocortex is organized into functionally specialized areas. While the functions and underlying neural circuitry of individual neocortical areas are well studied, it is unclear how these regions op...

For my first Bluesky post, I'm very excited to share a thread on our recent work with Mitra Javadzadeh, investigating how connections between cortical areas shape computations in the neocortex! [1/7] www.biorxiv.org/content/10.1...

31.01.2025 02:57 — 👍 19    🔁 11    💬 1    📌 1
Post image

Pre-print 🧠🧪
Is mechanism modeling dead in the AI era?

ML models trained to predict neural activity fail to generalize to unseen opto perturbations. But mechanism modeling can solve that.

We say "perturbation testing" is the right way to evaluate mechanisms in data-constrained models

1/8

08.01.2025 16:33 — 👍 116    🔁 46    💬 4    📌 2
Preview
2024: A Review of the Year in Neuroscience Feeling a bit wired

Cutting it a bit fine, but here’s my review of the year in neuroscience for 2024

The eighth of these, would you believe? We’ve got dark neurons, tiny monkeys, the most complete brain wiring diagram ever constructed, and much more…
Published on The Spike

Enjoy!

medium.com/the-spike/20...

30.12.2024 16:00 — 👍 190    🔁 73    💬 7    📌 18
An introduction to reinforcement learning for neuroscience | Published in Neurons, Behavior, Data analysis, and Theory By Kristopher T. Jensen. Reinforcement learning for neuroscientists

I wrote an introduction to RL for neuroscience last year that was just published in NBDT: tinyurl.com/5f58zdy3

This review aims to provide some intuition for and derivations of RL methods commonly used in systems neuroscience, ranging from TD learning through the SR to deep and distributional RL!

21.12.2024 17:59 — 👍 129    🔁 31    💬 6    📌 0
Preview
Monster Models Systems-level biology is hard because systems-level engineering is hard.

Stitching component models into system models has proven difficult in biology. But how much easier has it been in engineering? www.argmin.net/p/monster-mo...

20.12.2024 15:29 — 👍 12    🔁 2    💬 3    📌 1
Post image

🚨 New Paper!

Can neuroscience localizers uncover brain-like functional specializations in LLMs? 🧠🤖

Yes! We analyzed 18 LLMs and found units mirroring the brain's language, theory of mind, and multiple demand networks!

w/ @gretatuckute.bsky.social, @abosselut.bsky.social, @mschrimpf.bsky.social
🧵👇

19.12.2024 15:06 — 👍 105    🔁 27    💬 2    📌 5

1/ Okay, one thing that has been revealed to me from the replies to this is that many people don't know (or refuse to recognize) the following fact:

The unts in ANN are actually not a terrible approximation of how real neurons work!

A tiny 🧵.

🧠📈 #NeuroAI #MLSky

16.12.2024 20:03 — 👍 151    🔁 38    💬 21    📌 17

For my first post on Bluesky .. I'll start by announcing our 2025 edition of EEML which will be in Sarajevo :) ! I'm really excited about it and hope to see many of you there. Please follow the website (and Bluesky account) for more details which are coming soon ..

15.12.2024 18:39 — 👍 32    🔁 7    💬 1    📌 0

Have you had private doubts whether we'll ever understand the brain? Whether we'll be able explain psychological phenomena in an exhaustive way that ranges from molecules to membranes to synapses to cells to cell types to circuits to computation to perception and behavior?

14.11.2024 05:18 — 👍 39    🔁 12    💬 1    📌 1
Preview
The broader spectrum of in-context learning The ability of language models to learn a task from a few examples in context has generated substantial interest. Here, we provide a perspective that situates this type of supervised few-shot learning...

What counts as in-context learning (ICL)? Typically, you might think of it as learning a task from a few examples. However, we’ve just written a perspective (arxiv.org/abs/2412.03782) suggesting interpreting a much broader spectrum of behaviors as ICL! Quick summary thread: 1/7

10.12.2024 18:17 — 👍 122    🔁 31    💬 2    📌 1
Post image

Thrilled to share our NeurIPS Spotlight paper with Jan Bauer*, @aproca.bsky.social*, @saxelab.bsky.social, @summerfieldlab.bsky.social, Ali Hummos*! openreview.net/pdf?id=AbTpJ...

We study how task abstractions emerge in gated linear networks and how they support cognitive flexibility.

03.12.2024 16:04 — 👍 65    🔁 15    💬 2    📌 1

Would love to be added as well :)

20.11.2024 20:50 — 👍 0    🔁 0    💬 0    📌 0

Great thread from @michaelhendricks.bsky.social!

Reminds me of something Larry Abbott once said to me at a summer school:

Many physicists come into neuroscience assuming that the failure to find laws of the brain was just because biologists aren't clever enough. In fact, there are no laws.

🧠📈 🧪

13.11.2024 18:49 — 👍 68    🔁 9    💬 4    📌 1
Post image

(1/5) Very excited to announce the publication of Bayesian Models of Cognition: Reverse Engineering the Mind. More than a decade in the making, it's a big (600+ pages) beautiful book covering both the basics and recent work: mitpress.mit.edu/978026204941...

18.11.2024 16:25 — 👍 521    🔁 119    💬 15    📌 15

🙋‍♂️

16.11.2024 15:30 — 👍 1    🔁 0    💬 0    📌 0

To help find people at the intersection of neuroscience and AI. Of course let me know if I missed someone or you’d like to be added 🧪 🧠

#neuroskyence

go.bsky.app/CAfmKQs

13.11.2024 15:26 — 👍 50    🔁 18    💬 33    📌 0

I think you are already part of it - just double checked :)

13.11.2024 15:20 — 👍 1    🔁 0    💬 0    📌 0
Preview
GitHub - smonsays/hypernetwork-attention: Official code for the paper "Attention as a Hypernetwork" Official code for the paper "Attention as a Hypernetwork" - smonsays/hypernetwork-attention

tl;dr: hypernetworks are hiding in our beloved transformers.

github.com/smonsays/hyp...

28.10.2024 15:27 — 👍 1    🔁 0    💬 0    📌 0

@smonsays is following 20 prominent accounts