Simon Schug @smonsays - Bluesky Profile

Nassau Hall. Photo credit to Debbie and John O'Boyle

I'm joining Princeton University as an Associate Professor of Computer Science and Psychology this fall! Princeton is ambitiously investing in AI and Natural & Artificial Minds, and I'm excited for my lab to contribute. Recruiting postdocs and Ph.D. students in CS and Psychology — join us!

12.06.2025 14:29 — 👍 47 🔁 2 💬 4 📌 0

Are transformers smarter than you? Hypernetworks might explain why.

Come checkout our Oral at #ICLR tomorrow (Apr 26th, poster at 10:00, Oral session 6C in the afternoon).

openreview.net/forum?id=V4K...

25.04.2025 04:50 — 👍 8 🔁 0 💬 1 📌 0

Emergent Symbolic Mechanisms Support Abstract Reasoning in Large Language Models Many recent studies have found evidence for emergent reasoning capabilities in large language models, but debate persists concerning the robustness of these capabilities, and the extent to which they ...

LLMs have shown impressive performance in some reasoning tasks, but what internal mechanisms do they use to solve these tasks? In a new preprint, we find evidence that abstract reasoning in LLMs depends on an emergent form of symbol processing arxiv.org/abs/2502.20332 (1/N)

10.03.2025 19:08 — 👍 114 🔁 33 💬 4 📌 3

The Principle of Neural Science I first encountered Principles of Neural Science as a young student of neuroscience. The book was filled with delightful narratives…

New blog post: The principle of neuroscience. medium.com/@kording/the...

14.02.2025 23:04 — 👍 45 🔁 5 💬 3 📌 1

Dynamic consensus-building between neocortical areas via long-range connections The neocortex is organized into functionally specialized areas. While the functions and underlying neural circuitry of individual neocortical areas are well studied, it is unclear how these regions op...

For my first Bluesky post, I'm very excited to share a thread on our recent work with Mitra Javadzadeh, investigating how connections between cortical areas shape computations in the neocortex! [1/7] www.biorxiv.org/content/10.1...

31.01.2025 02:57 — 👍 19 🔁 11 💬 1 📌 1

Pre-print 🧠🧪
Is mechanism modeling dead in the AI era?

ML models trained to predict neural activity fail to generalize to unseen opto perturbations. But mechanism modeling can solve that.

We say "perturbation testing" is the right way to evaluate mechanisms in data-constrained models

1/8

08.01.2025 16:33 — 👍 115 🔁 46 💬 4 📌 2

2024: A Review of the Year in Neuroscience Feeling a bit wired

Cutting it a bit fine, but here’s my review of the year in neuroscience for 2024

The eighth of these, would you believe? We’ve got dark neurons, tiny monkeys, the most complete brain wiring diagram ever constructed, and much more…
Published on The Spike

Enjoy!

medium.com/the-spike/20...

30.12.2024 16:00 — 👍 190 🔁 73 💬 7 📌 18

An introduction to reinforcement learning for neuroscience | Published in Neurons, Behavior, Data analysis, and Theory By Kristopher T. Jensen. Reinforcement learning for neuroscientists

I wrote an introduction to RL for neuroscience last year that was just published in NBDT: tinyurl.com/5f58zdy3

This review aims to provide some intuition for and derivations of RL methods commonly used in systems neuroscience, ranging from TD learning through the SR to deep and distributional RL!

21.12.2024 17:59 — 👍 129 🔁 31 💬 6 📌 0

Monster Models Systems-level biology is hard because systems-level engineering is hard.

Stitching component models into system models has proven difficult in biology. But how much easier has it been in engineering? www.argmin.net/p/monster-mo...

20.12.2024 15:29 — 👍 12 🔁 2 💬 3 📌 1

🚨 New Paper!

Can neuroscience localizers uncover brain-like functional specializations in LLMs? 🧠🤖

Yes! We analyzed 18 LLMs and found units mirroring the brain's language, theory of mind, and multiple demand networks!

w/ @gretatuckute.bsky.social, @abosselut.bsky.social, @mschrimpf.bsky.social
🧵👇

19.12.2024 15:06 — 👍 105 🔁 27 💬 2 📌 5

1/ Okay, one thing that has been revealed to me from the replies to this is that many people don't know (or refuse to recognize) the following fact:

The unts in ANN are actually not a terrible approximation of how real neurons work!

A tiny 🧵.

🧠📈 #NeuroAI #MLSky

16.12.2024 20:03 — 👍 151 🔁 38 💬 21 📌 17

For my first post on Bluesky .. I'll start by announcing our 2025 edition of EEML which will be in Sarajevo :) ! I'm really excited about it and hope to see many of you there. Please follow the website (and Bluesky account) for more details which are coming soon ..

15.12.2024 18:39 — 👍 32 🔁 7 💬 1 📌 0

Have you had private doubts whether we'll ever understand the brain? Whether we'll be able explain psychological phenomena in an exhaustive way that ranges from molecules to membranes to synapses to cells to cell types to circuits to computation to perception and behavior?

14.11.2024 05:18 — 👍 39 🔁 12 💬 1 📌 1

The broader spectrum of in-context learning The ability of language models to learn a task from a few examples in context has generated substantial interest. Here, we provide a perspective that situates this type of supervised few-shot learning...

What counts as in-context learning (ICL)? Typically, you might think of it as learning a task from a few examples. However, we’ve just written a perspective (arxiv.org/abs/2412.03782) suggesting interpreting a much broader spectrum of behaviors as ICL! Quick summary thread: 1/7

10.12.2024 18:17 — 👍 123 🔁 31 💬 2 📌 1

Thrilled to share our NeurIPS Spotlight paper with Jan Bauer*, @aproca.bsky.social*, @saxelab.bsky.social, @summerfieldlab.bsky.social, Ali Hummos*! openreview.net/pdf?id=AbTpJ...

We study how task abstractions emerge in gated linear networks and how they support cognitive flexibility.

03.12.2024 16:04 — 👍 64 🔁 15 💬 2 📌 1

Would love to be added as well :)

20.11.2024 20:50 — 👍 0 🔁 0 💬 0 📌 0

Great thread from @michaelhendricks.bsky.social!

Reminds me of something Larry Abbott once said to me at a summer school:

Many physicists come into neuroscience assuming that the failure to find laws of the brain was just because biologists aren't clever enough. In fact, there are no laws.

🧠📈 🧪

13.11.2024 18:49 — 👍 68 🔁 9 💬 4 📌 1

(1/5) Very excited to announce the publication of Bayesian Models of Cognition: Reverse Engineering the Mind. More than a decade in the making, it's a big (600+ pages) beautiful book covering both the basics and recent work: mitpress.mit.edu/978026204941...

18.11.2024 16:25 — 👍 521 🔁 119 💬 15 📌 15

🙋‍♂️

16.11.2024 15:30 — 👍 1 🔁 0 💬 0 📌 0

To help find people at the intersection of neuroscience and AI. Of course let me know if I missed someone or you’d like to be added 🧪 🧠

#neuroskyence

go.bsky.app/CAfmKQs

13.11.2024 15:26 — 👍 50 🔁 18 💬 33 📌 0

I think you are already part of it - just double checked :)

13.11.2024 15:20 — 👍 1 🔁 0 💬 0 📌 0

GitHub - smonsays/hypernetwork-attention: Official code for the paper "Attention as a Hypernetwork" Official code for the paper "Attention as a Hypernetwork" - smonsays/hypernetwork-attention

tl;dr: hypernetworks are hiding in our beloved transformers.

github.com/smonsays/hyp...

28.10.2024 15:27 — 👍 0 🔁 0 💬 0 📌 0

Attention as a Hypernetwork Transformers can under some circumstances generalize to novel problem instances whose constituent parts might have been encountered during training but whose compositions have not. What mechanisms und...

With language being highly compositional itself, could the hypernetwork mechanism play a part in explaining the success of multi-head attention?

Maybe! Have a look at the paper in case you are curious!

arxiv.org/abs/2406.05816

28.10.2024 15:27 — 👍 1 🔁 0 💬 1 📌 0

Indeed in line with the hypothesis that the hypernetwork mechanism supports compositionality, this modification (hyla) improves performance on unseen tasks.

28.10.2024 15:26 — 👍 0 🔁 0 💬 1 📌 0

So what happens if we strengthen the hypernetwork mechanism?
Could we maybe further improve compositionality?

We can for instance make the value network nonlinear - without introducing additional parameters.

28.10.2024 15:26 — 👍 0 🔁 0 💬 1 📌 0

Training a simple decoder on the latent codes of training tasks allows us to predict the operations performed by the network on unseen tasks - especially for later layers.

28.10.2024 15:25 — 👍 0 🔁 0 💬 1 📌 0

To test this hypothesis we train small transformer models to solve abstract reasoning tasks. When we look at the latent code of tasks they have never seen before, we find a highly structured space.

28.10.2024 15:25 — 👍 0 🔁 0 💬 1 📌 0

From the hypernetwork perspective, a compact latent code specifies key-query specific operations.
Importantly, these operations are reusable: the same hypernetwork is used across all key-query pairs.

Could their reuse allow transformers to compositionally generalize?

28.10.2024 15:24 — 👍 0 🔁 0 💬 1 📌 0

For a given query, multi-head attention can be rewritten as a sum over the outputs of key-query specific value networks configured by a hypernetwork.

These hypernetworks are comparably simple: Both the hypernetwork and its value network are linear.

So why could this matter?

28.10.2024 15:23 — 👍 0 🔁 0 💬 1 📌 0

We know that hypernetworks - neural networks that generate the weights of another neural network - can compositionally generalize. So, should we build more hypernetworks into our transformers?

It turns out that attention with multiple heads already has them built-in!

28.10.2024 15:22 — 👍 0 🔁 0 💬 1 📌 0

Simon Schug

Latest posts by smonsays.bsky.social on Bluesky

@smonsays is following 20 prominent accounts