David Bau's Avatar

David Bau

@davidbau.bsky.social

Interpretable Deep Networks. http://baulab.info/ @davidbau

2,125 Followers  |  241 Following  |  140 Posts  |  Joined: 16.10.2023  |  2.0703

Latest posts by davidbau.bsky.social on Bluesky

Looking forward to #COLM2025 tomorrow. DM me if you'll also be there and want to meet to chat.

06.10.2025 12:10 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
NDIF Team We're a research computing project cracking open the mysteries inside large-scale AI systems. The NSF National Deep Inference Fabric consists of a unique combination of hardware and software that pr...

And kudos to @ndif-team.bsky.social for keeping up with weekly youtube video posts on AI interpretability!

www.youtube.com/@NDIFTeam

03.10.2025 18:53 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

There are a lot of interesting details that surface when you use SAEs to understand and control diffusion image synthesis models. Learn more in @wendlerc.bsky.social's talk.

03.10.2025 18:52 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
David Bau on How Artificial Intelligence Works Yascha Mounk and David Bau delve into the β€œblack box” of AI.

On the Good Fight podcast w substack.com/@yaschamounk I give a quick but careful primer on how modern AI works.

I also chat about our responsibility as machine learning scientists, and what we need to fix to get AI right.

Take a listen and reshare -

www.persuasion.community/p/david-bau

03.10.2025 08:58 β€” πŸ‘ 6    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

I love the 'opinionated' approach taken by Aaron + team in this survey. It captures the ongoing work around the central casual puzzles we face in mechanistic interpretability.

01.10.2025 14:25 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Thanks @kmahowald.bsky.social!

bsky.app/profile/kmah...

28.09.2025 00:46 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
The Dual-Route Model of Induction Do LLMs copy meaningful text by rote or by understanding meaning? Webpage for The Dual-Route Model of Induction (Feucht et al., 2025).

Read more at arxiv.org/abs/2504.03022 <- at COLM

footprints.baulab.info <- token context erasure
arithmetic.baulab.info <- concept parallelograms
dualroute.baulab.info <- the second induction route,
w a neat colab notebook.

@ericwtodd.bsky.social @byron.bsky.social @diatkinson.bsky.social

27.09.2025 20:54 β€” πŸ‘ 7    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

The takeaway for me: LLMs separate their token processing from their conceptual processing. Akin to humans' dual route processing of speech.

We need to be aware when an LM is thinking about tokens or concepts.

They do both, and it makes a difference which way it's thinking.

27.09.2025 20:54 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

If token-processing and concept-processing are largely separate, does killing one kill the other? Chris Olah's team in Olsson 2022 hypothesized that ICL emerges from token induction.

@keremsahin22.bsky.social + Sheridan are finding cool ways to look into Olah's induction hypothesis too!

27.09.2025 20:54 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

The representation space within the concept induction heads also has a more "meaningful" geometry than the transformer as a whole.

Sheridan discovered (Neurips mechint 2025) that semantic vector arithmetic works better in this space. (Token semantics work in tokenspace.)

arithmetic.baulab.info/

27.09.2025 20:54 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

If you disable token induction heads and ask the model to copy text with only the concept induction heads, it will NOT copy exactly. It will paraphrase the text.

That happens even for computer code. They copy the BEHAVIOR of the code, but write it in a totally different way!

27.09.2025 20:54 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

An amazing thing about the "concepts" in this 2nd route: they are *not* literal words. They are totally language-independent.

If the target context is in Chinese, they will copy the concept into Chinese. Or patch them between runs to get Italian. They mediate translation.

27.09.2025 20:54 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

This second set of text-copying attention heads also shows up in every LLM we tested, and these heads work in a totally different way from token induction heads.

Instead of copying tokens, they copy *concepts*.

27.09.2025 20:54 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

So Sheridan scrutinized copying mechanisms in LLMs and found a SECOND route.

Yes, the token induction of Elhage and Olsson is there.

But there is *another* route where the copying is done in a different way. It shows up it in attention heads that do 2-ahead copying.
bsky.app/profile/sfe...

27.09.2025 20:54 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Sherdian's erasure is Bad News for induction heads.

Induction heads are how transformers copy text: they find earlier tokens in identical contexts. (Elhage 2021, Olsson 2022 arxiv.org/abs/2209.11895)

But when that context "what token came before" is erased, how could induction possibly work?

27.09.2025 20:54 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Why is that weird? In a transformer each token knows its context, "which tokens came before" and in probes Sheridan found that info is always there when sequences are meaningLESS.

But in meaningFUL phrases, the LM often ERASES the context!!

Exactly opposite of what we expected.

27.09.2025 20:54 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

The work starts with a mystery!

In footprints.baulab.info (EMNLP) while dissecting the problem of how LMs read badly tokenized words like " n.ort.he.astern", Sheridan found a huge surprise: they do it by _erasing_ contextual information.

27.09.2025 20:54 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Who is going to be at #COLM2025?

I want to draw your attention to a COLM paper by my student @sfeucht.bsky.social that has totally changed the way I think and teach about LLM representations. The work is worth knowing.

And you can meet Sheridan at COLM, Oct 7!
bsky.app/profile/sfe...

27.09.2025 20:54 β€” πŸ‘ 38    πŸ” 8    πŸ’¬ 1    πŸ“Œ 2
Preview
NDIF Hot-swapping Beta Testing Do you have a research project where you plan to study many different models? NDIF will soon be deploying model hot-swapping, which will enable users to access any HuggingFace model remotely via NDIF. We are soliciting applications for a pilot program to beta test our hot-swapping functionality on real research. By participating, you will: Be in the first cohort of users to access models beyond our whitelist (including checkpoints) Directly control which models are hosted on the NDIF backend Receive 1:1 research and technical support from NDIF team Give feedback to NDIF, guiding future user experience Application Information: Apply by October 1st, 2025 Acceptance by October 15th, 2025 Applications will be reviewed based on impact, feasibility, and fit with the NDIF/NNsight platform We are particularly interested in supporting work on model checkpoints and training dynamics Please email info@ndif.us with any questions.

You'll get early access to the new system.

We will work with you to make sure that the models you need work with our new system: dedicated support!

And by being an early adopter you will help us make NDIF more useful to research community. Thank you!

t.co/Eo3HTC3bQn

26.09.2025 18:47 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
NDIF Hot-swapping Beta Testing Do you have a research project where you plan to study many different models? NDIF will soon be deploying model hot-swapping, which will enable users to access any HuggingFace model remotely via NDIF. We are soliciting applications for a pilot program to beta test our hot-swapping functionality on real research. By participating, you will: Be in the first cohort of users to access models beyond our whitelist (including checkpoints) Directly control which models are hosted on the NDIF backend Receive 1:1 research and technical support from NDIF team Give feedback to NDIF, guiding future user experience Application Information: Apply by October 1st, 2025 Acceptance by October 15th, 2025 Applications will be reviewed based on impact, feasibility, and fit with the NDIF/NNsight platform We are particularly interested in supporting work on model checkpoints and training dynamics Please email info@ndif.us with any questions.

What's new?

We are EXPANDING NDIF to support MANY more models!

But we need your help. If you are doing research on e.g., OLMO checkpoints or any other model beyond this list nnsight.net/status/

Then sign up for the NDIF Hot-swapping pilot here:

t.co/Eo3HTC3bQn

26.09.2025 18:47 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
NSF National Deep Inference Fabric NDIF is a research computing project that enables researchers and students to crack open the mysteries inside large-scale AI systems.

As you know ndif.us/ is a "white-box" inference service.

It lets you crack open the model and trace and modify its internals. We run the models for you on NSF servers.

Up to now, NDIF has supported small set of a dozen models.

26.09.2025 18:47 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Especially if you're curious about questions like
β€’ What is GPT-OSS 120B thinking inside?
β€’ What does OLMO-32b learn between all its hundreds of checkpoints?
β€’ Why do Qwen3 layers have such different roles from LLama's?
β€’ How does Foundation-Sec reason about cybersecurity?

26.09.2025 18:47 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Announcing a broad expansion of the National Deep Inference Fabric.

This could be relevant to your research...

26.09.2025 18:47 β€” πŸ‘ 11    πŸ” 3    πŸ’¬ 1    πŸ“Œ 2

The NDIF youtube talk series continues... Don't miss the fascinating talks on by Xu Pan and Josh Engels, on the NDIF youtube channel.

www.youtube.com/channel/UCaQ...

20.09.2025 19:20 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
davidbau.com The Truth is Our Superpower

In the wake of the Jimmy Kimmel firing: Do not underestimate the power of the truth.

The truth is our superpower.

davidbau.com/archives/202...

20.09.2025 19:17 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1
Preview
Historians See Autocratic Playbook in Trump’s Attacks on Science

www.nytimes.com/2025/08/31/s...

31.08.2025 19:42 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Monday: Trump tries to fire Fed Governor Lisa Cook (first time in 111 years).
Thursday: CDC chief dismissed, four top scientists resign.

Discredit, dismiss, blame.

History shows exactly where this three-step pattern leads.

29.08.2025 02:04 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Preview
LIVE: CDC rallies outside Atlanta HQ after director's ousting YouTube video by Associated Press

www.youtube.com/live/KZ7QvPH...

29.08.2025 13:45 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Independent science is not about power-sharing. It is about survival.

In 1932, Soviet citizens learned that when scientists are silenced, starvation follows.

Every generation must choose.

davidbau.com/archives/20...

29.08.2025 02:04 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

In 1932, Stalin's authoritarian central planning program wrecked Soviet agriculture, starving millions. As the crisis deepened, Trofim Lysenko, a mediocre agronomist backed by Stalin, rejected genetics as "bourgeois pseudoscience" and reorganized Soviet agriculture around his ideology...

29.08.2025 02:04 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@davidbau is following 20 prominent accounts