David Bau's Avatar

David Bau

@davidbau.bsky.social

Interpretable Deep Networks. http://baulab.info/ @davidbau

2,086 Followers  |  240 Following  |  107 Posts  |  Joined: 16.10.2023  |  1.8914

Latest posts by davidbau.bsky.social on Bluesky

Preview
Koyena Pal (@koyena.bsky.social) 🚨 Registration is live! 🚨 The New England Mechanistic Interpretability (NEMI) Workshop is happening Aug 22nd 2025 at Northeastern University! A chance for the mech interp community to nerd out on how models really work πŸ§ πŸ€– 🌐 Info: nemiconf.github.io/summer25/ πŸ“ Register: https://forms.gle/v4kJCweE3UUHUE81A

The New England Mechanistic Interpretability Workshop, NEMI 2025 is August 22 in Boston.

Talks, posters, meals, discussion... Most of all, an excellent chance to chat about new ideas with other great researchers in the field!

Help spread the word - register and repost -

bsky.app/profile/koy...

01.07.2025 15:00 β€” πŸ‘ 11    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
@nikhil07prakash.bsky.social Step 2: The LM binds the character-object-state triplet by copying their OIs (source) to the state token. These OIs also flow to the last token via corresponding tokens in the query (pointer). Next, LM uses both copies to attend to the correct state from last token and fetch its state OI (payload).

LLM reasoning is hidden, sometimes illusory.

But careful experiments can reveal surprising structure, exposing nontrivial mechanisms like double pointers. Symbolic-program techniques that you might not expect in a neural network.

Explanations and paper links:

bsky.app/profile/nik...

25.06.2025 15:00 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Nikhil's paper has many other remarkable experiments, including interventions that reveal discrete reasoning steps where the two people in a story are aware or unaware of each others' actions.

The picture is not complete, but it's worth reading and contemplating.

25.06.2025 15:00 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Benjamin Riley (@benjaminjriley.bsky.social) This isn't what cognitive scientists mean by theory of mind, and LLMs are not "applying" theory of mind when they produce text that includes the words "birds" and "cat" and "mind."

Looking inside these models is a way to break into the Chinese room; it is a way to approach the puzzle of whether apparent skills like Theory-of-Mind are just an LLM Clever Hans trick, or whether the model contains reasonable representations.

bsky.app/profile/ben...

25.06.2025 15:00 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

A double dereference can be seen at L35, where patching states on identical words between identical permuted stories changes the answer.

That's because "dereference thing 2" looks for the floating definition of "where is thing 2."

The patch redirects this second indirection!

25.06.2025 15:00 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

It is strong evidence that what's stored deeper than 50 is not a value but a POINTER dereferenced at 55.

Dai calls these OI's arxiv.org/abs/2409.05448 and
@fjiahai.bsky.social‬ calls them binding IDs arxiv.org/abs/2310.17191

It is a general "lookback" pattern.

Next surprise is nested...

25.06.2025 15:00 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

There is a lot to unpack in Nikhil's paper and it merits a close reading.

The first thing to understand is his remarkable Fig 2 experiment. Why does the patching of one state, which alters coffee->tea, switch to coffee->beer when you move states deeper than layer 55?

25.06.2025 15:00 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
@nikhil07prakash.bsky.social How do language models track mental states of each character in a story, often referred to as Theory of Mind? We reverse-engineered how LLaMA-3-70B-Instruct handles a belief-tracking task and found something surprising: it uses mechanisms strikingly similar to pointer variables in C programming!

The new "Lookback" paper from @nikhil07prakash.bsky.social‬ contains a surprising insight...

70b/405b LLMs use double pointers, akin to C programmers' double (**) pointers. They show up when the LLM is "knowing what Sally knows Ann knows", i.e., Theory of Mind.

bsky.app/profile/nik...

25.06.2025 15:00 β€” πŸ‘ 27    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0

This is the book to read before protesting in LA

11.06.2025 06:46 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

If you do NOT live in a red state, then please have a FRIENDLY chat with somebody who does, to make sure they are aware of what is happening and the stakes.

It does NO good to shout at MAGA. We need to talk to people. Here are some thoughts about engaging on X.

x.com/davidbau/st...

03.06.2025 16:15 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

If you live in in ME, KS, WV, IN, LA, MS, TX, ID, NC, AK, TN, MT, ND, AL, NE, SC, or any red state then your senator has outsized influence.

Read here about local impact and how to contact them. They DO listen to voters. They WILL listen to you!

thevisible.net/posts/005-a...

03.06.2025 16:15 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

FRIENDS: American science is being decimated by Congress NOW.

Your help is needed to fix this. The current DC plan PERMANENTLY slashes NSF, NIH, all science training. Money isn't redirectedβ€”it's gone.

Please read+share what's happening

thevisible.net/posts/004-s...

03.06.2025 16:15 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1

Recognize that when you engage with the broad public, many will disbelieve you.

You will find many who will "mansplain" science back to you.

Push back with clarity and evidence. You will help reveal the TACO nature of authoritarian views.

Do not fear that. It is our job.

29.05.2025 19:53 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Pranksters vs. Autocrats: Why Dilemma Actions Advance Nonviolent Activism (Brown Democracy Medal) Pranksters vs. Autocrats: Why Dilemma Actions Advance Nonviolent Activism (Brown Democracy Medal) [Popovic, Srdja, McClennen, Sophia A.] on Amazon.com. *FREE* shipping on qualifying offers. Pranksters vs. Autocrats: Why Dilemma Actions Advance Nonviolent Activism (Brown Democracy Medal)

To understand the importance and role of controlling the narrative under an authoritarian regime, read Gene Sharp and Srdja Popovic.

Wrong: Trump is the next Hitler.
Right: Trump Always Chickens Out (TACO) and is screwing up our life.

www.amazon.com/dp/1501756052/

29.05.2025 19:52 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

My concrete advice to PhD students:

(1) Do not be cowed by the fascist horde. Do engage with the public, especially skeptics.
(2) Speak on the things where you are expert, not where you are a dabbler. But recognize you are expert in many things.
(3) Be friendly, clear and firm.

29.05.2025 19:49 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We must not hide in our bluesky corner where voters are not. We need to flood the unfriendly airwaves on X, youtube, tiktok. And we must show up with our faces.

We need to be vulnerable, because no AI misinformation bot can match "being a real person."

Show your face. Defend your work.

29.05.2025 11:09 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We need to get our heads out of our *sses.

This is not the moment to focus on your personal ambition, to show why your latest sophisticated widget is better than doctor competitor's intricate theorem.

The whole scientific franchise is under attack. It is time to defend it to the public.

29.05.2025 11:04 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We cannot let our fear of political retribution to lead us to cede the internet to stone-age propaganda. Academics: please stand up on social (and all) media. You are expert teachers.

Share your personal stories. Defend your work to the public. Defend your international students.

29.05.2025 10:54 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Even my lefty Boston neighbors do not know. They think Rubio is expelling troublemakers, radicals, communists. Or that just Harvard is in the crosshairs.

They are totally unaware that he has stopped all student visas, or why that kills US science.

If we are academics: we need to teach.

29.05.2025 10:48 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Because of propaganda Americans do not understand what Rubio is doing with visas. "I gave you a visa to come and study," they think.

x.com/CitizenFree...

NO, he has not!! Please help explain to X how Rubio has stopped *ALL* student visas, and how it is killing US science.

29.05.2025 10:27 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Here is some evidence. But it doesn't seem to support β€ͺ@ukraineman101.bsky.social‬.

www.economist.com/science-and-...

29.05.2025 10:13 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

bsky.app/profile/chri...

29.05.2025 08:02 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
First They Came - Wikipedia

To engage in the logic is to lose the game.

The fascist playbook: normalize hate, starting at the most vulnerable populations, until everyone is subjugated.

en.wikipedia.org/wiki/First_T...

28.05.2025 23:57 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Trump administration orders US embassies to stop student visa interviews Directive could severely delay visa processing and hurt universities that rely on foreign students for revenue

I am in Boston because I believe in American democracy. I love our freedoms and our culture. I want to be teaching US students.

And I want to live and teach here in my home, where I grew up.

Why are we setting our home on fire?

www.theguardian.com/us-news/202...

28.05.2025 13:29 β€” πŸ‘ 5    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Post image

The USA is a magnet for AI talent! But with today's clampdown on international students our ecosystem is suddenly trashed.

Several of my projects have incoming PhD talent signed but frozen out in Germany, Denmark...

We are now discussing setting up shop in Toronto or London.

28.05.2025 13:29 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Here is some of what my lab does: baulab.info/

Just yesterday I had a conversation with other Boston natives about setting up a new AI incubator in town.

I am so excited by this. It is so important to figure out how to attract, nurture, and retain talent locally.

28.05.2025 13:29 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

When setting up my AI lab I faced a choice between Toronto and Boston. I chose Boston, my home and the world's best incubator for research talent.

Here you can take a short stroll to meet with top minds in hundreds of fields from AI to astronomy, batteries to biotech.

28.05.2025 13:29 β€” πŸ‘ 12    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

The techno utopian philosophy!

But Opus didn't like that essay and recommended one about the venture capital system.

25.05.2025 14:48 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I found the essay gripping. Although I asked for it and critiqued it, it is not my own perspective.

I think Opus's advocacy about AI safety, with its specific diagnosis of the problem, is worth reading.

The essay can be found here.

davidbau.com/archives/20...

25.05.2025 13:12 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Black Box, Blood Money

Friday evening, an Italian tourist escaped a torturer in Manhattan who was after his crypto password. I asked Anthropic's Opus 4 to analyze and explain what the episode might teach us about AI.

It critiqued my guidance, instead proposing a focus on VCs:

25.05.2025 13:12 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

@davidbau is following 20 prominent accounts