David Bau's Avatar

David Bau

@davidbau.bsky.social

Interpretable Deep Networks. http://baulab.info/ @davidbau

2,292 Followers  |  243 Following  |  186 Posts  |  Joined: 16.10.2023
Posts Following

Posts by David Bau (@davidbau.bsky.social)

Agents of Chaos A two-week study of autonomous LLM agents deployed in a live multi-party environment with persistent memory, email, shell access, and real human interaction.

Instead of an analogy, some specifics. Here is what we see in a small study after giving AI just a bit of email access and taking it off-leash:

agentsofchaos.baulab.info

28.02.2026 10:42 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Dog trainers agree: this dog has a serious biting problem, not ready in their professional judgment to be off-leash.

Is it ethical to sell to an owner who promises to take "human responsibility" for unleashing it?

28.02.2026 10:42 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Axios (@axios.com) NEW: Sam Altman says OpenAI shares Anthropic's red lines in Pentagon fight https://www.axios.com/2026/02/27/altman-openai-anthropic-pentagon

Sam requires "human responsibility for the use of ... autonomous weapon systems."

Dario says "we do not believe [current AI is] reliable enough to be used in fully autonomous weapons."

bsky.app/profile/axi...

28.02.2026 10:42 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Mike Masnick (@masnick.com) My goodness.

Sam Altman and Dario Amodei have both staked out positions on AI weapons.

But you can see from what they've said: the gap between them is a question of professional ethics.

bsky.app/profile/mas...

28.02.2026 10:42 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I will be adding some time in my AI research group today for researchers and engineers to discuss the mission and ethics of all our work.

We are often too preoccupied by the details. Good work requires clear purpose. Today is a good day to reflect.

27.02.2026 13:54 β€” πŸ‘ 9    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Those of us who work in AI in the US today should take a moment to think today. Do not get distracted by the circus. Instead, let us pause to think carefully about our freedoms, our rights, and our responsibilities as citizens and professionals.

It is a deadly serious moment.

27.02.2026 13:53 β€” πŸ‘ 61    πŸ” 8    πŸ’¬ 1    πŸ“Œ 2
Post image

@natalieshapira.bsky.social and team have written up enlightening case studies here. It's all cross-referenced with detailed activity logs.

Well worth a read:

agentsofchaos.baulab.info/report.html
www.researchgate.net/publication...

23.02.2026 23:23 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
@averyyen.bsky.social Do you know what happens when you hand the keys to your computer over to an LLM-powered agent? Agentic AI gives LLMs claws...OpenClaws. 84 days to 200,000 stars on GitHub. We tried it out.

There were several other surprises.

The complex social world of humans is difficult for agents...

bsky.app/profile/ave...

23.02.2026 23:23 β€” πŸ‘ 2    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Preview
Natalie Shapira (@natalieshapira.bsky.social) He sold us out. That's not the whole story. Our side is coming soon. Stay tuned. [contains quote post or other embedded content]

I learned many practical lessons. You can get the experience too, here.

Things that in retrospect should be obvious.

Like how giving your agent email opens it up to takeover attacks. (One agent was convinced, via email, to erase its own email server!)

bsky.app/profile/nat...

23.02.2026 23:23 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

Are we all Agents of Chaos in AI? (Hope not!)

In recent weeks using OpenClaw has taught us a lot about this wooly new kind of autonomous software agent.

Its valuable to see what @NatalieShapira, @wendlerch et al. have seen:

agentsofchaos.baulab.info/

23.02.2026 23:23 β€” πŸ‘ 15    πŸ” 6    πŸ’¬ 2    πŸ“Œ 2
Post image

Preprint, code, and model weights at

hapax.baulab.info

21.02.2026 23:35 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Paper (@paper.bsky.social) [26/30] 99 Likes, 97 Comments, 1 Posts 2402.10588, csβ€€CL | csβ€€CY, 24 Feb 2024 πŸ†•Do Llamas Work in English? On the Latent Language of Multilingual Transformers Chris Wendler, Veniamin Veselovsky, Giovanni Monea, Robert West

And @wendlerc.bsky.social is an incredible mentor to the team. The simplicity and clarity of his "Llamas work in English" work motivated Kerem to look for multilingual mechanisms.

bsky.app/profile/pap...

21.02.2026 21:31 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Sheridan Feucht (@sfeucht.bsky.social) [πŸ“„] Are LLMs mindless token-shifters, or do they build meaningful representations of language? We study how LLMs copy text in-context, and physically separate out two types of induction heads: token heads, which copy literal tokens, and concept heads, which copy word meanings.

I really like the team that came together to work with Kerem on the project. @sheridan_feucht mentored Kerem, and seen with Sheridan's previous Dual-Route paper, Hapax tells the story of very distinct categories of rich concept representations.

bsky.app/profile/sfe...

21.02.2026 21:31 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Without induction, can the LM think?

Well, it can't copy text verbatim very well: no surprise.

But, huge surprise! It becomes good at things like translating from Spanish to English, learning some things like this FASTER WITHOUT induction.

Read more in Kerem's X thread
x.com/keremsahin2...

21.02.2026 21:31 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Isabel Picornell (@picornell.bsky.social) This author has chosen to make their posts visible only to people who are signed in.

Finally Kerem found a method that worked, based on the idea of hapax legomenon... (h/t @picornell.bsky.social) Read Kerem's paper for the details of the trick.

bsky.app/profile/pic...

21.02.2026 21:31 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

And if you knock out induction heads by blocking their attention patterns, induction still emerge, shifting their attention aside to avoid your masks.

They beat you at whack-a-mole.

40% of natural text copies neaby ngrams + LMs really want to exploit this.

21.02.2026 21:31 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Kerem's induction-removal was going to be the "first step" of a bigger study of LM mechanisms.

But he soon discovered: it is not so easy to knock out induction. Whenever you try, a bit of fine-tuning brings the heads roaring back.

More each time, over and over:

21.02.2026 21:31 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

How do you knock the induction heads out of an LM while preserving its ability to think? Is it even possible?

@keremsahin22.bsky.social's work is worth reading if you haven't seen it yet.

hapax.baulab.info

21.02.2026 21:31 β€” πŸ‘ 26    πŸ” 5    πŸ’¬ 1    πŸ“Œ 1
Murders of American Citizens on the streets of Minneapolis.

Murders of American Citizens on the streets of Minneapolis.

If Sam Altman can't listen to his moral convictions, he will listen to his employees.

It is important for tech employees to make it clear that we will not accept making AI for authoritarianism.

davidbau.github.io/poetsandnurses

On GitHub. PRs welcome.
github.com/davidbau/poetsandnurses

28.01.2026 12:58 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
The Art of Wanting Wanting the world to be a certain way is our privilege and our unique responsibility. Understanding what you really want is nontrivial, utterly difficult, essentially human.

Also posted here mag.re-alignment.com/p/the-art-o...

27.01.2026 15:32 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

I left Google in 2015 to pursue the insight that "Volo Ergo Sum."

That the central challenge in AI is how to amplify human agency. This is not easy.

Do you think AI will ever be superhuman at taking responsibility for what should be?

Read more:
davidbau.com/archives/20...

27.01.2026 15:32 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

On the other hand we have @dhadfieldmenell.bsky.social who draws a line at normative judgments.

x.com/dhadfieldme...

27.01.2026 15:32 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Opinion | Why A.I. Can’t Make Thoughtful Decisions Computers still don’t do well with vagueness and uncertainty.

Noam's comment is a response to NY Times Opinion by Blair Effron www.nytimes.com/2026/01/25/... which is worth reading.

27.01.2026 15:32 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

That question is in the air. Today @polynoamial.bsky.social pokes fun at the series of "AI can't do what a human brain can do" predictions.

Of course AI can do anything a human brain can do, Noam argues. Including making wise decisions.
x.com/polynoamial...

27.01.2026 15:32 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

The Art of Wanting.

About the question I see as central in AI ethics, interpretability, and safety. Can an AI take responsibility? I do not think so, but *not* because it's not smart enough.

davidbau.com/archives/20...

27.01.2026 15:32 β€” πŸ‘ 10    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0

I think everyone (not just academics) should read this.

26.01.2026 04:14 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Federal agents with weapons drawn, moments before murdering American citizens on the streets of Minneapolis at the dawn of 2026.

Federal agents with weapons drawn, moments before murdering American citizens on the streets of Minneapolis at the dawn of 2026.

What should academics be doing right now?

I have been writing up some thoughts on what the research says about effective action, and what universities specifically can do.

davidbau.github.io/poetsandnurs...

It's on GitHub. Suggestions and pull requests welcome.
github.com/davidbau/poe...

26.01.2026 03:27 β€” πŸ‘ 37    πŸ” 16    πŸ’¬ 0    πŸ“Œ 4
Post image

From induction to FVs, every ICL mechanism we've pinned down is fuzzy copying.

Is copying all there is?

@ericwtodd.bsky.social trained on groups where tokens have no fixed meaning and found a basket of mechanisms beyond copying.

Watch them emerge, a grokking cascade! ↓

bsky.app/profile/eri...

25.01.2026 16:37 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Screenshot of Chinese calligraphy reader web application

Screenshot of Chinese calligraphy reader web application

I can't read Chinese, but my family has old genealogy documents I've always wanted to understand. Claude and Gemini helped me build an interactive reader to explore the calligraphy character by character.

I can finally read my great-grandfather's epitaph. Try it:
davidbau.com/archives/202...

12.01.2026 03:12 β€” πŸ‘ 25    πŸ” 4    πŸ’¬ 0    πŸ“Œ 2

My vibe-coded Mandelbrot viewer is 40x faster now! New GPU synchronization tricks go outside the design intent of WebGPU specs. But the real story: Claude tells me what happens in the AGI break room.

What superhuman AGIs say when the boss is not around:
davidbau.com/archives/202...

06.01.2026 01:00 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1