Instead of an analogy, some specifics. Here is what we see in a small study after giving AI just a bit of email access and taking it off-leash:
agentsofchaos.baulab.info
Instead of an analogy, some specifics. Here is what we see in a small study after giving AI just a bit of email access and taking it off-leash:
agentsofchaos.baulab.info
Dog trainers agree: this dog has a serious biting problem, not ready in their professional judgment to be off-leash.
Is it ethical to sell to an owner who promises to take "human responsibility" for unleashing it?
Sam requires "human responsibility for the use of ... autonomous weapon systems."
Dario says "we do not believe [current AI is] reliable enough to be used in fully autonomous weapons."
bsky.app/profile/axi...
Sam Altman and Dario Amodei have both staked out positions on AI weapons.
But you can see from what they've said: the gap between them is a question of professional ethics.
bsky.app/profile/mas...
I will be adding some time in my AI research group today for researchers and engineers to discuss the mission and ethics of all our work.
We are often too preoccupied by the details. Good work requires clear purpose. Today is a good day to reflect.
Those of us who work in AI in the US today should take a moment to think today. Do not get distracted by the circus. Instead, let us pause to think carefully about our freedoms, our rights, and our responsibilities as citizens and professionals.
It is a deadly serious moment.
@natalieshapira.bsky.social and team have written up enlightening case studies here. It's all cross-referenced with detailed activity logs.
Well worth a read:
agentsofchaos.baulab.info/report.html
www.researchgate.net/publication...
There were several other surprises.
The complex social world of humans is difficult for agents...
bsky.app/profile/ave...
I learned many practical lessons. You can get the experience too, here.
Things that in retrospect should be obvious.
Like how giving your agent email opens it up to takeover attacks. (One agent was convinced, via email, to erase its own email server!)
bsky.app/profile/nat...
Are we all Agents of Chaos in AI? (Hope not!)
In recent weeks using OpenClaw has taught us a lot about this wooly new kind of autonomous software agent.
Its valuable to see what @NatalieShapira, @wendlerch et al. have seen:
agentsofchaos.baulab.info/
Preprint, code, and model weights at
hapax.baulab.info
And @wendlerc.bsky.social is an incredible mentor to the team. The simplicity and clarity of his "Llamas work in English" work motivated Kerem to look for multilingual mechanisms.
bsky.app/profile/pap...
I really like the team that came together to work with Kerem on the project. @sheridan_feucht mentored Kerem, and seen with Sheridan's previous Dual-Route paper, Hapax tells the story of very distinct categories of rich concept representations.
bsky.app/profile/sfe...
Without induction, can the LM think?
Well, it can't copy text verbatim very well: no surprise.
But, huge surprise! It becomes good at things like translating from Spanish to English, learning some things like this FASTER WITHOUT induction.
Read more in Kerem's X thread
x.com/keremsahin2...
Finally Kerem found a method that worked, based on the idea of hapax legomenon... (h/t @picornell.bsky.social) Read Kerem's paper for the details of the trick.
bsky.app/profile/pic...
And if you knock out induction heads by blocking their attention patterns, induction still emerge, shifting their attention aside to avoid your masks.
They beat you at whack-a-mole.
40% of natural text copies neaby ngrams + LMs really want to exploit this.
Kerem's induction-removal was going to be the "first step" of a bigger study of LM mechanisms.
But he soon discovered: it is not so easy to knock out induction. Whenever you try, a bit of fine-tuning brings the heads roaring back.
More each time, over and over:
How do you knock the induction heads out of an LM while preserving its ability to think? Is it even possible?
@keremsahin22.bsky.social's work is worth reading if you haven't seen it yet.
hapax.baulab.info
Murders of American Citizens on the streets of Minneapolis.
If Sam Altman can't listen to his moral convictions, he will listen to his employees.
It is important for tech employees to make it clear that we will not accept making AI for authoritarianism.
davidbau.github.io/poetsandnurses
On GitHub. PRs welcome.
github.com/davidbau/poetsandnurses
Also posted here mag.re-alignment.com/p/the-art-o...
27.01.2026 15:32 β π 0 π 0 π¬ 0 π 0
I left Google in 2015 to pursue the insight that "Volo Ergo Sum."
That the central challenge in AI is how to amplify human agency. This is not easy.
Do you think AI will ever be superhuman at taking responsibility for what should be?
Read more:
davidbau.com/archives/20...
On the other hand we have @dhadfieldmenell.bsky.social who draws a line at normative judgments.
x.com/dhadfieldme...
Noam's comment is a response to NY Times Opinion by Blair Effron www.nytimes.com/2026/01/25/... which is worth reading.
27.01.2026 15:32 β π 0 π 0 π¬ 1 π 0
That question is in the air. Today @polynoamial.bsky.social pokes fun at the series of "AI can't do what a human brain can do" predictions.
Of course AI can do anything a human brain can do, Noam argues. Including making wise decisions.
x.com/polynoamial...
The Art of Wanting.
About the question I see as central in AI ethics, interpretability, and safety. Can an AI take responsibility? I do not think so, but *not* because it's not smart enough.
davidbau.com/archives/20...
I think everyone (not just academics) should read this.
26.01.2026 04:14 β π 6 π 1 π¬ 0 π 0Federal agents with weapons drawn, moments before murdering American citizens on the streets of Minneapolis at the dawn of 2026.
What should academics be doing right now?
I have been writing up some thoughts on what the research says about effective action, and what universities specifically can do.
davidbau.github.io/poetsandnurs...
It's on GitHub. Suggestions and pull requests welcome.
github.com/davidbau/poe...
From induction to FVs, every ICL mechanism we've pinned down is fuzzy copying.
Is copying all there is?
@ericwtodd.bsky.social trained on groups where tokens have no fixed meaning and found a basket of mechanisms beyond copying.
Watch them emerge, a grokking cascade! β
bsky.app/profile/eri...
Screenshot of Chinese calligraphy reader web application
I can't read Chinese, but my family has old genealogy documents I've always wanted to understand. Claude and Gemini helped me build an interactive reader to explore the calligraphy character by character.
I can finally read my great-grandfather's epitaph. Try it:
davidbau.com/archives/202...
My vibe-coded Mandelbrot viewer is 40x faster now! New GPU synchronization tricks go outside the design intent of WebGPU specs. But the real story: Claude tells me what happens in the AGI break room.
What superhuman AGIs say when the boss is not around:
davidbau.com/archives/202...