Chris Olah's Avatar

Chris Olah

@colah.bsky.social

Reverse engineering neural networks at Anthropic. Previously Distill, OpenAI, Google Brain.Personal account.

6,437 Followers  |  9 Following  |  41 Posts  |  Joined: 10.07.2023  |  2.1178

Latest posts by colah.bsky.social on Bluesky

Political violence is bad. It usually begets more political violence.

Celebrating political violence is bad. It usually encourages more political violence, against various targets.

Campus shootings are bad. They make everyone on campus less safe.

It's bad that what I wrote here is controversial.

10.09.2025 19:06 β€” πŸ‘ 9176    πŸ” 1758    πŸ’¬ 514    πŸ“Œ 137

The interpretability team will be mentoring more fellows this cycle, so if you're interested in interpretability, it might be worth applying!

Some of our fellows last cycle did this: arxiv.org/pdf/2507.21509

12.08.2025 18:54 β€” πŸ‘ 10    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Anthropic AI Safety Fellow, US Remote-Friendly (Travel Required) | San Francisco, CA

Applications for Anthropic AI Safety Fellows are due Aug 17!

US: job-boards.greenhouse.io/anthropic/jo...
UK: job-boards.greenhouse.io/anthropic/jo...
CA: job-boards.greenhouse.io/anthropic/jo...

It's a great opportunity to get mentorship and funding to work on safety for ~2 months.

12.08.2025 18:54 β€” πŸ‘ 24    πŸ” 5    πŸ’¬ 1    πŸ“Œ 0

But more importantly, I hope it will just help clarify what we mean by interference weights!

29.07.2025 23:33 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Our new note demonstrates that interference weights in toy models can demonstrate strikingly similar phenomenology to that of Towards Monosemanticity...

29.07.2025 23:33 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Circuit Tracing: Revealing Computational Graphs in Language Models We describe an approach to tracing the β€œstep-by-step” computation involved when a model responds to a single prompt.

They've been an ongoing challenge in our work for a long time. In fact, our recent work on attribution graphs (transformer-circuits.pub/2025/attribu...) was partly designed as a method to side step them as a challenge!

29.07.2025 23:33 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

The keen reader may recall all these plots referencing "interference weights??" in Towards Monosemanticity (transformer-circuits.pub/2023/monosem...).

29.07.2025 23:33 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
A Toy Model of Interference Weights

I've been talking about interference weights as a challenge for mechanistic interpretability for a while.

A short note discussing them - transformer-circuits.pub/2025/interfe...

29.07.2025 23:33 β€” πŸ‘ 28    πŸ” 2    πŸ’¬ 2    πŸ“Œ 0
Analogies between Biology and Deep Learning [rough note] A list of advantages that make understanding artificial nerural networks much easier than biological ones.

I should also mention that I wrote a blog post listing a bunch of specific analogies between deep learning and biology several years back. (It's probably of much narrower interest!)

colah.github.io/notes/bio-an...

13.05.2025 19:34 β€” πŸ‘ 11    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Of course, I'd be remiss to not mention that many others have made analogies between work in machine learning and biology -- most notable for us is the "bertology" work, which framed it self as studying the biology of the BERT models.

13.05.2025 19:34 β€” πŸ‘ 7    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

But we also think it's important for such "biology" results (which are more foreign in style to machine learning) to be treated as worthy of publication independent of methods work (which looks more similar to normal machine learning).

13.05.2025 19:34 β€” πŸ‘ 7    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

This was partly a convenient way to handle the length (jointly, the two papers are ~150 pages!).

13.05.2025 19:34 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

But why did the language come up in our paper title? There was actually a further reason, which is that we wanted to separate our "methods" work and what we called our "biology" work (i.e. the empirical research we did using our method).

13.05.2025 19:34 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Finally, you need to believe that a worthy mode of investigation is empirical (rather than theoretical), and a style of empirical research that's more open to the qualitative than purely quantitative.

This evokes biology more than physics.

13.05.2025 19:34 β€” πŸ‘ 11    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

One further needs to believe that individual neural networks, and in fact sub-components of those networks, warrant investigation. That's more idiosyncratic!

13.05.2025 19:34 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

At a basic level, one needs to believe deep learning warrants scientific investigation. This doesn't seem very controversial these days, but note that it's already kind of radical. See eg. Herbert Simon's The Sciences of the Artificial.

13.05.2025 19:34 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I've written multiple papers characterizing (small sets of) individual neurons. Historically, this hasn't seemed like a worthy topic of a paper in ML – I've had to justify it!

13.05.2025 19:34 β€” πŸ‘ 9    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

One way in which this is important is that the *types of questions* we're interested in are quite bizarre from a traditional machine learning perspective, but natural under the biological frame.

13.05.2025 19:34 β€” πŸ‘ 7    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I think there's a deep way in which the scientific aesthetic of biology is very relevant to deep learning and especially interpretability.

Biology is to evolution as interpretability is to gradient descent.

bsky.app/profile/cola...

13.05.2025 19:34 β€” πŸ‘ 14    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Stepping back, "physics of neural networks" is a whole area of research. Of course, it isn't physics in a classical sense. It's bringing the methods and style of physics to deep learning.

We refer to the "biology" of neural networks in a similar spirit!

13.05.2025 19:34 β€” πŸ‘ 9    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1

My colleagues and I have actually been using "biology" quite heavily as a metaphor and handle for several years now, beyond the title of this paper. There are a lot of reasons I think it's useful!

13.05.2025 19:34 β€” πŸ‘ 10    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

A number of people have asked me why we titled our recent paper "On the Biology of a Large Language Model".

Why call it "biology"?

13.05.2025 19:34 β€” πŸ‘ 26    πŸ” 6    πŸ’¬ 2    πŸ“Œ 1
Preview
Chris Olah on X: "The elegance of ML is the elegance of biology, not the elegance of math or physics. Simple gradient descent creates mind-boggling structure and behavior, just as evolution creates the awe inspiring complexity of nature." / X The elegance of ML is the elegance of biology, not the elegance of math or physics. Simple gradient descent creates mind-boggling structure and behavior, just as evolution creates the awe inspiring complexity of nature.

(This is a cross-post of one of my favorite old twitter threads: x.com/ch402/status... )

13.05.2025 19:32 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Every model is its own entire world of beautiful structure waiting to be discovered, if only we care to look.

13.05.2025 19:32 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1

I wish people would spend more time looking at the models we create though. It's like we're launching expeditions with complex equipment to reach more and more remote islands and tall mountains... and the biology stops at measuring the size and weight of the animals we find.

13.05.2025 19:32 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Thread: Circuits What can we learn if we invest heavily in reverse engineering a single neural network?

This aesthetic most obviously applies to interpretability (and explicitly animates distill.pub/2020/circuits/ and transformer-circuits.pub). But I think it applies to deep learning more broadly.

Training larger models is an expedition to a remote island to see the organisms there.

13.05.2025 19:32 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

People often complain that modern ML is throwing GPUs at problems without new research ideas. This is like finding evolution ugly because it's just a simple algorithm run for a very long time.

13.05.2025 19:32 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

My tools are those of a mathematician, but my aesthetic is that of an early natural scientist.

13.05.2025 19:32 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Naturally Occurring Equivariance in Neural Networks Neural networks naturally learn many transformed copies of the same feature, connected by symmetric weights.

For example, I was very excited about using group theory to encode symmetry in neural networks. But it turns out gradient descent can *discover* group convolutions (distill.pub/2020/circuit...). I think that's actually way more beautiful!

13.05.2025 19:32 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

I used to really want ML to be about complex math and clever proofs. But I've gradually come to think this is really the wrong aesthetic to bring.

13.05.2025 19:32 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@colah is following 9 prominent accounts