@neelnanda - Bluesky Profile

Mech Interp Workshop Reviewer Expression of Interest (NeurIPS 2025) Thanks for being interested in helping review for our workshop! Reviewers help make this whole thing possible. Reviews must be performed between Aug 25 and Sept 8

We are also in need of a lot of reviewers! All volunteers appreciated ❤️ Express interest: forms.gle/4euQVPdFEty...

See more info on our call for papers: mechinterpworkshop.com/cfp

And learn more about the workshop: mechinterpworkshop.com

13.07.2025 13:00 — 👍 0 🔁 0 💬 0 📌 0

All approaches encouraged, if they can give a convincing case that they further the field. Open source work, new methods, negative results, combining white + black box methods, from probing to circuit analysis, from real-world tasks to rigorous case studies - only quality matters

13.07.2025 13:00 — 👍 0 🔁 0 💬 1 📌 0

The call for papers for the NeurIPS Mechanistic Interpretability Workshop is open!

Max 4 or 9 pages, due 22 Aug, NeurIPS submissions welcome

We welcome any works that further our ability to use the internals of a model to better understand it

Details: mechinterpworkshop com

13.07.2025 13:00 — 👍 3 🔁 0 💬 1 📌 0

I'm curious how much other people encounter this kind of thing

01.06.2025 18:48 — 👍 4 🔁 0 💬 1 📌 0

I've been really feeling how much the general public is concerned about AI risk...

In a *weird* amount of recent interactions with normal people (eg my hairdresser) when I say I do AI research (*not* safety), they ask if AI will take over

Alas, I have no reassurances to offer

01.06.2025 18:48 — 👍 15 🔁 0 💬 3 📌 0

The mindset of Socratic Persuasion is shockingly versatile. I use it on a near daily basis in my personal and professional life: conflict resolution, helping prioritize, correcting misconceptions, gently giving negative feedback. My post has 8 case studies, to give you a sense:

26.05.2025 18:37 — 👍 3 🔁 0 💬 0 📌 0

Isn't this all obvious?

Maybe! Asking questions rather than lecturing people is hardly a novel insight

But people often assume the Qs must be neutral and open-ended. It can be very useful to be opinionated! But you need error correction mechanisms for when you're wrong.

26.05.2025 18:37 — 👍 2 🔁 0 💬 1 📌 0

Crucially, the goal is to give the right advice, not to *be* right. Asking Qs is far better received when you are genuinely listening to the answers, and open to changing your mind. No matter how much I know about a topic, they know more about their life, and I'm often wrong.

26.05.2025 18:37 — 👍 1 🔁 0 💬 1 📌 0

Socratic persuasion is more effective if I'm right: they feel more like they generated the argument themselves, and are less defensive.

Done right, I think its more collaborative - combining my perspective and their context to find the best advice. Better for both parties!

26.05.2025 18:37 — 👍 1 🔁 0 💬 1 📌 0

I can integrate the new info and pivot if needed, without embarrassment, and together we converge on the right advice. It's far more robust - since the other person is actively answering questions, disagreements surface fast

The post: www.neelnanda.io/blog/51-soc...

More thoughts in 🧵

26.05.2025 18:37 — 👍 1 🔁 0 💬 1 📌 0

Blog post: I often give advice, to mentees, friends, etc. This is hard! I'm often missing context

My favourite approach is Socratic persuasion: guiding them through my case via questions. If I'm wrong there's soon a surprising answer!

I can be opinionated *and* truth seeking

26.05.2025 18:36 — 👍 10 🔁 1 💬 1 📌 0

Note: This is optimised for “jump through the arbitrary hoops of conference publishing, to maximise your chance of getting in with minimal sacrifice to your scientific integrity”. I hope to have another post out soon on how to write *good* papers.

11.05.2025 22:47 — 👍 1 🔁 0 💬 0 📌 0

The checklist is pretty long - this is deliberate, as there's lots of moving parts! I tried to cover literally everything I could think of that should be done when submitting. I tried to italicise everything that should be fast or skippable, and obviously use your own judgement.

11.05.2025 22:47 — 👍 0 🔁 0 💬 1 📌 0

[Public] My Checklist for Writing ML Conference Papers There are a lot of moving pieces when writing a machine learning conference paper. It's easy to forget things, especially when you're new, and lots of best practices/nuances no one writes up. Here's a checklist I wrote for my mentees of everything I think you need to do to go from finishing the t...

See the checklist here:
docs.google.com/document/d/...

And check out my research sequence for more takes on the broader research process:
x.com/NeelNanda5/...

11.05.2025 22:47 — 👍 0 🔁 0 💬 1 📌 0

There are many moving pieces when turning a project into a machine learning conference paper, and best practices/nuances no one writes up. I made a comprehensive paper writing checklist for my mentees and am sharing a public version below - hopefully it's useful, esp for NeurIPS!

11.05.2025 22:47 — 👍 5 🔁 0 💬 1 📌 0

My Research Process: Understanding and Cultivating Research Taste — AI Alignment Forum This is post 3 of a sequence on my framework for doing and thinking about research. Start here. Thanks to my co-author Gemini 2.5 Pro …

Check it out here - post 4 on how to choose your research problems coming out soon!
www.alignmentforum.org/posts/Ldrss...

02.05.2025 13:00 — 👍 3 🔁 0 💬 0 📌 0

I think of (intuitive) taste like a neural network - decisions are data points, outcomes are labels. You'll learn organically, but can speed up!

Supervision: Papers/Ask a mentor
Sample efficiency: Reflect on *why* you were wrong
Episode length: Taste for short tasks comes faster

02.05.2025 13:00 — 👍 2 🔁 0 💬 1 📌 0

Post 3: What is research taste?

This mystical notion separates new and experienced researchers. It's real and important. But what is it and how to learn it?

I break down taste as the mix of intuition/models behind good open-ended decisions and share tricks to speed up learning
x.com/NeelNanda5/...

02.05.2025 13:00 — 👍 4 🔁 0 💬 1 📌 0

Great work by @NunoSempere and co. Check out yesterday's here:
blog.sentinel-team.org/p/global-ri...

29.04.2025 13:00 — 👍 2 🔁 0 💬 0 📌 0

I'm very impressed with the Sentinel newsletter: by far the best aggregator of global news I've found

Expert forecasters filter for the events that actually matter (not just noise), and forecast how likely this is to affect eg war, pandemics, frontier AI etc

Highly recommended!

29.04.2025 13:00 — 👍 7 🔁 2 💬 1 📌 0

As a striking example of how effective this is, Matryoshka SAEs fairly reliably get better on most metrics as you make them wider, as neural networks should. Normal sparse autoencoders do not.

02.04.2025 13:01 — 👍 0 🔁 0 💬 0 📌 0

This slightly worsens reconstruction (it's basically regurisation), but improves performance substantially on some downstream tasks and measurements of sparsity issues!

02.04.2025 13:01 — 👍 0 🔁 0 💬 1 📌 0

With a small tweak to the loss, we can simultaneously train SAEs of several different sizes that all work together to reconstruct things. Small ones learn high-level features, while wide ones learn low-level features!

02.04.2025 13:01 — 👍 0 🔁 0 💬 1 📌 0

In this specific case, the work focused on how sparsity is an imperfect proxy to optimize if we actually want interpretability. In particular, wide SAEs break apart high-level concepts into narrower ones via absorption, composition, and splitting.

02.04.2025 13:01 — 👍 0 🔁 0 💬 1 📌 0

This was great to supervise - the kind of basic science of SAEs work that's most promising IMO! Find a fundamental issue with SAEs, fixing it with an adjustment (here a different loss), and rigorously measuring how much it's fixed. I recommend using Matryoshka where possible.
x.com/BartBussman...

02.04.2025 13:00 — 👍 0 🔁 0 💬 1 📌 0

Post 50: Good Research Takes are Not Sufficient for Good Strategic Takes — Neel Nanda Having a good research track record is some evidence of good big-picture takes about AGI, but it's weak evidence. Strategic thinking is hard, and requires different skills. But people often conflate these skills, leading to excessive deference to researchers in the field, without evidence that that

Read it here (and thanks to my girlfriend for poking me to write the first post on my main blog in several years...)
www.neelnanda.io/blog/50-str...

22.03.2025 12:00 — 👍 1 🔁 0 💬 0 📌 0

New post: A weird phenomenon: because I have evidence of doing good interpretability research, people assume I have good big picture takes about how it matters for AGI X-risk. I try, but these are different skillsets! Before deferring, check for *relevant* evidence of skill.

22.03.2025 12:00 — 👍 9 🔁 0 💬 1 📌 0

More details on applied interp soon!

More broadly, the AGI Safety team is keen to get applications from both strong ML engineers and strong ML researchers.

Please apply!

18.02.2025 14:00 — 👍 1 🔁 0 💬 0 📌 0

DeepMind AGI Safety is hiring! I think we're among the best places in the world to do technical work to make AGI safer. I'm keen to meet some great future colleagues! Due Feb 28

One role is applied interpretability: a new subteam of my team using interp for safety in production
x.com/rohinmshah/...

18.02.2025 14:00 — 👍 2 🔁 0 💬 1 📌 0

Obviously, small SAEs can't capture everything. But making it larger isn't enough, as sparsity incentivises high-level concepts to be absorbed into new ones.

But not all hope is lost - our forthcoming paper on Matryoshka SAEs seems to substantially improve these issues!

11.02.2025 18:22 — 👍 3 🔁 1 💬 0 📌 0

Latest posts by neelnanda.bsky.social on Bluesky

@neelnanda is following 20 prominent accounts