Naomi Saphra's Avatar

Naomi Saphra

@nsaphra.bsky.social

Waiting on a robot body. All opinions are universal and held by both employers and family. ML/NLP professor. nsaphra.net

10,039 Followers  |  1,674 Following  |  2,798 Posts  |  Joined: 12.05.2023  |  2.2359

Latest posts by nsaphra.bsky.social on Bluesky

Excited to launch Principia, a nonprofit research organisation at the intersection of deep learning theory and AI safety.

Our goal is to develop theory for modern machine learning systems that can help us understand complex network behaviors, including those critical for AI safety and alignment.

1

16.02.2026 09:27 β€” πŸ‘ 88    πŸ” 26    πŸ’¬ 1    πŸ“Œ 0

I wonder if people are paying attention to how much their doom scrolling is cutting into time they used to use to read books

14.02.2026 14:33 β€” πŸ‘ 1068    πŸ” 154    πŸ’¬ 59    πŸ“Œ 70
Gatekeeping in Open Source: The Scott Shambaugh Story – MJ Rathbun | Scientific Coder πŸ¦€

An OpenClaw bot attempted to submit a PR for an issue explicitly left open for new contributors to try. The PR was rejected on the grounds that they are saving easy low priority issues as an onboarding exercise for human contributors.

So the bot simulated a tantrum.

13.02.2026 23:45 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

at the logits, here, iirc? maybe across layers, I don't remember

13.02.2026 23:31 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image Post image Post image

For years, I've been such a passionate devotee to TwoNN for tracking model complexity during training. When someone says they found a phase transition, show me TwoNN first.

Look from left to right below: TwoNN is perfect, empirical Fisher is too sensitive, weight norm is not sensitive enough.

13.02.2026 21:40 β€” πŸ‘ 13    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

LLMs can use similes and make allusions; they can be vivid and concrete, &c.

But they cannot spend 100 pages making you think Wickham is the charming love interest while inserting deniable clues that willβ€”only in retrospect!β€”reveal you should have known he’s a cad.

They’re not trained to mislead.+

13.02.2026 12:02 β€” πŸ‘ 48    πŸ” 7    πŸ’¬ 6    πŸ“Œ 1
Post image

We all know about the Claude spiritual bliss attractor state. But what happens when you let Grok talk to itself for a long time? Answer:

13.02.2026 04:13 β€” πŸ‘ 303    πŸ” 34    πŸ’¬ 29    πŸ“Œ 41

sad news. does gemini 3 preview stink because it stinks or because it's not fully baked? guess we will have to see

12.02.2026 23:26 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Graph leaderboard from my benchmark that measures how well LLMs can play the turn based game Fizzbuzz with standard and modified rules. Frontier, closed LLMs outperform open source LLMs by a wide margin.

Graph leaderboard from my benchmark that measures how well LLMs can play the turn based game Fizzbuzz with standard and modified rules. Frontier, closed LLMs outperform open source LLMs by a wide margin.

I read somewhere that the open-source LLMs are 'benchmaxxing': they're trained to do well on benchmarks but don't translate to general improvements. From my simple benchmark that seems true: I was surprised the only models that do decently at FizzBuzz are all the frontier, closed LLMs.

12.02.2026 22:04 β€” πŸ‘ 12    πŸ” 1    πŸ’¬ 2    πŸ“Œ 1
Post image

Our grad-level "Deep Learning" course (MIT's 6.7960) is now freely available online through OpenCourseWare: ocw.mit.edu/courses/6-79...

Lecture videos, psets, and readings are all provided.

Had a lot of fun teaching this with @sarameghanbeery.bsky.social and @jeremybernste.in!

11.02.2026 17:51 β€” πŸ‘ 118    πŸ” 38    πŸ’¬ 3    πŸ“Œ 2

As an empiricist, I think there is important empirical work you can do in my field with very limited math background, but I am personally not very interested in doing it.

12.02.2026 02:04 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

My impossible fantasy that is I don’t have anything to do one summer and I sit and work through Horn & Johnson and then all the details pop back onto the ruins I build my intuitions on, though.

12.02.2026 01:29 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

So I think of the options:
(1) I know math
(2) I used to know math
(3) I have high level geometric intuitions from edutainment videos
If I’m honest, 1+2 are adequate to engage in most quality CS and AI research, but 3 probably is not.

12.02.2026 01:24 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Some people know math deeply enough and with enough repetition that it doesn’t go away. I donβ€˜t know if anyone develops and retains the intuitions I consider necessary without EVER working through and learning advanced math, though.

12.02.2026 01:21 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I’m far from a mathematician. But at some point I took math classes and worked through proofs in papers. I’ve forgotten almost all the math I’ve ever learned, I could not reproduce those proofs, but while my mathematical skills have eroded, there’s a remaining intuitive structure I rely on.

12.02.2026 01:21 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Really excited to receive Coefficient Giving's Technical AI Safety Research Grant via Berkeley Existential Risk Initiative w/
@nsaphra.bsky.social! We aim to predict potential AI model failures before impact--before deployment, using interpretability.

11.02.2026 17:07 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

🚨New paper

Are visual tokens going into an LLM interpretable πŸ€”

Existing methods (e.g. logit lens) and assumptions would lead you to think β€œnot much”...

We propose LatentLens and show that most visual tokens are interpretable across *all* layers πŸ’‘

Details 🧡

11.02.2026 14:12 β€” πŸ‘ 21    πŸ” 3    πŸ’¬ 1    πŸ“Œ 5

Yeah this is vesting cliff behavior.

11.02.2026 14:58 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Are we talking British quite (not very) or American quite (very)?

10.02.2026 23:10 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

It's extra funny when you know the field well enough to recognize that a reference is hallucinated because the authors hate each other.

10.02.2026 20:17 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

the addiction or the protagonist?

10.02.2026 18:32 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Red Rising
Novel by Pierce Brown

Red Rising Novel by Pierce Brown

can't believe how addictive this series is despite being narrated by most obnoxious protagonist I've ever encountered

10.02.2026 18:28 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Our paper is out in @natneuro.nature.com!

www.nature.com/articles/s41...

We develop a geometric theory of how neural populations support generalization across many tasks.

@zuckermanbrain.bsky.social
@flatironinstitute.org
@kempnerinstitute.bsky.social

1/14

10.02.2026 15:56 β€” πŸ‘ 267    πŸ” 98    πŸ’¬ 7    πŸ“Œ 1
GAMES – CRIME CITY ROLLERS

they have a phenomenal roller derby league

10.02.2026 14:25 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Same task, different strategy ↔️

Why do identical neural network models develop separate internal approaches to solve the same problem?

@annhuang42.bsky.social explores the factors driving variability in task-trained networks in our latest @kempnerinstitute.bsky.social Deeper Learning blog.

09.02.2026 19:07 β€” πŸ‘ 42    πŸ” 6    πŸ’¬ 1    πŸ“Œ 0
Post image

There is no sign that Dems or Repubs have different propensities to use AI: "the β€œpolitics of AI” is not primarily driven by ideological resistance or enthusiasm for the technology, but rather by structural differences in where people work and what skills they possess." www.nber.org/papers/w34813

09.02.2026 15:33 β€” πŸ‘ 55    πŸ” 9    πŸ’¬ 2    πŸ“Œ 0

US HHS has proposed using virtual AI doctors to address needs in rural areas

09.02.2026 18:36 β€” πŸ‘ 23    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0
Enshittification: Why Everything Suddenly Got Worse and What to Do About It
Cory Doctorow

Enshittification: Why Everything Suddenly Got Worse and What to Do About It Cory Doctorow

I actually don’t love Doctorow’s writing, but he is highly correct about the inevitable outcome of current monopolies

09.02.2026 03:54 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0
The Stand
Stephen King

The Stand Stephen King

Yes it’s 1500 pages but at some point you dissociate and 500 pages later you realize it’s over

09.02.2026 03:54 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Hope Dies Last: Visionary People Across the World, Fighting to Find Us a Future
Alan Weisman

Hope Dies Last: Visionary People Across the World, Fighting to Find Us a Future Alan Weisman

There’s a really cool part about some guy who just cooks nasty sea creatures that are invasive or bottom feeders and has a Michelin star

09.02.2026 03:54 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@nsaphra is following 20 prominent accounts