@nathanlabenz - Bluesky Profile

Western AI execs often claim that "China will never slow their AI development down – and so of course we can't either!"

But is it true? Brian Tse of Concordia AI says China is more focused on practical applications & less AGI-pilled than the US

Full episode out tomorrow!

17.10.2025 20:38 — 👍 0 🔁 0 💬 0 📌 0

Huge thank you to Jaan Tallinn and the Survival & Flourishing Fund team for supporting all this work – and much more!

For a full list of their donations, see below

And note that there is still room for lots more $$ in this space – so do get involved!

x.com/sff_is_twee...

20.09.2025 12:29 — 👍 0 🔁 0 💬 0 📌 0

(Also not listed here are my modest personal investments into @GoodfireAI @aiunderwriting @elicitorg @HarmonyIntel – all of which have a for-profit approach to advancing AI safety – a great model where it works!)

20.09.2025 12:29 — 👍 0 🔁 0 💬 1 📌 0

Obviously this list isn't exhaustive – it's just what I've personally had time to research & understand well enough to endorse

And don't read much into the order – the most proven orgs are at the top, but your $$ might have more impact farther down the list 🤔

20.09.2025 12:29 — 👍 0 🔁 0 💬 1 📌 0

19) Seldon Lab @seldonai

They support early-stage AGI security startups, including Andon Labs, the makers of Vending Bench, who are researching autonomous AI organizations.

I received multiple strong endorsements for Seldon in my research.

x.com/andonlabs/s...

20.09.2025 12:29 — 👍 0 🔁 0 💬 1 📌 0

18) Zhijing Jin @ University of Toronto

Why is there no "MMLU for morality"?

Zhijing's group is doing some of the most ambitious moral reasoning benchmarking in the world today – hopefully they can fill this gap!

x.com/ZhijingJin/...

20.09.2025 12:29 — 👍 0 🔁 0 💬 1 📌 0

17) Forethought @forethought_org

If you're at all worried about the AIs taking over, it seems like you should also worry about people using AIs to take over.

This work was clarifying for me

x.com/TomDavidson...

20.09.2025 12:29 — 👍 0 🔁 0 💬 1 📌 0

16) Singapore AI Safety Hub @aisafetysg

Singapore has famously good governance, and is a natural, neutral middle ground / meeting point for US and Chinese officials and researchers.

A co-working space designed to support the Singapore Gov seems like a great investment

20.09.2025 12:29 — 👍 0 🔁 0 💬 1 📌 0

15) CivAI @civai_org

Emotionally resonant demonstrations of AI capabilities – this one provides both a window into AI psychosis & a preview of the ever-stranger AI future

Crazy that the infamous "AI in a box" experiment can now be run with actual AIs

x.com/civai_org/s...

20.09.2025 12:29 — 👍 0 🔁 0 💬 1 📌 0

14) Institute for AI Policy and Strategy @IAPSai

I'm allergic to US-vs-China framing, but everyone I talked to agreed that their work on hardware-based governance will be useful in any scenario, including those involving international coordination

x.com/peterwildef...

20.09.2025 12:29 — 👍 0 🔁 0 💬 1 📌 0

13) Timaeus @TimaeusResearch

They are developing their own interpretability paradigm, focused on how models develop throughout the training process, which I think of as "Embryology of AI"

Fascinating stuff, and starting to scale

x.com/danielmurfe...

20.09.2025 12:29 — 👍 0 🔁 0 💬 1 📌 0

12) Secure AI Project

They are working with legislators including @Scott_Wiener and @AlexBores to create State-level AI regulation that even @deanwball finds "worthy of applause" 👏

x.com/Thomas_Wood...

20.09.2025 12:29 — 👍 0 🔁 0 💬 1 📌 0

11) Flourishing Future Foundation

A "neglected approaches approach" to AI safety

re: "Self-Other Overlap", @ESYudkowsky said:

“I do not think superalignment is possible to our civilization; but if it were, it would come out of research like this"

x.com/juddrosenbl...

20.09.2025 12:29 — 👍 0 🔁 0 💬 1 📌 0

10) PIBBSS

Interdisciplinary research that brings experts in Law, Game Theory, Ecology, Philosophy & more together to study AI from novel angles

I've done at least 4 podcasts with PIBBSS folks @gabriel_weil @xuanalogue @aronvallinder @AmmannNora

x.com/pibbssai/st...

20.09.2025 12:29 — 👍 1 🔁 0 💬 1 📌 0

9) SecureDNA

I'm a freedom-loving American, but "It shouldn’t be easy to buy synthetic DNA fragments to recreate the 1918 flu virus"

Their tech is FREE for DNA synthesis companies

I always admire the unilateral provision of global public goods!

x.com/kesvelt/sta...

20.09.2025 12:29 — 👍 0 🔁 0 💬 1 📌 0

8) SecureBio @SecureBio

Remember the pandemic? That sucked...

We're doing MUCH less than we should be to prepare for the next one, but we do have a few heroes out there doing early detection wastewater monitoring 🙏

x.com/Simon__Grim...

20.09.2025 12:29 — 👍 0 🔁 0 💬 1 📌 0

7) MATS

They provide short, intensive training programs that help people transition their careers into mechanistic interpretability and other AI safety work.

Check out the mentors listed in this thread – a true who's who of top safety researchers

x.com/ryan_kidd44...

20.09.2025 12:29 — 👍 0 🔁 0 💬 1 📌 0

6) FarAI

A well-rounded AI safety org that does research, red teams defense-in-depth systems, and supports international dialogue.

Their finding that "superhuman" GO AIs are vulnerable to adversarial attacks is a classic

Currently seeking a COO 👀

x.com/farairesear...

20.09.2025 12:29 — 👍 0 🔁 0 💬 1 📌 0

5) The Center for AI Safety

Remember the AI extinction risk statement Sam A, Dario, and Demis all signed?

That was @DanHendrycks and @ai_risks

A super interesting mix of work, spanning benchmarks, model representations research, and policy leadership

x.com/ai_risks/st...

20.09.2025 12:29 — 👍 1 🔁 0 💬 1 📌 0

4) Palisade

Known for "scary demos", they specialize in showing that, under the right circumstances, today's AIs sometimes behave very badly.

Exactly how to interpret these results is contested, but at the very least I'm glad we're talking about it!

x.com/PalisadeAI/...

20.09.2025 12:29 — 👍 0 🔁 0 💬 1 📌 0

3) METR

Most famous for their work on autonomous task completion, they also study models' ability to conduct ML research, assist in creation of bioweapons, and more.

Important questions!

x.com/METR_Evals/...

20.09.2025 12:29 — 👍 0 🔁 0 💬 1 📌 0

2) Apollo Research @apolloaievals

They work with AI labs to test models BEFORE release

Most recently, they tested whether OpenAI's Deliberative Alignment strategy can eliminate "scheming" behavior. (Spoiler: not quite)

I read their work immediately

x.com/OpenAI/stat...

20.09.2025 12:29 — 👍 2 🔁 0 💬 2 📌 0

1) The AI Whistleblower Initiative

They provide expert advise & pro bono legal help to concerned insiders at AGI labs

Whistleblowers have already proven important

My experience on the GPT-4 Red Team makes me particularly passionate about this one!

twitter.com/AIWI_Offici...

20.09.2025 12:29 — 👍 0 🔁 0 💬 1 📌 0

0b) GiveWell provides another kind of baseline. They do the most rigorous charity impact analysis in the world today, and they popularized malaria as a uniquely high-leverage cause

They even red-team their own analysis with frontier reasoning LLMs 💡

x.com/GiveWell/st...

20.09.2025 12:29 — 👍 0 🔁 0 💬 1 📌 0

0a) For calibration, I compare everything to GiveDirectly. If you don't believe that an organization can do more good with your next $1K than cutting a poor Kenyan infant's risk of death by 50%, then ... just save babies

We gave to GiveDirectly first

x.com/GiveDirectl...

20.09.2025 12:29 — 👍 1 🔁 0 💬 1 📌 0

Wondering where your $$ can move the needle on AI safety?

Here are ~20 assorted organizations @amylabenz & I have supported with ~equal-sized personal donations and/or that I championed as a charity Recommender for SFF

Here's to a positive AI future! 🧵

x.com/robertwibli...

20.09.2025 12:29 — 👍 2 🔁 0 💬 1 📌 3

LLMs and World Models Evidence Emergent Realities: An Analysis of World Model Representations in Large Language Models Executive Summary The question of whether Large Language Models (LLMs) possess "world models"—internal, structured representations of the world that support generalization and reasoning—is a central and fi...

Just don't make the mistake of believing that they can't develop world models – they clearly can & do!

With that, here are ChatGPT & Gemini Deep Research reports on AI World Models

Enjoy your trip down the rabbit hole! 🐇 🕳️

chatgpt.com/share/68791...

docs.google.com/document/d/...

17.07.2025 18:39 — 👍 2 🔁 0 💬 0 📌 0

(2) In practice, it's hard to know what an LLM has memorized, when it's using random heuristics, what it's truly grokked, or what it was in the process of grokking when training stopped

This makes using & studying LLMs tricky!

Sorry, but nobody said this would be easy!

17.07.2025 18:39 — 👍 0 🔁 0 💬 1 📌 0

I could go on indefinitely here, but instead, I'll leave you with a few concluding thoughts:

(1) To steel-man the paper, I'll say this:

That a model can predict the next token in a sequence does NOT imply that it has a robust world model. That much is true.

17.07.2025 18:39 — 👍 0 🔁 0 💬 1 📌 0

Most compelling to me: "world models" from non-language problem spaces.

Here's an example where mechanistic interpretability analysis of a Protein Language Model helped biologists identify a new protein motif

x.com/james_y_zou...

17.07.2025 18:39 — 👍 0 🔁 0 💬 1 📌 0

Latest posts by nathanlabenz.bsky.social on Bluesky

@nathanlabenz is following 1 prominent accounts