Ryan Greenblatt's Avatar

Ryan Greenblatt

@ryangreenblatt.bsky.social

Chief scientist at Redwood Research (https://www.redwoodresearch.org/), focused on technical AI safety research to reduce risks from rogue AIs

19 Followers  |  1 Following  |  18 Posts  |  Joined: 07.10.2025
Posts Following

Posts by Ryan Greenblatt (@ryangreenblatt.bsky.social)

Anthropic hasn't made clear intermediate predictions, so I make up a proposed timeline with powerful AI in March 2027 that Anthropic might endorse. Then we can see which predictions are closer to correct.

03.11.2025 17:25 — 👍 0    🔁 0    💬 0    📌 0

Earlier predictions (before powerful AI) help (partially) adjudicate who was right and allow for updating before it's too late.

Sometimes this isn't possible (predictions roughly agree until too late), but my predictions aren't consistent with powerful AI by early 2027!

03.11.2025 17:25 — 👍 0    🔁 0    💬 1    📌 0
Preview
What's up with Anthropic predicting AGI by early 2027? — LessWrong As far as I'm aware, Anthropic is the only AI company with official AGI timelines[1]: they expect AGI by early 2027. In their recommendations (from M…

Anthropic has (relatively) official AGI timelines: powerful AI by early 2027. I think this prediction is unlikely to come true and I explain why in a new post.

I also give a proposed timeline with powerful AI in early 2027 so we can (hopefully) update before it is too late.

03.11.2025 17:25 — 👍 0    🔁 0    💬 1    📌 0
Preview
Is 90% of code at Anthropic being written by AIs? I'm skeptical that Dario's prediction of AIs writing 90% of code in 3-6 months has come true

Dario has recently been claiming that his prediction of AIs writing 90% of code in 3-6 months has come true.

I'm skeptical, though I agree that AIs are writing a high fraction of code at Anthropic.

22.10.2025 17:20 — 👍 0    🔁 0    💬 0    📌 0

Studying actual scheming AIs might be difficult (as they don't want to be studied!). Can we instead just study AIs trained to exhibit misaligned/scheming behavior?

I discuss how promising this is and how we might do this in a new post: www.lesswrong.com/posts/v6K3hn...

16.10.2025 17:06 — 👍 0    🔁 0    💬 0    📌 0

I include more details in the footnotes here: www.lesswrong.com/posts/FG54eu...

10.10.2025 16:31 — 👍 0    🔁 0    💬 0    📌 0

If companies are strongly opposed because this would leak too much IP, they could instead disclose this information to some number of non-conflicted third parties (e.g. METR) who could then publicly share their conclusions. 7/

10.10.2025 16:31 — 👍 0    🔁 0    💬 1    📌 0

Given that the situation is complicated, ideally AI companies would publicly disclose enough information that other actors can come to their own conclusion about how worried they should be about CoT obfuscation. 6/

10.10.2025 16:31 — 👍 0    🔁 0    💬 1    📌 0

The situation is also just generally complicated because things like "what SFT/prompting initialization do you use for your CoT" could have big effects on obfuscation and aren't exactly well described as applying optimization pressure to CoT. 5/

10.10.2025 16:31 — 👍 0    🔁 0    💬 1    📌 0

It's worth noting that some versions of training against CoT might be totally fine (as in, they don't incentivize obfuscation of misaligned reasoning) and that some particular approach for training against CoT could be worth it even if it does cause problems. 4/

10.10.2025 16:31 — 👍 0    🔁 0    💬 1    📌 0

I think we should try to create an incentive gradient that pushes AI companies to disclose information even if that information makes them look bad, so we should make companies feel some heat for not disclosing important information like whether they are training against CoT. 3/

10.10.2025 16:31 — 👍 0    🔁 0    💬 1    📌 0

It's striking that Anthropic says nothing given their system card is very thorough and includes a section on "Reasoning faithfulness" (kudos to them for providing so much other info!). Naively, this is evidence they are training against CoT and didn't want to disclose this. 2/

10.10.2025 16:31 — 👍 0    🔁 0    💬 1    📌 0

Anthropic, GDM, and xAI say nothing about whether they train against Chain-of-Thought (CoT) while OpenAI claims they don't.

AI companies should be transparent about whether (and how) they train against CoT. While OpenAI is doing better, all AI companies should say more. 1/

10.10.2025 16:31 — 👍 1    🔁 0    💬 1    📌 0
Preview
Iterated Development and Study of Schemers (IDSS) A strategy for handling scheming

Studying coherent scheming in AIs seems tricky, but there might be a feedback loop where we make schemers to study by iterating against detection methods and then improve detectors using these schemers. Iteration could start on weak AIs and transfer to stronger AIs.

10.10.2025 16:11 — 👍 0    🔁 0    💬 0    📌 0

Side question: what about the "shut it all down" plan proposed by (e.g.) MIRI?

I think this probably requires substantially more political will than Plan A and seems worse than a well-implemented version of Plan A. I say more here: www.lesswrong.com/posts/E8n93n...

08.10.2025 18:46 — 👍 1    🔁 0    💬 0    📌 0
Preview
Plans A, B, C, and D for misalignment risk Different plans for different levels of political will

What we should do to mitigate misalignment risk depends a lot on the level of buy-in/political will (from AI companies, US government, and China).

I've found it helpful to separate this out into Plan A/B/C/D and to plan for these situations somewhat separately.

I say more in a new post:

08.10.2025 18:45 — 👍 2    🔁 0    💬 1    📌 0
Preview
My AGI timeline updates from GPT-5 (and 2025 so far) AGI before 2029 now seems substantially less likely

I now think very short AGI timelines are less likely. I updated due to GPT-5 being slightly below trend and not seeing fast progress in 2025.

At the start of 2025, I thought full automation of AI R&D before 2029 was ~25% likely, now I think it's only ~15% likely.

07.10.2025 19:31 — 👍 0    🔁 0    💬 0    📌 0

I'm now on bluesky. You can see my X/twitter account here: x.com/RyanPGreenbl.... I post about AGI timelines, takeoff, and misalignment risk.

My bluesky account will cross post my posts from X/twitter starting with some of my historical posts that people might be interested in.

07.10.2025 19:22 — 👍 2    🔁 0    💬 0    📌 0