Toby Ord's Avatar

Toby Ord

@tobyord.bsky.social

Senior Researcher at Oxford University. Author — The Precipice: Existential Risk and the Future of Humanity. tobyord.com

917 Followers  |  36 Following  |  211 Posts  |  Joined: 20.11.2024  |  2.4287

Latest posts by tobyord.bsky.social on Bluesky

Today is Giving Tuesday, and you can 100x the impact of your donations by finding the most effective charities.

This year, needs across global health, animal welfare, and catastrophic risk are rising while some major funders step back

02.12.2025 14:42 — 👍 4    🔁 2    💬 1    📌 0
The abstract of the consistency training paper.

The abstract of the consistency training paper.

New Google DeepMind paper: "Consistency Training Helps Stop Sycophancy and Jailbreaks" by @alexirpan.bsky.social, me, Mark Kurzeja, David Elson, and Rohin Shah. (thread)

04.11.2025 00:18 — 👍 18    🔁 5    💬 1    📌 1
Post image

Frontier AI could reach or surpass human level within just a few years. This could help solve global issues, but also carries major risks. To move forward safely, we must develop robust technical guardrails and make sure the public has a much stronger say. superintelligence-statement.org

22.10.2025 16:24 — 👍 16    🔁 3    💬 0    📌 2
Preview
When it Comes to AI, What We Don't Know Can Hurt Us Yoshua Bengio and Charlotte Stix explain how companies' internal, often private, AI development is a threat to society.

In an op-ed published today in TIME, Charlotte Stix and I discuss the serious risks associated with internal deployment by frontier AI companies.
We argue that maintaining transparency and effective public oversight are essential to safely manage the trajectory of AI.
time.com/7327327/ai-w...

22.10.2025 20:06 — 👍 12    🔁 2    💬 1    📌 0

What ideas are already out there, just waiting on someone to really feel their power and bring them down from the ivory tower?

13.10.2025 17:11 — 👍 7    🔁 0    💬 1    📌 0

During questions someone asked what we can learn about how to write an influential paper. Equally important is what we can learn about reading such a paper. So many philosophers had read it in the intervening generation, but none had taken it seriously.

13.10.2025 17:05 — 👍 6    🔁 0    💬 1    📌 0

It made me realise for the first time that I was essential in making it so — that one Australian in Oxford in 1971 had thrown the ball far far down the field, to be received by another Australian in Oxford in 2004.

13.10.2025 17:01 — 👍 4    🔁 0    💬 1    📌 0
Preview
Death in a Shallow Pond From the bestselling coauthor of Wittgenstein’s Poker, a fascinating account of Peter Singer’s controversial “drowning child” thought experiment—and how it changed the way people think about charitabl...

The other evening I attended the launch of David Edmonds' book on Peter Singer's Shallow Pond. I was quite struck when he called it 'the most influential thought experiment in the history of moral philosophy' yet with no influence for its first 30 years…
🧵
press.princeton.edu/books/hardco...

13.10.2025 16:55 — 👍 10    🔁 1    💬 1    📌 0
Post image

We’re hiring!

Society isn’t prepared for a world with superhuman AI. If you want to help, consider applying to one of our research roles:
forethought.org/careers/res...

Not sure if you’re a good fit? See more in the reply (or just apply — it doesn’t take long)

13.10.2025 08:14 — 👍 6    🔁 3    💬 1    📌 0
Preview
Evidence that Recent AI Gains are Mostly from Inference-Scaling — Toby Ord In the last year or two, the most important trend in modern AI came to an end. The scaling-up of computational resources used to train ever-larger AI models through next-token prediction ( pre-trainin...

If you want to go a little deeper, see my full post:
www.tobyord.com/writing/most...
14/14

03.10.2025 19:40 — 👍 0    🔁 0    💬 0    📌 0
Preview
Inference Scaling Reshapes AI Governance — Toby Ord The shift from scaling up the pre-training compute of AI systems to scaling up their inference compute may have profound effects on AI governance. The nature of these effects depends crucially on whet...

So it looks like most of the gains are coming from the ability to spend more compute on each answer rather than from better ability to reason for the same token budget.
This shift to inference-scaling has big implications for AI business, governance, and risk:
www.tobyord.com/writing/infe...
13/

03.10.2025 19:39 — 👍 0    🔁 0    💬 1    📌 0
Post image Post image

And here are the relative boosts.
Overall the inference scaling produced 82%, 63%, and 92% of the total performance gains on the different benchmarks.
12/

03.10.2025 19:38 — 👍 2    🔁 0    💬 1    📌 0
Post image Post image

As you can see, most of the boost is coming from the inference-scaling that the RL training has enabled.
The same is true for the other benchmarks I examined. Here are the raw scatterplots:
11/

03.10.2025 19:37 — 👍 2    🔁 0    💬 1    📌 0
Post image

We can draw the trend on the chart, then divide the performance boost in two:
• the RL boost taking the base model to the trend line
• the inference-scaling boost taking it to the top of the trend
10/

03.10.2025 19:37 — 👍 2    🔁 0    💬 1    📌 0

Note how there is a clear trend line for the reasoning models, showing how their performance scales with more inference. The base model is slightly below this trend.
9/

03.10.2025 19:36 — 👍 1    🔁 0    💬 1    📌 0
Post image

I worked out a nice clean way to separate this out. Here is data from the MATH level 5 benchmark, showing performance vs token-use for a base model (Sonnet 3.6 – orange square) and its reasoning model (Sonnet 3.7 – red circles).
8/

03.10.2025 19:35 — 👍 2    🔁 0    💬 1    📌 0

But it turns out that even when reasoning is turned off, these models are using many more tokens to generate their answers, so even this boost is partly just from RL and partly from the inference-scaling.
7/

03.10.2025 19:35 — 👍 1    🔁 0    💬 1    📌 0

Often people assume it is mostly about the training. One piece of evidence for this is that even without reasoning turned on, a reasoning model seems to perform substantially better than its base model (i.e. a model that differs only in not having the RL training)
6/

03.10.2025 19:34 — 👍 1    🔁 0    💬 1    📌 0

But it is hard to tease out how much of the benefits of RL are coming directly from the training (1) and how much are coming from using far more tokens to run it (2).
5/

03.10.2025 19:33 — 👍 1    🔁 0    💬 1    📌 0

But (2) is less rosy.
For the largest AI companies, most costs come from deploying models to customers. If you need to 10x or 100x those costs, that is very expensive. And unlike training, it can't be made up in volume.
4/

03.10.2025 19:33 — 👍 1    🔁 0    💬 1    📌 0

Many people focus on (1).
This is the bull case for RL scaling — it started off small compared to internet-scale pre-training, so can be scaled 10x or 100x before doubling overall training compute.
3/

03.10.2025 19:32 — 👍 1    🔁 0    💬 1    📌 0

Scaling up AI using next-token prediction was the most important trend in modern AI. It stalled out over the last couple of years and has been replaced by RL scaling.

This has two parts:
1. Scaling RL training
2. Scaling inference compute at deployment

2/

03.10.2025 19:31 — 👍 1    🔁 0    💬 1    📌 0
Preview
Evidence that Recent AI Gains are Mostly from Inference-Scaling — Toby Ord In the last year or two, the most important trend in modern AI came to an end. The scaling-up of computational resources used to train ever-larger AI models through next-token prediction ( pre-trainin...

Evidence Recent AI Gains are Mostly from Inference-Scaling
🧵
Here's a thread about my latest post on AI scaling …
1/14
www.tobyord.com/writing/most...

03.10.2025 19:31 — 👍 9    🔁 4    💬 1    📌 0
Post image

"It has gone largely unnoticed that time spent on social media peaked in 2022 and has since gone into steady decline."

By @jburnmurdoch.ft.com

www.ft.com/content/a072...

03.10.2025 12:04 — 👍 143    🔁 29    💬 1    📌 13
A bar chart illustrates the estimated lives saved each year by various American foreign aid programs, totaling approximately 3.3 million lives saved annually. The programs listed from top to bottom include:

- HIV/AIDS: 1.6 million lives saved per year
- Humanitarian aid: 550,000 lives saved per year
- Vaccines: 500,000 lives saved per year
- Tuberculosis: 310,000 lives saved per year
- Malaria: 290,000 lives saved per year

At the bottom, a note indicates that the figures represent central estimates and that actual estimates may range from 2.3 to 5.6 million lives saved. It clarifies that these numbers do not encompass other vital forms of aid such as water and sanitation, nutrition, and family planning. The source of the data is credited to Kenny & Sandefur, 2025. The visual includes a label stating "Our World in Data" and is presented under a creative commons attribution license (CC BY).

A bar chart illustrates the estimated lives saved each year by various American foreign aid programs, totaling approximately 3.3 million lives saved annually. The programs listed from top to bottom include: - HIV/AIDS: 1.6 million lives saved per year - Humanitarian aid: 550,000 lives saved per year - Vaccines: 500,000 lives saved per year - Tuberculosis: 310,000 lives saved per year - Malaria: 290,000 lives saved per year At the bottom, a note indicates that the figures represent central estimates and that actual estimates may range from 2.3 to 5.6 million lives saved. It clarifies that these numbers do not encompass other vital forms of aid such as water and sanitation, nutrition, and family planning. The source of the data is credited to Kenny & Sandefur, 2025. The visual includes a label stating "Our World in Data" and is presented under a creative commons attribution license (CC BY).

✍️ New article: “Foreign aid from the United States saved millions of lives each year”

For decades, these aid programs received bipartisan support and made a difference. Cutting them will cost lives.

30.09.2025 09:20 — 👍 72    🔁 28    💬 3    📌 1
Preview
AI isn't replacing radiologists Radiology combines digital images, clear benchmarks, and repeatable tasks. But demand for human radiologists is ay an all-time high.

An insightful piece by Deena Mousa about how AI performs extremely well at benchmarks for reading medical scans, yet isn't putting radiologists out of work. Lots to learn for other knowledge-work professions here.
www.worksinprogress.news/p/why-ai-isn...

29.09.2025 09:01 — 👍 10    🔁 2    💬 0    📌 2

My best guess is that the solution doesn't lie in changing the values we assign to infinite (or very large) futures, but in adjusting morality to be less demanding. i.e. that it isn't *better* to prioritise our generation over the others, but it may be permissible.

26.09.2025 14:51 — 👍 0    🔁 0    💬 0    📌 0

I think the fanaticism problem (of concern for the whole future always winning out over everyday issues) is a real issue, and this paper doesn't try to resolve it. I think the issue is not due to the infinite per se as it comes up in astronomical finite cases too.

26.09.2025 14:49 — 👍 1    🔁 0    💬 1    📌 0
Preview
Evaluating the Infinite I present a novel mathematical technique for dealing with the infinities arising from divergent sums and integrals. It assigns them fine-grained infinite values from the set of hyperreal numbers in a ...

Please do check out the paper if you're interested!
20/20
arxiv.org/abs/2509.19389

25.09.2025 15:45 — 👍 3    🔁 0    💬 0    📌 0

That said, my method doesn't attempt to solve every problem that can arise from infinite value, and it does have some remaining issues which I outline in the paper. I think of it more as a proof-of-concept that there is a technical solution, than as a completed one.
19/

25.09.2025 15:44 — 👍 2    🔁 0    💬 1    📌 0

@tobyord is following 20 prominent accounts