akbir khan's Avatar

akbir khan

@akbir.bsky.social

dumbest overseer at @anthropic https://www.akbir.dev

359 Followers  |  156 Following  |  30 Posts  |  Joined: 18.11.2024  |  1.721

Latest posts by akbir.bsky.social on Bluesky

Post image

We’ve added four new benchmarks to the Epoch AI Benchmarking Hub: Aider Polyglot, WeirdML, Balrog, and Factorio Learning Environment!

Before we only featured our own evaluation results, but this new data comes from trusted external leaderboards. And we've got more on the way 🧡

08.05.2025 15:00 β€” πŸ‘ 5    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Preview
Factorio Learning Environment Claude Sonnet 3.5 builds factories

4. Factorio Learning Environment by Jack Hopkins, MΓ€rt Bakler , and
@akbir.bsky.social

This benchmark uses the factory-building game Factorio to test complex, long-term planning, with settings for lab-play (structured tasks) and open-play (unbounded growth).
jackhopkins.github.io/factorio-lea...

08.05.2025 15:00 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

New Anthropic blog post: Subtle sabotage in automated researchers.

As AI systems increasingly assist with AI research, how do we ensure they're not subtly sabotaging that research? We show that malicious models can undermine ML research tasks in ways that are hard to detect.

25.03.2025 16:03 β€” πŸ‘ 4    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
Controlling powerful AI
YouTube video by Anthropic Controlling powerful AI

control is a complimentary approach to alignment.

its really sensible, practical and can be done now, even before systems are superintelligent.

youtu.be/6Unxqr50Kqg?...

18.03.2025 15:22 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image Post image Post image Post image

This is a crazy paper. Fine-tuning a big GPT-4o on a small amount of insecure code or even "bad numbers" (like 666) makes them misaligned in almost everything else. They are more likely to start offering misinformation, spouting anti-human values, and talk about admiring dictators. Why is unclear.

25.02.2025 21:01 β€” πŸ‘ 214    πŸ” 43    πŸ’¬ 7    πŸ“Œ 19
Preview
Statement from Dario Amodei on the Paris AI Action Summit A call for greater focus and urgency

www.anthropic.com/news/paris-a...

11.02.2025 21:08 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

This is the entire goal

01.02.2025 02:13 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Dario Amodei β€” On DeepSeek and Export Controls On DeepSeek and Export Controls

darioamodei.com/on-deepseek-...

30.01.2025 02:09 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Trump announces 500B in AI funding. Five days ago.

Trump announces 500B in AI funding. Five days ago.

Deepseek r1 release. 8 days ago.

Deepseek r1 release. 8 days ago.

The fact that Deepseek R1 was released three days /before/ Stargate means these guys stood in front of Trump and said they needed half a trillion dollars while they knew R1 was open source and trained for $5M.

Beautiful.

28.01.2025 03:02 β€” πŸ‘ 13898    πŸ” 1772    πŸ’¬ 400    πŸ“Œ 120
Post image

Can anyone get a shorter DeepSeek R1 CoT than this?

24.01.2025 06:11 β€” πŸ‘ 17    πŸ” 1    πŸ’¬ 3    πŸ“Œ 0

Process based supervision done right, and with pretty CIDs to illustrate :)

23.01.2025 20:33 β€” πŸ‘ 8    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

I don’t really have the energy for politics right now. So I will observe without comment:

Executive Order 14110 was revoked (Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence)

21.01.2025 00:34 β€” πŸ‘ 96    πŸ” 37    πŸ’¬ 2    πŸ“Œ 6

R1 model is impressive

21.01.2025 22:21 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image 16.01.2025 18:33 β€” πŸ‘ 33014    πŸ” 6865    πŸ’¬ 429    πŸ“Œ 289
Preview
She Is in Love With ChatGPT A 28-year-old woman with a busy social life spends hours on end talking to her A.I. boyfriend for advice and consolation. And yes, they do have sex.

fuck the tabloids were right

www.nytimes.com/2025/01/15/t...

16.01.2025 06:46 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image Post image

New randomized, controlled trial by the World Bank of students using GPT-4 as a tutor in Nigeria. Six weeks of after-school AI tutoring = 2 years of typical learning gains, outperforming 80% of other educational interventions.

And it helped all students, especially girls who were initially behind.

15.01.2025 20:58 β€” πŸ‘ 354    πŸ” 88    πŸ’¬ 15    πŸ“Œ 27
Post image Post image

Generative AI has flaws and biases, and there is a tendency for academics to fix on that (85% of equity LLM papers focus on harms)…

…yet in many ways LLMs are uniquely powerful among new technologies for helping people equitably in education and healthcare. We need an urgent focus on how to do that

14.01.2025 17:45 β€” πŸ‘ 69    πŸ” 11    πŸ’¬ 2    πŸ“Œ 4
Post image

On one hand, this paper finds adding inference-time compute (like o1 does) improves medical reasoning, which is an important finding suggesting a way to continue to improve AI performance in medicine

On the other hand, scientific illustrations are apparently just anime now arxiv.org/pdf/2501.06458

14.01.2025 05:56 β€” πŸ‘ 71    πŸ” 5    πŸ’¬ 2    πŸ“Œ 2

my metabolism is noticeably higher in london than the bay.

13.01.2025 15:49 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Recommendations for Technical AI Safety Research Directions

What can AI researchers do *today* that AI developers will find useful for ensuring the safety of future advanced AI systems? To ring in the new year, the Anthropic Alignment Science team is sharing some thoughts on research directions we think are important.
alignment.anthropic.com/2025/recomme...

10.01.2025 21:03 β€” πŸ‘ 22    πŸ” 7    πŸ’¬ 1    πŸ“Œ 1

My hottest take is that nothing makes any sense at all outside of the context of the constantly increasing value of human life, but that increase in value is so invisible (and exists in a world that was built for previous, lower values) that we constantly think the opposite has happened.

05.01.2025 19:08 β€” πŸ‘ 1772    πŸ” 88    πŸ’¬ 56    πŸ“Œ 3

wait what does that mean?

Does it mean there are bugs in lean, or that it does too much work to check a proof?

05.01.2025 17:22 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

wait isn’t everything just regularisation?

04.01.2025 19:42 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Dario Amodei β€” Machines of Loving Grace How AI Could Transform the World for the Better

darioamodei.com/machines-of-...

04.01.2025 17:38 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

no - why isn’t lean suffice?

04.01.2025 19:07 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

like i really have outgrown most scenarios where i think my race has held me back but this one won’t let go

04.01.2025 04:13 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Nothing kills my excitement of returning to the US like the response i get from CBP officers.

04.01.2025 04:13 β€” πŸ‘ 7    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Felix Hill and some other DMers and I after cold water swimming at Parliament Hill Lido a few years ago

Felix Hill and some other DMers and I after cold water swimming at Parliament Hill Lido a few years ago

Felix Hill was such an incredible mentor β€” and occasional cold water swimming partner β€” to me. He's a huge part of why I joined DeepMind and how I've come to approach research. Even a month later, it's still hard to believe he's gone.

02.01.2025 19:01 β€” πŸ‘ 123    πŸ” 17    πŸ’¬ 7    πŸ“Œ 5
Felix β€” Jane X. Wang From the moment I heard him give a talk, I knew I wanted to work with Felix . His ideas about generalization and situatedness made explicit thoughts that had been swirling around in my head, incohe...

A brilliant colleague and wonderful soul Felix Hill recently passed away. This was a shock and in an effort to sort some things out, I wrote them down. Maybe this will help someone else, but at the very least it helped me. Rest in peace, Felix, you will be missed. www.janexwang.com/blog/2025/1/...

03.01.2025 04:02 β€” πŸ‘ 63    πŸ” 11    πŸ’¬ 2    πŸ“Œ 0

@akbir is following 20 prominent accounts