N/N
More on Game Arena and the upcoming chess matchups:
🔗 www.kaggle.com/blog/introdu...
Excited to see more AI evaluations move in this direction.
@matejjusup.bsky.social
• A PhD in multi-agent reinforcement learning at ETH Zurich • A chess enthusiast - 2585 Elo @Chesscom) • Developed the first language model at Google DeepMind capable of playing the game at near super-human level (3200 Elo).
N/N
More on Game Arena and the upcoming chess matchups:
🔗 www.kaggle.com/blog/introdu...
Excited to see more AI evaluations move in this direction.
4/N The first tournament? Chess, with top models like Gemini 2.5 Pro, o3, and DeepSeek-R1. Matches will be covered by Magnus Carlsen, Hikaru Nakamura, and Levy Rozman (GothamChess)
05.08.2025 18:45 — 👍 1 🔁 0 💬 1 📌 0
3/N
3. Resistance to benchmark saturation—many games remain unsolved by brute force or memorization
4. Strong emphasis on high-level behaviors: planning, reasoning, memory, adaptation, even deception
2/N By using head-to-head board game matchups, it offers several advantages over many existing evaluations:
1. Direct comparisons across a range of strategic games
2. Streamed, replayable matches that improve transparency and reproducibility
1/N I’ve long believed that board games should play a bigger role in AI evaluation. They naturally test strategic reasoning, long-term planning, adaptation—and they can’t be solved by brute force or memorization.
Game Arena is transparent, replayable, and tests actual behavioral intelligence.
A year after our trip to AAMAS in New Zealand, @sharky6000.bsky.social came back for more!
I should have planned my year not to miss @aamasconf.bsky.social…
Big congrats and keep up amazing work! 🎉👏
Looking forward to speaking at the ML Pub Club on June 3rd!
I'll discuss how, during my time at DeepMind, we taught LLMs to play chess at a GM level and the broader implications for strategic AI.
If you're in Zagreb, join us at Mažuranićev trg 13 at 6 PM!
More info & RSVP: lu.ma/erjji5it
A paper from my time at Google was accepted for a spotlight presentation at ICML!
In “Mastering Board Games by External and Internal Planning with Language Models”, we show how language models can achieve grandmaster-level play using a search budget on par with humans.
arxiv.org/abs/2412.12119
Hive (and all of its expansions) has been added to OpenSpiel! 🎉🤩🐝🐜🕷️🐞🦟🪲
From Gen42: "Hive is an award-winning board game with a difference. There is no board. The pieces are added to the playing area thus creating the board. As more and more pieces are added the game becomes a fight to ...
🧵1/5
www.youtube.com/watch?v=9_Pe... An interview with Rich. The humility of Rich is truly inspiring: "There are no authorities in science". I wish people would listen and live by this.
06.03.2025 20:50 — 👍 40 🔁 13 💬 2 📌 1
Sim agents are key for developing autonomous systems for safety-critical systems, like self-driving cars.
We're open-sourcing sim agents that achieve a 99.8% success rate with < 0.8% failures on the Waymo Dataset. These agents are built through scaling self-play.
We've released our lecture notes for the course Probabilistic AI at ETH Zurich, covering uncertainty in ML and its importance for sequential decision making. Thanks a lot to @jonhue.bsky.social for his amazing effort and to everyone who contributed! We hope this resource is useful to you!
17.02.2025 07:19 — 👍 61 🔁 10 💬 1 📌 0it won't, as far as i know. i will share the link here if anything changes
14.02.2025 21:26 — 👍 1 🔁 0 💬 0 📌 0
Join the conversation! We'll cover:
• The innovative search strategies we developed
• The implications of LLMs in strategic domains
• Q&A and networking with fellow AI enthusiasts
🗓️ 20th Feb 2025, 18:00-20:00
📍 Zürich, OAT ETH Zurich (14th floor)
🔗 www.zurichai.ch
LLMs Mastering Board Games: ZurichNLP Meetup - Feb 20th!
Excited to share insights from my student research at Google DeepMind at the upcoming ZurichNLP meetup! I'll present how we achieved high-level play in board games using LLMs with a search budget comparable to human chess grandmasters.
I've been talking about writing this paper to anyone who would listen since 2020. I bombed a bunch of job talks trying to convince companies to work on this. It's so nice to finally just be able to say, yes, self-play RL in a diverse world gives you immense capabilities
arxiv.org/abs/2502.03349
I am more than happy that @quantamagazine.bsky.social , which I have been reading since the first year of my Bachelor's degree, cited us:
www.quantamagazine.org/chatbot-soft...
More news about this work and 2nd version is coming soon!
#machinelearning #deeplearning #cs #computerscience #tcs
Pet peeve: Calling something that’s not open source… open source. Open weight != open source
29.01.2025 21:04 — 👍 30 🔁 3 💬 1 📌 1Graphics fill of statistics on the efficiency or in efficiency of cars
A typical European car is parked 92% of the time. It spends 1/5th of its driving time looking for parking. Its 5 seats only move 1.5 people. 86% of its fuel never reaches the wheels, and most of the energy that does, moves the car, not the people.
Sound efficient?
HT @ellenmacarthurfdn.bsky.social
An interesting idea that’s worth keeping an eye on!
26.01.2025 21:24 — 👍 0 🔁 0 💬 0 📌 0
Demis Hassabis, James Manyika, and I wrote up an overview of the AI research work & advances across Google in 2024 (Gemini, NotebookLM, robotics, ML for science, & advances in responsible AI+more). 🎊
Given it a read or paste it into NotebookLM to listen, if you prefer!
blog.google/technology/a...
Check out the 16th Workshop on Optimization and Learning in Multiagent Systems (OptLearnMAS-25) at #AAMAS 2025!
Topics: distributed opt., coalition formation, opt. under uncertainty, winner determination algs in auctions and procurements, algs to compute equilibria in games.
optlearnmas.github.io
In December, I posted about our new paper on mastering board games using internal + external planning. 👇
Here's a talk now on Youtube about it given by my awesome colleague John Schultz!
www.youtube.com/watch?v=JyxE...
John's talk is now available online!
www.youtube.com/watch?v=JyxE...
Join John's talk to get insights on our paper on mastering board games with language models!
13.01.2025 13:35 — 👍 6 🔁 1 💬 0 📌 1
Just a reminder that the AAMAS Doctoral Consortium deadline is next Friday!
Please consider submitting to this great venue or telling your students about it.
👇
The first round was a hard-fought win against a much lower-rated opponent, but it is a testament to the increased playing quality since the recent global chess boom!
lichess.org/broadcast/29...
After 15 years away from competitive chess, I forgot how much thrill and excitement the game gives! ♟️ I decided to attend a tournament with five grandmasters and numerous international, fide, and candidate masters.
@lichess.org broadcast: lichess.org/broadcast/29...
If you are into ML theory (RL or not) with a proven track record, and you are interested in an industry research position, PM me. Feel free to spread the word.
19.12.2024 00:55 — 👍 74 🔁 31 💬 2 📌 0After a slight delay, it is now also out on arXiv: arxiv.org/abs/2412.12119
18.12.2024 12:50 — 👍 8 🔁 1 💬 0 📌 0