Matej Jusup matejjusup - Bluesky Statics

Introducing Kaggle Game Arena | Kaggle Watch models compete in complex games providing a verifiable and dynamic measure of their capabilities

N/N
More on Game Arena and the upcoming chess matchups:
🔗 www.kaggle.com/blog/introdu...

Excited to see more AI evaluations move in this direction.

05.08.2025 18:45 — 👍 1 🔁 0 💬 0 📌 0

4/N The first tournament? Chess, with top models like Gemini 2.5 Pro, o3, and DeepSeek-R1. Matches will be covered by Magnus Carlsen, Hikaru Nakamura, and Levy Rozman (GothamChess)

05.08.2025 18:45 — 👍 1 🔁 0 💬 1 📌 0

3/N
3. Resistance to benchmark saturation—many games remain unsolved by brute force or memorization
4. Strong emphasis on high-level behaviors: planning, reasoning, memory, adaptation, even deception

05.08.2025 18:45 — 👍 1 🔁 0 💬 1 📌 0

2/N By using head-to-head board game matchups, it offers several advantages over many existing evaluations:
1. Direct comparisons across a range of strategic games
2. Streamed, replayable matches that improve transparency and reproducibility

05.08.2025 18:45 — 👍 1 🔁 0 💬 1 📌 0

1/N I’ve long believed that board games should play a bigger role in AI evaluation. They naturally test strategic reasoning, long-term planning, adaptation—and they can’t be solved by brute force or memorization.

Game Arena is transparent, replayable, and tests actual behavioral intelligence.

05.08.2025 18:45 — 👍 3 🔁 1 💬 1 📌 0

A year after our trip to AAMAS in New Zealand, @sharky6000.bsky.social came back for more!

I should have planned my year not to miss @aamasconf.bsky.social…

Big congrats and keep up amazing work! 🎉👏

23.05.2025 22:52 — 👍 3 🔁 0 💬 1 📌 0

ML Pub Club #22: Superhuman Planning with LLMs · Luma What happens when a chess champion meets cutting-edge AI? Join us for an evening with Matej Jusup, as he unpacks how large language models (LLMs) can go from…

Looking forward to speaking at the ML Pub Club on June 3rd!

I'll discuss how, during my time at DeepMind, we taught LLMs to play chess at a GM level and the broader implications for strategic AI.

If you're in Zagreb, join us at Mažuranićev trg 13 at 6 PM!

More info & RSVP: lu.ma/erjji5it

20.05.2025 19:12 — 👍 3 🔁 0 💬 0 📌 0

Mastering Board Games by External and Internal Planning with Language Models Advancing planning and reasoning capabilities of Large Language Models (LLMs) is one of the key prerequisites towards unlocking their potential for performing reliably in complex and impactful domains...

A paper from my time at Google was accepted for a spotlight presentation at ICML!

In “Mastering Board Games by External and Internal Planning with Language Models”, we show how language models can achieve grandmaster-level play using a search budget on par with humans.

arxiv.org/abs/2412.12119

01.05.2025 20:35 — 👍 22 🔁 4 💬 0 📌 0

Hive (and all of its expansions) has been added to OpenSpiel! 🎉🤩🐝🐜🕷️🐞🦟🪲

From Gen42: "Hive is an award-winning board game with a difference. There is no board. The pieces are added to the playing area thus creating the board. As more and more pieces are added the game becomes a fight to ...

🧵1/5

28.04.2025 12:53 — 👍 14 🔁 3 💬 1 📌 2

YouTube video by Amii TURING AWARD WINNER Richard S. Sutton in Conversation with Cam Linke | No Authorities in Science

www.youtube.com/watch?v=9_Pe... An interview with Rich. The humility of Rich is truly inspiring: "There are no authorities in science". I wish people would listen and live by this.

06.03.2025 20:50 — 👍 40 🔁 13 💬 2 📌 1

Sim agents are key for developing autonomous systems for safety-critical systems, like self-driving cars.

We're open-sourcing sim agents that achieve a 99.8% success rate with < 0.8% failures on the Waymo Dataset. These agents are built through scaling self-play.

28.02.2025 17:19 — 👍 34 🔁 5 💬 3 📌 1

We've released our lecture notes for the course Probabilistic AI at ETH Zurich, covering uncertainty in ML and its importance for sequential decision making. Thanks a lot to @jonhue.bsky.social for his amazing effort and to everyone who contributed! We hope this resource is useful to you!

17.02.2025 07:19 — 👍 61 🔁 10 💬 1 📌 0

it won't, as far as i know. i will share the link here if anything changes

14.02.2025 21:26 — 👍 1 🔁 0 💬 0 📌 0

ZurichAI | Largest ML meetup in Switzerland ZurichAI is the largest regularly scheduled machine learning meetup in Switzerland. We're in Zurich and host events for NLP, CV & more with 100+ regular attendees.

Join the conversation! We'll cover:
• The innovative search strategies we developed
• The implications of LLMs in strategic domains
• Q&A and networking with fellow AI enthusiasts

🗓️ 20th Feb 2025, 18:00-20:00
📍 Zürich, OAT ETH Zurich (14th floor)
🔗 www.zurichai.ch

14.02.2025 17:07 — 👍 3 🔁 0 💬 0 📌 0

LLMs Mastering Board Games: ZurichNLP Meetup - Feb 20th!

Excited to share insights from my student research at Google DeepMind at the upcoming ZurichNLP meetup! I'll present how we achieved high-level play in board games using LLMs with a search budget comparable to human chess grandmasters.

14.02.2025 17:07 — 👍 15 🔁 3 💬 2 📌 0

Robust Autonomy Emerges from Self-Play Self-play has powered breakthroughs in two-player and multi-player games. Here we show that self-play is a surprisingly effective strategy in another domain. We show that robust and naturalistic drivi...

I've been talking about writing this paper to anyone who would listen since 2020. I bombed a bunch of job talks trying to convince companies to work on this. It's so nice to finally just be able to say, yes, self-play RL in a diverse world gives you immense capabilities
arxiv.org/abs/2502.03349

09.02.2025 20:01 — 👍 92 🔁 6 💬 3 📌 0

I am more than happy that @quantamagazine.bsky.social , which I have been reading since the first year of my Bachelor's degree, cited us:
www.quantamagazine.org/chatbot-soft...

More news about this work and 2nd version is coming soon!

#machinelearning #deeplearning #cs #computerscience #tcs

04.02.2025 14:59 — 👍 4 🔁 1 💬 0 📌 0

Pet peeve: Calling something that’s not open source… open source. Open weight != open source

29.01.2025 21:04 — 👍 30 🔁 3 💬 1 📌 1

Graphics fill of statistics on the efficiency or in efficiency of cars

A typical European car is parked 92% of the time. It spends 1/5th of its driving time looking for parking. Its 5 seats only move 1.5 people. 86% of its fuel never reaches the wheels, and most of the energy that does, moves the car, not the people.

Sound efficient?

HT @ellenmacarthurfdn.bsky.social

25.01.2025 06:21 — 👍 1092 🔁 407 💬 30 📌 29

An interesting idea that’s worth keeping an eye on!

26.01.2025 21:24 — 👍 0 🔁 0 💬 0 📌 0

2024: A year of extraordinary progress and advancement in AI As we move into 2025, we’re looking back at the astonishing progress in AI in 2024.

Demis Hassabis, James Manyika, and I wrote up an overview of the AI research work & advances across Google in 2024 (Gemini, NotebookLM, robotics, ML for science, & advances in responsible AI+more). 🎊

Given it a read or paste it into NotebookLM to listen, if you prefer!

blog.google/technology/a...

24.01.2025 00:46 — 👍 125 🔁 22 💬 2 📌 0

Check out the 16th Workshop on Optimization and Learning in Multiagent Systems (OptLearnMAS-25) at #AAMAS 2025!

Topics: distributed opt., coalition formation, opt. under uncertainty, winner determination algs in auctions and procurements, algs to compute equilibria in games.

optlearnmas.github.io

20.01.2025 11:12 — 👍 11 🔁 2 💬 0 📌 0

In December, I posted about our new paper on mastering board games using internal + external planning. 👇

Here's a talk now on Youtube about it given by my awesome colleague John Schultz!

www.youtube.com/watch?v=JyxE...

17.01.2025 17:26 — 👍 35 🔁 11 💬 1 📌 0

John's talk is now available online!
www.youtube.com/watch?v=JyxE...

17.01.2025 15:27 — 👍 13 🔁 2 💬 0 📌 0

Join John's talk to get insights on our paper on mastering board games with language models!

13.01.2025 13:35 — 👍 6 🔁 1 💬 0 📌 1

Just a reminder that the AAMAS Doctoral Consortium deadline is next Friday!

Please consider submitting to this great venue or telling your students about it.

👇

10.01.2025 10:04 — 👍 8 🔁 3 💬 0 📌 0

29th BOŠNJACI Open • Round 1 9-round Swiss | 90 min + 30 sec / move | standard | Bošnjaci, Croatia | Saric, Culum, Pap, Zaja

The first round was a hard-fought win against a much lower-rated opponent, but it is a testament to the increased playing quality since the recent global chess boom!

lichess.org/broadcast/29...

03.01.2025 21:16 — 👍 1 🔁 0 💬 0 📌 0

After 15 years away from competitive chess, I forgot how much thrill and excitement the game gives! ♟️ I decided to attend a tournament with five grandmasters and numerous international, fide, and candidate masters.

@lichess.org broadcast: lichess.org/broadcast/29...

03.01.2025 21:16 — 👍 3 🔁 0 💬 1 📌 0

If you are into ML theory (RL or not) with a proven track record, and you are interested in an industry research position, PM me. Feel free to spread the word.

19.12.2024 00:55 — 👍 74 🔁 31 💬 2 📌 0

After a slight delay, it is now also out on arXiv: arxiv.org/abs/2412.12119

18.12.2024 12:50 — 👍 8 🔁 1 💬 0 📌 0

Posts by Matej Jusup (@matejjusup.bsky.social)