Marc Lanctot's Avatar

Marc Lanctot

@sharky6000.bsky.social

Research Scientist at Google DeepMind, interested in multiagent reinforcement learning, game theory, games, and search/planning. Lover of Linux ๐Ÿง, coffee โ˜•, and retro gaming. Big fan of open-source. #gohabsgo ๐Ÿ‡จ๐Ÿ‡ฆ For more info: https://linktr.ee/sharky6000

8,734 Followers  |  425 Following  |  2,164 Posts  |  Joined: 29.12.2023
Posts Following

Posts by Marc Lanctot (@sharky6000.bsky.social)

+1 great journal, please consider volunteering for the role of Action Editor at TMLR!

05.03.2026 18:14 โ€” ๐Ÿ‘ 4    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Waiting for deepseek v4

05.03.2026 16:58 โ€” ๐Ÿ‘ 34    ๐Ÿ” 2    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 0

Omg I literally laughed out loud ๐Ÿคฃ, hilarious use of AI-modified meme, well done sir!

05.03.2026 18:11 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

๐Ÿ“ฃ Reinforcement Learning Summer School is returning to Milan in 2026!

Co-organized with @ellisunitmilan.bsky.social & designed for Master's and PhD students on RL theory, multi-agent systems, RL & LLMs, real-world applications...

๐Ÿ“ Milan ๐Ÿ‡ฎ๐Ÿ‡น
๐Ÿ“… 3-12 June
โฐ Apply by 27 March
๐Ÿ”— https://bit.ly/4b2Plhp

04.03.2026 14:49 โ€” ๐Ÿ‘ 17    ๐Ÿ” 7    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Are AI models effective collaborators, or mere assistants awaiting your next command? (Preprint: arxiv.org/abs/2602.24188)

To find out, we make AI collaborate with itself, in private information games: tasks that require sharing private information, like this chess board ordering task.

04.03.2026 00:15 โ€” ๐Ÿ‘ 54    ๐Ÿ” 21    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 1
Preview
a close up of a pikachu with its mouth open ALT: a close up of a pikachu with its mouth open
04.03.2026 22:08 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Mark Zuckerberg is 'done with' the Metaโ€™s highest-paid employee as companyโ€™s reorganisation proves - The Times of India Tech News News: Meta CEO Mark Zuckerberg has quietly begun dismantling the power structure he built around Alexandr Wang, his $14 billion bet to lead the company's AI.

Interesting developmentโ€ฆ I guess Alexandr Wang is on the way out. That was a bit quicker than I expected. I would have thought heโ€™d be given at least a year of runway.

timesofindia.indiatimes.com/technology/t...

04.03.2026 19:52 โ€” ๐Ÿ‘ 12    ๐Ÿ” 3    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1

I am a huge fan of that paper, and sequential equilibrium. The Guess the Ace example is a perfect motivation.

03.03.2026 19:31 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

No no, the algorithm from that paper to computer sequential equilibrium would be very welcome.

I just felt that thread was the source if a lot of confusion. 'Subgame perfection' should be reserved for perfect info games but the thread was about LP solvers. So it wasn't clear what the proposal was.

03.03.2026 16:27 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Pretty nice looking thesis, thanks!

01.03.2026 22:23 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

If they're basically solving a slightly perturbed game, that would be great news.. because then I believe it would easy to have an "active" version (in the sense of bsky.app/profile/shar...) based on adversarial bandits. Will have to dig into the detail and ask Serena about it. ๐Ÿ˜€

01.03.2026 17:50 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Here's the overview. I highlighted one aspect of this that I really like, because vanilla VasE does not do anything special to handle the statistical uncertainty that is present in the scores out-of-the-box, which could be quite relevant when comparing agents.

01.03.2026 17:47 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Except it does it in a more sophisticated way with targeted ambiguity sets *and* it maintains some properties similar to the classical maximal lotteries.

01.03.2026 17:40 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

If I am right about that, it is similar in spirit to the motivation behind "Projected Replicator Dynamics" in the PSRO paper which simulated a constrained equation that had a lower bound on the probabilities. Or how in Nash averaging they use maximum entropy Nash equilibrium.

01.03.2026 17:40 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 0

Yes we use LPs to solve the maximal lotteries objectives too (they are basically two-player zero-sum games). Problem is that makes them sensitive to small changes. My first take was this seems like a way to redesign the LP to spread weight elsewhere...? To avoid the sensitivities.

01.03.2026 17:40 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Yeah we have been in touch with Serena Wang so we know of this work but I have only skimmed it so far. Looks neat!

01.03.2026 17:24 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Spoke to the authors this week! Nice work. They presented in London.

27.02.2026 22:10 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

This is amazing.

www.getyourfuckingmoneyback.com

27.02.2026 17:49 โ€” ๐Ÿ‘ 36968    ๐Ÿ” 11947    ๐Ÿ’ฌ 495    ๐Ÿ“Œ 800

So indeed you can go a long way with just proper modeling. What I am curious about is whether predictive modeling like this will generalize outside RRPS. I expect that it will. And indeed maybe it already covers much of the gain we expect from search/reasoning.

27.02.2026 21:56 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Yeah, that is a great question! In our RRPS paper from '23, we ran RL in a self-play setting where the "predictive agent" was endowed with the ability to predict which bot it was playing against. And when we tested it again held-out, unknown bots it did much better than standard self-play bots.

27.02.2026 21:56 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Ah yes that is a good point. The classical one is zero-sum but a different one-- which is not zero-sum-- (with the same equilibrim) was used when soliciting the human data.

27.02.2026 21:46 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Hey what leaderboard / site is this from?

25.02.2026 23:57 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Omg, I can def relate to this...

25.02.2026 23:42 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
AI models that simulate internal debate dramatically improve accuracy on complex tasks A new study reveals that top models like DeepSeek-R1 succeed by simulating internal debates. Here is how enterprises can harness this "society of thought" to build more robust, self-correcting agents.

Grateful to @venturebeat.com for featuring our Paradigms of Intelligence teamโ€™s research on โ€œsocieties of thought,โ€ or internal multi-agent dialogues.

Read the full piece, which includes a thoughtful quote from my friend & colleague James Evans: bit.ly/3ZN4oa5

25.02.2026 16:01 โ€” ๐Ÿ‘ 6    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

What???

A Cirque show.. named Ludo.. at a conference banquet dinner!!

๐Ÿคฏ๐Ÿคฏ๐Ÿคฏ

So cool! Can't wait for this! ๐Ÿฅฐ

25.02.2026 23:39 โ€” ๐Ÿ‘ 9    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Getting a lot easier to avoid tbh because the pro-AI contingent is greater in numbers than before.

But the last few times it happened, it was triggered exactly by this kind of question... but it was posed quite rudely, so maybe you have not (yet) triggered them...? ๐Ÿ˜…๐Ÿ‘

25.02.2026 01:48 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

๐Ÿ’”

22.02.2026 15:54 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

๐Ÿ˜ฑ๐Ÿ˜ญ

22.02.2026 15:54 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

๐Ÿ’ฏ!!

22.02.2026 15:51 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Ridiculous save by Connor Hellebuyck keeps the game tied #milanocortina2026 #cbcsports
YouTube video by CBC Sports Ridiculous save by Connor Hellebuyck keeps the game tied #milanocortina2026 #cbcsports

I know, insane!! Did you see that stick stop?? ๐Ÿคฏ๐Ÿคฏ๐Ÿคฏ

youtube.com/shorts/UczOl...

22.02.2026 15:41 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0