+1 great journal, please consider volunteering for the role of Action Editor at TMLR!
05.03.2026 18:14 โ ๐ 4 ๐ 1 ๐ฌ 0 ๐ 0@sharky6000.bsky.social
Research Scientist at Google DeepMind, interested in multiagent reinforcement learning, game theory, games, and search/planning. Lover of Linux ๐ง, coffee โ, and retro gaming. Big fan of open-source. #gohabsgo ๐จ๐ฆ For more info: https://linktr.ee/sharky6000
+1 great journal, please consider volunteering for the role of Action Editor at TMLR!
05.03.2026 18:14 โ ๐ 4 ๐ 1 ๐ฌ 0 ๐ 0Waiting for deepseek v4
05.03.2026 16:58 โ ๐ 34 ๐ 2 ๐ฌ 4 ๐ 0Omg I literally laughed out loud ๐คฃ, hilarious use of AI-modified meme, well done sir!
05.03.2026 18:11 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
๐ฃ Reinforcement Learning Summer School is returning to Milan in 2026!
Co-organized with @ellisunitmilan.bsky.social & designed for Master's and PhD students on RL theory, multi-agent systems, RL & LLMs, real-world applications...
๐ Milan ๐ฎ๐น
๐
3-12 June
โฐ Apply by 27 March
๐ https://bit.ly/4b2Plhp
Are AI models effective collaborators, or mere assistants awaiting your next command? (Preprint: arxiv.org/abs/2602.24188)
To find out, we make AI collaborate with itself, in private information games: tasks that require sharing private information, like this chess board ordering task.
Interesting developmentโฆ I guess Alexandr Wang is on the way out. That was a bit quicker than I expected. I would have thought heโd be given at least a year of runway.
timesofindia.indiatimes.com/technology/t...
I am a huge fan of that paper, and sequential equilibrium. The Guess the Ace example is a perfect motivation.
03.03.2026 19:31 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
No no, the algorithm from that paper to computer sequential equilibrium would be very welcome.
I just felt that thread was the source if a lot of confusion. 'Subgame perfection' should be reserved for perfect info games but the thread was about LP solvers. So it wasn't clear what the proposal was.
Pretty nice looking thesis, thanks!
01.03.2026 22:23 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0If they're basically solving a slightly perturbed game, that would be great news.. because then I believe it would easy to have an "active" version (in the sense of bsky.app/profile/shar...) based on adversarial bandits. Will have to dig into the detail and ask Serena about it. ๐
01.03.2026 17:50 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0Here's the overview. I highlighted one aspect of this that I really like, because vanilla VasE does not do anything special to handle the statistical uncertainty that is present in the scores out-of-the-box, which could be quite relevant when comparing agents.
01.03.2026 17:47 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0Except it does it in a more sophisticated way with targeted ambiguity sets *and* it maintains some properties similar to the classical maximal lotteries.
01.03.2026 17:40 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0If I am right about that, it is similar in spirit to the motivation behind "Projected Replicator Dynamics" in the PSRO paper which simulated a constrained equation that had a lower bound on the probabilities. Or how in Nash averaging they use maximum entropy Nash equilibrium.
01.03.2026 17:40 โ ๐ 1 ๐ 0 ๐ฌ 3 ๐ 0Yes we use LPs to solve the maximal lotteries objectives too (they are basically two-player zero-sum games). Problem is that makes them sensitive to small changes. My first take was this seems like a way to redesign the LP to spread weight elsewhere...? To avoid the sensitivities.
01.03.2026 17:40 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0Yeah we have been in touch with Serena Wang so we know of this work but I have only skimmed it so far. Looks neat!
01.03.2026 17:24 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0Spoke to the authors this week! Nice work. They presented in London.
27.02.2026 22:10 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
This is amazing.
www.getyourfuckingmoneyback.com
So indeed you can go a long way with just proper modeling. What I am curious about is whether predictive modeling like this will generalize outside RRPS. I expect that it will. And indeed maybe it already covers much of the gain we expect from search/reasoning.
27.02.2026 21:56 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0Yeah, that is a great question! In our RRPS paper from '23, we ran RL in a self-play setting where the "predictive agent" was endowed with the ability to predict which bot it was playing against. And when we tested it again held-out, unknown bots it did much better than standard self-play bots.
27.02.2026 21:56 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0Ah yes that is a good point. The classical one is zero-sum but a different one-- which is not zero-sum-- (with the same equilibrim) was used when soliciting the human data.
27.02.2026 21:46 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0Hey what leaderboard / site is this from?
25.02.2026 23:57 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0Omg, I can def relate to this...
25.02.2026 23:42 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0
Grateful to @venturebeat.com for featuring our Paradigms of Intelligence teamโs research on โsocieties of thought,โ or internal multi-agent dialogues.
Read the full piece, which includes a thoughtful quote from my friend & colleague James Evans: bit.ly/3ZN4oa5
What???
A Cirque show.. named Ludo.. at a conference banquet dinner!!
๐คฏ๐คฏ๐คฏ
So cool! Can't wait for this! ๐ฅฐ
Getting a lot easier to avoid tbh because the pro-AI contingent is greater in numbers than before.
But the last few times it happened, it was triggered exactly by this kind of question... but it was posed quite rudely, so maybe you have not (yet) triggered them...? ๐
๐
๐
22.02.2026 15:54 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0๐ฑ๐ญ
22.02.2026 15:54 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0๐ฏ!!
22.02.2026 15:51 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
I know, insane!! Did you see that stick stop?? ๐คฏ๐คฏ๐คฏ
youtube.com/shorts/UczOl...