I will be at @iclr-conf.bsky.social next week! If you are interested in any of these four postdoc positions, reach out to me via whova app so that we can meet during the conference!
18.04.2025 13:57 β π 1 π 0 π¬ 0 π 0@sarath-chandar.bsky.social
Associate Professor at Polytechnique Montreal and Mila.
I will be at @iclr-conf.bsky.social next week! If you are interested in any of these four postdoc positions, reach out to me via whova app so that we can meet during the conference!
18.04.2025 13:57 β π 1 π 0 π¬ 0 π 0Join us on April 17th for Mila TechAide 2025! π Hear from amazing speakers in #AI like Sara Hooker, Kyunghyun Cho, Golnoosh Farnadi, David Adelani, Derek Nowrouzezahrai, and Tegan Maharaj. All proceeds support #Centraide. Let's make a difference together! π‘π€
Get your tickets now! t.co/EBsatAqGTA
We have an amazing lineup of speakers, who will share their experience in deploying RL for power systems, industrial processes, nuclear fusion, logistics and more!
Consider submitting!
Do you care about applying RL to real systems? You should consider attending our RLC 2025 workshop on RL for real systems!
08.04.2025 14:06 β π 1 π 0 π¬ 0 π 0We know that policies learned through self-play can result in brittle, over-specialized strategies. But in our ICLR paper, we show that, provided the right representation and network architecture, self-play can learn generalizable cooperative policies! Congrats Arjun and Hadi for leading this work!
06.04.2025 14:44 β π 3 π 1 π¬ 0 π 0Collaborative Multi Agent Reinforcement Learning is key for AI in the future. Check out R3D2, a generalist agent working on text-based Hanabi, accepted at ICLR 2025.
Website: chandar-lab.github.io/R3D2-A-Gener...
Excited to share our ICLR 2025 paper: A Generalist Hanabi Agent!π
R3D2 agent plays all Hanabi settings at once and coordinates zero-shot with novel partners (Something SOTA LLMs can't do)βpowered by flexible architecture that handle changing obs/action spaces. π§΅π
π€ This was a project led by my students Arjun and
Hadi, in collaboration with Mathieu, Miao, and Janarthanan!
Paper: arxiv.org/abs/2503.14555
Code: github.com/chandar-lab/...
Models: huggingface.co/chandar-lab/...
Website: chandar-lab.github.io/R3D2-A-Gener.... 7/7
π§ Can LLMs master Hanabi without RL? We tested SOTA models by prompting them or finetuning them with expert trajectories.
Results? Still far offβo3-mini scored only 12 out of 25 even with extensive prompting! π§ #LLMs 6/n
π Robustness in Cross-Play: We introduce a new inter-setting eval alongside inter/intra-algorithm cross-playβR3D2 adapts smoothly to new agents & envs, outperforming others. Even our 12 (!) reviewers liked the thorough eval! π #CrossPlay
Openreview: openreview.net/forum?id=pCj...
5/n
β‘Zero-Shot Coordination: R3D2 can be trained on multiple settings at once (e.g., 2p & 3p), using simpler games to boost learning in complex onesβletting it coordinate across unseen game settings and partners without explicit training on those settings! 4/n
04.04.2025 17:12 β π 1 π 0 π¬ 1 π 0π Self-Play & Generalization: R3D2 adapts to new partners & game setups without complex MARL tricksβjust smart representations & architecture. Shows self-play alone can go far for generalizable policies! #ZeroShot #PolicyTransfer 3/n
04.04.2025 17:12 β π 0 π 0 π¬ 1 π 0π‘ Dynamic observation- and action-space: We frame Hanabi as a text game and use a dynamic action-space architecture, DRRN (He et al., 2015), letting R3D2 adapt across settings via text inputβdrawing on the power of language as representation. #TextGames 2/n
04.04.2025 17:12 β π 0 π 0 π¬ 1 π 0Can better architectures & representations make self-play enough for zero-shot coordination? π€
We explore this in our ICLR 2025 paper: A Generalist Hanabi Agent. We develop R3D2, the first agent to master all Hanabi settings and generalize to novel partners! π #ICLR2025 1/n
Apply here: tinyurl.com/crl-postdoc
Lab website: chandar-lab.github.io
2/2
In my lab, we have not one but four open postdoc positions! If you have strong research expertise and a PhD in LLMs and Foundation Models, and you are willing to learn about domain-specific problems and collaborate with domain experts, this is an ideal position for you! 1/2
21.03.2025 14:49 β π 6 π 1 π¬ 1 π 4I gave a talk on developing efficient foundation models for proteins and small molecules at the Helmholtz-ELLIS Workshop on Foundation Models in Science (www.mdc-berlin.de/news/events/...) today.
If you are interested in my spicy takes on ML for Biology, continue reading this thread! 1/n
I am excited to share that our BindGPT paper won the best poster award at #AAAI2025! Congratulations to the team! Work led by @artemzholus.bsky.social!
05.03.2025 14:54 β π 9 π 4 π¬ 0 π 0The best part? We are open-sourcing everything, including the intermediary model checkpoints. The main model is already on HuggingFace, be sure to check it out! (6/n)
Model: huggingface.co/chandar-lab/...
Paper: arxiv.org/abs/2502.19587
Code and checkpoints to be released soon!
It is also worth mentioning ModernBERT (arxiv.org/abs/2412.13663), which is another great parallel effort to modernize BERT but with some differences, as highlighted in our paper! (6/n)
28.02.2025 16:30 β π 2 π 0 π¬ 1 π 0After training for 1M steps with a maximum sequence length of 1024, we did a final 50k steps at 4096. This two-step training was a cost-efficient strategy to scale the modelβs maximum context window. For further scaling, our use of RoPE embeddings also lets us integrate YaRN! (5/n)
28.02.2025 16:30 β π 1 π 0 π¬ 1 π 0Before pre-training NeoBERT, we conducted thorough ablations on all our design choices by pre-training 9 different models at the scale of BERT (1M steps, 131k batch size). (4/n)
28.02.2025 16:30 β π 0 π 0 π¬ 1 π 0Its features include a native context length of 4,096, an optimal-depth-to-width ratio with 250M parameters, and the most efficient inference speeds of its kind! (3/n)
28.02.2025 16:30 β π 1 π 0 π¬ 1 π 0With identical finetuning, NeoBERT outperforms all baselines on MTEB! (2/n)
28.02.2025 16:30 β π 2 π 0 π¬ 1 π 12025 BERT is NeoBERT! We have fully pre-trained a next-generation encoder for 2.1T tokens with the latest advances in data, training, and architecture. This is a heroic effort from my PhD student, Lola Le Breton, in collaboration with Quentin Fournier and Mariam El Mezouar (1/n)
28.02.2025 16:30 β π 39 π 11 π¬ 1 π 4Happy to share that one of my latest works has been accepted to AAAI25! BindGPT is a new foundational model for generative chemistry. It tackles various generative tasks, 3D molecule generation and 3D conformer generation, with a single modelβsomething previously impossible. 1/2
bindgpt.github.io