Streaming Reinforcement Learning (RL) is a huge challenge: transitions are used once and discarded immediately. This makes agents extremely sample-inefficient. But what if we could "squeeze" more information out of every single frame?
Check out our latest paper!
24.02.2026 15:22 β
π 2
π 3
π¬ 1
π 1
New work, just accepted @ICLR: "The Expressive Limits of Diagonal SSMs for State-Tracking"
We give a complete characterization of what diagonal SSMs can and cannot compute on state-tracking tasks and the answer is deeply connected to group theory.
π§΅π
10.02.2026 16:54 β
π 2
π 2
π¬ 1
π 0
Can LLMs play Hangman? Spoiler alert: Not yet.
Check out βLLMs Canβt Play Hangman: On the Necessity of a Private Working Memory for Language Agentsβ, led by Davide Baldelli, Ali Parviz, AmalZouaq and Sarath Chandar.
27.01.2026 16:20 β
π 1
π 1
π¬ 1
π 0
Can LLMs become CAD designers?
Check out βCADmium: Fine-Tuning Code Language Models for Text-Driven Sequential CAD Designβ, which is now published in Transactions on Machine Learning Research (TMLR)!
20.01.2026 15:55 β
π 3
π 1
π¬ 1
π 0
I ran across a busy Sander at a #neurips party with a similar question - he was still patient enough to explain stuff. This talk further clarifies a good amount of my doubts. Recommend watching if you're working on diffusion / LLMs for generation!
25.12.2024 23:43 β
π 7
π 1
π¬ 0
π 0
I validate this
17.12.2024 07:28 β
π 1
π 0
π¬ 1
π 0