Our paper on RL scaling laws got a NeurIPS spotlight!! ππ₯³π¨
We added A LOT of new content, especially analyzing why inverse scaling happens. Check it out on arxiv!
@orenneumann.bsky.social
Doing RL on autonomous driving, supply chains and board games. Physics PhD from Goethe Uni Frankfurt.
Our paper on RL scaling laws got a NeurIPS spotlight!! ππ₯³π¨
We added A LOT of new content, especially analyzing why inverse scaling happens. Check it out on arxiv!
Reviewer asks why we didn't cite a recent paper. That paper cites our paper that's being reviewed π
I wonder how common citation cycles are...
There are quite a few papers on supply chain management with RL, although only on toy problems. I'm currently writing a paper on doing it with real supply chains.
14.01.2025 07:30 β π 2 π 0 π¬ 0 π 0Is it all related to dormant neurons or is there other literature on why RL struggles with plasticity?
arxiv.org/abs/2302.12902
Read the full paper for more details and results: 'AlphaZero Neural Scaling and Zipf's Law: a Tale of Board Games and Power Laws'. βοΈ
arxiv.org/abs/2412.11979
Big thanks to
@ericjmichaud.bsky.social
for sharing his wisdom! this all started with our hallway chat in ICLR π
X:
x.com/neumann_oren...
There is: in those games, larger models improve overall accuracy by focusing on late-game positions, forgetting what they learned on opening positions. This directly harms performance, since mastering openings is crucial, while wrapping up a game can be done with blind MCTS.
19.12.2024 14:16 β π 0 π 0 π¬ 1 π 0AlphaZero doesn't always scale nicely. On some games, Elo goes up, then sharply degrades w/ model size. We noticed this happens in games where game rules bend the Zipf curve, since end-game board positions have a high frequency. Is there a connection?
19.12.2024 14:16 β π 0 π 0 π¬ 1 π 0In line with the quantization model, we see that AlphaZero agents fit board states in decreasing order of frequency. This is very surprising: high-frequency opening moves are exponentially harder to model, since they depend on downstream positions.
19.12.2024 14:16 β π 0 π 0 π¬ 1 π 0There is! Chess/Go tournament games famously follow Zipf's law: the frequency of each board position scales as a power of their rank.
We find that Zipf's law emerges also in RL self-play games. It's a direct result of universal board-game rules.
The quantization model suggests that LLM power-law scaling results from the Zipf's law of natural language:
arxiv.org/abs/2303.13506
In RL, AlphaZero has one of the few examples of power-law scaling:
arxiv.org/abs/2210.00849
But is there a Zipf's law in board games?
π¨Do RL scaling laws share the same origin as LLM scaling laws?
We show that AlphaZero scaling might be the result of Zipf's law, and that inverse-scaling can result from unusual frequency curves.
arxiv.org/abs/2412.11979
A π§΅ on scaling laws and board games! βοΈπ²
I'm excited to share a new paper: "Mastering Board Games by External and Internal Planning with Language Models"
storage.googleapis.com/deepmind-med...
(also soon to be up on Arxiv, once it's been processed there)