Oren Neumann @orenneumann - Bluesky Profile

Our paper on RL scaling laws got a NeurIPS spotlight!! 😄🥳🚨
We added A LOT of new content, especially analyzing why inverse scaling happens. Check it out on arxiv!

16.10.2025 08:45 — 👍 0 🔁 0 💬 0 📌 0

Reviewer asks why we didn't cite a recent paper. That paper cites our paper that's being reviewed 😅
I wonder how common citation cycles are...

26.07.2025 18:19 — 👍 0 🔁 0 💬 0 📌 0

There are quite a few papers on supply chain management with RL, although only on toy problems. I'm currently writing a paper on doing it with real supply chains.

14.01.2025 07:30 — 👍 2 🔁 0 💬 0 📌 0

The Dormant Neuron Phenomenon in Deep Reinforcement Learning In this work we identify the dormant neuron phenomenon in deep reinforcement learning, where an agent's network suffers from an increasing number of inactive neurons, thereby affecting network express...

Is it all related to dormant neurons or is there other literature on why RL struggles with plasticity?
arxiv.org/abs/2302.12902

10.01.2025 13:19 — 👍 2 🔁 0 💬 1 📌 0

AlphaZero Neural Scaling and Zipf's Law: a Tale of Board Games and Power Laws Neural scaling laws are observed in a range of domains, to date with no clear understanding of why they occur. Recent theories suggest that loss power laws arise from Zipf's law, a power law observed ...

Read the full paper for more details and results: 'AlphaZero Neural Scaling and Zipf's Law: a Tale of Board Games and Power Laws'. ⚔️
arxiv.org/abs/2412.11979
Big thanks to
@ericjmichaud.bsky.social
for sharing his wisdom! this all started with our hallway chat in ICLR 😄
X:
x.com/neumann_oren...

19.12.2024 14:16 — 👍 1 🔁 0 💬 0 📌 0

There is: in those games, larger models improve overall accuracy by focusing on late-game positions, forgetting what they learned on opening positions. This directly harms performance, since mastering openings is crucial, while wrapping up a game can be done with blind MCTS.

19.12.2024 14:16 — 👍 0 🔁 0 💬 1 📌 0

AlphaZero doesn't always scale nicely. On some games, Elo goes up, then sharply degrades w/ model size. We noticed this happens in games where game rules bend the Zipf curve, since end-game board positions have a high frequency. Is there a connection?

19.12.2024 14:16 — 👍 0 🔁 0 💬 1 📌 0

In line with the quantization model, we see that AlphaZero agents fit board states in decreasing order of frequency. This is very surprising: high-frequency opening moves are exponentially harder to model, since they depend on downstream positions.

19.12.2024 14:16 — 👍 0 🔁 0 💬 1 📌 0

There is! Chess/Go tournament games famously follow Zipf's law: the frequency of each board position scales as a power of their rank.
We find that Zipf's law emerges also in RL self-play games. It's a direct result of universal board-game rules.

19.12.2024 14:16 — 👍 0 🔁 0 💬 1 📌 0

The Quantization Model of Neural Scaling We propose the Quantization Model of neural scaling laws, explaining both the observed power law dropoff of loss with model and data size, and also the sudden emergence of new capabilities with scale....

The quantization model suggests that LLM power-law scaling results from the Zipf's law of natural language:
arxiv.org/abs/2303.13506
In RL, AlphaZero has one of the few examples of power-law scaling:
arxiv.org/abs/2210.00849
But is there a Zipf's law in board games?

19.12.2024 14:16 — 👍 1 🔁 0 💬 1 📌 0

🚨Do RL scaling laws share the same origin as LLM scaling laws?
We show that AlphaZero scaling might be the result of Zipf's law, and that inverse-scaling can result from unusual frequency curves.
arxiv.org/abs/2412.11979
A 🧵 on scaling laws and board games! ♟️🎲

19.12.2024 14:16 — 👍 4 🔁 0 💬 1 📌 1

I'm excited to share a new paper: "Mastering Board Games by External and Internal Planning with Language Models"

storage.googleapis.com/deepmind-med...

(also soon to be up on Arxiv, once it's been processed there)

05.12.2024 07:49 — 👍 76 🔁 13 💬 4 📌 7

Oren Neumann

Latest posts by orenneumann.bsky.social on Bluesky

@orenneumann is following 19 prominent accounts