Axel Brunnbauer's Avatar

Axel Brunnbauer

@axelbrunnbauer.bsky.social

Multi-Agent RL, PhD Student @ TUWien

32 Followers  |  152 Following  |  7 Posts  |  Joined: 14.11.2024  |  1.6098

Latest posts by axelbrunnbauer.bsky.social on Bluesky

This blog post is a nice complementary, behind-the-scenes extra on our recent work about on-policy pathwise gradient algorithms. @cvoelcker.bsky.social went the extra mile, and wrote this piece to provide some more context on the design decisions behind REPPO!

03.10.2025 22:52 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
a close up of a sad cat with the words pleeeaasse written below it ALT: a close up of a sad cat with the words pleeeaasse written below it

cvoelcker.de/blog/2025/re...

I finally gave in and made a nice blog post about my most recent paper. This was a surprising amount of work, so please be nice and go read it!

02.10.2025 21:34 β€” πŸ‘ 29    πŸ” 7    πŸ’¬ 0    πŸ“Œ 3

Congrats igor! well deserved!! πŸ€—

01.10.2025 22:37 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Big if true 🀫: #REPPO works on Atari as well 😱 πŸ‘Ύ πŸš€

Some tuning is still needed, but we are seeing results roughly on par with #PQN.

If you want to test out #REPPO (atari is not integrated due to issues with envpool and jax version), check out github.com/cvoelcker/re...

#reinforcementlearning

16.09.2025 13:29 β€” πŸ‘ 7    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

My wedding gift for you πŸ˜…

16.09.2025 20:51 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Super stoked for the New York RL workshop tomorrow. Will be presenting 2 orals:
* Replicable Reinforcement Learning with Linear Function Approximation
* Relative Entropy Pathwise Policy Optimization

We already posted about the 2nd one (below), I'll get to talking about the first one in a bit here.

11.09.2025 14:28 β€” πŸ‘ 5    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

I’ve been hearing about this paper from Claas for a while now, the fact that they aren’t tuning per benchmark is a killer sign. Also, check out the wall clock plots!

18.07.2025 20:15 β€” πŸ‘ 20    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

My PhD journey started with me fine-tuning hparams of PPO which ultimately led to my research on stability. With REPPO, we've made a huge step in the right direction. Stable learning, no tuning on a new benchmark, amazing performance. REPPO has the potential to be the PPO killer we all waited for.

17.07.2025 19:41 β€” πŸ‘ 7    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
GIF showing two plots that symbolize the REPPO algorithm. On the left side, four curves track the return of an optimization function, and on the right side, the optimization paths over the objective function are visualized. The GIF shows that monte-carlo gradient estimators have a high variance and fail to converge, while surrogate function estimators converge smoothly, but might find suboptimal solutions if the surrogate function is imprecise.

GIF showing two plots that symbolize the REPPO algorithm. On the left side, four curves track the return of an optimization function, and on the right side, the optimization paths over the objective function are visualized. The GIF shows that monte-carlo gradient estimators have a high variance and fail to converge, while surrogate function estimators converge smoothly, but might find suboptimal solutions if the surrogate function is imprecise.

πŸ”₯ Presenting Relative Entropy Pathwise Policy Optimization #REPPO πŸ”₯
Off-policy #RL (eg #TD3) trains by differentiating a critic, while on-policy #RL (eg #PPO) uses Monte-Carlo gradients. But is that necessary? Turns out: No! We show how to get critic gradients on-policy. arxiv.org/abs/2507.11019

17.07.2025 19:11 β€” πŸ‘ 26    πŸ” 7    πŸ’¬ 2    πŸ“Œ 6
Preview
Scenario-Based Curriculum Generation for Multi-Agent Autonomous Driving The automated generation of diverse and complex training scenarios has been an important ingredient in many complex learning tasks. Especially in real-world application domains, such as autonomous dri...

Our paper on unsupervised environment design for autonomous-driving scenarios was accepted at ICRA! We built a curriculum generator for CARLA which adapts the scenario distribution to the current capabilities of the agent.
arxiv.org/abs/2403.17805

11.02.2025 10:18 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Awesome work!

06.02.2025 18:49 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Scalable Offline Reinforcement Learning for Mean Field Games Reinforcement learning algorithms for mean-field games offer a scalable framework for optimizing policies in large populations of interacting agents. Existing methods often depend on online interactio...

Excited to announce that our paper "Scalable Offline Reinforcement Learning for Mean Field Games" has been accepted at #AAMAS2025! πŸš€ We propose Off-MMD, an offline RL algorithm for learning equilibrium policies in MFGs from static datasets. arxiv.org/abs/2410.17898

20.12.2024 10:03 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

My bet is on legacy.

05.12.2024 22:56 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@axelbrunnbauer is following 20 prominent accounts