Anastasiia Pedan's Avatar

Anastasiia Pedan

@pedanana.bsky.social

60 Followers  |  9 Following  |  7 Posts  |  Joined: 12.02.2025  |  1.7323

Latest posts by pedanana.bsky.social on Bluesky

my main takeaway from a talk on reward design in rl: ai only beat humans when they were asked not to collaborate πŸ‘€πŸ‘€

25.08.2025 19:07 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

thank you, Claas, you're the best mentor I could've asked for!!!!

20.06.2025 17:05 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

This was an amazing collaboration with a cracked team consisting of @cvoelcker.bsky.social, me, Arash Ahmadian, Romina Abachi, @igilitschenski.bsky.social, and @sologen.bsky.social

#ReinforcementLearning #ModelBasedRL #RLTheory #ICML2025

19.06.2025 02:39 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Calibrated Value-Aware Model Learning with Probabilistic Environment Models The idea of value-aware model learning, that models should produce accurate value estimates, has gained prominence in model-based reinforcement learning. The MuZero loss, which penalizes a model's val...

For more details, feel free to come chat with us in Vancouverβ›°οΈπŸŒ²πŸŒŠ and check out our paperπŸ€–! www.arxiv.org/abs/2505.22772

19.06.2025 02:39 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

We can correct the MuZero loss and other losses from the same family by pushing the value estimates computed from different sampled model rollouts to have the correct variance and mean. We prove the soundness of this change and show that it is beneficial for agent performance πŸ“ˆπŸ“ˆπŸ“ˆ!

19.06.2025 02:39 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Getting a correct value estimate is instrumental in model-based RL, so if your algorithm fails to provide correct targets for model learning, your agent is in trouble because these errors will accumulate fast πŸ“‰πŸ“‰πŸ“‰!

19.06.2025 02:39 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Would you be surprised to learn that many empirical implementations of value-aware model learning (VAML) algos, including MuZero, lead to incorrect model & value functions when training stochastic models πŸ€•? In our new @icmlconf.bsky.social 2025 paper, we show why this happens and how to fix it 🦾!

19.06.2025 02:39 β€” πŸ‘ 8    πŸ” 3    πŸ’¬ 1    πŸ“Œ 1

@pedanana is following 9 prominent accounts