Aly Lidayan's Avatar

Aly Lidayan

@aliday.bsky.social

AI PhD student at Berkeley alyd.github.io

29 Followers  |  18 Following  |  8 Posts  |  Joined: 02.12.2024  |  1.5746

Latest posts by aliday.bsky.social on Bluesky

I'm presenting this 3-5:30pm on Saturday, Hall 3 #396 ๐ŸŒž
come chat about designing rewards and intrinsic motivation for RL + meta RL!

25.04.2025 02:58 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

1/3 Out now: new paper on people's perception of AI (robot) creativity! Core finding: we attribute more creativity to a creative act if people not only see the final artwork, but also its creation process & the robot making it. Video: vimeo.com/1073134853 Open-access paper: doi.org/10.1145/3711...

08.04.2025 13:27 โ€” ๐Ÿ‘ 10    ๐Ÿ” 5    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
BAMDP Shaping: a Unified Framework for Intrinsic Motivation and Reward Shaping Intrinsic motivation and reward shaping guide reinforcement learning (RL) agents by adding pseudo-rewards, which can lead to useful emergent behaviors. However, they can also encourage counterproducti...

Get all the details in our paper: arxiv.org/abs/2409.05358 ๐Ÿš€

This work was a joint effort with Michael Dennis and Stuart Russell at UC Berkeley!

26.03.2025 00:05 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

5๏ธโƒฃWe demonstrate our framework in Mountain Car. We set the potential to the maximum displacement the agent learnt to reach so far, signaling the value of its training. Rewarding displacement directly (pink) led to reward-hacking but the BAMPF (green) preserved optimalityโœ…

26.03.2025 00:05 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

4๏ธโƒฃWe get a new typology for intrinsic motivation & reward shaping terms based on which BAMDP value component they signal! They hinder exploration if they align poorly with actual value, e.g., prediction error is high for watching a noisy TV but no valuable information is gained.

26.03.2025 00:05 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

3๏ธโƒฃTo guide more efficient exploration, BAMPF potentials should encode BAMDP state value. To gain further insights, we decompose BAMDP value into the value of the information gathered๐Ÿง  and the value of the MDP state given prior knowledge only๐ŸŒŽ.

26.03.2025 00:05 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

2๏ธโƒฃHarmful reward-hacking policies maximize modified rewards to the detriment of true rewards. We prove that converting IM and reward shaping terms to BAMDP potential-based shaping functions (BAMPFs) prevents hacking, and empirically validate this in both RL and meta-RL.

26.03.2025 00:05 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

1๏ธโƒฃWe cast RL agents as policies in Bayes-Adaptive MDPs, which augment the MDP state with the history of all environment interactions. Optimal exploration maximizes BAMDP state value, and pseudo-rewards guide RL agents by rewarding them for going to more valuable BAMDP states.

26.03.2025 00:05 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐ŸšจOur new #ICLR2025 paper presents a unified framework for intrinsic motivation and reward shaping: they signal the value of the RL agentโ€™s state๐Ÿค–=external state๐ŸŒŽ+past experience๐Ÿง . Rewards based on potentials over the learning agentโ€™s state provably avoid reward hacking!๐Ÿงต

26.03.2025 00:05 โ€” ๐Ÿ‘ 10    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

@aliday is following 18 prominent accounts