Karim Abdel Sadek's Avatar

Karim Abdel Sadek

@karimabdel.bsky.social

Incoming PhD, UC Berkeley Interested in RL, AI Safety, Cooperative AI, TCS https://karim-abdel.github.io

181 Followers  |  94 Following  |  16 Posts  |  Joined: 13.11.2024  |  2.4038

Latest posts by karimabdel.bsky.social on Bluesky

The paper, "Mitigating goal misgeneralization via minimax regret" will appear at @rl-conference.bsky.social!

Joint work with the great Matthew Farrugia-Roberts, Usman Anwar, Hannah Erlebach, Christrian Schroeder de Witt, David Krueger and @michaelddennis.bsky.social

www.arxiv.org/pdf/2507.03068

08.07.2025 17:16 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Future work we are excited about:

β€’ Improving UED algorithms to be closer to the results predicted by our theory

β€’ Mitigating the fully ambiguous case, by focusing on the inductive biases of the agent.

08.07.2025 17:16 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We also visualize the performance of our agents in a maze for each possible location of the goal in the environment.

The results show that agents trained with the regret objective achieve near-maximum return for almost all goal locations.

08.07.2025 17:16 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We complement our theoretical findings with empirical results. We find these as supporting our theory, showing better generalization of agents trained via minimax regret.

Left: performance at test time
Right: % of distinguishing levels played by the respective level designer

08.07.2025 17:16 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

In the case where the environments in deployment are in the support of the training level distribution, we also show that a policy that is optimal with respect to the minimax regret objective must provably be robust against goal misgeneralization!

08.07.2025 17:16 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We first formally show that a policy maximizing expected value may suffer from goal misgeneralization if distinguishing levels are rare.

08.07.2025 17:16 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Goal misgeneralization can occur when training only on non-distinguishing levels, as shown in Langosco et al., 2022.

Adding a few distinguishing levels does not alter this outcome. However, we propose a mitigation for this scenario!

08.07.2025 17:16 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Goal misgeneralization arises due to the presence of β€˜proxy goals’. We formalize this and characterize environments as either:

β€’ Non-distinguishing: the true and proxy reward may induce the same behaviour

β€’ Distinguishing: the true and proxy rewards induce different behavior

08.07.2025 17:16 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We propose using regret, the difference between the optimal agent's return and our current policy's return, as a training objective.

Minimizing it will encourage the agent to solve rare out-of-distribution levels during training, helping it learn the correct reward function.

08.07.2025 17:16 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

*New Paper*

🚨 Goal misgeneralization occurs when AI agents learn the wrong reward function, instead of the human's intended goal.

πŸ˜‡ We show that training with a minimax regret objective provably mitigates it, promoting safer and better-aligned RL policies!

08.07.2025 17:16 β€” πŸ‘ 9    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Cooperative AIPlaintext Code Block

CAIF's new and massive report on multi-agent AI risks will be really useful resource for the field
www.cooperativeai.com/post/new-rep...

21.02.2025 14:24 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image

what if…

21.02.2025 04:31 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0
Post image

A large group of us (spearheaded by Denizalp Goktas) have put out a position paper on paths towards foundation models for strategic decision-making. Language models still lack these capabilities so we'll need to build them: hal.science/hal-04925309...

18.02.2025 18:33 β€” πŸ‘ 33    πŸ” 7    πŸ’¬ 2    πŸ“Œ 0

lbh gnxr gur yninynzc bhgchg, naq Nyvpr naq Obo qb gur qbg cebqhpg bs vg jvgu gurve erfcrpgvir ahzore naq gura nccyl zbq 2 gb gur erfhyg. Gurl gura pbzzhavpngr gur ovg gurl bognvarq (1=jnir,0=jvax), naq guvf bcrengvba nyjnlf erghea gur fnzr ahzore gb obgu vs n=o be bgurejvfr snvyf jvgu c=1/2?

17.02.2025 06:30 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image Post image Post image

Model-free deep RL algorithms like NFSP, PSRO, ESCHER, & R-NaD are tailor-made for games with hidden information (e.g. poker).
We performed the largest-ever comparison of these algorithms.
We find that they do not outperform generic policy gradient methods, such as PPO.
arxiv.org/abs/2502.08938
1/N

14.02.2025 18:41 β€” πŸ‘ 93    πŸ” 20    πŸ’¬ 3    πŸ“Œ 4
Cooperative AI

The 2025 Cooperative AI summer school (9-13 July 2025 near London) is now accepting applications, due March 7th!
www.cooperativeai.com/summer-schoo...

09.01.2025 19:25 β€” πŸ‘ 14    πŸ” 5    πŸ’¬ 1    πŸ“Œ 0

The magic thing that humans do is a pretty good job at solving tasks under high uncertainty about the problem specification. We also frequently are capable of doing this collaboratively. I still do not see evidence that models can do any part of this.

21.12.2024 01:08 β€” πŸ‘ 82    πŸ” 12    πŸ’¬ 6    πŸ“Œ 1

I will be at @neuripsconf.bsky.social this week!

Would love to chat about Multi-agent systems, RL, Human-AI Alignment, or anything interesting :)

I'm also applying for PhD programs this cycle, feel free to reach out for any advice!

More about me: karim-abdel.github.io

08.12.2024 23:59 β€” πŸ‘ 9    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

I give you a loaded coin, with some (unknown) probability 0<p<1 of landing Heads, and I ask you to generate a fair coin toss.

Great! We know how to do this! This is the Von Neumann trick: toss twice. If HH or TT, repeat; if HT or TH, return the first.

Problem solved? Not quite... This can be bad!

18.11.2024 20:50 β€” πŸ‘ 41    πŸ” 7    πŸ’¬ 3    πŸ“Œ 0

Here some cool work doing a first step towards that in Minecraft using MCTS: Scalably Solving Assistance Games - openreview.net/pdf/080f0c69...

19.11.2024 15:26 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Very cool work! I think an important challenge is to scale assistance games in scenarios where the goal/action/communication space can be 'large', as to capture real world scenarios where we will want to actually apply CIRL.

19.11.2024 15:26 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Here some cool work doing a first step towards that in Minecraft using MCTS: Scalably Solving Assistance Games - openreview.net/pdf/080f0c69...

19.11.2024 15:22 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@karimabdel is following 20 prominent accounts