Marcel Hussing's Avatar

Marcel Hussing

@marcelhussing.bsky.social

PhD student at the University of Pennsylvania. Currently, intern at MSR. Interested in reliable and replicable reinforcement learning and using it for knowledge discovery: https://marcelhussing.github.io/ All posts are my own.

2,850 Followers  |  339 Following  |  123 Posts  |  Joined: 09.11.2024  |  2.1493

Latest posts by marcelhussing.bsky.social on Bluesky

Post image

New ChatGPT data just dropped

07.08.2025 23:29 β€” πŸ‘ 38    πŸ” 7    πŸ’¬ 0    πŸ“Œ 1

My PhD journey started with me fine-tuning hparams of PPO which ultimately led to my research on stability. With REPPO, we've made a huge step in the right direction. Stable learning, no tuning on a new benchmark, amazing performance. REPPO has the potential to be the PPO killer we all waited for.

17.07.2025 19:41 β€” πŸ‘ 7    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
GIF showing two plots that symbolize the REPPO algorithm. On the left side, four curves track the return of an optimization function, and on the right side, the optimization paths over the objective function are visualized. The GIF shows that monte-carlo gradient estimators have a high variance and fail to converge, while surrogate function estimators converge smoothly, but might find suboptimal solutions if the surrogate function is imprecise.

GIF showing two plots that symbolize the REPPO algorithm. On the left side, four curves track the return of an optimization function, and on the right side, the optimization paths over the objective function are visualized. The GIF shows that monte-carlo gradient estimators have a high variance and fail to converge, while surrogate function estimators converge smoothly, but might find suboptimal solutions if the surrogate function is imprecise.

πŸ”₯ Presenting Relative Entropy Pathwise Policy Optimization #REPPO πŸ”₯
Off-policy #RL (eg #TD3) trains by differentiating a critic, while on-policy #RL (eg #PPO) uses Monte-Carlo gradients. But is that necessary? Turns out: No! We show how to get critic gradients on-policy. arxiv.org/abs/2507.11019

17.07.2025 19:11 β€” πŸ‘ 26    πŸ” 7    πŸ’¬ 2    πŸ“Œ 4

Works that use #VAML/ #MuZero losses often use deterministic models. But if we want to use stochastic models to measure uncertainty or because we want to leverage current SOTA models such as #transformers and #diffusion, we need to take care! Naively translating the loss functions leads to mistakes!

19.06.2025 15:20 β€” πŸ‘ 7    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0

Dhruv Rohatgi will be giving a lecture on our recent work on comp-stat tradeoffs in next-token prediction at the RL Theory virtual seminar series (rl-theory.bsky.social) tomorrow at 2pm EST! Should be a fun talk---come check it out!!

26.05.2025 19:19 β€” πŸ‘ 11    πŸ” 5    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

Just arrived in Montreal for my internship at FAIR. So far Montreal has been amazing, great walkable areas, good food and nice people! Although I must say I have to get used to being addressed in French πŸ˜…

26.05.2025 16:23 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

We'll be presenting our work on Oracle-Efficient Reinforcement Learning for Max Value Ensembles at the RL theory seminar! Been following this series for a while, super excited we get to present some of our work. πŸ₯³

25.04.2025 14:22 β€” πŸ‘ 7    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Many great papers from Mila!
Two by my team at the Adaptive Agents Lab (Adage) together with collaborators:

A Truncated Newton Method for Optimal Transport
openreview.net/forum?id=gWr...

MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL
openreview.net/forum?id=6Rt...

#ICLR2025

24.04.2025 02:19 β€” πŸ‘ 7    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
CoLLAs Collas

πŸ“’ Deadline Extension Alert! πŸ“’

Good news! We’re extending the #CoLLAs2025 submission deadlines:

πŸ“ Abstracts: Feb 26, 2025, 23:59 AoE
πŸ“„ Papers: Mar 3, 2025, 23:59 AoE

More time to refine your workβ€”don't miss this chance to contribute to #lifelong-learning research! πŸš€

πŸ”— lifelong-ml.cc

20.02.2025 20:18 β€” πŸ‘ 3    πŸ” 4    πŸ’¬ 0    πŸ“Œ 1

I was very hyped about this place initially, now I come here, see 5 posts about politics, unfollow 5 people and close the website. Where are the interesting AI posts?

22.02.2025 23:05 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Can you solve group-conditional online conformal prediction with a no-regret learning algorithm? Not with vanilla regret, but -yes- with swap regret. And algorithms from the follow-the-regularized leader family (notably online gradient descent) work really well for other reasons.

18.02.2025 13:19 β€” πŸ‘ 21    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Post image

Bummed out about recent politics & news drowning out AI and science you want to see on Bluesky?

Well, here is a small "sky thread" (written on a ✈️) about something I recently discovered: e-values!

They are an alternative to the standard p-values as a measure of statistical significance. 1/N

17.02.2025 17:53 β€” πŸ‘ 41    πŸ” 12    πŸ’¬ 2    πŸ“Œ 3

Throwing compute at things has proven quite powerful in other domains but until recently not as much in #ReinforcementLearning.

Excited to share that out MAD-TD paper got a spotlight at #ICLR25! Check out Claas' thread on how to get the most out of your compute/data buck when training from scratch.

11.02.2025 22:57 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

I agree with the notion but I don't think "things being outdated" is always bad. I'm of the opinion that we should still teach SVMs/Kernels as they teach us a different way to think about ML. PCA is still a core tool to teaching low-dim embeddings to students. We need as many tools as possible.

09.02.2025 18:23 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Are there no spotlights this year? Do we know?

09.02.2025 15:59 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

EC 2025 (S)PC --- lets get ready for the Super Bowl! Every time there is a first down, bid on a paper. Field goal? Bid on two. Touchdown? Bid on 5 papers (10 if its the Eagles!) At the halftime show enter your topic preferences and conflicts. Lets go birds!

08.02.2025 19:33 β€” πŸ‘ 17    πŸ” 1    πŸ’¬ 3    πŸ“Œ 3
Preview
Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF Reinforcement learning from human feedback (RLHF) has emerged as a central tool for language model alignment. We consider online exploration in RLHF, which exploits interactive access to human or AI f...

It's called exploratory preference optimization arxiv.org/abs/2405.21046 by @djfoster.bsky.social and others :)

08.02.2025 18:58 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

🚨🚨 RLC deadline has been extended by a week! Abstract deadline is Feb. 21 with a paper deadline of Feb. 28 🚨🚨. Please spread the word!

08.02.2025 18:05 β€” πŸ‘ 25    πŸ” 13    πŸ’¬ 1    πŸ“Œ 2

This is huge, I might be able to make it now woohoo

08.02.2025 18:56 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image 31.01.2025 17:56 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

What a future work section should be:
Oh, and here is this interesting and hard open problem that someone should solve.

Future work sections in empirical ML papers:
We leave hyperparameter optimization for future work.

29.01.2025 15:59 β€” πŸ‘ 7    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

My new year's resolution is to spend more time thinking. Last year I found myself deep in the nitty gritty of creating solutions. While that is important it is also necessary to reflect and look at the bigger picture. Entering my 5th year, I will try to focus more on defining the next problems.

01.01.2025 11:51 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Apparently I'm in the top 1% of wandb users. Good or bad sign?

27.12.2024 03:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Wish I could recommend the Kingkiller Chronicle but we may never see an ending. If you are fine with that, the first two been my favorite books for years and I still go back regularly.

26.12.2024 07:06 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

One key is to follow people and simply engage. Like I spent years on twitter and ended up with 300 followers. Here, I felt much more appreciated. I want to tweet things because people may read it.

25.12.2024 23:33 β€” πŸ‘ 9    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Reposting a postdoc opportunity in field robotics at Penn, in case anyone missed it the first time around.

23.12.2024 05:15 β€” πŸ‘ 16    πŸ” 4    πŸ’¬ 0    πŸ“Œ 0

With great power comes great responsibility 🧐

22.12.2024 03:42 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Done!

21.12.2024 20:22 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Microsoft's Computational Social Science group may have the opportunity to hire one researcher

Senior: 0-3 yrs post PhD
jobs.careers.microsoft.com/global/en/jo...

Principal: 3+ yrs post PhD
jobs.careers.microsoft.com/global/en/sh...

Please note: our ability to hire this season is not certain

19.12.2024 18:54 β€” πŸ‘ 130    πŸ” 54    πŸ’¬ 5    πŸ“Œ 1

Anyone who shills Hydra gets a retweet. Not using it borders on malpractice.

19.12.2024 00:35 β€” πŸ‘ 10    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

@marcelhussing is following 20 prominent accounts