Marcel Hussing's Avatar

Marcel Hussing

@marcelhussing.bsky.social

PhD student at the University of Pennsylvania. Prev, intern at MSR, currently at Meta FAIR. Interested in reliable and replicable reinforcement learning, robotics and knowledge discovery: https://marcelhussing.github.io/ All posts are my own.

2,908 Followers  |  337 Following  |  151 Posts  |  Joined: 09.11.2024  |  2.2712

Latest posts by marcelhussing.bsky.social on Bluesky

Agreed and that is basically what section 4.1 in the paper I linked says. There is nuance to it for course but that part really needs to be step 1. However it is unclear how to start this discussion and our proposal is to turn the discussion into research.

10.12.2025 05:38 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Can we hop in general? A discussion of benchmark selection and design using the Hopper environment Empirical, benchmark-driven testing is a fundamental paradigm in the current RL community. While using off-the-shelf benchmarks in reinforcement learning (RL) research is a common practice, this choic...

What is a better benchmark? @cvoelcker.bsky.social and I wrote finding the frame paper about this in the hopes of starting a discussion around Benchmark selection. Benchmarks are often chosen via "we did what others did" where the starting point was picked arbitrarily. arxiv.org/abs/2410.08870

10.12.2025 01:32 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

RL event at tacos el Gordo?

03.12.2025 02:52 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

Yea that flavor is so good. I think my favorite has to be cabeza though. We should go sometime

03.12.2025 02:51 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Holy shit it's so good, I said the same thing. Which one's your favorite?

03.12.2025 02:34 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Getting ready to leave for NeurIPS tomorrow morning. ✈️ Let me know if you're around and wanna hang out!

01.12.2025 13:46 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I've always been confused by this term. The model isn't obtaining new knowledge. You are just exposing a different part of an already learned distribution. Maybe something like contextual generalization would make more sense to me.

16.11.2025 14:32 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

You know what would be funny? If it comes back and the reviews aren't out yet.

11.11.2025 21:09 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
a man wearing a suit and tie is standing in a field of yellow flowers ALT: a man wearing a suit and tie is standing in a field of yellow flowers

Me all day today

11.11.2025 20:52 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

It's mind-boggling to my how many of the papers I reviewed don't cite a single paper that is older than like 2020. It's like we try to collectively forget what people did in the past so we can publish more.

31.10.2025 19:19 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Am at Columbia today giving a talk about (in part) this work. Have a few hours to kill afterwards. If anyone is around there and wants to chat, DM me.

27.10.2025 15:13 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Not sure there is a single good source. Maybe we should write one @cvoelcker.bsky.social

27.10.2025 15:10 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I don't necessarily think it's dull but one would need a conference where work like that can be published. Only TMLR comes to mind to some extent.

27.10.2025 14:57 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

The cynic in me wants to say "because the paper needs to confuse the reviewer to get accepted" but I would of course never say that.

27.10.2025 14:51 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I also think it's not that they don't work but there were a lot of entangled problems that over the years have been addressed. I'm convinced that many of these things need to be restudied with our new algorithmic/architectural insights that simply make learning stable.

27.10.2025 14:45 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 4    πŸ“Œ 0

Yea πŸ˜‚ we spent a lot of time on getting the exponents on the ridge regression small to avoid an explosion down the line but that worked out only semi well. πŸ˜… I do think it's probably possible to get much smaller exponents but I suspect that will require a fundamentally different approach.

26.10.2025 15:26 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

This should of course say quantizing Q-values 🀦

26.10.2025 14:34 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

This was a fun collaboration between theory and practice with the theory group at Penn.

πŸ‘©β€πŸŽ“πŸ‘¨β€πŸŽ“
@ericeaton.bsky.social
@mkearnsphilly.bsky.social
@aaroth.bsky.social
@sikatasengupta.bsky.social
@optimistsinc.bsky.social

(6/6)

26.10.2025 14:16 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

We also empirically evaluate the algorithms. We first demonstrate that the sample complexity bounds are not representative of average case performance. Then, we derive insights for deep RL with discrete action spaces.

πŸ’‘ Quantizing actions leads to agreement across policies! (5/6)

26.10.2025 14:16 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

We build two objects near a reference point and apply randomized rounding. In ridge, the reference is the convex minimizer. With a Rademacher argument and uniform gradient convergence, this yields replicability. Also, the algorithm is replicable even if not fully accurate. (4/6)

26.10.2025 14:16 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

In this work, we show that we can get replicability guarantees even in function approximation settings with RL. The idea is to ensure replicability of ridge regression and uncentered covariance estimation first. Then, use these tools in common approaches that solve linear MDPs. (3/6)

26.10.2025 14:16 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

is motivated by the fact that in deep RL, variation from randomness can lead to drastically different solutions when executing the same algorithm twice. An algorithm is formally replicable, if (whp) it produces identical outcomes. I.e., run your algorithm twice and get the same policy twice. (2/6)

26.10.2025 14:16 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Replicable Reinforcement Learning with Linear Function Approximation Replication of experimental results has been a challenge faced by many scientific disciplines, including the field of machine learning. Recent work on the theory of machine learning has formalized rep...

I think I posted about it before but never with a thread. We recently put a new preprint on arxiv.

πŸ“– Replicable Reinforcement Learning with Linear Function Approximation

πŸ”— arxiv.org/abs/2509.08660

In this paper, we study formal replicability in RL with linear function approximation. The... (1/6)

26.10.2025 14:16 β€” πŸ‘ 20    πŸ” 6    πŸ’¬ 2    πŸ“Œ 1

A large chunk of CS theory is ordering alphabetically, some even order randomly. Without any common standard, ideas like these are just gonna disadvantage people.

26.10.2025 13:58 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

This one is so accurate it hurts my soul

23.10.2025 02:31 β€” πŸ‘ 7    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Happy guy sad guy meme with sad text: USE PPO AND TUNE HYPERPARAMETER FOR WEEKS and happy text: USE REPPO AND GET A POLICY

Happy guy sad guy meme with sad text: USE PPO AND TUNE HYPERPARAMETER FOR WEEKS and happy text: USE REPPO AND GET A POLICY

I have been told I need to get more modern in my paper promotion! github.com/cvoelcker/reppo / arxiv.org/abs/2507.11019 @marcelhussing.bsky.social

26.09.2025 14:51 β€” πŸ‘ 12    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0

Super stoked for the New York RL workshop tomorrow. Will be presenting 2 orals:
* Replicable Reinforcement Learning with Linear Function Approximation
* Relative Entropy Pathwise Policy Optimization

We already posted about the 2nd one (below), I'll get to talking about the first one in a bit here.

11.09.2025 14:28 β€” πŸ‘ 5    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

arxiv.org/abs/2207.04136 we always wondered how to discover the factored structure if not given. It's an intriguing question for which I have a few ideas but so far too little time.

23.08.2025 02:20 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

(Maybe) unpopular opinion: There should not be *any* new experiments in a rebuttal. A rebuttal is for clarifications and incorrect statements in a review. You should not be allowed to add new content at that point. Either your paper is done or it isn't. It should not be written during rebuttals.

16.08.2025 16:12 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

New ChatGPT data just dropped

07.08.2025 23:29 β€” πŸ‘ 39    πŸ” 7    πŸ’¬ 0    πŸ“Œ 1

@marcelhussing is following 20 prominent accounts