Amir Mesbah's Avatar

Amir Mesbah

@amirmesbah.bsky.social

Graduate Student - Interested in RL and its mathematics ๐Ÿ‘พ > https://amirhosein-mesbah.github.io/

94 Followers  |  519 Following  |  17 Posts  |  Joined: 18.11.2024  |  1.7846

Latest posts by amirmesbah.bsky.social on Bluesky

Post image

๐ŸšจThe Formalism-Implementation Gap in RL research๐Ÿšจ

Lots of progress in RL research over last 10 years, but too much performance-driven => overfitting to benchmarks (like the ALE).

1โƒฃ Let's advance science of RL
2โƒฃ Let's be explicit about how benchmarks map to formalism

1/X

28.10.2025 13:55 โ€” ๐Ÿ‘ 41    ๐Ÿ” 4    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 2
Preview
Put CASH on Bandits: A Max K-Armed Problem for Automated Machine Learning The Combined Algorithm Selection and Hyperparameter optimization (CASH) is a challenging resource allocation problem in the field of AutoML. We propose MaxUCB, a max $k$-armed bandit method to trade o...

I am happy to share that our paper "Put CASH on Bandits: A Max K-Armed Problem for Automated Machine Learning" has been accepted at NeurIPS 2025!

Endless thanks to my amazing co-authors @claireve.bsky.social and @keggensperger.bsky.social

๐Ÿ“„ Read it on arXiv: arxiv.org/abs/2505.05226

(1/3)

06.10.2025 16:53 โ€” ๐Ÿ‘ 7    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Preview
a close up of a sad cat with the words pleeeaasse written below it ALT: a close up of a sad cat with the words pleeeaasse written below it

cvoelcker.de/blog/2025/re...

I finally gave in and made a nice blog post about my most recent paper. This was a surprising amount of work, so please be nice and go read it!

02.10.2025 21:34 โ€” ๐Ÿ‘ 29    ๐Ÿ” 7    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 3

Thanks a lot! That was lightning fast ๐Ÿš€

02.10.2025 22:26 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Relative Entropy Pathwise Policy Optimization - Technical Overview | Claas A. Voelcker A lightweight overview of the new REPPO algorithm

cvoelcker.de/blog/2025/re...

Here ya go!

02.10.2025 21:31 โ€” ๐Ÿ‘ 1    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Maybe a blog post would also help =)

26.09.2025 14:59 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

Could you add me please?

09.09.2025 21:59 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Definition of dynamic programming in RL, from Csaba Szepesvรกriโ€™s RL theory lecture notes (Lecture 2, "Planning in MDPs")

Definition of dynamic programming in RL, from Csaba Szepesvรกriโ€™s RL theory lecture notes (Lecture 2, "Planning in MDPs")

Definition of dynamic programming, from Putermanโ€™s Markov Decision Processes โ€” chapter 1.

Definition of dynamic programming, from Putermanโ€™s Markov Decision Processes โ€” chapter 1.

I came across a couple of other definitions that might be helpful to mention (apologies if youโ€™re already considering these).
The first one is from Csaba Szepesvรกriโ€™s RL theory lecture notes (lecture 2, planning in MDPs), and the second one is from Puterman's MDP book (chapter 1).

04.08.2025 09:45 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

What are we talking about when we talk about Dynamic Programming?

#ReinforcementLearning

03.08.2025 20:14 โ€” ๐Ÿ‘ 8    ๐Ÿ” 1    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

What if all mathematicians had great visualization skills, tools, and public notes!

31.07.2025 16:22 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Onno and I will be presenting our poster at # W1005 tomorrow (Wed) morning.
He made a great thread about it, come chat with us about POMDP theory :)

16.07.2025 03:45 โ€” ๐Ÿ‘ 19    ๐Ÿ” 5    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I will not be at #ICML2025 this year, but 3 of my PhD students at ๐Ÿค– Adage (Adaptive Agents Lab) ๐Ÿค– are, presenting 3 papers.
โญ Avery Ma
โญ Claas Voelcker (cvoelcker.bsky.social)
โญ Tyler Kastner

Meet them to talk about Model-based RL, Distributional RL, and Jailbreaking LLMs.

14.07.2025 18:54 โ€” ๐Ÿ‘ 5    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Levine's take on the success of LLMs compared to video models is interesting, but I'll expand on how efforts toward AI could take two different paths, and why I think AI and NeuroAI could take different approaches moving forward. ๐Ÿงต

๐Ÿง ๐Ÿค– #MLSky

12.06.2025 14:30 โ€” ๐Ÿ‘ 7    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Preprint Alert ๐Ÿš€

Can we simultaneously learn transformation-invariant and transformation-equivariant representations with self-supervised learning?

TL;DR Yes! This is possible via simple predictive learning & architectural inductive biases โ€“ without extra loss terms and predictors!

๐Ÿงต (1/10)

14.05.2025 12:52 โ€” ๐Ÿ‘ 51    ๐Ÿ” 16    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 5
Preview
GitHub - vwxyzjn/cleanrl: High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG) High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG) - vwxyzjn/cleanrl

cleanrl is amazing (github.com/vwxyzjn/clea...) and its structure makes sense for teaching but an actual research codebase should not inherit this style! you do not want this amount of code duplication

11.05.2025 20:01 โ€” ๐Ÿ‘ 32    ๐Ÿ” 2    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 0
Preview
Reinforcement Learning from Human Feedback Reinforcement learning from human feedback (RLHF) has become an important technical and storytelling tool to deploy the latest machine learning systems. In this book, we hope to give a gentleโ€ฆ

rlhfbook also available on arxiv for SEO ๐Ÿ˜€ happy friday
arxiv.org/abs/2504.12501

18.04.2025 16:07 โ€” ๐Ÿ‘ 69    ๐Ÿ” 13    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 4
Reinforcement Learning (RL) for LLMs
YouTube video by Natasha Jaques Reinforcement Learning (RL) for LLMs

Recorded a recent "talk" / rant about RL fine-tuning of LLMs for a guest lecture in Stanford CSE234: youtube.com/watch?v=NTSY.... Covers some of my lab's recent work on personalized RLHF, as well as some mild Schmidhubering about my own early contributions to this space

27.03.2025 21:31 โ€” ๐Ÿ‘ 51    ๐Ÿ” 10    ๐Ÿ’ฌ 5    ๐Ÿ“Œ 1

PQN puts Q-learning back on the map and now comes with a blog post + Colab demo! Also, congrats to the team for the spotlight at #ICLR2025

20.03.2025 11:51 โ€” ๐Ÿ‘ 16    ๐Ÿ” 4    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Happy #Nowruz and the beginning of the spring!

20.03.2025 17:37 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I wanted to send you the link just now but hopefully you have found it =)

18.03.2025 21:08 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Sure *_*
Looking forward to it :)

17.03.2025 20:55 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Not yet. Just the classical claim that they're trying to learn the distribuition of the return =))
Do yo have any insights?

17.03.2025 18:37 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I was reading about the ways that I can enhance the performance of dqn on a real-world problem. One of the candidates was c51 but i haven't implement it yet becuase of computational costs. But it was interesting for becuase i haven't read the papers before

17.03.2025 14:23 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I didn't know until last week that it can cause a huge performance boost using it with dqn.

17.03.2025 14:06 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Claire Vernade - European career opportunities European Academic Career Opportunities in 2025

Iโ€™ve put together a short list of opportunities for early career academics willing to come to Europe: www.cvernade.com/miscellaneou...

This mostly covers France and Germany for now but Iโ€™m willing to extend it. I build on @ellis.eu resources and my own knowledge of these systems.

11.03.2025 09:19 โ€” ๐Ÿ‘ 75    ๐Ÿ” 26    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 0
Preview
Andrew Barto and Richard Sutton are the recipients of the 2024 ACM A.M. Turing Award for developing the conceptual and algorithmic foundations of reinforcement learning. Andrew Barto and Richard Sutton as the recipients of the 2024 ACM A.M. Turing Award for developing the conceptual and algorithmic foundations of reinforcement learning. In a series of papers beginning...

RL is so back!

(well, for some of us, it never really left)

awards.acm.org/about/2024-t...

05.03.2025 10:41 โ€” ๐Ÿ‘ 72    ๐Ÿ” 12    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Post image

First 11 chapters of RLHF Book have v0 draft done. Should be useful now.

Next:
* Crafting more blog content into future topics,
* DPO+ chapter,
* Meeting with publishers to get wheels turning on physical copies,
* Cleaning & cohesiveness
rlhfbook.com

26.02.2025 16:35 โ€” ๐Ÿ‘ 48    ๐Ÿ” 9    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Applications are now open! 3-week courses: Comp Neuro and Deep Learning. 2-week courses: NeuroAI and Comp Tools for Climate Science.

Applications are now open! 3-week courses: Comp Neuro and Deep Learning. 2-week courses: NeuroAI and Comp Tools for Climate Science.

๐Ÿšจ Neuromatch Academy Course Applications are OPEN for 2025!! ๐Ÿšจ

Get your application in early to be a student or teaching assistant for this yearโ€™s courses!

Applications are due Sunday, March 23.

Apply & learn more: neuromatch.io/courses/

#mlsky #compneurosky #ai #climatesolutions #ScienceEdu ๐Ÿงช

24.02.2025 17:58 โ€” ๐Ÿ‘ 86    ๐Ÿ” 75    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 12

2014 GoogLeNet: The best image classifier was only trainable using weeks of Google's custom infrastructure.

2018 ResNet: A more accurate model is trainable in a 1/2 hour on a single GPU.

What stops this from happening for LLMs?

27.01.2025 15:16 โ€” ๐Ÿ‘ 52    ๐Ÿ” 9    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 3
Post image

I am teaching a class on #FoundationalModels for #robotics and Scaling #DeepRL algorithms. This class expands on last year's class and my generalist robotics policies tutorial and code. I plan to share the lectures and code assignments. Starting with the first lectures below.

19.01.2025 19:14 โ€” ๐Ÿ‘ 21    ๐Ÿ” 6    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

@amirmesbah is following 20 prominent accounts