Antonin Raffin's Avatar

Antonin Raffin

@araffin.bsky.social

Researcher in robotics and machine learning (Reinforcement Learning). Maintainer of Stable-Baselines (SB3). https://araffin.github.io/

3,363 Followers  |  303 Following  |  111 Posts  |  Joined: 08.02.2024
Posts Following

Posts by Antonin Raffin (@araffin.bsky.social)

RLSS'26 β€” Milan

If you are in Europe, applications for the 2026 Reinforcement Learning Summer School are now open!

Location: Milan
Date: June 3–12, 2026
Website: rlsummerschool.com

02.03.2026 09:11 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

folks, please don't submit LLM-generated PRs to open source projects. It makes no sense.

If the maintainers want to use an LLM to fix an issue, they can use Claude or whatnot directly. They don't need you as intermediary, that's just silly.

If they don't want to use LLMs, they have reasons.

28.02.2026 03:10 β€” πŸ‘ 68    πŸ” 13    πŸ’¬ 0    πŸ“Œ 0
Preview
Switch to Markdown documentation (MyST parser) by araffin Β· Pull Request #2219 Β· DLR-RM/stable-baselines3 Description You can see the doc here: https://stable-baselines3.readthedocs.io/en/md-doc/ Should be identical to the rst one (I use the auto migrate tool and then fixed errors manually). For examp...

Thanks to the MyST parser and rst-to-myst, you can easily convert your documentation to Markdown while still keeping all the features of Sphinx =)

github.com/DLR-RM/stabl...

21.02.2026 14:23 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

bsky.app/profile/kahn...

20.02.2026 21:39 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Recent Advances in RL for Continuous Control (SOTA 2025) | CERN ML Workshop
YouTube video by Antonin Raffin Recent Advances in RL for Continuous Control (SOTA 2025) | CERN ML Workshop

The talk i gave about "Recent Advances in RL for Continuous Control" at CERN last year is now online =)

www.youtube.com/watch?v=Sb0d...

11.02.2026 07:05 β€” πŸ‘ 16    πŸ” 4    πŸ’¬ 1    πŸ“Œ 1
Implement MPO Β· Issue #9 Β· Stable-Baselines-Team/stable-baselines3-contrib Maximum a Posteriori Policy Optimisation (MPO) Reference implementation: https://github.com/deepmind/acme PyTorch implementation: https://github.com/fabiopardo/tonic

github.com/Stable-Basel...
contributions are welcomed =) (the issue is from 2020...)

mainly lack of time, clean and readable implementation (and benchmark/comparison to other algos)

15.02.2026 19:51 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

reppo is in the updated version (in the references), PQL might be what you are looking for? (should be in the references too)
For mpo, i need to re-read the paper and try to implement and benchmark.

15.02.2026 17:21 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Six Things I Learned Watching a Robotics Startup Die from the Inside | Rui Xu I spent a year as COO of a YC-backed robotics startup. The company didn't make it. Here's what I actually learned.

nice blog post about a humanoid robotics startup failure: ruixu.us/posts/six-th...

11.02.2026 14:04 β€” πŸ‘ 10    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Recent Advances in Reinforcement Learning for Continuous Control | SOTA Early 2026

I also updated the slides recently for the RL Mannheim Workshop to include new SOTA algorithms from early 2026

araffin.github.io/slides/advan...

11.02.2026 07:05 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Recent Advances in RL for Continuous Control (SOTA 2025) | CERN ML Workshop
YouTube video by Antonin Raffin Recent Advances in RL for Continuous Control (SOTA 2025) | CERN ML Workshop

The talk i gave about "Recent Advances in RL for Continuous Control" at CERN last year is now online =)

www.youtube.com/watch?v=Sb0d...

11.02.2026 07:05 β€” πŸ‘ 16    πŸ” 4    πŸ’¬ 1    πŸ“Œ 1
To understand how our radio buttons work I need to understand two separate component libraries and hundreds of lines of React.

To understand how our radio buttons work I need to understand two separate component libraries and hundreds of lines of React.

If you missed this post last week, it explains pretty well how modern frontend works these days. :/

https://paulmakeswebsite...

02.02.2026 11:45 β€” πŸ‘ 19    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Q-value overestimation animation for my upcoming talk about "Recent Advances in RL for Continuous Control" at the Mannheim RL Workshop

31.01.2026 13:41 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Preview
The Formalism-Implementation Gap in Reinforcement Learning Research The last decade has seen an upswing in interest and adoption of reinforcement learning (RL) techniques, in large part due to its demonstrated capabilities at performing certain tasks at "super-human l...

This is something I talk about in my paper, where I suggest being explicit about {\gamma}_train (some methods use multiple gammas during training) and \gamma_eval.
One of my students is empirically investigating this and, as one would expect, it can have a huge impact.

arxiv.org/abs/2510.16175

29.01.2026 10:08 β€” πŸ‘ 12    πŸ” 1    πŸ’¬ 1    πŸ“Œ 1
Servo 0.0.4 showing new support for multiple windows

Servo 0.0.4 showing new support for multiple windows

December in Servo…

πŸŽ€πŸ§‘β€πŸ« FOSDEM talks next week!
🀹πŸͺŸ multiple windows
πŸͺ†πŸŒ HTTP proxy support
πŸ”πŸ•΅οΈ more SubtleCrypto algorithms
πŸ’½πŸ—ƒοΈ new site data & network API

servo.org/blog/2026/01...

23.01.2026 06:39 β€” πŸ‘ 45    πŸ” 8    πŸ’¬ 2    πŸ“Œ 0
Preview
Docker Cheat Sheet β€” The Ultimate CLI Reference Comprehensive Docker CLI reference with commands for containers, images, volumes, networks, Compose, and Dockerfile.

Dr. Who plays with Docker How :
docker.how

18.01.2026 08:42 β€” πŸ‘ 44    πŸ” 9    πŸ’¬ 2    πŸ“Œ 0
The export and preview menu, with the "PDF" section unfolded.

The export and preview menu, with the "PDF" section unfolded.

HTML preview & export now available in the web app! With HTML export, you can create a website from the same Typst file as your PDFs. This makes it easy to create documents that feel just as at home on the web as they do in print.

13.01.2026 18:21 β€” πŸ‘ 50    πŸ” 6    πŸ’¬ 1    πŸ“Œ 1
Post image

This network analyzer is very efficient and allows you to find interesting accounts, eg. people followed by lots of the people you follow (but not you).

bsky-follow-finder.theo.io

(Reposting this for folks who have joined Bsky more recently)

12.01.2026 18:17 β€” πŸ‘ 17    πŸ” 9    πŸ’¬ 2    πŸ“Œ 0
Preview
Open Source projects Join the conversation

People wanted our Open Source Organizations starter pack to include many projects, so we decided to give them their own starter pack.
go.bsky.app/HvKFRKa

09.01.2026 17:00 β€” πŸ‘ 29    πŸ” 6    πŸ’¬ 2    πŸ“Œ 0

"uv is fast because of what it doesn’t do, not because of what language it’s written in"

31.12.2025 16:36 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Using AI coding for data analysis without personal programming skill fills me with dread.

Small errors in the code poisons results in ways that may not be visibly obvious.

LLMs are great when people verify outputs; the path to hell is when they don't.

26.12.2025 17:07 β€” πŸ‘ 24    πŸ” 4    πŸ’¬ 1    πŸ“Œ 2
RLJ | RLC Call for Papers

Hi RL Enthusiasts!

RLC is coming to Montreal, Quebec, in the summer: Aug 16–19, 2026!

Call for Papers is up now:
Abstract: Mar 1 (AOE)
Submission: Mar 5 (AOE)

Excited to see what you’ve been up to - Submit your best work!
rl-conference.cc/callforpaper...

Please share widely!

23.12.2025 22:16 β€” πŸ‘ 61    πŸ” 28    πŸ’¬ 0    πŸ“Œ 8
Post image

Almost 5 years in the making... "Hyperparameter Optimization in Machine Learning" is finally out! πŸ“˜

We designed this monograph to be self-contained, covering: Grid, Random & Quasi-random search, Bayesian & Multi-fidelity optimization, Gradient-based methods, Meta-learning.

arxiv.org/abs/2410.22854

17.12.2025 09:54 β€” πŸ‘ 13    πŸ” 8    πŸ’¬ 0    πŸ“Œ 0
Preview
RL103: From Deep Q-Learning (DQN) to Soft Actor-Critic (SAC) and Beyond | Antonin Raffin | Homepage This second blog post continues my practical introduction to (deep) reinforcement learning, presenting the main concepts and providing intuitions to understand the more recent Deep RL algorithms. In a...

A practical introduction to (deep) RL, providing intuitions to understand the more recent algorithms (continued).

In this second post, I continue from DQN on to the Soft Actor-Critic (SAC) algorithm and its extensions.

araffin.github.io/post/rl103/

12.12.2025 17:47 β€” πŸ‘ 16    πŸ” 4    πŸ’¬ 1    πŸ“Œ 1

What a phenomenal talk by @jenson.org. He works in a very different slice of tech than I do, but his ethos toward developing tech deeply matches my own, and he articulates it so well.

I highly recommend watching it, regardless of whether you're interested in UX.

13.12.2025 19:58 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

antonin has been cooking olala

12.12.2025 18:40 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

bsky.app/profile/araf...

12.12.2025 17:47 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
RL102: From Tabular Q-Learning to Deep Q-Learning (DQN) | Antonin Raffin | Homepage This blog post is meant to be a practical introduction to (deep) reinforcement learning1, presenting the main concepts and providing intuitions to understand the more recent Deep RL algorithms. For a ...

Make sure to read part I =) (aka RL102: from tabular RL to DQN)

araffin.github.io/post/rl102/

12.12.2025 17:47 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
RL103: From Deep Q-Learning (DQN) to Soft Actor-Critic (SAC) and Beyond | Antonin Raffin | Homepage This second blog post continues my practical introduction to (deep) reinforcement learning, presenting the main concepts and providing intuitions to understand the more recent Deep RL algorithms. In a...

A practical introduction to (deep) RL, providing intuitions to understand the more recent algorithms (continued).

In this second post, I continue from DQN on to the Soft Actor-Critic (SAC) algorithm and its extensions.

araffin.github.io/post/rl103/

12.12.2025 17:47 β€” πŸ‘ 16    πŸ” 4    πŸ’¬ 1    πŸ“Œ 1
Post image

πŸš€ We just shipped v0.216.0!

Word-level diffing just landed. πŸŽ‰
It's been a night-and-day difference for usβ€”seeing exactly what changed within each line.

10.12.2025 17:16 β€” πŸ‘ 128    πŸ” 9    πŸ’¬ 3    πŸ“Œ 1

Are there any plans to release the code, and if so, in what timeframe? (Same question for XQC: code coming soonβ„’?)

10.12.2025 10:00 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0