Quentin Gallouรฉdec's Avatar

Quentin Gallouรฉdec

@qgallouedec.hf.co

PhD - Research @hf.co ๐Ÿค— TRL maintainer

903 Followers  |  170 Following  |  17 Posts  |  Joined: 18.11.2024  |  1.8679

Latest posts by qgallouedec.hf.co on Bluesky

Post image Post image

It started as a modest project to offer a free, open-source alternative to MuJoCo environments, and today, panda-gym is downloaded over 100k times, and cited in over 100 papers. ๐Ÿฆพ

02.05.2025 23:14 โ€” ๐Ÿ‘ 7    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

just pip install trl

26.04.2025 22:57 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Gotchas in Tokenizer Behavior Every Developer Should Know A Blog post by Quentin Gallouรฉdec on Hugging Face

How many of these 8 things did you know?

huggingface.co/blog/qgallou...

20.04.2025 18:21 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿš€ TRL 0.14 โ€“ Featuring GRPO! ๐Ÿš€

TRL 0.14 brings *GRPO*, the RL algorithm behind ๐Ÿณ DeekSeek-R1 .

โšก Blazing fast generation with vLLM integration.
๐Ÿ“‰ Optimized training with DeepSpeed ZeRO 1/2/3.

30.01.2025 14:54 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Open-R1: a fully open reproduction of DeepSeek-R1 Weโ€™re on a journey to advance and democratize artificial intelligence through open source and open science.

The most impactful open-source project of today (dixit Vercel VP of AI)
=> huggingface.co/blog/open-r1

28.01.2025 12:17 โ€” ๐Ÿ‘ 81    ๐Ÿ” 17    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
Preview
GitHub - huggingface/open-r1: Fully open reproduction of DeepSeek-R1 Fully open reproduction of DeepSeek-R1. Contribute to huggingface/open-r1 development by creating an account on GitHub.

Last moments of closed-source AI ๐Ÿชฆ :
Hugging Face is openly reproducing the pipeline of ๐Ÿณ DeepSeek-R1. Open data, open training. open models, open collaboration.

๐Ÿซต Let's go!
github.com/huggingface/...

25.01.2025 14:36 โ€” ๐Ÿ‘ 32    ๐Ÿ” 7    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
Post image

The algorithm behind DeepSeek's R1 model (aka GRPO) now lives in TRL main branch! Go and test it!

22.01.2025 15:07 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

[Stonks] TRL is a Python library for training language models.

It has seen impressive growth this year. Lots of new features, an improved codebase, and this has translated into increased usage. You can count on us to do even more in 2025.

06.01.2025 17:26 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Visualize and understand GPU memory in PyTorch Weโ€™re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co/blog/train_m...

24.12.2024 11:04 โ€” ๐Ÿ‘ 5    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

๐ŸŽ… Santa Claus has delivered the ultimate guide to understand OOM error (link in comment)

24.12.2024 11:04 โ€” ๐Ÿ‘ 16    ๐Ÿ” 5    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Post image

Top 1 Python dev today. Third time since september ๐Ÿซจ

17.12.2024 18:32 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

๐Ÿšจ TRL 0.13 is out! ๐Ÿค—

Featuring a Process-supervised Reward Models (PRM) Trainer ๐Ÿ‹๏ธ

PRMs empower LLMs to "think before answering"โ€”a key feature behind OpenAI's o1 launch just two weeks ago. ๐Ÿš€

17.12.2024 16:07 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute ๐Ÿ”ฅ

How? By combining step-wise reward models with tree search algorithms :)

We're open sourcing the full recipe and sharing a detailed blog post ๐Ÿ‘‡

16.12.2024 17:08 โ€” ๐Ÿ‘ 109    ๐Ÿ” 21    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 1
Post image

The number of TRL models on the ๐Ÿค— Hub has risen x60 this year! ๐Ÿ“ˆ
How about doing the same next year?

03.12.2024 12:55 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

We took those TRL notebooks from last week and made a page from them. So if you're upskilling on finetuning or aligning LLMs, and want examples from the community (like Maxime Labonne Philipp Schmid Sergio Paniego Blanco), check it out!

bsky.app/profile/benb...

>> huggingface.co/docs/trl/mai...

02.12.2024 09:18 โ€” ๐Ÿ‘ 21    ๐Ÿ” 4    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Machine Learning Engineer Internship, TRL - EMEA Remote - Hugging Face Here at Hugging Face, weโ€™re on a journey to advance good Machine Learning and make it more accessible. Along the way, we contribute to the development of technology for the better.We have built the fa...

(don't mind the location) apply.workable.com/huggingface/...

27.11.2024 15:51 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Join us at Hugging Face as an intern if you want to contribute to amazing open-source projects, and develop LLM's best finetuning library, aka TRL.

๐Ÿง‘โ€๐Ÿ’ป Full remote
๐Ÿคฏ Exciting subjects
๐ŸŒ Anywhere in the world
๐Ÿคธ๐Ÿป Flexible working hours

Link to apply in comment ๐Ÿ‘‡

27.11.2024 15:49 โ€” ๐Ÿ‘ 7    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
ML Research Engineer Internship, SmolLMs pretraining and datasets - EMEA Remote - Hugging Face Here at Hugging Face, weโ€™re on a journey to advance good Machine Learning and make it more accessible. Along the way, we contribute to the development of technology for the better.We have built the fa...

Weโ€™re looking for an intern to join our SmolLM team! If youโ€™re excited about training LLMs and building high-quality datasets, weโ€™d love to hear from you. ๐Ÿค—

US: apply.workable.com/huggingface/...
EMEA: apply.workable.com/huggingface/...

27.11.2024 10:20 โ€” ๐Ÿ‘ 64    ๐Ÿ” 12    ๐Ÿ’ฌ 7    ๐Ÿ“Œ 2

I'd love to! We have a lot of room for improvement here!

25.11.2024 10:43 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

These tutorials provide a comprehensive but concise roadmap through TRL across the main fine-tuning and alignment classes.

๐Ÿค” Let me know if you would like a dedicated course on TRL basics.

25.11.2024 10:16 โ€” ๐Ÿ‘ 5    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Post image

It's Sunday morning so taking a minute for a nerdy thread (on math, tokenizers and LLMs) of the work of our intern Garreth

By adding a few lines of code to the base Llama 3 tokenizer, he got a free boost in arithmetic performance ๐Ÿ˜ฎ

[thread]

24.11.2024 11:05 โ€” ๐Ÿ‘ 273    ๐Ÿ” 35    ๐Ÿ’ฌ 5    ๐Ÿ“Œ 5
Zach Mueller - Calling argparse without subprocess How to use argparse without the CLI

How can you avoid the temptation to use a subprocess for sub-commands?

This blog post from @muellerzr.bsky.social saved my day.

muellerzr.github.io/til/argparse...

22.11.2024 19:02 โ€” ๐Ÿ‘ 4    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Finetune SmolLM2 with TRL!

21.11.2024 11:32 โ€” ๐Ÿ‘ 12    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
Preview
From Files to Chunks: Improving HF Storage Efficiency Weโ€™re on a journey to advance and democratize artificial intelligence through open source and open science.

When XetHub joined Hugging Face, we brainstormed how to share our tech with the community.

The magic? Versioning chunks, not files, giving rise to:

๐Ÿง  Smarter storage
โฉ Faster uploads
๐Ÿš€ Efficient downloads

Curious? Read the blog and let us know how it could help your workflows!

20.11.2024 18:51 โ€” ๐Ÿ‘ 34    ๐Ÿ” 15    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 2

@qgallouedec.hf.co is following 20 prominent accounts