It started as a modest project to offer a free, open-source alternative to MuJoCo environments, and today, panda-gym is downloaded over 100k times, and cited in over 100 papers. ๐ฆพ
02.05.2025 23:14 โ ๐ 7 ๐ 1 ๐ฌ 0 ๐ 0@qgallouedec.hf.co
PhD - Research @hf.co ๐ค TRL maintainer
It started as a modest project to offer a free, open-source alternative to MuJoCo environments, and today, panda-gym is downloaded over 100k times, and cited in over 100 papers. ๐ฆพ
02.05.2025 23:14 โ ๐ 7 ๐ 1 ๐ฌ 0 ๐ 0just pip install trl
26.04.2025 22:57 โ ๐ 4 ๐ 0 ๐ฌ 0 ๐ 0How many of these 8 things did you know?
huggingface.co/blog/qgallou...
๐ TRL 0.14 โ Featuring GRPO! ๐
TRL 0.14 brings *GRPO*, the RL algorithm behind ๐ณ DeekSeek-R1 .
โก Blazing fast generation with vLLM integration.
๐ Optimized training with DeepSpeed ZeRO 1/2/3.
The most impactful open-source project of today (dixit Vercel VP of AI)
=> huggingface.co/blog/open-r1
Last moments of closed-source AI ๐ชฆ :
Hugging Face is openly reproducing the pipeline of ๐ณ DeepSeek-R1. Open data, open training. open models, open collaboration.
๐ซต Let's go!
github.com/huggingface/...
The algorithm behind DeepSeek's R1 model (aka GRPO) now lives in TRL main branch! Go and test it!
22.01.2025 15:07 โ ๐ 4 ๐ 0 ๐ฌ 0 ๐ 0[Stonks] TRL is a Python library for training language models.
It has seen impressive growth this year. Lots of new features, an improved codebase, and this has translated into increased usage. You can count on us to do even more in 2025.
๐ Santa Claus has delivered the ultimate guide to understand OOM error (link in comment)
24.12.2024 11:04 โ ๐ 16 ๐ 5 ๐ฌ 2 ๐ 0Top 1 Python dev today. Third time since september ๐ซจ
17.12.2024 18:32 โ ๐ 4 ๐ 0 ๐ฌ 0 ๐ 0๐จ TRL 0.13 is out! ๐ค
Featuring a Process-supervised Reward Models (PRM) Trainer ๐๏ธ
PRMs empower LLMs to "think before answering"โa key feature behind OpenAI's o1 launch just two weeks ago. ๐
We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute ๐ฅ
How? By combining step-wise reward models with tree search algorithms :)
We're open sourcing the full recipe and sharing a detailed blog post ๐
The number of TRL models on the ๐ค Hub has risen x60 this year! ๐
How about doing the same next year?
We took those TRL notebooks from last week and made a page from them. So if you're upskilling on finetuning or aligning LLMs, and want examples from the community (like Maxime Labonne Philipp Schmid Sergio Paniego Blanco), check it out!
bsky.app/profile/benb...
>> huggingface.co/docs/trl/mai...
(don't mind the location) apply.workable.com/huggingface/...
27.11.2024 15:51 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0Join us at Hugging Face as an intern if you want to contribute to amazing open-source projects, and develop LLM's best finetuning library, aka TRL.
๐งโ๐ป Full remote
๐คฏ Exciting subjects
๐ Anywhere in the world
๐คธ๐ป Flexible working hours
Link to apply in comment ๐
Weโre looking for an intern to join our SmolLM team! If youโre excited about training LLMs and building high-quality datasets, weโd love to hear from you. ๐ค
US: apply.workable.com/huggingface/...
EMEA: apply.workable.com/huggingface/...
I'd love to! We have a lot of room for improvement here!
25.11.2024 10:43 โ ๐ 3 ๐ 0 ๐ฌ 0 ๐ 0These tutorials provide a comprehensive but concise roadmap through TRL across the main fine-tuning and alignment classes.
๐ค Let me know if you would like a dedicated course on TRL basics.
It's Sunday morning so taking a minute for a nerdy thread (on math, tokenizers and LLMs) of the work of our intern Garreth
By adding a few lines of code to the base Llama 3 tokenizer, he got a free boost in arithmetic performance ๐ฎ
[thread]
How can you avoid the temptation to use a subprocess for sub-commands?
This blog post from @muellerzr.bsky.social saved my day.
muellerzr.github.io/til/argparse...
Finetune SmolLM2 with TRL!
21.11.2024 11:32 โ ๐ 12 ๐ 0 ๐ฌ 0 ๐ 1When XetHub joined Hugging Face, we brainstormed how to share our tech with the community.
The magic? Versioning chunks, not files, giving rise to:
๐ง Smarter storage
โฉ Faster uploads
๐ Efficient downloads
Curious? Read the blog and let us know how it could help your workflows!