Leandro von Werra's Avatar

Leandro von Werra

@lvwerra.bsky.social

Research @ Hugging Face

1,147 Followers  |  51 Following  |  11 Posts  |  Joined: 22.06.2023  |  1.4838

Latest posts by lvwerra.bsky.social on Bluesky

Preview
GitHub - huggingface/picotron: Minimalistic 4D-parallelism distributed training framework for education purpose Minimalistic 4D-parallelism distributed training framework for education purpose - huggingface/picotron

Checkout the code and super detailed video walkthrough!

Code: github.com/huggingface/...

Video: youtube.com/playlist?lis...

Work lead by Haojun Zhao and Ferdinand Mom!

06.01.2025 16:51 โ€” ๐Ÿ‘ 1    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Distributed training is notoriously hard to learn - knowledge is scattered across papers and complex codebases.

Enter picotron: implementing all 4D parallelism concepts in separate, readable files totaling just 1988 LoC!

06.01.2025 16:51 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
thumbnail that says introducing smolagents

thumbnail that says introducing smolagents

supercharge your LLM apps with smolagents ๐Ÿ”ฅ

however cool your LLM is, without being agentic it can only go so far

enter smolagents: a new agent library by @hf.co to make the LLM write code, do analysis and automate boring stuff! huggingface.co/blog/smolage...

31.12.2024 15:32 โ€” ๐Ÿ‘ 88    ๐Ÿ” 17    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 3
A plot showing increased performance of Llama-3.2-3B when pretrained on FineMath

A plot showing increased performance of Llama-3.2-3B when pretrained on FineMath

Introducing ๐Ÿ“FineMath: the best open math pre-training dataset with 50B+ tokens!

Math remains challenging for LLMs and by training on FineMath we see considerable gains over other math datasets, especially on GSM8K and MATH.

๐Ÿค— huggingface.co/datasets/Hug...

Hereโ€™s a breakdown ๐Ÿงต

19.12.2024 15:55 โ€” ๐Ÿ‘ 45    ๐Ÿ” 15    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1
Video thumbnail

Or watch how the model solves the Lokta-Volterra equation and plots the results and refines them.

Try it out: huggingface.co/spaces/data-...

19.12.2024 18:56 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Video thumbnail

Releasing Jupyter Agents - LLMs running data analysis directly in a notebook!

The agent can load data, execute code, plot results and following your guidance and ideas!

A very natural way to collaborate with an LLM over data and it's just scratching the surface of what's possible soon!

19.12.2024 18:56 โ€” ๐Ÿ‘ 13    ๐Ÿ” 4    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute ๐Ÿ”ฅ

How? By combining step-wise reward models with tree search algorithms :)

We're open sourcing the full recipe and sharing a detailed blog post ๐Ÿ‘‡

16.12.2024 17:08 โ€” ๐Ÿ‘ 109    ๐Ÿ” 21    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 1

๐Ÿ’”

14.12.2024 20:12 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Big News in AI4Science! โœจ
We are thrilled to launch LeMaterial, an open-source project in collaboration with @hf.co to accelerate materials discovery โš›๏ธ๐Ÿค—

Discover LeMat-Bulk: a 6.7M-entry dataset standardizing and unifying Materials Project, Alexandria and OQMD

11.12.2024 18:34 โ€” ๐Ÿ‘ 11    ๐Ÿ” 7    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Post image

Announcing ๐Ÿฅ‚ FineWeb2: A sparkling update with 1000s of ๐Ÿ—ฃ๏ธlanguages.

We applied the same data-driven approach that led to SOTA English performance in๐Ÿท FineWeb to thousands of languages.

๐Ÿฅ‚ FineWeb2 has 8TB of compressed text data and outperforms other datasets.

08.12.2024 09:19 โ€” ๐Ÿ‘ 75    ๐Ÿ” 19    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

The FineWeb team is happy to finally release "FineWeb2" ๐Ÿฅ‚๐Ÿฅณ

FineWeb 2 extends the data driven approach to pre-training dataset design that was introduced in FineWeb 1 to now covers 1893 languages/scripts

Details: huggingface.co/datasets/Hug...

A detailed open-science tech report is coming soon

08.12.2024 09:08 โ€” ๐Ÿ‘ 105    ๐Ÿ” 13    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 2

There are not many opportunities out there to build open LLMs and make them state-of-the-art, too! This is one of them.

28.11.2024 09:42 โ€” ๐Ÿ‘ 16    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Video thumbnail

WOW! ๐Ÿคฏ Language models are becoming smaller and more capable than ever! Here's SmolLM2 running 100% locally in-browser w/ WebGPU on a 6-year-old GPU. Just look at that speed! โšก๏ธ๐Ÿ˜

Powered by ๐Ÿค— Transformers.js and ONNX Runtime Web!

How many tokens/second do you get? Let me know! ๐Ÿ‘‡

27.11.2024 13:51 โ€” ๐Ÿ‘ 46    ๐Ÿ” 10    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 3

Some people are pushing models to the top right of the plot following the scaling laws, others push them to the top left and make them faster and cheaper!

We need both!

26.11.2024 16:33 โ€” ๐Ÿ‘ 11    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
A screenshot of LightEval benchmarking results in a terminal

A screenshot of LightEval benchmarking results in a terminal

Check out how easy it is to do LLM evals with LightEval!

* any dataset on the ๐Ÿค— Hub can become an eval task in a few lines of code: customize the prompt, metrics, parsing, few-shots, everything!
* model- and data-parallel inference
* auto batching with the new vLLM backend

25.11.2024 17:24 โ€” ๐Ÿ‘ 78    ๐Ÿ” 10    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1
Post image

It's Sunday morning so taking a minute for a nerdy thread (on math, tokenizers and LLMs) of the work of our intern Garreth

By adding a few lines of code to the base Llama 3 tokenizer, he got a free boost in arithmetic performance ๐Ÿ˜ฎ

[thread]

24.11.2024 11:05 โ€” ๐Ÿ‘ 273    ๐Ÿ” 35    ๐Ÿ’ฌ 5    ๐Ÿ“Œ 5

Looks more like a rave!

22.11.2024 09:20 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

What's the secret sauce of SmolLM2 to beat LLM titans like Llama3.2 and Qwen2.5?

Unsurprisingly: data, data, data!

The SmolTalk is open and available here: huggingface.co/datasets/Hug...

21.11.2024 14:17 โ€” ๐Ÿ‘ 62    ๐Ÿ” 7    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1
A little guide to building Large Language Models in 2024
YouTube video by ThomWolf A little guide to building Large Language Models in 2024

Slides here: docs.google.com/presentation...

Inspired by the nice talk from @thomwolf.bsky.social earlier this year and updated with some material we are working on right now:

www.youtube.com/watch?v=2-SP...

19.11.2024 20:35 โ€” ๐Ÿ‘ 11    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image Post image Post image Post image

All the things you need to know to pretrain an LLM at home*!

Gave a workshop at Uni Bern: starts with scaling laws and goes to web scale data processing and finishes training with 4D parallelism and ZeRO.

*assuming your home includes an H100 cluster

19.11.2024 20:35 โ€” ๐Ÿ‘ 77    ๐Ÿ” 9    ๐Ÿ’ฌ 5    ๐Ÿ“Œ 0

@lvwerra is following 19 prominent accounts