Elie's Avatar

Elie

@eliebak.hf.co

Training LLM's at huggingface | hf.co/science

2,380 Followers  |  261 Following  |  20 Posts  |  Joined: 27.10.2024  |  1.8677

Latest posts by eliebak.hf.co on Bluesky

Post image

LLM Reasoning labs will be eating good today๐Ÿ”

We commandeered the HF cluster for a few days and generated 1.2M reasoning-filled solutions to 500k NuminaMath problems with DeepSeek-R1 ๐Ÿณ
Have fun!

12.02.2025 14:36 โ€” ๐Ÿ‘ 22    ๐Ÿ” 3    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 2
Preview
GitHub - huggingface/open-r1: Fully open reproduction of DeepSeek-R1 Fully open reproduction of DeepSeek-R1. Contribute to huggingface/open-r1 development by creating an account on GitHub.

Last moments of closed-source AI ๐Ÿชฆ :
Hugging Face is openly reproducing the pipeline of ๐Ÿณ DeepSeek-R1. Open data, open training. open models, open collaboration.

๐Ÿซต Let's go!
github.com/huggingface/...

25.01.2025 14:36 โ€” ๐Ÿ‘ 32    ๐Ÿ” 7    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
Preview
GitHub - huggingface/open-r1: Fully open reproduction of DeepSeek-R1 Fully open reproduction of DeepSeek-R1. Contribute to huggingface/open-r1 development by creating an account on GitHub.

We are reproducing the full DeepSeek R1 data and training pipeline so everybody can use their recipe. Instead of doing it in secret we can do it together in the open!

Follow along: github.com/huggingface/...

25.01.2025 13:29 โ€” ๐Ÿ‘ 199    ๐Ÿ” 36    ๐Ÿ’ฌ 6    ๐Ÿ“Œ 6
A plot showing increased performance of Llama-3.2-3B when pretrained on FineMath

A plot showing increased performance of Llama-3.2-3B when pretrained on FineMath

Introducing ๐Ÿ“FineMath: the best open math pre-training dataset with 50B+ tokens!

Math remains challenging for LLMs and by training on FineMath we see considerable gains over other math datasets, especially on GSM8K and MATH.

๐Ÿค— huggingface.co/datasets/Hug...

Hereโ€™s a breakdown ๐Ÿงต

19.12.2024 15:55 โ€” ๐Ÿ‘ 45    ๐Ÿ” 15    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1
Post image

WOW, Gemini Flash 2.0 is really impressive. Wondering about the size of this supposedly smol model.

One odd thing is that the model seems to lose some ability with long contexts compared to Flash 1.5. If any google friends could share insights, I'd love to hear them!

11.12.2024 16:19 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

๐Ÿ‘‹๐Ÿ‘‹

10.12.2024 00:23 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Curious about this, what is the % of "new ideas" that you are not allowed to publish? (if you can answer ofc)

05.12.2024 22:57 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

should be good now

05.12.2024 19:59 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Hey, I'll be at neurips next week! My DM are open if you want to meet and talk about pre-training/data/whatever you want ๐Ÿซก

04.12.2024 08:06 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Link: www.freepatentsonline.com/y2024/037844...
I've probably missed a lot, feel free to add more โฌ‡๏ธ

03.12.2024 11:12 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

- They use some kind of metadata token to give information about toxicity, data leakage but also "quality" token?
- [0118] talk about using some kind of lora's during the finetuning/alignment phase to adapt on multiple downstream task
- ~[0154] some memory evaluation technique?

03.12.2024 11:12 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Google patent on "Training of large neural network". ๐Ÿ˜ฎ

I don't know if this give much information but by going quickly through it seems that:
- They are not only using "causal language modeling task" as a pre-training task but also "span corruption" and "prefix modeling". (ref [0805]-[0091])

03.12.2024 11:11 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

So many open-source and open releases last week!
Here's a recap, find the text-readable version here huggingface.co/posts/merve/...

02.12.2024 09:53 โ€” ๐Ÿ‘ 79    ๐Ÿ” 4    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 0
Preview
smollm/smol_tools at main ยท huggingface/smollm Everything about the SmolLM & SmolLM2 family of models - huggingface/smollm

๐Ÿ“ฌ Summarize and rewrite your text/emails faster, and offline!

Check @andimara.bsky.social's Smol Tools for summarization and rewriting. It uses SmolLM2 to summarize text and make it more friendly or professional, all running locally thanks to llama.cpp github.com/huggingface/...

30.11.2024 15:58 โ€” ๐Ÿ‘ 11    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

What else should we log during LLM training? Right now, it's just loss, grad_norm, and evals, but I want to log more to have a better understanding of pre-training. Thinking about adding stuff like entropix metrics (agreement, varentropy?)

Any thoughts or cool ideas?

30.11.2024 15:19 โ€” ๐Ÿ‘ 7    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1

Glad to have you back!

28.11.2024 21:52 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

i find it sad but imo it's good news that those people block 'us.' I'm tired of seeing hateful comments on my colleagues' (and other ML engineers/researchers') posts."

28.11.2024 14:37 โ€” ๐Ÿ‘ 8    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

why not flex attention?

28.11.2024 08:35 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
ML Research Engineer Internship, SmolLMs pretraining and datasets - US Remote - Hugging Face At Hugging Face, weโ€™re on a journey to democratize good AI. We are building the fastest growing platform for AI builders with over 5 million users & 100k organizations who collectively shared over...

my bad, here you go for US! apply.workable.com/huggingface/...

28.11.2024 06:20 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

should be okay!

28.11.2024 06:20 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Video thumbnail

WOW! ๐Ÿคฏ Language models are becoming smaller and more capable than ever! Here's SmolLM2 running 100% locally in-browser w/ WebGPU on a 6-year-old GPU. Just look at that speed! โšก๏ธ๐Ÿ˜

Powered by ๐Ÿค— Transformers.js and ONNX Runtime Web!

How many tokens/second do you get? Let me know! ๐Ÿ‘‡

27.11.2024 13:51 โ€” ๐Ÿ‘ 46    ๐Ÿ” 10    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 3
A job description stating:
About this Role

This internship works at the intersections of software engineering, machine learning engineering, and education. With a strong focus on distributed training through the accelerate library (https://huggingface.co/docs/accelerate/index), we'll focus on bringing state-of-the-art training techniques into the library while also documenting and helping
teach others how they work. By the end of this internship, the candidate will have touched on all aspects of distributed training and core library contributions, including large-scale distributed training, API design, writing educational material aimed at a semi-technical audience, and
understanding the nuances of writing software that scales.

A job description stating: About this Role This internship works at the intersections of software engineering, machine learning engineering, and education. With a strong focus on distributed training through the accelerate library (https://huggingface.co/docs/accelerate/index), we'll focus on bringing state-of-the-art training techniques into the library while also documenting and helping teach others how they work. By the end of this internship, the candidate will have touched on all aspects of distributed training and core library contributions, including large-scale distributed training, API design, writing educational material aimed at a semi-technical audience, and understanding the nuances of writing software that scales.

I'm looking for an intern!

If you are:
* Driven
* Love OSS
* Interested in distributed PyTorch training/FSDPv2/DeepSpeed

Come work with me!

Fully remote, more details to apply in the comments

26.11.2024 16:01 โ€” ๐Ÿ‘ 52    ๐Ÿ” 10    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 1

10000% agree with omar, this is totally disproportionate

27.11.2024 13:09 โ€” ๐Ÿ‘ 7    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
ML Research Engineer Internship, SmolLMs pretraining and datasets - EMEA Remote - Hugging Face Here at Hugging Face, weโ€™re on a journey to advance good Machine Learning and make it more accessible. Along the way, we contribute to the development of technology for the better.We have built the fa...

Weโ€™re looking for an intern to join our SmolLM team! If youโ€™re excited about training LLMs and building high-quality datasets, weโ€™d love to hear from you. ๐Ÿค—

US: apply.workable.com/huggingface/...
EMEA: apply.workable.com/huggingface/...

27.11.2024 10:20 โ€” ๐Ÿ‘ 64    ๐Ÿ” 12    ๐Ÿ’ฌ 7    ๐Ÿ“Œ 2

super nice! ๐Ÿค—

26.11.2024 18:08 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Rearchitecting Hugging Face Uploads and Downloads Weโ€™re on a journey to advance and democratize artificial intelligence through open source and open science.

On the Xet team at @huggingface.bsky.social we're always looking for ways to move bytes to computer near you as fast as possible.

To do this, we're redesigning the upload and download infrastructure on the Hub. This post describes how, check the thread for details ๐Ÿงต

huggingface.co/blog/rearchi...

26.11.2024 17:39 โ€” ๐Ÿ‘ 67    ๐Ÿ” 11    ๐Ÿ’ฌ 9    ๐Ÿ“Œ 1

yayyyyyy! ๐Ÿ”ฅ

26.11.2024 16:48 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

The SmolLM series has a new member: say hi to SmolVLM! ๐Ÿค

It uses a preliminary 16k context version of SmolLM2 to tackle long-context vision documents and higher-res images.

And yes, weโ€™re cooking up versions with bigger context lengths. ๐Ÿ‘จโ€๐Ÿณ

Try it yourself here: huggingface.co/spaces/Huggi...

26.11.2024 16:47 โ€” ๐Ÿ‘ 6    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Small yet mighty! ๐Ÿ’ซ

We are releasing SmolVLM: a new 2B small vision language made for on-device use, fine-tunable on consumer GPU, immensely memory efficient ๐Ÿค 

We release three checkpoints under Apache 2.0: SmolVLM-Instruct, SmolVLM-Synthetic and SmolVLM-Base huggingface.co/collections/...

26.11.2024 16:04 โ€” ๐Ÿ‘ 159    ๐Ÿ” 27    ๐Ÿ’ฌ 11    ๐Ÿ“Œ 4
Post image

Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs.

SmolVLM can be fine-tuned on a Google collab and be run on a laptop! Or process millions of documents with a consumer GPU!

26.11.2024 15:57 โ€” ๐Ÿ‘ 104    ๐Ÿ” 22    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 4

@eliebak.hf.co is following 19 prominent accounts