Pradeep Dasigi's Avatar

Pradeep Dasigi

@pdasigi.bsky.social

#NLP research @ai2.bsky.social; OLMo post-training https://pdasigi.github.io/

364 Followers  |  82 Following  |  13 Posts  |  Joined: 21.11.2024  |  1.8968

Latest posts by pdasigi.bsky.social on Bluesky

Introducing OLMo-2-0325-32B-Instruct! It's the spring RL curve time. This time, we used GRPO for RLVR and trained a pretty nice fully open source model!

13.03.2025 19:19 β€” πŸ‘ 12    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

Announcing OLMo 2 32B: the first fully open model to beat GPT 3.5 & GPT-4o mini on a suite of popular, multi-skill benchmarks.

Comparable to best open-weight models, but a fraction of training compute. When you have a good recipe, ✨ magical things happen when you scale it up!

13.03.2025 18:36 β€” πŸ‘ 58    πŸ” 15    πŸ’¬ 3    πŸ“Œ 3

How to curate instruction tuning datasets while targeting specific skills? This is a common question developers face while post-training LMs.

In this work led by @hamishivi.bsky.social we found that simple embedding based methods scale much better than fancier computationally intensive ones.

04.03.2025 19:20 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

also some other tΓΌlu contributors are on the market:
@ljvmiranda.bsky.social (ljvmiranda921.github.io) and Xinxi Lyu (alrope123.github.io) are also applying to phd programs, and @valentinapy.bsky.social (valentinapy.github.io) is on the faculty market, hire them all!!

30.01.2025 19:25 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Here's a significant update to TΓΌlu 3: we scaled up the post-training recipe to Llama 3.1 405B. TΓΌlu 3 405B beats Llama's 405B instruct model and also Deepseek V3.

Huge shoutout to @hamishivi.bsky.social and @vwxyzjn.bsky.social who led the scale up, and to the rest of the team!

30.01.2025 19:21 β€” πŸ‘ 7    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Very pleased to see Tulu 3 70B more or less tied with Llama 3.1 70B Instruct on style controlled ChatBotArena. The only model anywhere close to that with open code and data for post-training! Lots of stuff people can build on.

Next looking for OLMo 2 numbers.

08.01.2025 17:13 β€” πŸ‘ 24    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Post image

Excited to see Tulu 3 sits in between Llama 3.1 and 3.3 instruct on the chatbot arena leaderboard right now!

Particularly happy it is top 20 for Math and Multi-turn prompts :)

All the details and data on how to train a model this good are right here: arxiv.org/abs/2411.15124

08.01.2025 17:47 β€” πŸ‘ 15    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Test accuracy, train rewards, kl divergence, and response lenght training curves when training Tulu 3 SFT and Tulu 3 DPO on the MATH or GSM8k train sets, and evaluating on MATH/GSM8k using RLVR. Performance significantly improves in both cases.

Test accuracy, train rewards, kl divergence, and response lenght training curves when training Tulu 3 SFT and Tulu 3 DPO on the MATH or GSM8k train sets, and evaluating on MATH/GSM8k using RLVR. Performance significantly improves in both cases.

New OpenAI RL finetuning API reminds me a lot of RLVR, which we used for TΓΌlu 3 (arxiv.org/abs/2411.15124).

Using RL to train against labels is a simple idea, but very effective (>10pt gains just using GSM8k train set).

It's implemented for you to use in Open-Instruct πŸ˜‰: github.com/allenai/open...

06.12.2024 20:24 β€” πŸ‘ 7    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Job Application for Predoctoral Young Investigator, OLMo at The Allen Institute for AI

Our team at Ai2 (OLMo) is looking for a predoctoral researcher. You get to work on exciting research in building open LMs while preparing for a PhD.

Apply here: job-boards.greenhouse.io/thealleninst...

04.12.2024 17:04 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
GitHub - allenai/OLMo: Modeling, training, eval, and inference code for OLMo Modeling, training, eval, and inference code for OLMo - allenai/OLMo

We just updated the OLMo repo at github.com/allenai/OLMo!
There are now several training configs that together reproduce the training runs that lead to the final OLMo 2 models.
In particular, all the training data is available, tokenized and shuffled exactly as we trained on it!

02.12.2024 20:13 β€” πŸ‘ 54    πŸ” 11    πŸ’¬ 0    πŸ“Œ 0

How does your cat feel about wearing a Garmin device and adding you as a connection?

27.11.2024 18:15 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

OLMo 2 is out! We released 7B and 13B models that are *fully open*, and compete with the best open-weight models out there.

Importantly, we use the same post-training recipe as our recently released TΓΌlu 3, and it works remarkably well, more so at the 13B size.

26.11.2024 21:28 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

As a reviewer, I did not find it particularly useful. The recommendations I received were already addressed by my initial review. As an author, the mix of reviews we got were roughly the same quality as usual.
This is an interesting idea though, and I hope there's a way to make it work.

25.11.2024 14:52 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
allura-org/Teleut-7b Β· Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

open source tulu 3 model recreation! rivals the original sft and other models in its size range
huggingface.co/allura-org/T...

24.11.2024 20:09 β€” πŸ‘ 34    πŸ” 6    πŸ’¬ 1    πŸ“Œ 3

We even developed two *new* instruction following evals for this setup:
1) IFEval-OOD, a variant of IFEval (Zhou et al., 2023) but with a disjoint set of constraints.
2) HREF, a more general IF eval targeting a diverse set of IF tasks.
Detailed analyses on these evals are coming out soon.

23.11.2024 23:53 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

We presented some preliminary findings on generalization and overfitting in the report based on this setup, and will put out more analysis soon.

23.11.2024 23:53 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

From a traditional ML perspective, this setup may seem obvious, i.e., a dev-test split of an eval. But none of our modern evals are in-distribution anymore, and we expect our models to generalize across distributions. So we adapted the traditional setup to use newer and harder evals as unseen ones.

23.11.2024 23:53 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

For each "core skill" we care about, we chose a separate set of "development" and "unseen" evaluations. We tracked the performance of models only on the former during development and evaluated only the final checkpoints on the unseen ones.

23.11.2024 23:53 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

A common approach for improving LM performance at specific skills is to *synthesize* training data that is similar to corresponding evals. But how do we ensure that we are not simply overfitting to those benchmarks? It is worth highlighting our approach to evaluation for TΓΌlu 3 in this regard.

23.11.2024 23:53 β€” πŸ‘ 8    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
GitHub - allenai/open-instruct Contribute to allenai/open-instruct development by creating an account on GitHub.

Want to post-train on your own data? Here's our training code: github.com/allenai/open....

Reproducing LM evaluations can be notoriously difficult. So we released our evaluation framework where you can specify and tweak every last detail and reproduce what we did: github.com/allenai/olmes.

23.11.2024 04:20 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Super excited to release TΓΌlu 3, a suite of open SoTA post-trained models, data, code, evaluation framework, and most importantly post-training recipes.

I learned A LOT about LM post-training working on this project. We wrote it all up so now you can too.
Paper: allenai.org/papers/tulu-...

23.11.2024 04:20 β€” πŸ‘ 8    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1
Post image

Excited to release Tulu 3! We worked hard to try and make the best open post-training recipe we could, and the results are good!
I was lucky enough to work on almost every stage of the pipeline in one way or another. Some comments + highlights ⬇️

21.11.2024 17:45 β€” πŸ‘ 9    πŸ” 5    πŸ’¬ 1    πŸ“Œ 0
Post image

Meet TΓΌlu 3, a set of state-of-the-art instruct models with fully open data, eval code, and training algorithms.
We invented new methods for fine-tuning language models with RL and built upon best practices to scale synthetic instruction and preference data.
Demo, GitHub, paper, and models πŸ‘‡

21.11.2024 17:15 β€” πŸ‘ 111    πŸ” 31    πŸ’¬ 3    πŸ“Œ 7
Post image

Open Post-Training recipes!

Some of my personal highlights:
πŸ’‘ We significantly scaled up our preference data!
πŸ’‘ RL with Verifiable Rewards to improve targeted skills like math and precise instruction following
πŸ’‘ evaluation toolkit for post-training (including new unseen evals!)

21.11.2024 18:40 β€” πŸ‘ 22    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

@pdasigi is following 19 prominent accounts