Michael Hu's Avatar

Michael Hu

@michahu.bsky.social

PhD student at NYU. NLP & training data. michahu.github.io

62 Followers  |  128 Following  |  6 Posts  |  Joined: 03.10.2023  |  1.5667

Latest posts by michahu.bsky.social on Bluesky

Preview
Nothing Like You Stephan Bodzin, Luna Semara Β· Boavista Β· Song Β· 2021

Boavista album by Stephan Bodzin:
open.spotify.com/track/7ujvbI...

26.11.2024 01:29 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Is this #1 in your Spotify wrapped πŸ˜†

26.11.2024 01:14 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

thanks for featuring this work!

19.11.2024 02:04 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Aioli: A Unified Optimization Framework for Language Model Data Mixing Language model performance depends on identifying the optimal mixture of data groups to train on (e.g., law, code, math). Prior work has proposed a diverse set of methods to efficiently learn mixture ...

In joint work with @MayeeChen @NickLourie @kchonyc @HazyResearch, we use our optimization framework to analyze failures of existing methods. We then turn these insights into:

Aioli πŸ§„, a fully-online data mixing algorithm!

paper: arxiv.org/abs/2411.05735
code: github.com/HazyResearch...

12.11.2024 17:04 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

So you want a good pretraining data mixπŸ§‘β€πŸ³, but which data mixing algorithm do you pick? DoGE, DoReMi, Skill-it, grid searching proportions… πŸ˜΅β€πŸ’«

It turns out that these algorithms are all special cases of Linear Mixing Optimization, our new data mixing framework! 🧡

12.11.2024 17:04 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

metropolis-hastings:
1️⃣ sample from your proposal function
2️⃣ run the sample through your filter, proportional to the desired pdf
3️⃣ use the kept samples to initialize the next round

i wonder if we can connect iterative approaches to synthetic data as making specific choices in an MCMC framework...

10.11.2024 02:24 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@michahu is following 20 prominent accounts