Muru Zhang's Avatar

Muru Zhang

@muruzhang.bsky.social

First-year NLP PhD @ USC | Intern @ TogetherAI | Prev. UW, AWS https://nanami18.github.io/

638 Followers  |  64 Following  |  2 Posts  |  Joined: 12.11.2024  |  1.3579

Latest posts by muruzhang.bsky.social on Bluesky

Great to be part of this project led by the amazing @hamishivi.bsky.social. The most fun (in retrospect) thing is to observe how the results start to shift as we scale up the candidate pool, evaluation suite, and selection size :) And eventually we find a simple method does the best!

04.03.2025 21:14 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

How well do data-selection methods work for instruction-tuning at scale?

Turns out, when you look at large, varied data pools, lots of recent methods lag behind simple baselines, and a simple embedding-based method (RDS) does best!

More below โฌ‡๏ธ (1/8)

04.03.2025 17:10 โ€” ๐Ÿ‘ 13    ๐Ÿ” 4    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 2

This is a great effort for the migration, thanks for putting it together! Can I be added to the list?

12.11.2024 22:23 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

@muruzhang is following 20 prominent accounts