Great to be part of this project led by the amazing @hamishivi.bsky.social. The most fun (in retrospect) thing is to observe how the results start to shift as we scale up the candidate pool, evaluation suite, and selection size :) And eventually we find a simple method does the best!
04.03.2025 21:14 โ ๐ 2 ๐ 1 ๐ฌ 0 ๐ 0
How well do data-selection methods work for instruction-tuning at scale?
Turns out, when you look at large, varied data pools, lots of recent methods lag behind simple baselines, and a simple embedding-based method (RDS) does best!
More below โฌ๏ธ (1/8)
04.03.2025 17:10 โ ๐ 13 ๐ 4 ๐ฌ 1 ๐ 2
This is a great effort for the migration, thanks for putting it together! Can I be added to the list?
12.11.2024 22:23 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
I (try to) do NLP research. Antipodean abroad.
currently doing PhD @uwcse,
prev @usyd @ai2
๐ฆ๐บ๐จ๐ฆ๐ฌ๐ง
ivison.id.au
Training big models at @ai2.bsky.social.
NLP PhD student at UPenn | Prev USC
cylumn.com
Assistant Professor @ UChicago CS/DSI (NLP & HCI) | Writing with AI โ๏ธ
https://minalee-research.github.io/
Professor at UW; Researcher at Meta. LMs, NLP, ML. PNW life.
USC CS Ph.D. student
Prev Tsinghua Uni
NLP, Multimodal Learning, AI for Science
https://saccharomycetes.github.io/
1st year CS PhD student @UCSD
Visiting PhD at Stanford๐ฒ, CS PhD student at NUS ๐ธ๐ฌ, PhD Fellow @ Google, NLP researcher๐
https://yocodeyo.github.io
Working on Social Intelligence and Evaluation
3rd year PhD student @uscnlp. Interested in NLP x CSS | she/her
Incoming assistant professor at UCSD CSE in MLSys. Currently recruiting students! Also academic partner at Together AI. https://danfu.org/
Chief AI Scientist at Databricks. Founding team at MosaicML. MIT/Princeton alum. Lottery ticket enthusiast. Working on data intelligence.
AI @ OpenAI, Tesla, Stanford
2nd year PhD at UCSD w/ @rajammanabrolu.bsky.social
Prev: @ltiatcmu.bsky.social @umich.edu
Research: Agents๐ค, Reasoning๐ง , Games๐พ
AI/ML/NLP researcher and Senior Lecturer at Imperial College London. Working on language models for planning, reasoning and interpretable decision making
PhD student @MIT EECS & CSAIL. Working on principled and scalable methods in ML & LLM.
she/her/hers
sustcsonglin.github.io
AI professor at Caltech. General Chair ICLR 2025.
http://www.yisongyue.com
PhD Student at UC San Diego | LLM Agents, Reinforcement Learning, Human-AI Collaboration, Multi-Agent Systems
CS Prof at UC Irvine, CTO/Cofounder at Envive AI
Work on evaluation and robustness of LLMs
AI, RL, NLP, Games Asst Prof at UCSD
Research Scientist at Nvidia
Lab: http://pearls.ucsd.edu
Personal: prithvirajva.com