changho - Bluesky Statics

Tons of model weights available, but what else can we do besides prediction? 🤔 Introducing Grad-Mimic! A new data selection framework using well-trained model’s weights to find high-value samples for foundation models. Boost data curation & data efficiency!

09.02.2025 21:07 — 👍 3 🔁 3 💬 1 📌 0

Weak-to-Strong Generalization Through the Data-Centric Lens The weak-to-strong generalization phenomenon is the driver for important machine learning applications including highly data-efficient learning and, most recently, performing superalignment. While dec...

11/ We'd love to hear your thoughts and dive deeper into the discussion! 🚀
See you in Singapore at #ICLR2025!

Big thanks to my advisor
@fredsala.bsky.social for his guidance and to John for his contributions!

Paper: arxiv.org/abs/2412.03881
Github: github.com/SprocketLab/...

05.02.2025 18:41 — 👍 1 🔁 0 💬 0 📌 0

10/ By leveraging the strengths of simpler models and their understanding of easy patterns, stronger models can iteratively build upon this foundation to tackle increasingly complex challenges. We see this process as a systematic and scalable path toward achieving superintelligence.

05.02.2025 18:40 — 👍 0 🔁 0 💬 1 📌 0

9/ Looking ahead, we're excited to explore data-centric mechanisms for weak-to-strong generalization. Just as scholars refine theories—building on past insights to deepen understanding and create new concepts—we believe weak-to-strong generalization follows a similar trajectory.

05.02.2025 18:40 — 👍 0 🔁 0 💬 1 📌 0

8) Key takeaways for practitioners:
- Instead of just improving algorithms, focus on selecting the right data!
- Prioritizing high-overlap data sources gives us better generalization.

05.02.2025 18:39 — 👍 0 🔁 0 💬 1 📌 0

7/ How can we maximize overlap density when choosing data sources?

We frame data selection as a bandit problem, using UCB to balance exploration and exploitation across datasets. This strategically identifies and prioritize sources with high overlap density, maximizing generalization.

05.02.2025 18:39 — 👍 0 🔁 0 💬 1 📌 0

6/ To tackle this, we propose an overlap detection algorithm that uncovers these points in real-world datasets and helps explain both the presence and absence of weak-to-strong generalization.

05.02.2025 18:38 — 👍 0 🔁 0 💬 1 📌 0

5/ However, identifying overlap density in real-world datasets is challenging. Overlapping points are latent!

05.02.2025 18:38 — 👍 0 🔁 0 💬 1 📌 0

4/ 🔑 Key insights:
- Weak models can make accurate pseudolabels based on easy patterns
- Strong models leverage these labels to generalize on hard patterns.
- More overlap → better generalization

05.02.2025 18:38 — 👍 1 🔁 0 💬 1 📌 0

3/ The amount of these overlaps, i.e., the proportion of points containing both the easy and hard patterns, is the core quantity determining how much weak-to-strong generalization we get.

05.02.2025 18:34 — 👍 0 🔁 0 💬 1 📌 0

2/ The intuition is simple: generalization tracks the data points containing both “easy” patterns (learnable by a weak model) and “challenging” patterns (only learnable by a stronger model), as with such points, weak predictions create signal to learn challenging patterns with stronger models.

05.02.2025 18:25 — 👍 1 🔁 0 💬 1 📌 0

1/ Weak-to-strong generalization, where a strong student model surpasses its weaker teacher model, is crucial for achieving 'superintelligence'. We propose a mechanism explaining when and why this happens.

05.02.2025 18:22 — 👍 0 🔁 0 💬 1 📌 0

What enables a strong model to surpass its weaker teacher?

🚀 Excited to share our ICLR 2025 paper: "Weak-to-Strong Generalization Through the Data-Centric Lens"! 🧵

05.02.2025 18:22 — 👍 4 🔁 2 💬 1 📌 0

First up at #NeurIPS2024 from our group, our work on labeling via programmatic distillation (a spotlight!). Label your data orders of magnitude faster and cheaper — come join us today at Poster Session 2 East for a demo!

11.12.2024 23:15 — 👍 15 🔁 8 💬 0 📌 0

Posts by (@changho.bsky.social)