Tons of model weights available, but what else can we do besides prediction? π€ Introducing Grad-Mimic! A new data selection framework using well-trained modelβs weights to find high-value samples for foundation models. Boost data curation & data efficiency!
09.02.2025 21:07 β
π 3
π 3
π¬ 1
π 0
Weak-to-Strong Generalization Through the Data-Centric Lens
The weak-to-strong generalization phenomenon is the driver for important machine learning applications including highly data-efficient learning and, most recently, performing superalignment. While dec...
11/ We'd love to hear your thoughts and dive deeper into the discussion! π
See you in Singapore at #ICLR2025!
Big thanks to my advisor
@fredsala.bsky.social for his guidance and to John for his contributions!
Paper: arxiv.org/abs/2412.03881
Github: github.com/SprocketLab/...
05.02.2025 18:41 β
π 1
π 0
π¬ 0
π 0
10/ By leveraging the strengths of simpler models and their understanding of easy patterns, stronger models can iteratively build upon this foundation to tackle increasingly complex challenges. We see this process as a systematic and scalable path toward achieving superintelligence.
05.02.2025 18:40 β
π 0
π 0
π¬ 1
π 0
9/ Looking ahead, we're excited to explore data-centric mechanisms for weak-to-strong generalization. Just as scholars refine theoriesβbuilding on past insights to deepen understanding and create new conceptsβwe believe weak-to-strong generalization follows a similar trajectory.
05.02.2025 18:40 β
π 0
π 0
π¬ 1
π 0
8) Key takeaways for practitioners:
- Instead of just improving algorithms, focus on selecting the right data!
- Prioritizing high-overlap data sources gives us better generalization.
05.02.2025 18:39 β
π 0
π 0
π¬ 1
π 0
7/ How can we maximize overlap density when choosing data sources?
We frame data selection as a bandit problem, using UCB to balance exploration and exploitation across datasets. This strategically identifies and prioritize sources with high overlap density, maximizing generalization.
05.02.2025 18:39 β
π 0
π 0
π¬ 1
π 0
6/ To tackle this, we propose an overlap detection algorithm that uncovers these points in real-world datasets and helps explain both the presence and absence of weak-to-strong generalization.
05.02.2025 18:38 β
π 0
π 0
π¬ 1
π 0
5/ However, identifying overlap density in real-world datasets is challenging. Overlapping points are latent!
05.02.2025 18:38 β
π 0
π 0
π¬ 1
π 0
4/ π Key insights:
- Weak models can make accurate pseudolabels based on easy patterns
- Strong models leverage these labels to generalize on hard patterns.
- More overlap β better generalization
05.02.2025 18:38 β
π 1
π 0
π¬ 1
π 0
3/ The amount of these overlaps, i.e., the proportion of points containing both the easy and hard patterns, is the core quantity determining how much weak-to-strong generalization we get.
05.02.2025 18:34 β
π 0
π 0
π¬ 1
π 0
2/ The intuition is simple: generalization tracks the data points containing both βeasyβ patterns (learnable by a weak model) and βchallengingβ patterns (only learnable by a stronger model), as with such points, weak predictions create signal to learn challenging patterns with stronger models.
05.02.2025 18:25 β
π 1
π 0
π¬ 1
π 0
1/ Weak-to-strong generalization, where a strong student model surpasses its weaker teacher model, is crucial for achieving 'superintelligence'. We propose a mechanism explaining when and why this happens.
05.02.2025 18:22 β
π 0
π 0
π¬ 1
π 0
What enables a strong model to surpass its weaker teacher?
π Excited to share our ICLR 2025 paper: "Weak-to-Strong Generalization Through the Data-Centric Lens"! π§΅
05.02.2025 18:22 β
π 4
π 2
π¬ 1
π 0
First up at #NeurIPS2024 from our group, our work on labeling via programmatic distillation (a spotlight!). Label your data orders of magnitude faster and cheaper β come join us today at Poster Session 2 East for a demo!
11.12.2024 23:15 β
π 15
π 8
π¬ 0
π 0