's Avatar

@changho.bsky.social

Ph.D student at @WisconsinCS @UWMadison

18 Followers  |  20 Following  |  12 Posts  |  Joined: 07.12.2024
Posts Following

Posts by (@changho.bsky.social)

Tons of model weights available, but what else can we do besides prediction? πŸ€” Introducing Grad-Mimic! A new data selection framework using well-trained model’s weights to find high-value samples for foundation models. Boost data curation & data efficiency!

09.02.2025 21:07 β€” πŸ‘ 3    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
Preview
Weak-to-Strong Generalization Through the Data-Centric Lens The weak-to-strong generalization phenomenon is the driver for important machine learning applications including highly data-efficient learning and, most recently, performing superalignment. While dec...

11/ We'd love to hear your thoughts and dive deeper into the discussion! πŸš€
See you in Singapore at #ICLR2025!

Big thanks to my advisor
@fredsala.bsky.social for his guidance and to John for his contributions!

Paper: arxiv.org/abs/2412.03881
Github: github.com/SprocketLab/...

05.02.2025 18:41 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

10/ By leveraging the strengths of simpler models and their understanding of easy patterns, stronger models can iteratively build upon this foundation to tackle increasingly complex challenges. We see this process as a systematic and scalable path toward achieving superintelligence.

05.02.2025 18:40 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

9/ Looking ahead, we're excited to explore data-centric mechanisms for weak-to-strong generalization. Just as scholars refine theoriesβ€”building on past insights to deepen understanding and create new conceptsβ€”we believe weak-to-strong generalization follows a similar trajectory.

05.02.2025 18:40 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

8) Key takeaways for practitioners:
- Instead of just improving algorithms, focus on selecting the right data!
- Prioritizing high-overlap data sources gives us better generalization.

05.02.2025 18:39 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

7/ How can we maximize overlap density when choosing data sources?

We frame data selection as a bandit problem, using UCB to balance exploration and exploitation across datasets. This strategically identifies and prioritize sources with high overlap density, maximizing generalization.

05.02.2025 18:39 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

6/ To tackle this, we propose an overlap detection algorithm that uncovers these points in real-world datasets and helps explain both the presence and absence of weak-to-strong generalization.

05.02.2025 18:38 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

5/ However, identifying overlap density in real-world datasets is challenging. Overlapping points are latent!

05.02.2025 18:38 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

4/ πŸ”‘ Key insights:
- Weak models can make accurate pseudolabels based on easy patterns
- Strong models leverage these labels to generalize on hard patterns.
- More overlap β†’ better generalization

05.02.2025 18:38 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

3/ The amount of these overlaps, i.e., the proportion of points containing both the easy and hard patterns, is the core quantity determining how much weak-to-strong generalization we get.

05.02.2025 18:34 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

2/ The intuition is simple: generalization tracks the data points containing both β€œeasy” patterns (learnable by a weak model) and β€œchallenging” patterns (only learnable by a stronger model), as with such points, weak predictions create signal to learn challenging patterns with stronger models.

05.02.2025 18:25 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

1/ Weak-to-strong generalization, where a strong student model surpasses its weaker teacher model, is crucial for achieving 'superintelligence'. We propose a mechanism explaining when and why this happens.

05.02.2025 18:22 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

What enables a strong model to surpass its weaker teacher?

πŸš€ Excited to share our ICLR 2025 paper: "Weak-to-Strong Generalization Through the Data-Centric Lens"! 🧡

05.02.2025 18:22 β€” πŸ‘ 4    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Post image

First up at #NeurIPS2024 from our group, our work on labeling via programmatic distillation (a spotlight!). Label your data orders of magnitude faster and cheaper β€” come join us today at Poster Session 2 East for a demo!

11.12.2024 23:15 β€” πŸ‘ 15    πŸ” 8    πŸ’¬ 0    πŸ“Œ 0