Somin W's Avatar

Somin W

@sominw.bsky.social

cs phd @ northeastern. opinions on new england & beyond..

32 Followers  |  71 Following  |  10 Posts  |  Joined: 20.11.2024  |  2.0582

Latest posts by sominw.bsky.social on Bluesky

Preview
Circuit Distillation Model distillation typically focuses on behavioral mimicry, where a student model is trained to replicate a teacher's output while treating its internal computations as a black box. In this work we pr...

πŸ“‹ Why it matters: interpretable, controllable compression students that learn how the teacher thinks. We also see faster, cleaner training dynamics compared to baselines. Preprint + details: arxiv.org/abs/2509.25002 (4/4)

30.09.2025 23:32 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

πŸ“˜ How it works (high level): identify the teacher’s task circuit --> find functionally analogous student components via ablation --> align their internals during training. Outcome: the student learns the same computation, not just the outputs. (3/4)

30.09.2025 23:32 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

🍎 On Entity Tracking and Theory of Mind, a student that updates only ~11–15% of attention heads inherits the teacher’s capability and closes much of the gap; targeted transfer over brute-force fine-tuning. (2/4)

30.09.2025 23:32 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸ”Š New work w/ @silvioamir.bsky.social & @byron.bsky.social! We show you can distill a model’s mechanism, not just its answers -- teaching a small LM to run it's circuit same as a larger teacher model. We call it Circuit Distillation. (1/4)

30.09.2025 23:32 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1
Preview
Who Taught You That? Tracing Teachers in Model Distillation Model distillation -- using outputs from a large teacher model to teach a small student model -- is a practical means of creating efficient models for a particular task. We ask: Can we identify a stud...

5️⃣ Attribution techniques could help trace distillation practices, ensuring compliance with model usage policies & improving transparency in AI systems. [6/6]

πŸ”— Dive into the full details: arxiv.org/abs/2502.06659

11.02.2025 17:16 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

4️⃣ Our analysis spans summarization, question answering, and instruction following, using models like Llama, Mistral, and Gemma as teachers. Across tasks, PoS templates consistently outperformed n-grams in distinguishing teachers πŸ“Š [5/6]

11.02.2025 17:16 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

3️⃣ But here’s the twist: Syntactic patterns (like Part-of-Speech templates) do retain strong teacher signals! Students unconsciously mimic structural patterns from their teacher, leaving behind an identifiable trace 🧩 [4/6]

11.02.2025 17:16 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

2️⃣ Simple similarity metrics like BERTScore fail to attribute a student to its teacher. Even perplexity under the teacher model isn’t enough to reliably identify the original teacher. Shallow lexical overlap is just not a strong fingerprint πŸ” [3/6]

11.02.2025 17:16 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

1️⃣ Model distillation transfers knowledge from a large teacher model to a smaller student model. But does the fine-tuned student reveal clues in its outputs about its origins? [2/6]

11.02.2025 17:16 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Who Taught You That? Tracing Teachers in Model Distillation Model distillation -- using outputs from a large teacher model to teach a small student model -- is a practical means of creating efficient models for a particular task. We ask: Can we identify a stud...

πŸ“’ Can we trace a small distilled model back to its teacher? πŸ€”New work (w/ @chantalsh.bsky.social, @silvioamir.bsky.social & @byron.bsky.social) finds some footprints left by LLMs in distillation! [1/6]

πŸ”— Full paper: arxiv.org/abs/2502.06659

11.02.2025 17:16 β€” πŸ‘ 7    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0

@sominw is following 20 prominent accounts