In this amazing multidisciplinary collaboration, we report our early experience with the @openclaw-x.bsky.social ->
23.02.2026 23:32 β π 40 π 21 π¬ 1 π 9In this amazing multidisciplinary collaboration, we report our early experience with the @openclaw-x.bsky.social ->
23.02.2026 23:32 β π 40 π 21 π¬ 1 π 9π Why it matters: interpretable, controllable compression students that learn how the teacher thinks. We also see faster, cleaner training dynamics compared to baselines. Preprint + details: arxiv.org/abs/2509.25002 (4/4)
30.09.2025 23:32 β π 1 π 0 π¬ 0 π 0π How it works (high level): identify the teacherβs task circuit --> find functionally analogous student components via ablation --> align their internals during training. Outcome: the student learns the same computation, not just the outputs. (3/4)
30.09.2025 23:32 β π 0 π 0 π¬ 1 π 0π On Entity Tracking and Theory of Mind, a student that updates only ~11β15% of attention heads inherits the teacherβs capability and closes much of the gap; targeted transfer over brute-force fine-tuning. (2/4)
30.09.2025 23:32 β π 0 π 0 π¬ 1 π 0π New work w/ @silvioamir.bsky.social & @byron.bsky.social! We show you can distill a modelβs mechanism, not just its answers -- teaching a small LM to run it's circuit same as a larger teacher model. We call it Circuit Distillation. (1/4)
30.09.2025 23:32 β π 5 π 0 π¬ 1 π 1
5οΈβ£ Attribution techniques could help trace distillation practices, ensuring compliance with model usage policies & improving transparency in AI systems. [6/6]
π Dive into the full details: arxiv.org/abs/2502.06659
4οΈβ£ Our analysis spans summarization, question answering, and instruction following, using models like Llama, Mistral, and Gemma as teachers. Across tasks, PoS templates consistently outperformed n-grams in distinguishing teachers π [5/6]
11.02.2025 17:16 β π 0 π 0 π¬ 1 π 03οΈβ£ But hereβs the twist: Syntactic patterns (like Part-of-Speech templates) do retain strong teacher signals! Students unconsciously mimic structural patterns from their teacher, leaving behind an identifiable trace π§© [4/6]
11.02.2025 17:16 β π 0 π 0 π¬ 1 π 02οΈβ£ Simple similarity metrics like BERTScore fail to attribute a student to its teacher. Even perplexity under the teacher model isnβt enough to reliably identify the original teacher. Shallow lexical overlap is just not a strong fingerprint π [3/6]
11.02.2025 17:16 β π 0 π 0 π¬ 1 π 01οΈβ£ Model distillation transfers knowledge from a large teacher model to a smaller student model. But does the fine-tuned student reveal clues in its outputs about its origins? [2/6]
11.02.2025 17:16 β π 0 π 0 π¬ 1 π 0
π’ Can we trace a small distilled model back to its teacher? π€New work (w/ @chantalsh.bsky.social, @silvioamir.bsky.social & @byron.bsky.social) finds some footprints left by LLMs in distillation! [1/6]
π Full paper: arxiv.org/abs/2502.06659