πMany thanks to my supervisor @djsutherland.ml and the reviewers for their thoughtful suggestions and feedback.
πPoster Hall 3+2B #376 on Fri, Apr 25 at 15:00
π€Oral in Session 6A on Sat, Apr 26 at 16:30
π°https://arxiv.org/pdf/2407.10490
(12/12)
21.04.2025 05:45 β π 5 π 0 π¬ 0 π 0
This wraps up the main story of our paper. π¬
But thereβs more comingβ
π§ Many RL + LLM methods (like GRPO) also involve negative gradients.
π― And a token-level AKG decomposition is even more suitable for real-world LLMs.
Please stay tuned. π
(11/12)
21.04.2025 05:45 β π 4 π 0 π¬ 1 π 0
With this setup, we can now explain some strange behaviors in DPO, like why the model's confidence on both the chosen and rejected answers drops after long training. ππ
Just apply force analysis and remember: the smaller p(y-), the stronger the squeezing effect.
(10/12)
21.04.2025 05:45 β π 3 π 0 π¬ 1 π 0
Just like this!!!
(9/12)
21.04.2025 05:45 β π 4 π 0 π¬ 1 π 0
We formally show that, as long as you're using a softmax to produce probabilistic predictions, the squeezing effect is inevitable. And it gets stronger when p(y-) is smaller β the less likely an answer is (especially for off-policy), the harder all dimensions get squeezed.
(8/12)
21.04.2025 05:45 β π 5 π 0 π¬ 1 π 0
Now letβs switch gears to DPO β a more complex algorithm than SFT (as its AKG decomposition shows). But from a force analysis perspective, the story is surprisingly similar.
βοΈ The key difference? DPO introduces a negative gradient term β thatβs where the twist comes in.
(7/12)
21.04.2025 05:45 β π 3 π 0 π¬ 1 π 0
It also offers a possible explanation for a specific hallucination pattern in SFT:
π The model uses facts or phrases from A2 when answering an unrelated Q1.
Why does this happen?
Just do a force analysis β the answer emerges naturally. π‘
(6/12)
21.04.2025 05:45 β π 2 π 0 π¬ 1 π 0
Time to see how learning dynamics explains those weird behaviors. We observe a consistent trend: similar responses often rise in confidence, then fall.
ππ This aligns well with the force analysis perspective. (More supporting experiments in the paper).
(5/12)
21.04.2025 05:45 β π 3 π 0 π¬ 1 π 0
Now letβs analyze SFT!
The change in the modelβs prediction can be decomposed (AKG-style). The input is a concatenation: [x; y]. This lets us ask questions like: βHow does the modelβs confidence in 'y-' change if we fine-tune on 'y+'?β
(4/12)
21.04.2025 05:45 β π 2 π 0 π¬ 1 π 0
This toy example on MNIST helps you understand how it works: since 4 and 9 look similar from the model's perspective, learning 4 will make p(y=4 | 9) more likely. (More detailed discussions on simple classification tasks can be found here arxiv.org/pdf/2203.02485)
(3/12)
21.04.2025 05:45 β π 2 π 0 π¬ 1 π 0
Instead of focusing on the global optimum, learning dynamics analyzes how the model behaves during training β one update at a time.
π§ Think of the model's prediction as an object and each gradient update as a force acting on it.
(2/12)
21.04.2025 05:45 β π 6 π 0 π¬ 1 π 0
You might have seen some strange behaviors when fine-tuning LLMs.
π§©Prior work offers great insights, but we take a different angle: We dive into the dynamics behind these changes, step by step, like force analysis in physics. βοΈ
(1/12)
21.04.2025 05:45 β π 7 π 0 π¬ 1 π 0
π’Curious why your LLM behaves strangely after long SFT or DPO?
We offer a fresh perspectiveβconsider doing a "force analysis" on your modelβs behavior.
Check out our #ICLR2025 Oral paper:
Learning Dynamics of LLM Finetuning!
(0/12)
21.04.2025 05:45 β π 70 π 15 π¬ 1 π 3
We formally show that, as long as you're using a softmax to produce probabilistic predictions, the squeezing effect is inevitable. And it gets stronger when p(y-) is smaller β the less likely an answer is (especially for off-policy), the harder all dimensions get squeezed.
(8/12)
21.04.2025 05:33 β π 0 π 0 π¬ 0 π 0
Now letβs switch gears to DPO β a more complex algorithm than SFT (as its AKG decomposition shows). But from a force analysis perspective, the story is surprisingly similar.
βοΈ The key difference? DPO introduces a negative gradient term β thatβs where the twist comes in.
(7/12)
21.04.2025 05:33 β π 0 π 0 π¬ 1 π 0
It also offers a possible explanation for a specific hallucination pattern in SFT:
π The model uses facts or phrases from A2 when answering an unrelated Q1.
Why does this happen?
Just do a force analysis β the answer emerges naturally. π‘
(6/12)
21.04.2025 05:33 β π 0 π 0 π¬ 1 π 0
Time to see how learning dynamics explains those weird behaviors. We observe a consistent trend: similar responses often rise in confidence, then fall.
ππ This aligns well with the force analysis perspective. (More supporting experiments in the paper).
(5/12)
21.04.2025 05:33 β π 0 π 0 π¬ 1 π 0
Now letβs analyze SFT!
The change in the modelβs prediction can be decomposed (AKG-style). The input is a concatenation: [x; y]. This lets us ask questions like: βHow does the modelβs confidence in 'y-' change if we fine-tune on 'y+'?β
(4/12)
21.04.2025 05:33 β π 0 π 0 π¬ 1 π 0
This toy example on MNIST helps you understand how it works: since 4 and 9 look similar from the model's perspective, learning 4 will make p(y=4 | 9) more likely. (More detailed discussions on simple classification tasks can be found here arxiv.org/pdf/2203.02485)
(3/12)
21.04.2025 05:33 β π 0 π 0 π¬ 1 π 0
Instead of focusing on the global optimum, learning dynamics analyzes how the model behaves during training β one update at a time.
π§ Think of the model's prediction as an object and each gradient update as a force acting on it.
(2/12)
21.04.2025 05:33 β π 0 π 0 π¬ 1 π 0
You might have seen some strange behaviors when fine-tuning LLMs.
π§©Prior work offers great insights, but we take a different angle: We dive into the dynamics behind these changes, step by step, like force analysis in physics. βοΈ
(1/12)
21.04.2025 05:33 β π 0 π 0 π¬ 1 π 0