David Alvarez-Melis @dmelis

To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning Recent advancements in large language models have significantly improved their reasoning abilities, particularly through techniques involving search and backtracking. Backtracking naturally scales tes...

Backtracking isn't a silver bullet for LLM reasoning.
Our work shows when it helps, when it hurts, and why.
If you’re training models to reason—especially with RL—how you teach them to search matters!
Check out our paper 👉 arxiv.org/abs/2504.07052 🙏📄

11.04.2025 16:29 — 👍 2 🔁 1 💬 0 📌 0

We push further with reinforcement learning 🚀

Fine-tuning with GRPO, backtracking shines: it discovers new, efficient strategies. 🌟

The no-backtracking model?
✅ Great at low compute (pass@1)
❌ But loses ability to generate diversity solutions—hurting pass@k performance.

11.04.2025 16:29 — 👍 1 🔁 0 💬 1 📌 0

Can we fix backtracking on CountDown by tackling these 2 issues? 🔧 We try two variations:
🔀 Mix-backtracking: trained on more diverse search traces
🧠 Think-backtracking: skips steps to encourage implicit reasoning
Both help! But with enough compute, direct solution still wins

11.04.2025 16:29 — 👍 2 🔁 0 💬 1 📌 0

2️⃣ Backtracking makes models verbose—often at the expense of “actual” reasoning 💬
Instead of thinking internally without outputting CoT, they learn to spell out every step, even when it’s unnecessary.
It talks more…🤯📝 but thinks less— this hurts test-time efficiency!

11.04.2025 16:29 — 👍 1 🔁 0 💬 1 📌 0

But what goes wrong when backtracking fails (eg in CountDown)?🤔We find 2 pitfalls:
1️⃣Teaching models to search via CoT can backfire—they learn to make mistakes. On many problems, our backtracking model makes more mistakes before finding the right answer (vs direct sol. model)!

11.04.2025 16:29 — 👍 1 🔁 0 💬 1 📌 0

Here’s what we found:
🔢 On CountDown, the direct solution model—no self-reflection, just raw diversity—outperforms backtracking
🧮 But on Sudoku, the result flips: backtracking wins.
So, backtracking isn’t universally beneficial—it depends on the nature of the reasoning required

11.04.2025 16:29 — 👍 1 🔁 0 💬 1 📌 0

We compare backtracking (BT) to an alternative way to scale test-time compute: parallel sampling + best-of-N.
We train:
1️⃣ A backtracking model using CoT to perform search
2️⃣ A direct solution model that learns from the optimal solution
Equating test-compute, who will win? 🤔

11.04.2025 16:29 — 👍 3 🔁 0 💬 1 📌 0

In our newest work (led by the amazing
@sunnytqin.bsky.social , w/ @emalach.bsky.social, Samy Jelassi), we investigate a core question for LLMs: "𝑡𝑜 𝑏𝑎𝑐𝑘𝑡𝑟𝑎𝑐𝑘 𝑜𝑟 𝑛𝑜𝑡 𝑡𝑜 𝑏𝑎𝑐𝑘𝑡𝑟𝑎𝑐𝑘" in two prototypical logic-heavy puzzles: CountDown and Sudoku.

11.04.2025 16:29 — 👍 3 🔁 2 💬 1 📌 0

🚨 New preprint! TL;DR: Backtracking is not the "holy grail" for smarter LLMs.

It’s praised for helping models “fix mistakes” and improve reasoning—but is it really the best use of test-time compute? 🤔

11.04.2025 16:29 — 👍 8 🔁 2 💬 1 📌 0

Seeking a Sociotechnical Systems Research Assistant (aka “Pre-Doc”) Apply here: (NOTE: Application Portal opens February 3, 2025) Deadline: March 3, 2025. (Late or incomplete applications will not be considered.) NOTE: Unfortunately, applicants must be eligib…

🎉Microsoft Research New England is hiring a predoctoral research assistant to work with @nancybaym.bsky.social, Tarleton Gillespie, and @marylgray.bsky.social on issues related to the dynamics of technology and society. 🎉

socialmediacollective.org/2025/01/22/s...

13.02.2025 16:25 — 👍 25 🔁 18 💬 0 📌 2

Sometimes I am a Tree: Data Drives Unstable Hierarchical Generalization Language models (LMs), like other neural networks, often favor shortcut heuristics based on surface-level patterns. Although LMs behave like n-gram models early in training, they must eventually learn...

Transformer LMs get pretty far by acting like ngram models, so why do they learn syntax? A new paper by sunnytqin.bsky.social, me, and @dmelis.bsky.social illuminates grammar learning in a whirlwind tour of generalization, grokking, training dynamics, memorization, and random variation. #mlsky #nlp

20.12.2024 17:55 — 👍 136 🔁 31 💬 5 📌 1

David Alvarez-Melis

Latest posts by dmelis.bsky.social on Bluesky

@dmelis is following 20 prominent accounts