We push further with reinforcement learning 🚀
Fine-tuning with GRPO, backtracking shines: it discovers new, efficient strategies. 🌟
The no-backtracking model?
✅ Great at low compute (pass@1)
❌ But loses ability to generate diversity solutions—hurting pass@k performance.
11.04.2025 16:29 — 👍 1 🔁 0 💬 1 📌 0
Can we fix backtracking on CountDown by tackling these 2 issues? 🔧 We try two variations:
🔀 Mix-backtracking: trained on more diverse search traces
🧠 Think-backtracking: skips steps to encourage implicit reasoning
Both help! But with enough compute, direct solution still wins
11.04.2025 16:29 — 👍 2 🔁 0 💬 1 📌 0
2️⃣ Backtracking makes models verbose—often at the expense of “actual” reasoning 💬
Instead of thinking internally without outputting CoT, they learn to spell out every step, even when it’s unnecessary.
It talks more…🤯📝 but thinks less— this hurts test-time efficiency!
11.04.2025 16:29 — 👍 1 🔁 0 💬 1 📌 0
But what goes wrong when backtracking fails (eg in CountDown)?🤔We find 2 pitfalls:
1️⃣Teaching models to search via CoT can backfire—they learn to make mistakes. On many problems, our backtracking model makes more mistakes before finding the right answer (vs direct sol. model)!
11.04.2025 16:29 — 👍 1 🔁 0 💬 1 📌 0
Here’s what we found:
🔢 On CountDown, the direct solution model—no self-reflection, just raw diversity—outperforms backtracking
🧮 But on Sudoku, the result flips: backtracking wins.
So, backtracking isn’t universally beneficial—it depends on the nature of the reasoning required
11.04.2025 16:29 — 👍 1 🔁 0 💬 1 📌 0
We compare backtracking (BT) to an alternative way to scale test-time compute: parallel sampling + best-of-N.
We train:
1️⃣ A backtracking model using CoT to perform search
2️⃣ A direct solution model that learns from the optimal solution
Equating test-compute, who will win? 🤔
11.04.2025 16:29 — 👍 3 🔁 0 💬 1 📌 0
In our newest work (led by the amazing
@sunnytqin.bsky.social , w/ @emalach.bsky.social, Samy Jelassi), we investigate a core question for LLMs: "𝑡𝑜 𝑏𝑎𝑐𝑘𝑡𝑟𝑎𝑐𝑘 𝑜𝑟 𝑛𝑜𝑡 𝑡𝑜 𝑏𝑎𝑐𝑘𝑡𝑟𝑎𝑐𝑘" in two prototypical logic-heavy puzzles: CountDown and Sudoku.
11.04.2025 16:29 — 👍 3 🔁 2 💬 1 📌 0
🚨 New preprint! TL;DR: Backtracking is not the "holy grail" for smarter LLMs.
It’s praised for helping models “fix mistakes” and improve reasoning—but is it really the best use of test-time compute? 🤔
11.04.2025 16:29 — 👍 8 🔁 2 💬 1 📌 0
Sometimes I am a Tree: Data Drives Unstable Hierarchical Generalization
Language models (LMs), like other neural networks, often favor shortcut heuristics based on surface-level patterns. Although LMs behave like n-gram models early in training, they must eventually learn...
Transformer LMs get pretty far by acting like ngram models, so why do they learn syntax? A new paper by sunnytqin.bsky.social, me, and @dmelis.bsky.social illuminates grammar learning in a whirlwind tour of generalization, grokking, training dynamics, memorization, and random variation. #mlsky #nlp
20.12.2024 17:55 — 👍 136 🔁 31 💬 5 📌 1
PhD student @ImperialCollege. Research Scientist Intern @Meta prev. @Cohere, @GoogleAI. Interested in generalisable learning and reasoning. She/her
lisaalaz.github.io
ML for biology / healthcare at MSR.
Professor, Department of Psychology and Center for Brain Science, Harvard University
https://gershmanlab.com/
Neuroscience Professor at Harvard University. Personal account and posts here. Research group website: https://vnmurthylab.org.
I wrote a book.
Free pdf: http://trustworthymachinelearning.com
Paperback: http://amazon.com/dp/B09SL5GPCD
Posts are my own and don't necessarily represent IBM.
#NLP / #NLProc , #dataScience, #AI / #ArtificialIntelligence, #linguistics (#syntax, #semantics, …), occasional #parenting, #gardening, & what not. PhD. Adjunct prof once in a full red moon. Industry / technical mentor. Not my opinion, never my employer’s
Making memes, dreams, & software!
Sr. Director of Dev Advocacy at GitHub. Married to a dweeb and mom to a nerdy toddler + baby combo. She/Her ✝️ CHI 🏠
Subscribe to my newsletter!
cassidoo.co/newsletter
Assistant professor at Yale Linguistics. Studying computational linguistics, cognitive science, and AI. He/him.
Your Only Source For Professional Dog Ratings
nonprofit: @15outof10.org ❤️🩹
links.weratedogs.com
We advance science and technology to benefit humanity.
http://microsoft.com/research
⛷️ ML Theorist carving equations and mountain trails | 🚴♂️ Biker, Climber, Adventurer | 🧠 Reinforcement Learning: Always seeking higher peaks, steeper walls and better policies.
https://ualberta.ca/~szepesva
Harvard CS PhD Candidate. Interested in algorithmic decision-making, data-centric ML, and applications to public sector operations
Assistant Professor @Stanford CS @StanfordNLP @StanfordAILab
Computational Social Science & NLP
(jolly good) Fellow at the Kempner Institute @kempnerinstitute.bsky.social, incoming assistant professor at UBC Linguistics (and by courtesy CS, Sept 2025). PhD @stanfordnlp.bsky.social with the lovely @jurafsky.bsky.social
isabelpapad.com
Reverse engineering neural networks at Anthropic. Previously Distill, OpenAI, Google Brain.Personal account.
policy for v smart things @openai. Past: PhD @HarvardSEAS/@SchmidtFutures/@MIT_CSAIL. Posts my own; on my head be it
Researcher (OpenAI. Ex: DeepMind, Brain, RWTH Aachen), Gamer, Hacker, Belgian.
Anon feedback: https://admonymous.co/giffmana
📍 Zürich, Suisse 🔗 http://lucasb.eyer.be
Assistant Professor at UW and Staff Research Scientist at Google DeepMind. Social Reinforcement Learning in multi-agent and human-AI interactions. PhD from MIT. Check out https://socialrl.cs.washington.edu/ and https://natashajaques.ai/.
The Asia-Pacific Chapter of the Association for Computational Linguistics
The 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (AACL 2025)
https://www.afnlp.org/conferences/ijcnlp2025
#AACL2025 #NLProc #NLP
EMNLP 2025 - The annual Conference on Empirical Methods in Natural Language Processing
Dates: November 5-9, 2025 in Suzhou, China
Hashtags: #EMNLP2025 #NLP
Submission Deadline: May 19th, 2025