π¨ One model, high correctness:
With low-threshold tuning, we take Llama3-70B from:
β‘οΈ 51% β 87% correctness
β‘οΈ Retaining 53% of the original completeness
06.06.2025 08:21 β π 0 π 0 π¬ 1 π 0
βοΈ HALT allows you to trade off completeness and correctness
We introduce a threshold that tunes how eagerly the model should respond:
Low threshold = more reliable answers π (Left box)
High threshold = more detailed answers π(Right box)
06.06.2025 08:21 β π 0 π 0 π¬ 1 π 0
π οΈ Our approach: Adjust finetuning responses to match the capabilities of the LLM
1οΈβ£ Break pretrained LLM responses into factual fragments
2οΈβ£ Use ground truth to flag incorrect fragments
3οΈβ£ Modify finetuning responses by removing or replacing errors with βUnsure from hereβ π§
06.06.2025 08:21 β π 0 π 0 π¬ 1 π 0
π§ Standard LLMs always respond β even when unsure.
This leads to partially incorrect outputs in critical domains like Coding, Math, Medicine, and QA.
Why? Standard finetuning ignores what the pretrained model actually knows and pushes it to always complete every prompt.
06.06.2025 08:21 β π 1 π 0 π¬ 1 π 0
What if LLMs knew when to stop? π§
HALT finetuning teaches LLMs to only generate content theyβre confident is correct.
π Insight: Post-training must be adjusted to the modelβs capabilities.
βοΈ Tunable trade-off: Higher correctness π vs. More completeness π
π§΅
06.06.2025 08:21 β π 10 π 1 π¬ 1 π 0
Google Chief Scientist, Gemini Lead. Opinions stated here are my own, not those of Google. Gemini, TensorFlow, MapReduce, Bigtable, Spanner, ML things, ...
Chief Scientist at the UK AI Security Institute (AISI). Previously DeepMind, OpenAI, Google Brain, etc.
Reverse engineering neural networks at Anthropic. Previously Distill, OpenAI, Google Brain.Personal account.
What would we need to understand in order to design an amazing future? Ex DeepMind, OpenAI
At Microsoft Research. Lead of https://aka.ms/game-intelligence - we drive innovation in machine learning with applications in games. https://iclr.cc Board.
Researcher in robotics and machine learning (Reinforcement Learning). Maintainer of Stable-Baselines (SB3).
https://araffin.github.io/
PhD. Student @ELLIS.eu @UniFreiburg with Thomas Brox and Cordelia Schmid
Understanding intelligence and cultivating its societal benefits
https://kifarid.github.io
Research Scientist at Google DeepMind, interested in multiagent reinforcement learning, game theory, games, and search/planning.
Lover of Linux π§, coffee β, and retro gaming. Big fan of open-source. #gohabsgo π¨π¦
For more info: https://linktr.ee/sharky6000
#RobotLearning Professor (#MachineLearning #Robotics) at @ias-tudarmstadt.bsky.social of
@tuda.bsky.social @dfki.bsky.social @hessianai.bsky.social
Neuronerd, Prof, head of HIP lab in Experimental Psychology, Oxford. Researcher, UK AI Safety Institute. https://humaninformationprocessing.com/.
Stanford Linguistics and Computer Science. Director, Stanford AI Lab. Founder of @stanfordnlp.bsky.social . #NLP https://nlp.stanford.edu/~manning/
I lead Cohere For AI. Formerly Research
Google Brain. ML Efficiency, LLMs,
@trustworthy_ml.
PhD supervised by Tim RocktΓ€schel and Ed Grefenstette, part time at Cohere. Language and LLMs. Spent time at FAIR, Google, and NYU (with Brenden Lake). She/her.
Professor, Programmer in NYC.
Cornell, Hugging Face π€
RL & Meta-Learning @ DeepMind.
Associate Professor of Machine Learning, University of Oxford;
OATML Group Leader;
Director of Research at the UK government's AI Safety Institute (formerly UK Taskforce on Frontier AI)
Machine learning, environmental modeling, sustainability, robotics
Professor @UCL
He/him
PhD student at Mila and Visiting Researcher at Meta, working on the science of AI agents. Made in Sicily.
Professor and Head of Machine Learning Department at Carnegie Mellon. Board member OpenAI. Chief Technical Advisor Gray Swan AI. Chief Expert Bosch Research.