Huge thanks to all my collaborators @antonxue.bsky.social, @shreyahavaldar.bsky.social , @deliprao.bsky.social , @helenjin.bsky.social, Chris Callison-Burch, @riceric22.bsky.social
LLMs often make reasoning errors. However, current LLM error detection methods often fail when earlier errors corrupt downstream judgments. We introduce Autoregressive Reasoning Entailment Stability (ARES), an framework for measuring reasoning soundness with stability guarantees.
I'll be presenting our work "Probabilistic Soundness Guarantees in LLM Reasoning Chains" at EMNLP 2025
Today (Nov 5) Hall C 14:30-16:00 802-Main
Blog: debugml.github.io/ares
Paper: arxiv.org/abs/2507.12948
Code: github.com/fallcat/ares
Joint works with @profericwong.bsky.social @antonxue.bsky.social @shreyahavaldar.bsky.social @helenjin.bsky.social @deliprao.bsky.social Chris Callison-Burch, Helen Qu, Marco Gatti, Bhuvnesh Jain
See our posters at Actionable Interpretability workshop
East Ballroom A
Probabilistic Soundness Guarantees in LLM Reasoning Chains
arxiv.org/abs/2507.12948
10:40-11:40am
Sum-of-Parts: Self-Attributing Neural Networks with End-to-End Learning of Feature Group
debugml.github.io/sum-of-parts
1-2pm