Andreas Opedal, Yanick Zengaffinen, Haruki Shirakami, Clemente Pasti, Mrinmaya Sachan, Abulhair Saparov, Ryan Cotterell, Bernhard Sch\"olkopf
Are Language Models Efficient Reasoners? A Perspective from Logic Programming
https://arxiv.org/abs/2510.25626
               
            
            
                30.10.2025 05:25 β π 0    π 1    π¬ 0    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Francesco Ignazio Re, Andreas Opedal, Glib Manaiev, Mario Giulianelli, Ryan Cotterell: A Spatio-Temporal Point Process for Fine-Grained Modeling of Reading Behavior https://arxiv.org/abs/2506.19999 https://arxiv.org/pdf/2506.19999 https://arxiv.org/html/2506.19999
               
            
            
                26.06.2025 06:32 β π 1    π 3    π¬ 0    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            See the paper for more details and experiments: arxiv.org/pdf/2410.13502 
Or check out the codebase to generate your own problems: github.com/eth-lre/math...
               
            
            
                14.03.2025 16:14 β π 0    π 0    π¬ 0    π 0                      
            
         
            
        
            
            
            
            
                                                
                                                
    
    
    
    
            All models are sensitive to a simple change in sentence ordering, where we take one sentence and move it to the beginning. We also find that the problem is easiest for LLMs if the sentence is moved from near the beginning or end, rather than from the middle!
               
            
            
                14.03.2025 16:14 β π 0    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
                                                
                                                
    
    
    
    
            OpenAIβs o1 and DeepSeek-R1 are certainly impressive. However, when we permuted the ordering of the sentences their performance went down to 5% and 11% respectively (with the token limit set to 25,000 as recommended by OpenAI).
               
            
            
                14.03.2025 16:14 β π 0    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
                                                
                                                
    
    
    
    
            Here are the results for what we call βnonlinearβ problems. Solving them requires keeping intermediate results in memory for subsequent steps before they can be used for further deduction. The most complex problems are pretty hard for all models, but they are still able to solve some of them!
               
            
            
                14.03.2025 16:14 β π 0    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            We apply MathGAP to perform a systematic analysis on whether LLMs can use simple examples in context to solve more complex ones at inference. Generalization to proof width turns out to be harder than to proof depth, but we see a steady decrease in performance as proofs get both deeper and wider π‘
               
            
            
                14.03.2025 16:14 β π 0    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
                                                
                                                
    
    
    
    
            With our proof system we can generate new MWPs that adhere to the structure of proof trees, as well as ground-truth CoT traces! From the proof trees we then characterize the complexity of reasoning in several ways, e.g., depth, width, shape, and ordering of nodes (i.e., sentences).
               
            
            
                14.03.2025 16:14 β π 0    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Our work builds on a simple observation: Math word problems (MWPs) are deductive reasoning problems, so solving them can be thought of as applying inference rules. We can thus view solution/reasoning traces as proof trees, the structure of which tells us how hard/complex the problem is to solve.
               
            
            
                14.03.2025 16:14 β π 0    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
                                                
                                                
    
    
    
    
            New #ICLR2025 paper π£π£ 
We argue that to properly evaluate a modelβs reasoning ability, it must be tested on problems that are harder than the ones it has already seen. Enter MathGAP, an evaluation framework for math word problems with arbitrarily complex proofsπ§΅
arxiv.org/abs/2410.13502
               
            
            
                14.03.2025 16:14 β π 11    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Andreas Opedal, Eleanor Chodroff, Ryan Cotterell, Ethan Gotlieb Wilcox
On the Role of Context in Reading Time Prediction
https://arxiv.org/abs/2409.08160
               
            
            
                11.10.2024 03:01 β π 0    π 1    π¬ 0    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Mario Giulianelli, Andreas Opedal, Ryan Cotterell
Generalized Measures of Anticipation and Responsivity in Online Language Processing
https://arxiv.org/abs/2409.10728
               
            
            
                16.10.2024 03:01 β π 0    π 1    π¬ 0    π 0                      
            
         
    
         
        
            
        
                            
                    
                    
                                            Cognitive & developmental psychology. Emotions in expectation violations & belief updates.
PhDc in Marburg, Germany.
                                     
                            
                    
                    
                                            RamΓ³n y Cajal Research fellow at the University of Valladolid @uva-es.bsky.social. #language, #cognition #hearingloss, #bilingualism. @ASU alumna. @BecariosFLCπ¦»ππ§  #firstgenππΎ
                                     
                            
                    
                    
                                            PhD student in Language Science at @ucirvine.bsky.social
Website: https://shiupadhye.github.io/
                                     
                            
                    
                    
                                            PhD student in Computer Science and Natural Language Processing at ETH ZΓΌrich
                                     
                            
                    
                    
                                            PhD student in Machine Learning at ETH Zurich & Max Planck Institute
                                     
                            
                    
                    
                                            University of Cambridge and 
Max Planck Institute for Intelligent Systems 
I'm interested in amortized inference/PFNs/in-context learning for challenging probabilistic and causal problems.
https://arikreuter.github.io/
                                     
                            
                    
                    
                                            Compling PhD student @UT_Linguistics | prev. CS, Math, Comp. Cognitive Sci @cornell
                                     
                            
                    
                    
                                            Sentence processing modeling | Computational psycholinguistics | 1st year PhD student at LLF, CNRS, UniversitΓ© Paris CitΓ© | Currently visiting COLT, Universitat Pompeu Fabra, Barcelona, Spain
https://ninanusb.github.io/
                                     
                            
                    
                    
                                            Postdoc Fellow @ ETH AI Center.
https://eleanor-h.github.io/
                                     
                            
                    
                    
                                            Senior Lecturer (Associate Professor) in Natural Language Processing, Queen's University Belfast. NLProc β’ Cognitive Science β’ Semantics β’ Health Analytics.
                                     
                            
                    
                    
                                            Doctor of NLP/Vision+Language from UCSB
Evals, metrics, multilinguality, multiculturality, multimodality, and (dabbling in) reasoning
https://saxon.me/
                                     
                            
                    
                    
                                            Machine Learning Researcher | PhD Candidate @ucsd_cse | @trustworthy_ml
chhaviyadav.org
                                     
                            
                    
                    
                                            generative models and making them faster
                                     
                            
                    
                    
                                            We support #MaxPlanck researchers in founding science-based #startups by transforming cutting-edge #research into real-world solutions and incubating #innovation together.
β ππ£π£ππ¬ π‘π’πͺ: http://www.maximize-incubator.com
                                     
                            
                    
                    
                                            π€ Computational Linguist
π Postdoctoral researcher for @Abstraction_ERC project at @Unibo
π€Ώ Freediver
https://giuliarambelli.github.io/
                                     
                            
                    
                    
                                            PhD Student @ ETH Zurich
Previously: Student Researcher @ Google DeepMind, @Γcole polytechnique
https://nicolaszucchet.github.io
                                     
                            
                    
                    
                                            ELLIS PhD Fellow @belongielab.org | @aicentre.dk | University of Copenhagen | @amsterdamnlp.bsky.social | @ellis.eu 
Multi-modal ML | Alignment | Culture | Evaluations & Safety| AI & Society
Web: https://www.srishti.dev/
                                     
                            
                    
                    
                                            NLP PhD student at Chalmers University of Technology, Sweden. Working on retrieval augmented language models and interpretability.
                                     
                            
                    
                    
                                            PhD fellow in XAI, IR & NLP
βοΈ Mila - Quebec AI Institute | University of Copenhagen π°
#NLProc #ML #XAI
Recreational sufferer
                                     
                            
                    
                    
                                            Professor, University Of Copenhagen π©π° PI @belongielab.org π΅οΈββοΈ Director @aicentre.dk π€ Board member @ellis.eu πͺπΊ Formerly: Cornell, Google, UCSD
#ComputerVision #MachineLearning