How can we use models of cognition to help LLMs interpret figurative language (irony, hyperbole) in a more human-like manner? Come to our #ACL2025NLP poster on Wednesday at 11AM (exhibit hall - exact location TBA) to find out! @mcgill-nlp.bsky.social @mila-quebec.bsky.social @aclmeeting.bsky.social
28.07.2025 09:16 β π 3 π 2 π¬ 0 π 0
What do systematic hallucinations in LLMs tell us about their generalization abilities?
Come to our poster at #ACL2025 on July 29th at 4 PM in Level 0, Halls X4/X5. Would love to chat about interpretability, hallucinations, and reasoning :)
@mcgill-nlp.bsky.social @mila-quebec.bsky.social
28.07.2025 09:18 β π 2 π 2 π¬ 0 π 0
A blizzard is raging through Montreal when your friend says βLooks like Florida out there!β Humans easily interpret irony, while LLMs struggle with it. We propose a π³π©π¦π΅π°π³πͺπ€π’π-π΄π΅π³π’π΅π¦π¨πΊ-π’πΈπ’π³π¦ probabilistic framework as a solution.
Paper: arxiv.org/abs/2506.09301 to appear @ #ACL2025 (Main)
26.06.2025 15:52 β π 15 π 7 π¬ 1 π 4
π Huge thanks to my collaborators @mengcao.bsky.social, Marc-Antoine Rondeau, and my advisor Jackie Cheung for their invaluable guidance and support throughout this work, and to friends at @mila-quebec.bsky.social and @mcgill-nlp.bsky.social π 7/n
06.06.2025 18:12 β π 3 π 0 π¬ 0 π 0
π§ TL;DR: These irrelevant context hallucinations show that LLMs go beyond mere parroting π¦ β they do generalize, based on contextual cues and abstract classes. But not reliably. They're more like chameleons π¦ β blending with the context, even when they shouldnβt. 6/n
06.06.2025 18:11 β π 8 π 0 π¬ 1 π 1
π Whatβs going on inside?
With mechanistic interpretability, we found:
- LLMs first compute abstract classes (like βlanguageβ) before narrowing to specific answers
- Competing circuits inside the model: one based on context, one based on query. Whichever is stronger wins. 5/n
06.06.2025 18:11 β π 3 π 0 π¬ 1 π 0
Sometimes this yields the right answer for the wrong reasoning (βPortugueseβ from βBrazilβ), other times, it produces confident errors (βJapaneseβ from βHondaβ). 4/n
06.06.2025 18:11 β π 1 π 0 π¬ 1 π 0
Turns out, we can. They follow a systematic failure mode we call class-based (mis)generalization: the model abstracts the class from the query (e.g., languages) and generalizes based on features from the irrelevant context (e.g., Honda β Japan). 3/n
06.06.2025 18:10 β π 6 π 0 π¬ 1 π 0
These examples show answers β even to the same query β can shift under different irrelevant contexts. Can we predict these shifts? 2/n
06.06.2025 18:10 β π 9 π 0 π¬ 1 π 0
Do LLMs hallucinate randomly? Not quite.
Our #ACL2025 (Main) paper shows that hallucinations under irrelevant contexts follow a systematic failure mode β revealing how LLMs generalize using abstract classes + context cues, albeit unreliably.
π Paper: arxiv.org/abs/2505.22630 1/n
06.06.2025 18:09 β π 46 π 18 π¬ 1 π 3
Master student at Saarland university
PhD-ing at McGill Linguistics + Mila, working under Prof. Siva Reddy. Mostly computational linguistics, with some NLP; habitually disappointed Arsenal fan
CS PhD student at UT Austin in #NLP
Interested in language, reasoning, semantics and cognitive science. One day we'll have more efficient, interpretable and robust models!
Other interests: math, philosophy, cinema
https://www.juandiego-rodriguez.com/
Senior applied research scientist at Mila. NLP. ML for healthcare/bio. Football. Pink Floyd. Post-rock. Montreal bagel ambassador π¨π¦.
https://zhi-wen.net/
AI/ML Applied Research Intern at Adobe | NLP-ing (Research Masters) at MILA/McGill
#NLP Postdoc at Mila - Quebec AI Institute & McGill University
mariusmosbach.com
Le plus grand centre de recherche universitaire en apprentissage profond β The world's largest academic research center in deep learning.
phding@mcgill, words@reboot, translator@limited connection, forecast.weather.gov stan
shiraab.github.io
Books:
https://asterismbooks.com/product/the-hand-of-the-hand-laura-vazquez
https://www.arche-editeur.com/livre/le-reve-dun-langage-commun-747
PhD fellow in XAI, IR & NLP
βοΈ Mila - Quebec AI Institute | University of Copenhagen π°
#NLProc #ML #XAI
Recreational sufferer
Interp & analysis in NLP
Mostly π¦π·, slightly π¨π±
π¨βπ³ Web Agents @mila-quebec.bsky.social
π @mcgill-nlp.bsky.social
Assistant Professor @Mila-Quebec.bsky.social
Co-Director @McGill-NLP.bsky.social
Researcher @ServiceNow.bsky.social
Alumni: @StanfordNLP.bsky.social, EdinburghNLP
Natural Language Processor #NLProc
#NLP Postdoc at Mila - Quebec AI Institute and McGill University | Former PhD @ University of Copenhagen (CopeNLU)
π karstanczak.github.io
PhD Student at MILA/McGill University with Prof. Siva Reddy and Prof. Vered Shwartz. Previously UBC-CS.
Studying societal impacts of AI, alignment and safety.
Based in Montrealπ¨π¦
PhD Student at Mila and McGill | Research in ML and NLP | Past: AI2, MSFTResearch
arkilpatel.github.io
AI PhDing at Mila/McGill (prev FAIR intern). Happily residing in Montreal π₯―βοΈ
Academic: language grounding, vision+language, interp, rigorous & creative evals, cogsci
Other: many sports, urban explorations, puzzles/quizzes
bennokrojer.com
PhD student at Mila & McGill | Visiting student at KAIST #nlp https://ianporada.github.io/
Indigenous language technology. PhD candidate at McGill University in Montreal. NgΔpuhi Nui Tonu.
PhDing @ MILA/McGill in Computer Science | Multi-agent systems, emergent organization, and neurosymbolic methods
Previously @ UWaterloo
Also love guitar/bass, volleyball, history, cycling, and institutional design