Ziling Cheng's Avatar

Ziling Cheng

@ziling-cheng.bsky.social

MSc Master's @mila-quebec.bsky.social @mcgill-nlp.bsky.social Research Fellow @ RBC Borealis Model analysis, interpretability, reasoning and hallucination Studying model behaviours to make them better :)) Looking for Fall '26 PhD

40 Followers  |  26 Following  |  8 Posts  |  Joined: 01.06.2025  |  1.8275

Latest posts by ziling-cheng.bsky.social on Bluesky

Post image

How can we use models of cognition to help LLMs interpret figurative language (irony, hyperbole) in a more human-like manner? Come to our #ACL2025NLP poster on Wednesday at 11AM (exhibit hall - exact location TBA) to find out! @mcgill-nlp.bsky.social @mila-quebec.bsky.social @aclmeeting.bsky.social

28.07.2025 09:16 β€” πŸ‘ 3    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Post image

What do systematic hallucinations in LLMs tell us about their generalization abilities?

Come to our poster at #ACL2025 on July 29th at 4 PM in Level 0, Halls X4/X5. Would love to chat about interpretability, hallucinations, and reasoning :)

@mcgill-nlp.bsky.social @mila-quebec.bsky.social

28.07.2025 09:18 β€” πŸ‘ 2    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Post image

A blizzard is raging through Montreal when your friend says β€œLooks like Florida out there!” Humans easily interpret irony, while LLMs struggle with it. We propose a 𝘳𝘩𝘦𝘡𝘰𝘳π˜ͺ𝘀𝘒𝘭-𝘴𝘡𝘳𝘒𝘡𝘦𝘨𝘺-𝘒𝘸𝘒𝘳𝘦 probabilistic framework as a solution.
Paper: arxiv.org/abs/2506.09301 to appear @ #ACL2025 (Main)

26.06.2025 15:52 β€” πŸ‘ 15    πŸ” 7    πŸ’¬ 1    πŸ“Œ 4

πŸ™ Huge thanks to my collaborators @mengcao.bsky.social, Marc-Antoine Rondeau, and my advisor Jackie Cheung for their invaluable guidance and support throughout this work, and to friends at @mila-quebec.bsky.social and @mcgill-nlp.bsky.social πŸ’™ 7/n

06.06.2025 18:12 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

🧠 TL;DR: These irrelevant context hallucinations show that LLMs go beyond mere parroting 🦜 β€” they do generalize, based on contextual cues and abstract classes. But not reliably. They're more like chameleons 🦎 β€” blending with the context, even when they shouldn’t. 6/n

06.06.2025 18:11 β€” πŸ‘ 8    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1

πŸ” What’s going on inside?
With mechanistic interpretability, we found:
- LLMs first compute abstract classes (like β€œlanguage”) before narrowing to specific answers
- Competing circuits inside the model: one based on context, one based on query. Whichever is stronger wins. 5/n

06.06.2025 18:11 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Sometimes this yields the right answer for the wrong reasoning (β€œPortuguese” from β€œBrazil”), other times, it produces confident errors (β€œJapanese” from β€œHonda”). 4/n

06.06.2025 18:11 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Turns out, we can. They follow a systematic failure mode we call class-based (mis)generalization: the model abstracts the class from the query (e.g., languages) and generalizes based on features from the irrelevant context (e.g., Honda β†’ Japan). 3/n

06.06.2025 18:10 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

These examples show answers β€” even to the same query β€” can shift under different irrelevant contexts. Can we predict these shifts? 2/n

06.06.2025 18:10 β€” πŸ‘ 9    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Do LLMs hallucinate randomly? Not quite.

Our #ACL2025 (Main) paper shows that hallucinations under irrelevant contexts follow a systematic failure mode β€” revealing how LLMs generalize using abstract classes + context cues, albeit unreliably.

πŸ“Ž Paper: arxiv.org/abs/2505.22630 1/n

06.06.2025 18:09 β€” πŸ‘ 46    πŸ” 18    πŸ’¬ 1    πŸ“Œ 3

@ziling-cheng is following 20 prominent accounts