Takeaway: reasoning LLMs are getting better and better on math and codeโdeterministic reasoning tasks. But we should also evaluate them on open-ended, inherently uncertain everyday reasoning! (9/10)
04.03.2026 16:13 โ ๐ 4 ๐ 1 ๐ฌ 1 ๐ 0Takeaway: reasoning LLMs are getting better and better on math and codeโdeterministic reasoning tasks. But we should also evaluate them on open-ended, inherently uncertain everyday reasoning! (9/10)
04.03.2026 16:13 โ ๐ 4 ๐ 1 ๐ฌ 1 ๐ 0๐จNew Paper!๐จ How do reasoning LLMs handle inferences that have no deterministic answer? We find that they diverge from humans in some significant ways, and fail to reflect human uncertaintyโฆ ๐งต(1/10)
04.03.2026 16:13 โ ๐ 27 ๐ 11 ๐ฌ 2 ๐ 1Beyond our controlled setup, we also show how LatentLens works much better than baselines on off-the-shelf Qwen2-VL-7B-Instruct
11.02.2026 14:12 โ ๐ 3 ๐ 1 ๐ฌ 1 ๐ 0
Building a VLM can be surprisingly simple: You keep both the LLM and vision encoder frozen, you just train a small MLP that projects into the LLM embedding space as prefixes. Thatโs it ๐ฎ
But how and why does that work? How do visual tokens relate to language, i.e. do they have interpretable NNs?
Paper: arxiv.org/abs/2602.00462
Code: github.com/McGill-NLP/...
Demo: tinyurl.com/ce57mn4v
Couldn't have imagined better collaborators to wrap up the phd: Shravan Nayak @oscmansan.bsky.social @vaibhavadlakha.bsky.social
@delliott.bsky.social @sivareddyg.bsky.social @mariusmosbach.bsky.social
๐จNew paper
Are visual tokens going into an LLM interpretable ๐ค
Existing methods (e.g. logit lens) and assumptions would lead you to think โnot muchโ...
We propose LatentLens and show that most visual tokens are interpretable across *all* layers ๐ก
Details ๐งต
"Not only is the ratio of AIโs resource rapacity to its productive utility indefensibly and irremediably skewed, AI-made material is itself a waste product: flimsy, shoddy, disposable, a single-use plastic of the mind."
>>
enshittification | noun | when a digital platform is made worse for users, in order to increase profits
03.09.2025 20:22 โ ๐ 29162 ๐ 8575 ๐ฌ 507 ๐ 649Windows Notepad, the native simple text editor, now has formatting options and a Copilot button.
Look what they did to Notepad. Shut the fuck up. This is Notepad. You are not welcome here. Oh yeah "Let me use Copilot for Notepad". "I'm going to sign into my account for Notepad". What the fuck are you talking about. It's Notepad.
27.08.2025 01:41 โ ๐ 17434 ๐ 4579 ๐ฌ 446 ๐ 496
Our new paper in #PNAS (bit.ly/4fcWfma) presents a surprising findingโwhen words change meaning, older speakers rapidly adopt the new usage; inter-generational differences are often minor.
w/ Michelle Yang, โช@sivareddyg.bsky.socialโฌ , @msonderegger.bsky.socialโฌ and @dallascard.bsky.socialโฌ๐(1/12)
Thrilled to announce our new survey that explores the exciting possibilities and troubling risks of computational persuasion in the era of LLMs ๐ค๐ฌ
๐Arxiv: arxiv.org/pdf/2505.07775
๐ป GitHub: github.com/beyzabozdag/...
Started a new podcast with @tomvergara.bsky.social !
Behind the Research of AI:
We look behind the scenes, beyond the polished papers ๐ง๐งช
If this sounds fun, check out our first "official" episode with the awesome Gauthier Gidel
from @mila-quebec.bsky.social :
open.spotify.com/episode/7oTc...
Zohran Mamdani, a 33-year-old state assemblyman, declared victory in New York Cityโs Democratic mayoral primary after Andrew Cuomo conceded the race.
โTonight we made history,โ Mamdani said, addressing his supporters. wapo.st/44yMVoI
Mahmoud Khalil is finally home with his beautiful wife and newborn son.
Each one of the 104 days he spent detained was a grave injustice.
From the moment of his detention, @ccrjustice.org + @aclu.org engaged my office as we worked closely to help secure his release. They did remarkable work here.
The facts:
We release (MVPBench) with around 55K videos (grouped as *minimal video pairs*) from diverse physical understanding sources
Arxiv: arxiv.org/abs/2506.09987
Huggingface: huggingface.co/datasets/fac...
GitHub: github.com/facebookrese...
Leaderboard: huggingface.co/spaces/faceb...
Excited to share the results of my recent internship!
We ask ๐ค
What subtle shortcuts are VideoLLMs taking on spatio-temporal questions?
And how can we instead curate shortcut-robust examples at a large-scale?
We release: MVPBench
Details ๐๐ฌ
Congrats!
30.05.2025 18:20 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0Today, I was denied access to seeing my constituent, Mr. Kilmar Abrego Garcia. If there is nothing to hide, cut the crap. Let his lawyer and I check on him.
26.05.2025 19:32 โ ๐ 38960 ๐ 10741 ๐ฌ 733 ๐ 353
Breaking news: The Trump administration revoked Harvardโs ability to enroll foreign students, saying it allowed anti-American agitators.
Existing foreign students must transfer or risk losing their legal status, DHS said.
when in albuquerqueโฆ
07.05.2025 06:00 โ ๐ 4 ๐ 0 ๐ฌ 0 ๐ 0We won a Senior Area Chair Award at NAACL!! Many thanks again to my amazing coauthors Gaurav Kamath and @sivareddyg.bsky.social :-)
03.05.2025 15:50 โ ๐ 13 ๐ 2 ๐ฌ 0 ๐ 0Check out Gaurav's video on their #NAACL paper and find @adadtur.bsky.social at the conference ๐
02.05.2025 01:41 โ ๐ 11 ๐ 1 ๐ฌ 0 ๐ 0Great work from labmates on LLMs vs humans regarding linguistic preferences: You know when a sentence kind of feels off e.g. "I met at the park the man". So in what ways do LLMs follow these human intuitions?
01.05.2025 15:04 โ ๐ 7 ๐ 3 ๐ฌ 0 ๐ 0Ada is an undergrad and will soon be looking for PhDs. Gaurav is a PhD student looking for intellectually stimulating internships/visiting positions. They did most of the work without much of my help. Highly recommend them. Please reach out to them if you have any positions.
01.05.2025 15:14 โ ๐ 6 ๐ 2 ๐ฌ 1 ๐ 0Incredibly proud of my students @adadtur.bsky.social and Gaurav Kamath for winning a SAC award at #NAACL2025 for their work on assessing how LLMs model constituent shifts.
01.05.2025 15:11 โ ๐ 17 ๐ 5 ๐ฌ 1 ๐ 0Congratulations to Mila members @adadtur.bsky.social , Gaurav Kamath and @sivareddyg.bsky.social for their SAC award at NAACL! Check out Ada's talk in Session I: Oral/Poster 6. Paper: arxiv.org/abs/2502.05670
01.05.2025 14:30 โ ๐ 13 ๐ 7 ๐ฌ 0 ๐ 3I filmed this yesterday on my way to Lousiana where my constituent Rรผmeysa รztรผrk is being wrongfully held by ICE. Iโm there now demanding her release. More to come.
22.04.2025 21:36 โ ๐ 30276 ๐ 6195 ๐ฌ 933 ๐ 443A circular diagram with a blue whale icon at the center. The diagram shows 8 interconnected research areas around LLM reasoning represented as colored rectangular boxes arranged in a circular pattern. The areas include: ยง3 Analysis of Reasoning Chains (central cloud), ยง4 Scaling of Thoughts (discussing thought length and performance metrics), ยง5 Long Context Evaluation (focusing on information recall), ยง6 Faithfulness to Context (examining question answering accuracy), ยง7 Safety Evaluation (assessing harmful content generation and jailbreak resistance), ยง8 Language & Culture (exploring moral reasoning and language effects), ยง9 Relation to Human Processing (comparing cognitive processes), ยง10 Visual Reasoning (covering ASCII generation capabilities), and ยง11 Following Token Budget (investigating direct prompting techniques). Arrows connect the sections in a clockwise flow, suggesting an iterative research methodology.
Models like DeepSeek-R1 ๐ mark a fundamental shift in how LLMs approach complex problems. In our preprint on R1 Thoughtology, we study R1โs reasoning chains across a variety of tasks; investigating its capabilities, limitations, and behaviour.
๐: mcgill-nlp.github.io/thoughtology/
Not sure if this has been shared here yet, but this is video of Rumeysa Ozturk's arrest posted by WCVB. It's terrifying.
26.03.2025 16:43 โ ๐ 4598 ๐ 2301 ๐ฌ 49 ๐ 1168