Super cool work on quantifying with NLP how language evolves through generations
In linguistics, the "apparent time hypothesis" famously discusses this but never empirically tests it
@bennokrojer.bsky.social
AI PhDing at Mila/McGill (prev FAIR intern). Happily residing in Montreal π₯―βοΈ Academic: language grounding, vision+language, interp, rigorous & creative evals, cogsci Other: many sports, urban explorations, puzzles/quizzes bennokrojer.com
Super cool work on quantifying with NLP how language evolves through generations
In linguistics, the "apparent time hypothesis" famously discusses this but never empirically tests it
Maybe he'd otherwise miss the Alpes and skiing too much
28.07.2025 15:57 β π 0 π 0 π¬ 2 π 0Done the same in the past when I felt little motivation for the PhD! It's been a while since I've read one... Maybe I'll pick it up again
04.07.2025 22:17 β π 2 π 0 π¬ 0 π 0I miss Edinburgh and its wonderful people already!! Thanks to @tallinzen.bsky.social and @edoardo-ponti.bsky.social for inspiring discussions during the viva! I'm now exchanging Arthur's Seat for Mont Royal to join @sivareddyg.bsky.social's wonderful lab @mila-quebec.bsky.social π€©
01.07.2025 21:33 β π 15 π 1 π¬ 3 π 0Also check out our previous two episodes! They didn't have a single guest, instead:
1) we introduce the podcast and how Tom and I got into research in Ep 00
2) we interview several people at Mila just before the Neurips deadline about their submissions in Ep 01
Started a new podcast with @tomvergara.bsky.social !
Behind the Research of AI:
We look behind the scenes, beyond the polished papers π§π§ͺ
If this sounds fun, check out our first "official" episode with the awesome Gauthier Gidel
from @mila-quebec.bsky.social :
open.spotify.com/episode/7oTc...
Turns out condensing your research into 3min is very hard but also teaches you a lot
Finally the video from Mila's speed science competition is on YouTube!
From a soup of raw pixels to abstract meaning
t.co/RDpu1kR7jM
This is part of a larger effort at meta to significantly improve physical world modeling so check out the other works in this blog post!
ai.meta.com/blog/v-jepa-...
Some reflections at the end:
There's a lot of talk about math reasoning these days, but this project made me appreciate what simple reasoning we humans take for granted, arising in our first months and years of living
As usual i also included "Behind The Scenes" in the Appendix:
I am super grateful to my smart+kind collaborators at Meta who made this a very enjoyable project :)
(Mido Assran Nicolas Ballas @koustuvsinha.com @candaceross.bsky.social @quentin-garrido.bsky.social Mojtaba Komeili)
The Montreal office in general is a very fun place π
The hardest tasks for current models are still intuitive physics tasks where performance is often below random (In line with the prev. literature)
We encourage the community to use MVPBench to check if the latest VideoLLMs possess a *real* understanding of the physical world!
On the other hand even the strongest sota models perform around random chance, with only 2-3 models significantly above random
13.06.2025 14:47 β π 1 π 0 π¬ 1 π 0The questions in MVPBench are conceptually simple: relatively short videos with little linguistic or cultural knowledge needed. As a result humans have no problem with these questions, e.g. it is known that even babies do well on various intuitive physics tasks
13.06.2025 14:47 β π 1 π 0 π¬ 1 π 0By automating the pairing of highly similar video pairs pairs and unifying different datasets, as well filtering out examples that models can solve with a single-frame, we end up with (probably) the largest and most diverse dataset of its kind:
13.06.2025 14:47 β π 1 π 0 π¬ 1 π 0So a solution we propose a 3-step curation framework that results in the Minimal Video Pairs benchmark (MVPBench)
13.06.2025 14:47 β π 1 π 0 π¬ 1 π 0We show that seemingly βhigh-performingβ VideoLLMs take various shortcuts on video tasks meant to test physical understanding, such as models falling back to single-frame biases.
In total we analyze 4 such shortcuts and find that model scores often don't change much:
What was the motivation behind MVPBench?
Our starting point is a skepticism about recent βsuccessesβ of multi-modal LLMs on video understanding benchmarks
Minimal pair setups are ideal for robust evals and diagnosing exact failure modes
So each example in MVPBench has a minimal-change pair: a visually similar video together with the same question but with an opposing answer
β To get credit a model must get both correctly
The facts:
We release (MVPBench) with around 55K videos (grouped as *minimal video pairs*) from diverse physical understanding sources
Arxiv: arxiv.org/abs/2506.09987
Huggingface: huggingface.co/datasets/fac...
GitHub: github.com/facebookrese...
Leaderboard: huggingface.co/spaces/faceb...
Excited to share the results of my recent internship!
We ask π€
What subtle shortcuts are VideoLLMs taking on spatio-temporal questions?
And how can we instead curate shortcut-robust examples at a large-scale?
We release: MVPBench
Details ππ¬
Once I clicked Post I also realized that lmao. But I'm surprised that 27 stations would be that low??
31.05.2025 05:06 β π 1 π 0 π¬ 1 π 0Top 99% in Boston πͺ
Love these interactive maps
When you forget your leftover rice in the fridge for 3 weeks
29.05.2025 01:09 β π 4 π 0 π¬ 0 π 0I'm definitely refreshing. I'm pretty sure there is simply not enough paper posts in my relatively large network to make e.g. this feed exciting:
bsky.app/profile/did:...
Maybe it's just that I'm now paying more attention to the good parts again but since this post bluesky seems more fun again. Still not many paper discussions going on but saw some fun general posts
27.05.2025 23:11 β π 5 π 0 π¬ 1 π 0It was tough just logging back into my retired twitter account and to see a timeline that is so much fuller with interesting research discourse...
And yes I've tried to customizing my feeds and whatnot but no feed can fix a lack of posts
I finish my work day with the conclusion that code assistants are maybe net negative for my short-term progress and most likely negative for my long-term progress and learning
Also sycophancy is annoying af
Congrats Jaemin!
20.05.2025 19:02 β π 2 π 0 π¬ 1 π 0Chapter 2
19.05.2025 21:53 β π 4 π 0 π¬ 0 π 0Attend my AI 2025 bootcamp
19.05.2025 20:16 β π 11 π 2 π¬ 2 π 0