Benno Krojer's Avatar

Benno Krojer

@bennokrojer.bsky.social

AI PhDing at Mila/McGill (prev FAIR intern). Happily residing in Montreal πŸ₯―❄️ Academic: language grounding, vision+language, interp, rigorous & creative evals, cogsci Other: many sports, urban explorations, puzzles/quizzes bennokrojer.com

2,612 Followers  |  962 Following  |  1,755 Posts  |  Joined: 24.04.2023  |  2.3508

Latest posts by bennokrojer.bsky.social on Bluesky

Super cool work on quantifying with NLP how language evolves through generations

In linguistics, the "apparent time hypothesis" famously discusses this but never empirically tests it

29.07.2025 16:02 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Maybe he'd otherwise miss the Alpes and skiing too much

28.07.2025 15:57 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

Done the same in the past when I felt little motivation for the PhD! It's been a while since I've read one... Maybe I'll pick it up again

04.07.2025 22:17 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I miss Edinburgh and its wonderful people already!! Thanks to @tallinzen.bsky.social and @edoardo-ponti.bsky.social for inspiring discussions during the viva! I'm now exchanging Arthur's Seat for Mont Royal to join @sivareddyg.bsky.social's wonderful lab @mila-quebec.bsky.social 🀩

01.07.2025 21:33 β€” πŸ‘ 15    πŸ” 1    πŸ’¬ 3    πŸ“Œ 0

Also check out our previous two episodes! They didn't have a single guest, instead:

1) we introduce the podcast and how Tom and I got into research in Ep 00
2) we interview several people at Mila just before the Neurips deadline about their submissions in Ep 01

25.06.2025 15:54 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
02 | Gauthier Gidel: Bridging Theory and Deep Learning, Vibes at Mila, and the Effects of AI on Art Behind the Research of AI Β· Episode

Started a new podcast with @tomvergara.bsky.social !

Behind the Research of AI:
We look behind the scenes, beyond the polished papers 🧐πŸ§ͺ

If this sounds fun, check out our first "official" episode with the awesome Gauthier Gidel
from @mila-quebec.bsky.social :

open.spotify.com/episode/7oTc...

25.06.2025 15:54 β€” πŸ‘ 17    πŸ” 6    πŸ’¬ 1    πŸ“Œ 0
Post image

Turns out condensing your research into 3min is very hard but also teaches you a lot

Finally the video from Mila's speed science competition is on YouTube!

From a soup of raw pixels to abstract meaning

t.co/RDpu1kR7jM

20.06.2025 15:54 β€” πŸ‘ 8    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

This is part of a larger effort at meta to significantly improve physical world modeling so check out the other works in this blog post!

ai.meta.com/blog/v-jepa-...

13.06.2025 14:47 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Some reflections at the end:
There's a lot of talk about math reasoning these days, but this project made me appreciate what simple reasoning we humans take for granted, arising in our first months and years of living

As usual i also included "Behind The Scenes" in the Appendix:

13.06.2025 14:47 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

I am super grateful to my smart+kind collaborators at Meta who made this a very enjoyable project :)

(Mido Assran Nicolas Ballas @koustuvsinha.com @candaceross.bsky.social @quentin-garrido.bsky.social Mojtaba Komeili)

The Montreal office in general is a very fun place πŸ‘‡

13.06.2025 14:47 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

The hardest tasks for current models are still intuitive physics tasks where performance is often below random (In line with the prev. literature)

We encourage the community to use MVPBench to check if the latest VideoLLMs possess a *real* understanding of the physical world!

13.06.2025 14:47 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

On the other hand even the strongest sota models perform around random chance, with only 2-3 models significantly above random

13.06.2025 14:47 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

The questions in MVPBench are conceptually simple: relatively short videos with little linguistic or cultural knowledge needed. As a result humans have no problem with these questions, e.g. it is known that even babies do well on various intuitive physics tasks

13.06.2025 14:47 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

By automating the pairing of highly similar video pairs pairs and unifying different datasets, as well filtering out examples that models can solve with a single-frame, we end up with (probably) the largest and most diverse dataset of its kind:

13.06.2025 14:47 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

So a solution we propose a 3-step curation framework that results in the Minimal Video Pairs benchmark (MVPBench)

13.06.2025 14:47 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We show that seemingly β€œhigh-performing” VideoLLMs take various shortcuts on video tasks meant to test physical understanding, such as models falling back to single-frame biases.

In total we analyze 4 such shortcuts and find that model scores often don't change much:

13.06.2025 14:47 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

What was the motivation behind MVPBench?

Our starting point is a skepticism about recent β€œsuccesses” of multi-modal LLMs on video understanding benchmarks

13.06.2025 14:47 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Minimal pair setups are ideal for robust evals and diagnosing exact failure modes

So each example in MVPBench has a minimal-change pair: a visually similar video together with the same question but with an opposing answer

β†’ To get credit a model must get both correctly

13.06.2025 14:47 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs Existing benchmarks for assessing the spatio-temporal understanding and reasoning abilities of video language models are susceptible to score inflation due to the presence of shortcut solutions based ...

The facts:

We release (MVPBench) with around 55K videos (grouped as *minimal video pairs*) from diverse physical understanding sources

Arxiv: arxiv.org/abs/2506.09987

Huggingface: huggingface.co/datasets/fac...

GitHub: github.com/facebookrese...

Leaderboard: huggingface.co/spaces/faceb...

13.06.2025 14:47 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

Excited to share the results of my recent internship!

We ask πŸ€”
What subtle shortcuts are VideoLLMs taking on spatio-temporal questions?

And how can we instead curate shortcut-robust examples at a large-scale?

We release: MVPBench

Details πŸ‘‡πŸ”¬

13.06.2025 14:47 β€” πŸ‘ 16    πŸ” 5    πŸ’¬ 1    πŸ“Œ 0

Once I clicked Post I also realized that lmao. But I'm surprised that 27 stations would be that low??

31.05.2025 05:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

Top 99% in Boston πŸ’ͺ

Love these interactive maps

30.05.2025 15:32 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

When you forget your leftover rice in the fridge for 3 weeks

29.05.2025 01:09 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I'm definitely refreshing. I'm pretty sure there is simply not enough paper posts in my relatively large network to make e.g. this feed exciting:
bsky.app/profile/did:...

28.05.2025 00:21 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Maybe it's just that I'm now paying more attention to the good parts again but since this post bluesky seems more fun again. Still not many paper discussions going on but saw some fun general posts

27.05.2025 23:11 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

It was tough just logging back into my retired twitter account and to see a timeline that is so much fuller with interesting research discourse...

And yes I've tried to customizing my feeds and whatnot but no feed can fix a lack of posts

27.05.2025 17:16 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1

I finish my work day with the conclusion that code assistants are maybe net negative for my short-term progress and most likely negative for my long-term progress and learning

Also sycophancy is annoying af

22.05.2025 21:57 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Congrats Jaemin!

20.05.2025 19:02 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Chapter 2

19.05.2025 21:53 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Attend my AI 2025 bootcamp

19.05.2025 20:16 β€” πŸ‘ 11    πŸ” 2    πŸ’¬ 2    πŸ“Œ 0

@bennokrojer is following 20 prominent accounts