Benno Krojer @bennokrojer - Bluesky Profile

Couldn't have wished for a better place to do my PhD, come apply!

15.10.2025 13:27 — 👍 5 🔁 0 💬 0 📌 0

I'll be at COLM!

Excited to chat about about anything vision+language, interpretability, cogsci/psych, embedding spaces, visual reasoning, video/world models

07.10.2025 02:59 — 👍 5 🔁 0 💬 0 📌 0

I used it recently for the first time and was blown away by the speed. Should switch!

22.09.2025 02:35 — 👍 2 🔁 0 💬 0 📌 0

Devoured this book in 18 hours, usually not a big fan of audio books!

It covered lots from crowdworker rights, the ideologies (doomers, EA, ...) and the silicon valley startup world to the many big egos and company-internal battles

Great work by @karenhao.bsky.social

22.09.2025 02:34 — 👍 4 🔁 0 💬 0 📌 0

Lmao

12.09.2025 01:56 — 👍 1 🔁 0 💬 0 📌 0

Congratulations @bennokrojer.bsky.social on passing your PhD proposal exam! A great presentation and exciting work!

21.08.2025 02:17 — 👍 5 🔁 1 💬 0 📌 0

for inspiration, here are some from my past papers:

arxiv.org/abs/2407.03471
arxiv.org/abs/2506.09987
arxiv.org/abs/2508.01119

13.08.2025 12:19 — 👍 0 🔁 0 💬 0 📌 0

very happy to see the trend of a Behind the Scenes section catching on! transparent & honest science 👌

love the detailed montreal spots mentioned

consider including such a section in your next appendix!

(paper by @a-krishnan.bsky.social arxiv.org/pdf/2504.050...)

13.08.2025 12:19 — 👍 5 🔁 1 💬 1 📌 0

The Platonic Representation Hypothesis We argue that representations in AI models, particularly deep networks, are converging. First, we survey many examples of convergence in the literature: over time and across multiple domains, the ways...

Do you have any thoughts on how this relates to e.g. the Platonic Representation Hypothesis paper? arxiv.org/abs/2405.07987

13.08.2025 12:15 — 👍 2 🔁 0 💬 1 📌 0

Super cool work on quantifying with NLP how language evolves through generations

In linguistics, the "apparent time hypothesis" famously discusses this but never empirically tests it

29.07.2025 16:02 — 👍 3 🔁 1 💬 0 📌 0

Maybe he'd otherwise miss the Alpes and skiing too much

28.07.2025 15:57 — 👍 0 🔁 0 💬 2 📌 0

Done the same in the past when I felt little motivation for the PhD! It's been a while since I've read one... Maybe I'll pick it up again

04.07.2025 22:17 — 👍 2 🔁 0 💬 0 📌 0

I miss Edinburgh and its wonderful people already!! Thanks to @tallinzen.bsky.social and @edoardo-ponti.bsky.social for inspiring discussions during the viva! I'm now exchanging Arthur's Seat for Mont Royal to join @sivareddyg.bsky.social's wonderful lab @mila-quebec.bsky.social 🤩

01.07.2025 21:33 — 👍 15 🔁 1 💬 3 📌 0

Also check out our previous two episodes! They didn't have a single guest, instead:

1) we introduce the podcast and how Tom and I got into research in Ep 00
2) we interview several people at Mila just before the Neurips deadline about their submissions in Ep 01

25.06.2025 15:54 — 👍 3 🔁 0 💬 0 📌 0

02 | Gauthier Gidel: Bridging Theory and Deep Learning, Vibes at Mila, and the Effects of AI on Art Behind the Research of AI · Episode

Started a new podcast with @tomvergara.bsky.social !

Behind the Research of AI:
We look behind the scenes, beyond the polished papers 🧐🧪

If this sounds fun, check out our first "official" episode with the awesome Gauthier Gidel
from @mila-quebec.bsky.social :

open.spotify.com/episode/7oTc...

25.06.2025 15:54 — 👍 17 🔁 6 💬 1 📌 0

Turns out condensing your research into 3min is very hard but also teaches you a lot

Finally the video from Mila's speed science competition is on YouTube!

From a soup of raw pixels to abstract meaning

t.co/RDpu1kR7jM

20.06.2025 15:54 — 👍 9 🔁 0 💬 0 📌 0

This is part of a larger effort at meta to significantly improve physical world modeling so check out the other works in this blog post!

ai.meta.com/blog/v-jepa-...

13.06.2025 14:47 — 👍 0 🔁 0 💬 0 📌 0

Some reflections at the end:
There's a lot of talk about math reasoning these days, but this project made me appreciate what simple reasoning we humans take for granted, arising in our first months and years of living

As usual i also included "Behind The Scenes" in the Appendix:

13.06.2025 14:47 — 👍 2 🔁 0 💬 1 📌 0

I am super grateful to my smart+kind collaborators at Meta who made this a very enjoyable project :)

(Mido Assran Nicolas Ballas @koustuvsinha.com @candaceross.bsky.social @quentin-garrido.bsky.social Mojtaba Komeili)

The Montreal office in general is a very fun place 👇

13.06.2025 14:47 — 👍 1 🔁 0 💬 1 📌 0

The hardest tasks for current models are still intuitive physics tasks where performance is often below random (In line with the prev. literature)

We encourage the community to use MVPBench to check if the latest VideoLLMs possess a *real* understanding of the physical world!

13.06.2025 14:47 — 👍 1 🔁 0 💬 1 📌 0

On the other hand even the strongest sota models perform around random chance, with only 2-3 models significantly above random

13.06.2025 14:47 — 👍 1 🔁 0 💬 1 📌 0

The questions in MVPBench are conceptually simple: relatively short videos with little linguistic or cultural knowledge needed. As a result humans have no problem with these questions, e.g. it is known that even babies do well on various intuitive physics tasks

13.06.2025 14:47 — 👍 1 🔁 0 💬 1 📌 0

By automating the pairing of highly similar video pairs pairs and unifying different datasets, as well filtering out examples that models can solve with a single-frame, we end up with (probably) the largest and most diverse dataset of its kind:

13.06.2025 14:47 — 👍 1 🔁 0 💬 1 📌 0

So a solution we propose a 3-step curation framework that results in the Minimal Video Pairs benchmark (MVPBench)

13.06.2025 14:47 — 👍 1 🔁 0 💬 1 📌 0

We show that seemingly “high-performing” VideoLLMs take various shortcuts on video tasks meant to test physical understanding, such as models falling back to single-frame biases.

In total we analyze 4 such shortcuts and find that model scores often don't change much:

13.06.2025 14:47 — 👍 1 🔁 0 💬 1 📌 0

What was the motivation behind MVPBench?

Our starting point is a skepticism about recent “successes” of multi-modal LLMs on video understanding benchmarks

13.06.2025 14:47 — 👍 1 🔁 0 💬 1 📌 0

Minimal pair setups are ideal for robust evals and diagnosing exact failure modes

So each example in MVPBench has a minimal-change pair: a visually similar video together with the same question but with an opposing answer

→ To get credit a model must get both correctly

13.06.2025 14:47 — 👍 1 🔁 0 💬 1 📌 0

A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs Existing benchmarks for assessing the spatio-temporal understanding and reasoning abilities of video language models are susceptible to score inflation due to the presence of shortcut solutions based ...

The facts:

We release (MVPBench) with around 55K videos (grouped as *minimal video pairs*) from diverse physical understanding sources

Arxiv: arxiv.org/abs/2506.09987

Huggingface: huggingface.co/datasets/fac...

GitHub: github.com/facebookrese...

Leaderboard: huggingface.co/spaces/faceb...

13.06.2025 14:47 — 👍 3 🔁 1 💬 1 📌 0

Excited to share the results of my recent internship!

We ask 🤔
What subtle shortcuts are VideoLLMs taking on spatio-temporal questions?

And how can we instead curate shortcut-robust examples at a large-scale?

We release: MVPBench

Details 👇🔬

13.06.2025 14:47 — 👍 16 🔁 5 💬 1 📌 0

Once I clicked Post I also realized that lmao. But I'm surprised that 27 stations would be that low??

31.05.2025 05:06 — 👍 1 🔁 0 💬 1 📌 0

Benno Krojer

Latest posts by bennokrojer.bsky.social on Bluesky

@bennokrojer is following 20 prominent accounts