Klara Janouskova (@klara-cz) — Bluesky Profile

2 days ago

Multimodal Large Language Models as Image Classifiers

Nikita Kisel, Illia Volkov @klara-cz.bsky.social Jiri Matas

tl;dr: if you evaluate good (chatGPT) model on a dirty (ImageNet) test set, it is bad. Yes, ImageNet test is bad nowadays. +insights from labeling.
arxiv.org/abs/2603.065...

7 1 1 0

3 days ago

I am glad somebody has appreciated it! 🐈

I am not gonna lie, I tried to have my dog there at first, but despite ImageNet's over 100 classes being dog breeds, they still somehow managed not to squeeze the australian shepherd in.

2 0 0 0

3 days ago

To study this, we introduce ReGT, a new multilabel reannotation of 625 ImageNet classes that corrects many of these issues. When evaluated on the cleaned labels, multimodal LLMs improve by up to +10.8% accuracy, substantially narrowing the gap with supervised vision models. 📈

4 3 1 0

3 days ago

Work with Nikita Kisel, Illia Volkov and Jiri Matas, to be presented at #CVPR26, findings!

1 0 0 0

3 days ago

🤗 Finally, we show that these models aren’t just affected by annotation quality; they can help fix it. In a controlled verification study, annotators integrated model predictions in roughly half of the difficult cases, suggesting MLLMs can be useful tools for large-scale dataset curation.

1 0 1 0

3 days ago

To study this, we introduce ReGT, a new multilabel reannotation of 625 ImageNet classes that corrects many of these issues. When evaluated on the cleaned labels, multimodal LLMs improve by up to +10.8% accuracy, substantially narrowing the gap with supervised vision models. 📈

4 3 1 0

3 days ago

We show that small changes in evaluation protocol, like choice of distractors, output mapping, even image order, significantly impact accuracy.

⚠️ But there’s a deeper issue: the data. ImageNet contains a lot of label noise, so even a perfect eval. protocol may not give a meaningful result.

1 0 1 0

3 days ago

Let me introduce our new paper: Multimodal Large Language Models as Image Classifiers

❓ Multimodal LLMs are increasingly used for visual tasks, but evaluating their image classification ability has produced conflicting conclusions.

Link: arxiv.org/html/2603.06...

11 3 2 1

1 week ago

He totally does, he is getting more snuggly every day ☺️

2 0 0 0

1 week ago

Morning walks 🐾

13 0 1 0

2 weeks ago

It also really does feel like reviewer psychology since they have not explicitly pointed it out as the issue - not being able to run the experiment again with different framing but same reviewers is tough :D

2 0 0 0

2 weeks ago

When you re-read the introduction of your freshly rejected paper that was somewhat rushed before the deadline and you are like: Ok, this is why. 🥲

5 0 1 0

2 weeks ago

Team 2/2 rejected, with one suggested for the findings workshop.

I am a bit sad because I feel they were rejected for the wrong reasons + I am tired of getting BR rating with no suggestions for rebuttal, but I am much more into ECCV than CVPR this year anyway. 😁

Good luck with resubmission! 🍀

5 0 1 0

2 weeks ago

I feel like for the first time in my (short) reviewing career, I may have helped a (IMO of course) nice paper get accepted despite other reviewer(s).

5 0 1 0

2 weeks ago

1/n Attention, Please! 🚀

Our work “Revisiting Attentive Probing Through the Lens of Efficiency” has been accepted at #ICLR2026.

We introduce Efficient Probing (EP) — a lightweight, multi-query attentive probing method for frozen encoders.

Paper + code at the end 👇

11 4 1 1

1 month ago

I was starting to wonder what do I do with my time now

3 0 1 0

1 month ago

Oh ok, that is a different level of wrong than I thought 🥲

2 0 1 0

1 month ago

I think most benchmarks are pretty noisy; it is just that for some (say ImageNet :)), enough people actually looked at the images and noticed.
To be fair, data annotation is HARD. I do agree people should at least try to do a better job and be responsive, of course :)
bsky.app/profile/klar...

5 0 1 0

1 month ago

What a beautiful day to be done with all deadlines! ☃️

This was my WFH lunch break today, if it is not clear why I do not live in Prague 😁

2 0 0 0

2 months ago

25 % left, a few more nice bedtime readings for me. :)

1 0 0 0

2 months ago

JAZZ HANDS!

I am currently at
R: Be very ready.
G: I am very ready. Be calm.
R: Am calm. You be calm.
G: NO YOU BE-

2 0 1 0

2 months ago

Having stopped about midway through Project Hail Mary and forbidding myself to resume until I finish my CVPR reviews was a pretty good motivation.

Also, if you have not read it yet, but you think you might enjoy it, go for it, you are in for a treat (and the movie is coming)! 🤓

6 0 1 0

2 months ago

I should have added it looks like this (a few lucky days a year 🤣), like today ☃️☺️

1 0 0 0

2 months ago

True, not many positions come with free canistherapy 🐶 (let’s ignore that he’s a teenager now). I hope my profile pic makes up for the regrettable omission and is self-explanatory!

2 0 0 0

2 months ago

Imagine this: Prague 🏰, a top CV lab, learning all the things we work on at VRG, regular cake at coffee breaks (hope you are not on a diet, but we also have a free gym on site) 🍰, excellent filter coffee ☕, and - last but not least - working with Giorgos.

There’s a postdoc opening. Don’t miss out 🙂

6 1 2 0

2 months ago

Recently, Illia received an award for the research he has been doing with us.

Most people would think about sth to buy for themselves. He donated it all to support his home. 🇺🇦

#DoNotForget

6 0 0 0

2 months ago

I have been getting pretty good and diverse assignments (my work is a bit "all over the place") before, max 1 paper on X per conf :) But now it is 3 papers on my msc topic X and before, I got my bsc stuff for WACV.
It accumulated and I had to vent a bit but it is not enough to make me grumpy (yet) 😁

0 0 1 0

2 months ago

I am fine reviewing a paper on it here and there, just not the whole batch like this cvpr 🥹 But I have not though of this, might use it next time, thanks!

2 0 1 0

2 months ago

The curse of doing research as an undergrad: publish one paper on topic X, then spend your entire PhD reviewing papers on X. There’s a reason I changed topics. 🫠

At least one paper in my batch actually looks very interesting though :)

5 0 1 0

3 months ago

⏰📍: Tomorrow, 11 am, Hadfield Hall - come say hi to Nikita presenting!

0 0 0 0