Stefan Baumann ✈️ ICCV 2025's Avatar

Stefan Baumann ✈️ ICCV 2025

@stefanabaumann.bsky.social

PhD Student at @compvis.bsky.social & @ellis.eu working on generative computer vision. Interested in extracting world understanding from models and more controlled generation. 🌐 https://stefan-baumann.eu/

1,264 Followers  |  650 Following  |  79 Posts  |  Joined: 17.11.2024  |  2.3884

Latest posts by stefanabaumann.bsky.social on Bluesky

Oh yeah, sorry, I should've made it more clear that I was talking in the more general case

03.10.2025 18:48 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Let's for example say (zero-shot) semantic correspondence working quite well based on activations of image diffusion models.

The model has never been trained for it, and, while it's obvious that related capabilities might be useful for denoising, I'd still consider this an emergent capability

03.10.2025 18:45 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Not in the sense of, e.g., generating new kinds of videos when the model was trained for video generation, but capabilities w.r.t. other tasks could still be considered emergent, right?

03.10.2025 18:43 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

Fair :D

18.09.2025 15:21 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

First time I ever hear someone from the 3D CV community actually say this out loud! This has been bugging me for a long time

18.09.2025 14:48 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Ah, makes sense :)

11.09.2025 13:18 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Why are you not on a current stable version?

11.09.2025 11:59 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

The bugs I ran into reproduce across 2.7, 2.8 and current nightlies

11.09.2025 11:35 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

Welcome to the club! I've somehow managed to find two bugs with torch.compile() in the last few days πŸ₯²

10.09.2025 23:26 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

β€œEveryone knows” what an autoencoder is… but there's an important complementary picture missing from most introductory material.

In short: we emphasize how autoencoders are implementedβ€”but not always what they represent (and some of the implications of that representation).🧡

06.09.2025 21:20 β€” πŸ‘ 68    πŸ” 10    πŸ’¬ 2    πŸ“Œ 1

That process really sounds like a labor of love! Penrose looks really interesting, I'll play around with it! Thanks!

31.08.2025 16:47 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Any tips on how to create such nice figures?

Specifically, I can judge whether a figure is nice, but struggle seeing how to get there. That, combined with using tools like TikZ, where I have to change most settings to get something decent-looking makes it quite hard to get great results for me

31.08.2025 12:17 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Especially in fields with alphabetical ordering πŸ˜‡

20.08.2025 11:29 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I've never gotten that far, but had actually good experiment suggestions in at least 5 reviews

17.08.2025 10:37 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Even as an author, I actually have experienced this from multiple reviewers by now. In the first moment, it's annoying of course, but in the long term, it can make the paper better. I think I'm around 10-15% of reviews I got being actually really good and constructive in a useful way

16.08.2025 17:23 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

One thing I forgot: sometimes, reviewers think of great experiments that significantly strengthen the paper. I had one amazing reviewer at CVPR that suggested multiple, and I'd like to think that some of the additional evals I suggest also fall in that category. We should make sure we retain that

16.08.2025 13:16 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Worst of both worlds: requesting multiple substantial additional experiments and then rejecting the paper based on "insufficient experimental evaluation" after the authors somehow managed to perform all the requested experiments

16.08.2025 10:14 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

The question is: what is better: rejecting a paper because of insufficient experiments outright, or hoping that the authors somehow manage to put together the experiments in the short timespan available

16.08.2025 10:12 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

Happy Birthday Kosta!

15.08.2025 15:01 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Conference hotels just seem to be significantly more expensive than other reasonable alternatives, even at the discounted rates

10.08.2025 12:16 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Same for us in my lab (and likely the rest of German universities): we can book mostly whatever, we just have to show it's the cheapest reasonable option available (unless it's below a very low location-specific threshold for things like hotel rooms).

10.08.2025 12:15 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Congrats!

06.08.2025 11:28 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Did you also happen to participate in creating LLM preference annotations?

05.08.2025 06:39 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

As an author, I honestly prefer forum-style comments over one-page rebuttals (as long as we get some way to include figures). As a reviewer, I prefer a single page

01.08.2025 11:08 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

tl;dr: do importance weighting/sampling on a sequence level, not a token level.
Makes everything behave much better (see below) and makes more sense from a theoretical perspective, too.

Paper: www.arxiv.org/abs/2507.18071

26.07.2025 19:43 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

I'm calling it now, GSPO will be the next big hype in LLM RL algos after GRPO.

It makes so much more sense intuitively to work on a sequence rather than on a token level when our rewards are on a sequence level.

26.07.2025 19:40 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Absolutely 100% this. Who would want to read papers like VGG-T

25.07.2025 09:41 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Genie: Generative Interactive Environments We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-c...

Genie did this in a really cool manner: arxiv.org/abs/2402.15391

03.07.2025 16:11 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I don't think the implicit assumptions are problematic likely, as long as the frequency range is reasonable. Keep in mind that we add an MLP afterwards that can freely learn to modulate the model with different frequencies

25.06.2025 12:16 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

approaches, but I don't think I've seen this in public yet. I considered doing it a while ago, but I never found a good justification to spend the time carefully ablating something like this. It might lead to some cool interpretable insights into the model's behavior across time though (3/3)

25.06.2025 07:02 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@stefanabaumann is following 20 prominent accounts