Congrats!
06.08.2025 11:28 β π 1 π 0 π¬ 0 π 0@stefanabaumann.bsky.social
PhD Student at @compvis.bsky.social & @ellis.eu working on generative computer vision. Interested in extracting world understanding from models and more controlled generation. π https://stefan-baumann.eu/
Congrats!
06.08.2025 11:28 β π 1 π 0 π¬ 0 π 0Did you also happen to participate in creating LLM preference annotations?
05.08.2025 06:39 β π 2 π 0 π¬ 1 π 0As an author, I honestly prefer forum-style comments over one-page rebuttals (as long as we get some way to include figures). As a reviewer, I prefer a single page
01.08.2025 11:08 β π 2 π 0 π¬ 0 π 0tl;dr: do importance weighting/sampling on a sequence level, not a token level.
Makes everything behave much better (see below) and makes more sense from a theoretical perspective, too.
Paper: www.arxiv.org/abs/2507.18071
I'm calling it now, GSPO will be the next big hype in LLM RL algos after GRPO.
It makes so much more sense intuitively to work on a sequence rather than on a token level when our rewards are on a sequence level.
Absolutely 100% this. Who would want to read papers like VGG-T
25.07.2025 09:41 β π 3 π 0 π¬ 1 π 0Genie did this in a really cool manner: arxiv.org/abs/2402.15391
03.07.2025 16:11 β π 2 π 0 π¬ 0 π 0I don't think the implicit assumptions are problematic likely, as long as the frequency range is reasonable. Keep in mind that we add an MLP afterwards that can freely learn to modulate the model with different frequencies
25.06.2025 12:16 β π 0 π 0 π¬ 1 π 0approaches, but I don't think I've seen this in public yet. I considered doing it a while ago, but I never found a good justification to spend the time carefully ablating something like this. It might lead to some cool interpretable insights into the model's behavior across time though (3/3)
25.06.2025 07:02 β π 0 π 0 π¬ 1 π 0So, just interpolating between two vectors wouldn't represent that too well, likely. Similarly, interpolating between N vectors for a small N might not be nicely aligned with the learned behavior. For a somewhat large N, this should work quite well and might be more efficient than the current (2/n)
25.06.2025 07:01 β π 0 π 0 π¬ 1 π 0If I understand your suggestion correctly, you're proposing to forgo Fourier embeddings completely though, just replacing them with an interpolation between two vectors. That should also work, but, imho, you can identify at least three somewhat distinct phases in diffusion sampling (1/n)
25.06.2025 06:59 β π 1 π 0 π¬ 1 π 0Random Fourier projections are totally fine for timestep embeddings -- HDiT and some others use them --, but you still have to get the frequency range right. If your variance is too high, you're gonna end up with the same problem; if it's too small, you'll have actual problems
25.06.2025 06:58 β π 0 π 0 π¬ 1 π 0For the Fourier embedding of the timestep, most models range from slightly to severely suboptimal when considering the utilization of the embedding's range. Simo Ryu posted an analysis of this a while ago: x.com/cloneofsimo/...
25.06.2025 06:55 β π 0 π 0 π¬ 1 π 0I guess the question is how much time is worth investing for all these sessions and when to make the harsh decision that a presentation isn't good enough. I feel the current system likely leads to improvements across the board, but definitely didn't make every presentation perfect
19.06.2025 22:57 β π 1 π 0 π¬ 0 π 0I should note that for us, the general feedback was that the presentation was good enough in general, but could be improved. We still needed the full 30min to get good use out of the coaching. For presentations that start out somewhat problematic, I imagine 30min might be too tight
19.06.2025 22:51 β π 1 π 0 π¬ 1 π 0For our presentation, we presented our talk to the coach once and then got a bunch of incredibly useful and, most importantly, constructive feedback. We then used that to create an improved second version
19.06.2025 22:48 β π 4 π 1 π¬ 1 π 0π Excited to share that our lab has three papers accepted at CVPR 2025!
Come say hi in Nashville!
π Johannes, Ming, Kolja, Stefan, and BjΓΆrn will be attending.
If you are interested, feel free to check the paper (arxiv.org/abs/2506.02221) or come by at CVPR:
π Poster Session 6, Sunday 4:00 to 6:00 PM, Poster #208
Here's the third and final part of Slater Stich's "History of diffusion" interview series!
The other two interviewees' research played a pivotal role in the rise of diffusion models, whereas I just like to yap about them π¬ this was a wonderful opportunity to do exactly that!
#KostasThoughts: Another major conference review drop is around the corner. In baseball, a .300 average is elite. In research, itβs a familiar reality: submitting to top conferences means rejections happen. Keep swinging!
07.05.2025 18:16 β π 4 π 1 π¬ 0 π 0Awesome, I'll look forward to it!
29.04.2025 15:05 β π 2 π 0 π¬ 0 π 0Yeah, your EDM2 results look great, both qualitatively and especially quantitatively!
I'd be really interested to see whether it scales to such big models with all of their complications like CFG, as it really messes up the relation with the diffusion loss
I am very happy to share our latest work on the information theory of generative diffusion:
"Entropic Time Schedulers for Generative Diffusion Models"
We find that the conditional entropy offers a natural data-dependent notion of time during generation
Link: arxiv.org/abs/2504.13612
I always wanted to do this, thank you for making it a reality! Will this also work with sampling tricks such as CFG? Also, have you tried this on larger-scale models such as T2I ones (e.g., FLUX)?
29.04.2025 14:43 β π 3 π 0 π¬ 1 π 0New blog post: let's talk about latents!
sander.ai/2025/04/15/l...
And the CVPR oral decisions are out! (on Openreview)
04.04.2025 15:25 β π 4 π 3 π¬ 1 π 0Ah sorry, it seems I didn't read the thread closely enough π
20.03.2025 13:01 β π 2 π 0 π¬ 0 π 0Or we just make the model classify points as sky or not and, in the case of them being sky, ignore the predicted depth - like some papers in depth estimation have been doing. Then everything is still neatly in one model, but you don't have the problem of having to threshold confidences etc
20.03.2025 11:48 β π 0 π 0 π¬ 1 π 0Introducing VGGT (CVPR'25), a feedforward Transformer that directly infers all key 3D attributes from one, a few, or hundreds of images, in seconds!
Project Page: vgg-t.github.io
Code & Weights: github.com/facebookrese...
Oh, interesting. I wasn't aware of this. Thank you, I'm gonna take a look at this!
11.03.2025 17:33 β π 0 π 0 π¬ 0 π 0