Two great works on how we can manipulate style for generative modeling by PiMa!
18.10.2025 08:37 β π 4 π 0 π¬ 1 π 0@rmsnorm.bsky.social
PhD Student at Ommer Lab (Stable Diffusion) Trying to understand motion... π https://nickstracke.dev
Two great works on how we can manipulate style for generative modeling by PiMa!
18.10.2025 08:37 β π 4 π 0 π¬ 1 π 0π€ What happens when you poke a scene β and your model has to predict how the world moves in response?
We built the Flow Poke Transformer (FPT) to model multi-modal scene dynamics from sparse interactions.
It learns to predict the π₯πͺπ΄π΅π³πͺπ£πΆπ΅πͺπ°π― of motion itself π§΅π
Our method pipeline
π€When combining Vision-language models (VLMs) with Large language models (LLMs), do VLMs benefit from additional genuine semantics or artificial augmentations of the text for downstream tasks?
π€¨Interested? Check out our latest work at #AAAI25:
π»Code and πPaper at: github.com/CompVis/DisCLIP
π§΅π
And thanks for the kind words ! :)
09.12.2024 11:29 β π 2 π 0 π¬ 0 π 0It was due to a compute constraint at that time. We will update it with numbers run on the complete test set once we release a new version of the paper.
09.12.2024 11:29 β π 2 π 0 π¬ 2 π 0We make code and cleaned π§Ή weights available for SD 1.5 and SD 2.1.
Have a look now!
π Paper: compvis.github.io/cleandift/st...
π» Code: github.com/CompVis/clea...
π€ Hugging Face: huggingface.co/CompVis/clea...
We show you can, with just 30 minutes of task-agnostic finetuning on a single GPU. π€―
No noise. Better features. Better performance. Across many tasks.
And no timestep searching headaches! π
They need noisy images as input - and the right noise level for each task.
So we have to find the right timestep for every downstream task? π€―
What if you could ditch all of that? π
This work was co-led by @stefanabaumann.bsky.social and @koljabauer.bsky.social.
β¨ Diffusion models are amazing at learning world representations. Their features power many tasks:
β’ Semantic correspondence
β’ Depth estimation
β’ Semantic segmentation
β¦ and more!
But hereβs the catch β‘οΈπ
π€ Why do we extract diffusion features from noisy images? Isnβt that destroying information?
Yes, it is - but we found a way to do better. π
Hereβs how we unlock better features, no noise, no hassle.
π Project Page: compvis.github.io/cleandift
π» Code: github.com/CompVis/clea...
π§΅π
me right now..
20.11.2024 14:22 β π 48 π 3 π¬ 4 π 0Hi, just sharing an updated version of the PyTorch 2 Internals slides: drive.google.com/file/d/18YZV.... Content: basics, jit, dynamo, Inductor, export path and executorch. This is focused on internals so you will need a bit of C/C++. I show how you can export and run a model on a Pixel Watch too.
19.11.2024 11:05 β π 87 π 17 π¬ 2 π 1While we're starting up over here, I suppose it's okay to reshare some old content, right?
Here's my lecture from the EEML 2024 summer school in Novi Sadπ·πΈ, where I tried to give an intuitive introduction to diffusion models: youtu.be/9BHQvQlsVdE
Check out other lectures on their channel as well!