Iβm thrilled to share that Iβll present two first-authored papers at #ICCV2025 πΊ in Honolulu together with @mgui7.bsky.social ! ποΈ
(Thread π§΅π)
Iβm thrilled to share that Iβll present two first-authored papers at #ICCV2025 πΊ in Honolulu together with @mgui7.bsky.social ! ποΈ
(Thread π§΅π)
π€ What happens when you poke a scene β and your model has to predict how the world moves in response?
We built the Flow Poke Transformer (FPT) to model multi-modal scene dynamics from sparse interactions.
It learns to predict the π₯πͺπ΄π΅π³πͺπ£πΆπ΅πͺπ°π― of motion itself π§΅π
Our method pipeline
π€When combining Vision-language models (VLMs) with Large language models (LLMs), do VLMs benefit from additional genuine semantics or artificial augmentations of the text for downstream tasks?
π€¨Interested? Check out our latest work at #AAAI25:
π»Code and πPaper at: github.com/CompVis/DisCLIP
π§΅π
In order to extract features from diffusion models, you have to noise your input and tune the noise level for each downstream task. But isn't there a better way? π€
Turns out there is, using our newly proposed feature extraction method CleanDIFT π§Ήπ
Check it out β¬οΈ
Hi, I recently started as an ELLIS PhD student at BjΓΆrn Ommer's lab. I would be happy to be on the list as well :)
27.11.2024 14:27 β π 2 π 0 π¬ 1 π 0
After many years, our lab finally has a social media presence at @compvis.bsky.social ! π₯³
Give it a follow, we have some amazing research on generative computer vision coming soon!