CleanDIFT: Diffusion Features without Noise
CleanDIFT enables extracting Noise-Free, Timestep-Independent Diffusion Features
π§Ή CleanDiFT: Diffusion Features without Noise
@rmsnorm.bsky.social*, @stefanabaumann.bsky.social*, @koljabauer.bsky.social*, @frankfundel.bsky.social, BjΓΆrn Ommer
Oral Session 1C (Davidson Ballroom): Friday 9:00
Poster Session 1 (ExHall D): Friday 10:30-12:30, # 218
compvis.github.io/cleandift/
09.06.2025 07:58 β π 8 π 3 π¬ 1 π 0
Our paper is accepted at WACV 2025! π€
Check out DistillDIFT. Code & weights are now public:
π github.com/compvis/dist...
06.12.2024 14:35 β π 1 π 0 π¬ 0 π 0
π₯ We achieve SOTA in unsupervised & weakly-supervised semantic correspondence at just a fraction of the computational cost.
06.12.2024 14:35 β π 1 π 0 π¬ 1 π 0
β¨ By training just a tiny LoRA adapter, we transfer the power of a large diffusion model (SDXL Turbo) into a small ViT (DINOv2).
π All done unsupervised by retrieving pairs of similar images.
06.12.2024 14:35 β π 1 π 0 π¬ 1 π 0
π Meet DistillDIFT:
It distills the power of two vision foundation models into one streamlined model, achieving SOTA performance at a fraction of the computational cost.
No need for bulky generative combosβjust pure efficiency. π‘
06.12.2024 14:35 β π 1 π 0 π¬ 1 π 0
This work was co-lead by: @joh-schb.bsky.social @vtaohu.bsky.social
π·Project Page: compvis.github.io/distilldift
π»Code: github.com/compvis/dist...
π Paper: arxiv.org/abs/2412.03512
π
06.12.2024 14:35 β π 1 π 0 π¬ 1 π 0
Did you know you can distill the capabilities of a large diffusion model into a small ViT? βοΈ
We showed exactly that for a fundamental task:
semantic correspondenceπ
A thread π§΅π
06.12.2024 14:35 β π 4 π 2 π¬ 1 π 2
Our paper is accepted at WACV 2025! π€
Check out DistillDIFT. Code & weights are now public:
π github.com/compvis/dist...
06.12.2024 12:23 β π 1 π 0 π¬ 0 π 0
π₯ We achieve SOTA in unsupervised & weakly-supervised semantic correspondence at just a fraction of the computational cost.
06.12.2024 12:23 β π 0 π 0 π¬ 1 π 0
β¨ By training just a tiny LoRA adapter, we transfer the power of a large diffusion model (SDXL Turbo) into a small ViT (DINOv2).
π All done unsupervised by retrieving pairs of similar images.
06.12.2024 12:23 β π 0 π 0 π¬ 1 π 0
π Meet DistillDIFT:
It distills the power of two vision foundation models into one streamlined model, achieving SOTA performance at a fraction of the computational cost.
No need for bulky generative combosβjust pure efficiency. π‘
06.12.2024 12:23 β π 0 π 0 π¬ 1 π 0
π·Project Page: compvis.github.io/distilldift
π»Code: github.com/compvis/dist...
π Paper: arxiv.org/abs/2412.03512
π
06.12.2024 12:23 β π 0 π 0 π¬ 1 π 0
https://code4conservation.de/
PhD Student at Ommer Lab, Munich (Stable Diffusion)
π― Working on getting my first 3M.
PhD Student for Unsupervised Segmentation & Representation Learning @ Ulm University ποΈπ¨βπ»
https://leonsick.github.io
Student Researcher @ RAI Institute, MSc CS Student @ ETH Zurich
visual computing, 3D vision, spatial AI, machine learning, robot perception.
πZurich, Switzerland
PhD Student at @compvis.bsky.social & @ellis.eu working on generative computer vision.
Interested in extracting world understanding from models and more controlled generation. π https://stefan-baumann.eu/
PhD Student @ CompVis group, LMU Munich
Working on diffusion & flow modelsπ«Ά
Computer Vision and Learning research group @ LMU Munich, headed by BjΓΆrn Ommer.
Generative Vision (Stable Diffusion, VQGAN) & Representation Learning
π https://ommer-lab.com
official Bluesky account (check usernameπ)
Bugs, feature requests, feedback: support@bsky.app