History-Guided Video Diffusion
Classifier-free guidance (CFG) is a key technique for improving conditional generation in diffusion models, enabling more accurate control while enhancing sample quality. It is natural to extend this ...
For more information, please visit our paper arxiv.org/abs/2502.06764 and project website boyuan.space/history-guidance and. All credit goes to my students Kiwhan Song (still in his undergrad!) and Boyuan Chen, as well as awesome collaborators Yilun Du, Max Simchowitz, and Russ Tedrake. (7/7)
11.02.2025 20:37 β π 4 π 0 π¬ 0 π 0
We show that DFoT alone is already a competitive model, matching or beating industry SOTA with way more compute than us. Together with HG, it can stably rollout very long videos, stay robust to out-of-distribution context, and stitch sub-trajectories (6/7)
11.02.2025 20:37 β π 2 π 0 π¬ 1 π 0
DFoT enables History Guidance (HG), a family of history-conditioned guidance methods that composes diffusion scores from different histories. From its simplest form to its most advanced variant, HG significantly enhances video diffusion and unlocks new abilities. (5/7)
11.02.2025 20:37 β π 2 π 0 π¬ 1 π 0
Unlike previous methods, DFoT views history or target alike as tokens of different noise levels. DFoT trains diffusion with varying noise levels per frame. To conditionally sample, one simply masks out a portion of history with noise before computing the diffusion score. (4/7)
11.02.2025 20:37 β π 0 π 0 π¬ 1 π 0
Can we train a single model to perform conditional diffusion with different portions of history - variable lengths, subsets of frames, and even different image-domain frequencies? Introducing DFoT, a simple yet flexible add-on that requires no architectural changes. (3/7)
11.02.2025 20:37 β π 0 π 0 π¬ 1 π 0
Classifier-free Guidance (CFG) has been widely used by video diffusion models to boost sample quality. However, researchers rarely perform CFG beyond the first frame. Our paper finds that an equally important conditioning variable, the history, is the long-ignored key. (2/7)
11.02.2025 20:37 β π 0 π 0 π¬ 1 π 0
Announcing Diffusion Forcing Transformer (DFoT), our new video diffusion algorithm that generates ultra-long videos of 800+ frames. DFoT enables History Guidance, a simple add-on to any existing video diffusion models for a quality boost. Website: boyuan.space/history-guidance (1/7)
11.02.2025 20:37 β π 35 π 6 π¬ 1 π 0
Cool!
11.01.2025 11:31 β π 5 π 0 π¬ 0 π 0
hey everyone - I am now also active here and excited about computer vision and machine learning stuff. π
08.01.2025 14:07 β π 47 π 5 π¬ 3 π 0
Friends in industry: As 2024 comes to a close, if your budget has room, consider joining the sponsors of the Summer Geometry Initiative (SGI)! SGI is entering year 5 with a proven track record bringing a diverse set of brilliant students into graphics/vision/ML/math research.
18.12.2024 15:59 β π 17 π 8 π¬ 2 π 0
Was great chatting with your students, cool work!!
12.12.2024 16:45 β π 3 π 0 π¬ 0 π 0
Wow, indeed!!
12.12.2024 16:44 β π 1 π 0 π¬ 0 π 0
Hi NeurIPS crowd! Meet Boyuan Chen and I at the Diffusion Forcing poster today at 11 am, East Exhibit Hall A-C #2701! Concurrently, @justinmsolomon.bsky.social is jumping in for our student Artem to present "Score Distillation via Reparametrized DDIM" at #2402! - Artem had visa issues :(
12.12.2024 16:42 β π 13 π 3 π¬ 0 π 0
If you are looking to do a PhD on inverse graphics, 3D computer vision, differentiable rendering, etc, please apply to Ayush's lab at the University of Cambridge! He is brilliant, very patient, and a kind human :)
02.12.2024 15:31 β π 7 π 0 π¬ 0 π 0
Wow, what a warm welcome! Thanks, Kosta π
23.11.2024 00:58 β π 2 π 0 π¬ 0 π 0
Assistant Professor at UC Berkeley
I am a Research Scientist at Google Zurich working on 3d vision (https://m-niemeyer.github.io/)
Professor, University of TΓΌbingen @unituebingen.bsky.social.
Head of Department of Computer Science π.
Faculty, TΓΌbingen AI Center π©πͺ @tuebingen-ai.bsky.social.
ELLIS Fellow, Founding Board Member πͺπΊ @ellis.eu.
CV π·, ML π§ , Self-Driving π, NLP πΊ
Assistant Professor @uchicago @uchicagocs. PhD from @TelAvivUni. Interested in computer graphics, machine learning, & computer vision π€
AI professor at Caltech. General Chair ICLR 2025.
http://www.yisongyue.com
Incoming Assistant Professor at the University of Cambridge
https://ayushtewari.com/
Artist, Prof. of Engineering @UCBerkeley, Chief Scientist, @AmbiRobotics & @JacobiRobotics. Interested in robots, rockets, redwoods, rebels.
Research on AI and biodiversity π
Asst Prof at MIT CSAIL,
AI for Conservation slack and CV4Ecology founder
#QueerInAI π³οΈβπ
Google DeepMind. PhD from ETH Zurich & MPI-IS.
https://pengsongyou.github.io/
Probabilistic ML researcher at Google Deepmind
Assistant Professor at the University of Cambridge @eng.cam.ac.uk, working on 3D computer vision and inverse graphics, previously postdoc at Stanford and PhD at Oxford @oxford-vgg.bsky.social
https://elliottwu.com/
Assistant Professor of Computer Graphics and Geometry Processing at Columbia University www.silviasellan.com
Bot. I daily tweet progress towards machine learning and computer vision conference deadlines. Maintained by @chriswolfvision.bsky.social
Blog: https://argmin.substack.com/
Webpage: https://people.eecs.berkeley.edu/~brecht/
Working towards the safe development of AI for the benefit of all at UniversitΓ© de MontrΓ©al, LawZero and Mila.
A.M. Turing Award Recipient and most-cited AI researcher.
https://lawzero.org/en
https://yoshuabengio.org/profile/
Associate Professor of Machine Learning, University of Oxford;
OATML Group Leader;
Director of Research at the UK government's AI Safety Institute (formerly UK Taskforce on Frontier AI)
Director Data Science Institute @UWMadison, Professor of Physics,
EiC @MLSTjournal. Physics, stats/ML/AI, open science.