Can Generative Video Models Help Pose Estimation?
Yes! We find that off-the-shelf generative video models can hallucinate plausible intermediate frames that provide useful context for pose estimators (e.g. DUSt3R), especially for images with little t...
Project page: inter-pose.github.io
Paper: arxiv.org/abs/2412.16155
Great thanks to the amazing team Jason Y. Zhang (@jasonyzhang.bsky.social), Philipp Henzler, Zhengqi Li (@zhengqili.bsky.social), Noah Snavely (@snavely.bsky.social), Ricardo Martin-Brualla.
23.12.2024 17:44 β π 0 π 0 π¬ 0 π 0
This also applies to MASt3R. While MASt3R excels with overlapping pairs via feature matching, it struggles with non-overlapping ones due to unreliable correspondences. InterPose maintains robustness, outperforming MASt3R on outward-facing and matching it on center-facing datasets.
23.12.2024 17:44 β π 1 π 0 π¬ 1 π 0
We show that InterPose generalizes across 3 SOTA video models (DynamiCrafter, Runway Gen-3, Luma Dream Machine) and consistently outperforms DUSt3R on 4 diverse datasets (indoor, outdoor, object) using our new benchmark, which selects challenging pairs with little to no overlap.
23.12.2024 17:44 β π 1 π 0 π¬ 1 π 0
β οΈ Challenge: Generated videos may contain visual artifacts or implausible motion.
π Solution: We generate multiple videos and use a self-consistency metric to select the most visually consistent sample.
23.12.2024 17:44 β π 1 π 0 π¬ 1 π 0
π‘ Motivation β Powerful Visual Priors: Video models are pre-trained on vast web-scale video data, enabling them to learn significantly more powerful priors of the visual world compared to 3D models like DUSt3R requiring 3D datasets.
23.12.2024 17:44 β π 1 π 0 π¬ 1 π 0
π€Can Generative Video Models Help Pose Estimation?
β
Yes!
We find that generative video models can hallucinate plausible intermediate frames that provide useful context for pose estimators (e.g. DUSt3R), especially for images with little to no overlap.
π inter-pose.github.io
23.12.2024 17:44 β π 15 π 4 π¬ 1 π 1
Introducing Doppelgangers++! π An enhanced pairwise image classifier that tackles visual aliasing (doppelgangers) to improve 3D reconstruction accuracy across diverse, real-world scenes. πβ¨
πProject page: bit.ly/3VAPMJc. Code is also available.
11.12.2024 02:40 β π 21 π 4 π¬ 1 π 0
PhD student @ Stanford University. MS @ ETH Zurich.
3D Vision and Generation.
https://www.zhuliyuan.net/
PhD Student at Westlake University. 3D/4D Reconstruction, Virtual Humans.
fanegg.github.io
Assistant Professor @ Westlake University. PhD @ MPI-IS. Focusing on democratizing human-centric digitization.
CS PhD student at @Cornell
Computer vision, graphics and Machine Learning
gemmechu.github.io
Professor, University of TΓΌbingen @unituebingen.bsky.social.
Head of Department of Computer Science π.
Faculty, TΓΌbingen AI Center π©πͺ @tuebingen-ai.bsky.social.
ELLIS Fellow, Founding Board Member πͺπΊ @ellis.eu.
CV π·, ML π§ , Self-Driving π, NLP πΊ
www.dgp.toronto.edu/~hertzman
Yang Research Scholar | Kanwisher Lab | McGovern Institute for Brain Research | MIT BCS
Researcher in generative 3D AI @ Google. PhD from UCL.
PhD student at UC Berkeley. MIT EECS BS '20 & MEng '21. I research 3D computer vision for AR, robotics, and social good. See http://ethanweber.me for more info.
PhD student at MIT. Previously at Cornell.
Originally from Queens.
Computer vision, graphics, and machine learning
echen01.github.io
Researcher in machine learning and computer vision for science. Senior Group Leader at HHMI Janelia Research Campus. Supporter of DEIB in science and tech. CV: https://bit.ly/BransonCV
Geometric deep learning + Computer vision
PhD student at Dyson Robotics Lab, Imperial College London
http://edexheim.github.io
Research Scientist Intern @ NVIDIA
PhD Student @ CVG, ETH Zurich
Prev. @ Meta Reality Labs, UC Berkeley, RWTH
Dynamic 3D Vision, NeRFs, 3DGS
tobiasfshr.github.io
LM/NLP/ML researcher Β―\_(γ)_/Β―
yoavartzi.com / associate professor @ Cornell CS + Cornell Tech campus @ NYC / nlp.cornell.edu / associate faculty director @ arXiv.org / researcher @ ASAPP / starting @colmweb.org / building RecNet.io
World Labs. Former research scientist at Google. Ph.D UWCSE.
π San Francisco π keunhong.com