Ruojin Cai's Avatar

Ruojin Cai

@ruojin.bsky.social

PhD Student at Cornell CS https://www.cs.cornell.edu/~ruojin/

53 Followers  |  61 Following  |  6 Posts  |  Joined: 10.12.2024  |  1.6346

Latest posts by ruojin.bsky.social on Bluesky

Preview
Can Generative Video Models Help Pose Estimation? Yes! We find that off-the-shelf generative video models can hallucinate plausible intermediate frames that provide useful context for pose estimators (e.g. DUSt3R), especially for images with little t...

Project page: inter-pose.github.io
Paper: arxiv.org/abs/2412.16155

Great thanks to the amazing team Jason Y. Zhang (@jasonyzhang.bsky.social), Philipp Henzler, Zhengqi Li (@zhengqili.bsky.social), Noah Snavely (@snavely.bsky.social), Ricardo Martin-Brualla.

23.12.2024 17:44 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

This also applies to MASt3R. While MASt3R excels with overlapping pairs via feature matching, it struggles with non-overlapping ones due to unreliable correspondences. InterPose maintains robustness, outperforming MASt3R on outward-facing and matching it on center-facing datasets.

23.12.2024 17:44 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

We show that InterPose generalizes across 3 SOTA video models (DynamiCrafter, Runway Gen-3, Luma Dream Machine) and consistently outperforms DUSt3R on 4 diverse datasets (indoor, outdoor, object) using our new benchmark, which selects challenging pairs with little to no overlap.

23.12.2024 17:44 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

⚠️ Challenge: Generated videos may contain visual artifacts or implausible motion.
πŸ”‘ Solution: We generate multiple videos and use a self-consistency metric to select the most visually consistent sample.

23.12.2024 17:44 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

πŸ’‘ Motivation – Powerful Visual Priors: Video models are pre-trained on vast web-scale video data, enabling them to learn significantly more powerful priors of the visual world compared to 3D models like DUSt3R requiring 3D datasets.

23.12.2024 17:44 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

πŸ€”Can Generative Video Models Help Pose Estimation?
βœ…Yes!
We find that generative video models can hallucinate plausible intermediate frames that provide useful context for pose estimators (e.g. DUSt3R), especially for images with little to no overlap.
πŸ”— inter-pose.github.io

23.12.2024 17:44 β€” πŸ‘ 15    πŸ” 4    πŸ’¬ 1    πŸ“Œ 1
Video thumbnail

Introducing Doppelgangers++! πŸš€ An enhanced pairwise image classifier that tackles visual aliasing (doppelgangers) to improve 3D reconstruction accuracy across diverse, real-world scenes. 🌍✨
πŸ”—Project page: bit.ly/3VAPMJc. Code is also available.

11.12.2024 02:40 β€” πŸ‘ 21    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0
Post image Post image Post image Post image


Doppelgangers++: Improved Visual Disambiguation with Geometric 3D Features

Yuanbo Xiangli, Ruojin Cai, Hanyu Chen, Jeffrey Byrne,
@snavely.bsky.social

tl;dr: new dataset (55K pairs) + Mast3r == PROFIT
arxiv.org/abs/2412.05826

10.12.2024 10:19 β€” πŸ‘ 17    πŸ” 5    πŸ’¬ 1    πŸ“Œ 0

@ruojin is following 20 prominent accounts