Ruojin Cai @ruojin - Bluesky Profile

Latest posts by ruojin.bsky.social on Bluesky

Can Generative Video Models Help Pose Estimation? Yes! We find that off-the-shelf generative video models can hallucinate plausible intermediate frames that provide useful context for pose estimators (e.g. DUSt3R), especially for images with little t...

Project page: inter-pose.github.io
Paper: arxiv.org/abs/2412.16155

Great thanks to the amazing team Jason Y. Zhang (@jasonyzhang.bsky.social), Philipp Henzler, Zhengqi Li (@zhengqili.bsky.social), Noah Snavely (@snavely.bsky.social), Ricardo Martin-Brualla.

23.12.2024 17:44 — 👍 0 🔁 0 💬 0 📌 0

This also applies to MASt3R. While MASt3R excels with overlapping pairs via feature matching, it struggles with non-overlapping ones due to unreliable correspondences. InterPose maintains robustness, outperforming MASt3R on outward-facing and matching it on center-facing datasets.

23.12.2024 17:44 — 👍 1 🔁 0 💬 1 📌 0

We show that InterPose generalizes across 3 SOTA video models (DynamiCrafter, Runway Gen-3, Luma Dream Machine) and consistently outperforms DUSt3R on 4 diverse datasets (indoor, outdoor, object) using our new benchmark, which selects challenging pairs with little to no overlap.

23.12.2024 17:44 — 👍 1 🔁 0 💬 1 📌 0

⚠️ Challenge: Generated videos may contain visual artifacts or implausible motion.
🔑 Solution: We generate multiple videos and use a self-consistency metric to select the most visually consistent sample.

23.12.2024 17:44 — 👍 1 🔁 0 💬 1 📌 0

💡 Motivation – Powerful Visual Priors: Video models are pre-trained on vast web-scale video data, enabling them to learn significantly more powerful priors of the visual world compared to 3D models like DUSt3R requiring 3D datasets.

23.12.2024 17:44 — 👍 1 🔁 0 💬 1 📌 0

🤔Can Generative Video Models Help Pose Estimation?
✅Yes!
We find that generative video models can hallucinate plausible intermediate frames that provide useful context for pose estimators (e.g. DUSt3R), especially for images with little to no overlap.
🔗 inter-pose.github.io

23.12.2024 17:44 — 👍 15 🔁 4 💬 1 📌 1

Introducing Doppelgangers++! 🚀 An enhanced pairwise image classifier that tackles visual aliasing (doppelgangers) to improve 3D reconstruction accuracy across diverse, real-world scenes. 🌍✨
🔗Project page: bit.ly/3VAPMJc. Code is also available.

11.12.2024 02:40 — 👍 21 🔁 4 💬 1 📌 0

Doppelgangers++: Improved Visual Disambiguation with Geometric 3D Features

Yuanbo Xiangli, Ruojin Cai, Hanyu Chen, Jeffrey Byrne,
@snavely.bsky.social

tl;dr: new dataset (55K pairs) + Mast3r == PROFIT
arxiv.org/abs/2412.05826

10.12.2024 10:19 — 👍 17 🔁 5 💬 1 📌 0

@ruojin is following 20 prominent accounts

Boyang Deng
@boyangdeng

Liyuan Zhu
@liyuanzzz

PhD student @ Stanford University. MS @ ETH Zurich. 3D Vision and Generation. https://www.zhuliyuan.net/

Yue Chen
@fanegg

PhD Student at Westlake University. 3D/4D Reconstruction, Virtual Humans. fanegg.github.io

Yuliang Xiu
@xiuyuliang

Assistant Professor @ Westlake University. PhD @ MPI-IS. Focusing on democratizing human-centric digitization.

@momergul

CS PhD Student @Cornell

Qianqian Wang
@qianqianwang

@gemmechu

CS PhD student at @Cornell Computer vision, graphics and Machine Learning gemmechu.github.io

Andreas Geiger
@andreasgeiger

Professor, University of Tübingen @unituebingen.bsky.social. Head of Department of Computer Science 🎓. Faculty, Tübingen AI Center 🇩🇪 @tuebingen-ai.bsky.social. ELLIS Fellow, Founding Board Member 🇪🇺 @ellis.eu. CV 📷, ML 🧠, Self-Driving 🚗, NLP 🖺

Aaron Hertzmann
@aaronhertzmann.com

www.dgp.toronto.edu/~hertzman

Georgina Woo
@georginawooxy

Yang Research Scholar | Kanwisher Lab | McGovern Institute for Brain Research | MIT BCS

Philipp
@phenzler

Researcher in generative 3D AI @ Google. PhD from UCL.

Ethan Weber
@ethanjohnweber

PhD student at UC Berkeley. MIT EECS BS '20 & MEng '21. I research 3D computer vision for AR, robotics, and social good. See http://ethanweber.me for more info.

Eric Chen
@ericmchen

PhD student at MIT. Previously at Cornell. Originally from Queens. Computer vision, graphics, and machine learning echen01.github.io

Kristin Branson
@kristinmbranson

Researcher in machine learning and computer vision for science. Senior Group Leader at HHMI Janelia Research Campus. Supporter of DEIB in science and tech. CV: https://bit.ly/BransonCV

Georg Bökman
@bokmangeorg

Geometric deep learning + Computer vision

Eric Dexheimer
@ericdexheimer

PhD student at Dyson Robotics Lab, Imperial College London http://edexheim.github.io

Tobias Fischer
@tobiasfshr

Research Scientist Intern @ NVIDIA PhD Student @ CVG, ETH Zurich Prev. @ Meta Reality Labs, UC Berkeley, RWTH Dynamic 3D Vision, NeRFs, 3DGS tobiasfshr.github.io

@amberxlyb

Yoav Artzi
@yoavartzi.com

LM/NLP/ML researcher ¯\_(ツ)_/¯ yoavartzi.com / associate professor @ Cornell CS + Cornell Tech campus @ NYC / nlp.cornell.edu / associate faculty director @ arXiv.org / researcher @ ASAPP / starting @colmweb.org / building RecNet.io

Keunhong Park
@keunhong

World Labs. Former research scientist at Google. Ph.D UWCSE. 📍 San Francisco 🔗 keunhong.com