Yixin Chen's Avatar

Yixin Chen

@yixinchen.bsky.social

Research Scientist at BIGAI, 3D Vision, prev @UCLA, @MPI_IS, @Amazon, https://yixchen.github.io

45 Followers  |  157 Following  |  14 Posts  |  Joined: 13.01.2025  |  1.4918

Latest posts by yixinchen.bsky.social on Bluesky

We hope this can provide some insights on how to design diffusion-based NVS methods to improve their consistency and plausibility!

πŸ§©πŸ’»πŸ—‚οΈ All code, data, & checkpoints are released!
πŸ”— Learn more: jason-aplp.github.io/MOVIS/ (6/6)

01.04.2025 01:46 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

πŸ“Š We also visualize the sampling process of:

πŸ”Ή Ours (with biased timestep scheduler) βœ…

πŸ”Ή Zero123 (without it) ❌

Our approach shows more precise location prediction in the earlier stage & finer detail refinement in later stages! 🎯✨ (5/6)

01.04.2025 01:45 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸ’‘ Key insight in MOVIS: A biased noise timestep scheduler for diffusion-based novel view synthesizer that prioritizes larger timesteps early in training and gradually decreases them over time. This improves novel view synthesis in multi-object scenes! 🎯πŸ”₯ (4/6)

01.04.2025 01:45 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸ”We analyze the sampling process of diffusion-based novel view synthesizers and:
πŸ“Œ Larger timesteps β†’ Focus on position & orientation recovery
πŸ“Œ Smaller timesteps β†’ Refine geometry & appearance

πŸ‘‡ We visualize the sampling process below! (3/6)

01.04.2025 01:44 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

In MOVIS, we enhance diffusion-based novel view synthesis with:
πŸ” Additional structural inputs (depth & mask)
πŸ–ŒοΈ Novel-view mask prediction as an auxiliary task
🎯 A biased noise scheduler to facilitate training
We identify the following key insight: (2/6)

01.04.2025 01:43 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸš€How to preserve object consistency in NVS, ensuring correct position, orientation, plausible geometry, and appearance? This is especially critical for image/video generative models and world models.

πŸŽ‰Check out our #CVPR2025 paper: MOVIS (jason-aplp.github.io/MOVIS) πŸ‘‡ (1/6)

01.04.2025 01:42 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

This line highlights our work in reconstruction and scene understandingβ€”including SSR (dali-jack.github.io/SSR/), PhyScene (physcene.github.io), PhyRecon(phyrecon.github.io), ArtGS (articulate-gs.github.io), etc.β€”with more to come soon!πŸ™ŒπŸ™Œ (n/n)

21.03.2025 09:52 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

Even more!

Our model generalizes to in-the-wild scenes like YouTube videosπŸŽ₯🌍! Using just *15 input views*, we achieve high-quality reconstructions with detailed geometry & appearance. 🌟 Watch the demo to see it in action! πŸ‘‡ (5/n)

21.03.2025 09:52 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

πŸ† On datasets like Replica and ScanNet++, our model produces higher-quality reconstructions compared to baselines, including better accuracy in less-captured areas, more precise object structures, smoother backgrounds, and fewer floating artifacts. πŸ‘€ (4/n)

21.03.2025 09:51 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

πŸŽ₯✨ Our method excels in large, heavily occluded scenes, outperforming baselines that require 100 views using just 10. The reconstructed scene supports interactive text-based editing, and its decomposed object meshes enable photorealistic VFX edits.πŸ‘‡ (3/n)

21.03.2025 09:50 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

πŸ› οΈ Our method combines decompositional neural reconstruction with diffusion prior, filling in missing information in less observed and occluded regions. The reconstruction (rendering loss) and generative (SDS loss) guidance are balanced by our visibility-guided modeling. (2/n)

21.03.2025 09:48 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

πŸš€ How to reconstruct 3D scenes with decomposed objects from sparse inputs?

Check out DPRecon (dp-recon.github.io) at #CVPR2025 β€” it recovers all objects, achieves photorealistic mesh rendering, and supports text-based geometry & appearance editing. More detailsπŸ‘‡ (1/n)

21.03.2025 09:48 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 5    πŸ“Œ 0
Post image

πŸ“’πŸ“’πŸ“’Excited to announce the 5th Workshop on 3D Scene Understanding for Vision, Graphics, and Robotics at #CVPR2025! Expect our awesome speakers and challenges on multi-modal 3D scene understanding and reasoning. πŸŽ‰πŸŽ‰πŸŽ‰

Learn more at scene-understanding.com.

14.03.2025 09:20 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Checking the digest from scholar-inbox has become my daily routine. A real game-changer!πŸ‘πŸ‘πŸ‘

16.01.2025 02:33 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@yixinchen is following 20 prominent accounts