Kwang Moo Yi kmyid - Bluesky Statics

MERG3R: A Divide-and-Conquer Approach to Large-Scale Neural Visual Geometry MERG3R: A training-free divide-and-conquer framework that enables geometric foundation models to scale to large, unordered image collections far beyond their native memory limits.

leochengkx.github.io/MERG3R/

04.03.2026 21:05 — 👍 0 🔁 0 💬 0 📌 0

Cheng and Shaikh et al., "MERG3R: A Divide-and-Conquer Approach to Large-Scale Neural Visual Geometry"

Simple, practical idea that works. Sort->split->merge for faster, scalable reconstruction.

04.03.2026 21:05 — 👍 0 🔁 0 💬 1 📌 0

NERFIFY: Turning NeRF Papers into Code NERFIFY: A Multi-Agent Framework for Turning NeRF Papers into Code. Accepted to CVPR 2026.

seemandhar.github.io/NERFIFY/

03.03.2026 17:02 — 👍 0 🔁 0 💬 0 📌 0

Jain et al., "NERFIFY: A Multi-Agent Framework for Turning NeRF Papers into Code"

Given what we have now, it makes sense that we should no longer spend time reproducing work. Let agents read paper, code, verify with nerfstudio.

03.03.2026 17:02 — 👍 0 🔁 0 💬 1 📌 0

Marginalized Bundle Adjustment: Multi-View Camera Pose from Monocular Depth Estimates Structure-from-Motion (SfM) is a fundamental 3D vision task for recovering camera parameters and scene geometry from multi-view images. While recent deep learning advances enable accurate Monocular De...

arxiv.org/abs/2602.18906

26.02.2026 19:37 — 👍 1 🔁 0 💬 0 📌 0

Zhu et al., "Marginalized Bundle Adjustment: Multi-View Camera Pose from Monocular Depth Estimates"

With monocular depth estimators being more and more accurate, they are often better for SfM in many cases. Good, but not the best for IMC phototourism yet!

26.02.2026 19:36 — 👍 0 🔁 0 💬 1 📌 0

Flow3r: Factored Flow Prediction for Scalable Visual Geometry Learning DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion

flow3r-project.github.io

26.02.2026 00:02 — 👍 0 🔁 0 💬 0 📌 0

Cong et al., "Flow3r: Factored Flow Prediction for Scalable Visual Geometry Learning"

More data is better, and you can use unlabeled videos with off-the-shelf dense flow predictors to do this. 800k unlabeled video to further enhance geometry estimation.

26.02.2026 00:02 — 👍 0 🔁 0 💬 1 📌 0

When Pretty Isn't Useful: Investigating Why Modern Text-to-Image Models Fail as Reliable Training Data Generators Recent text-to-image (T2I) diffusion models produce visually stunning images and demonstrate excellent prompt following. But do they perform well as synthetic vision data generators? In this work, we ...

arxiv.org/abs/2602.19946

24.02.2026 19:07 — 👍 1 🔁 0 💬 1 📌 0

Adamkiewicz et al., "When Pretty Isn't Useful: Investigating Why Modern Text-to-Image Models Fail as Reliable Training Data Generators"

Interesting that while we have "better" image generators, their usefulness as synthetic data generators is declining. Do we need a pivot?

24.02.2026 19:07 — 👍 7 🔁 1 💬 2 📌 1

Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control

codeysun.github.io/generated-re...

23.02.2026 19:56 — 👍 0 🔁 0 💬 0 📌 0

Xie, Sun, Neall et al., "Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control"

Camera & 3D hand pose conditioned video generation with DiTs.

23.02.2026 19:56 — 👍 1 🔁 0 💬 1 📌 0

DreamZero: World Action Models are Zero-shot Policies Generated by create next app

dreamzero0.github.io

20.02.2026 21:01 — 👍 1 🔁 0 💬 0 📌 0

Ye et al., "World Action Models are Zero-shot Policies"

Video + Action training. A LOT of engineering gems that allow making it work with an actual robot.

20.02.2026 20:43 — 👍 4 🔁 0 💬 2 📌 0

Unpaired Image-to-Image Translation via a Self-Supervised Semantic Bridge Adversarial diffusion and diffusion-inversion methods have advanced unpaired image-to-image translation, but each faces key limitations. Adversarial approaches require target-domain adversarial loss d...

arxiv.org/abs/2602.16664

19.02.2026 19:25 — 👍 0 🔁 0 💬 0 📌 0

Liu at al., "Unpaired Image-to-Image Translation via a Self-Supervised Semantic Bridge"

Train a diffusion bridge to go to Self-Supervised foundational features and back. Allows image-to-image translation.

19.02.2026 19:25 — 👍 1 🔁 0 💬 1 📌 0

CAPA: Depth Completion as Parameter-Efficient Test-Time Adaptations

research.nvidia.com/labs/dvl/pro...

18.02.2026 23:23 — 👍 0 🔁 0 💬 0 📌 0

Ke et al., "CAPA: Depth Completion as Parameter-Efficient Test-Time Adaptation"

Fine-tune your foundational model at test time with sparse measurements. Makes a lot of sense if you have, e.g. Lidar measurements with you.

18.02.2026 23:23 — 👍 1 🔁 0 💬 1 📌 0

Sphere Encoder Image Generation with a Sphere Encoder

sphere-encoder.github.io

17.02.2026 20:29 — 👍 0 🔁 0 💬 0 📌 0

Yue et al., "Image Generation with a Sphere Encoder"

Sampling Gaussian noise independently for each pixel causes you to actually sample more-or-less on a sphere. So let our latent space be on a sphere.

17.02.2026 20:29 — 👍 0 🔁 0 💬 1 📌 0

Paper Title | Project Page

qinbaigao.github.io/VF-Editor-pr...

13.02.2026 22:30 — 👍 0 🔁 0 💬 0 📌 0

Qin and Sun et al., "Variation-aware Flexible 3D Gaussian Editing"

Distill 2D edits into a feed-forward model that predicts edit fields. Quick 3D editing of 3D Gaussians.

13.02.2026 22:30 — 👍 0 🔁 0 💬 1 📌 0

FastFlow: Accelerating The Generative Flow Matching Models with Bandit Inference Flow-matching models deliver state-of-the-art fidelity in image and video generation, but the inherent sequential denoising process renders them slower. Existing acceleration methods like distillation...

arxiv.org/abs/2602.11105

12.02.2026 18:37 — 👍 0 🔁 0 💬 0 📌 0

Bajpai et al., "FastFlow: Accelerating The Generative Flow Matching Models with Bandit Inference"

You can use Multi-Armed Bandits with Flow Matching models to accelerate inference by 2.6x, all without any training and minimal overhead.

12.02.2026 18:37 — 👍 1 🔁 0 💬 1 📌 0

Luo et al., "4RC: 4D Reconstruction via Conditional Querying Anytime and Anywhere"

Encode videos into 4D latents, then query motion and geometry between any two frames.

11.02.2026 22:46 — 👍 2 🔁 0 💬 0 📌 0

Luo et al., "4RC: 4D Reconstruction via Conditional Querying Anytime and Anywhere"

Encode into a 4D latents, which is used to conditionally decode geometry and motion.

11.02.2026 21:36 — 👍 0 🔁 0 💬 0 📌 0

Foundation Inference Models for Ordinary Differential Equations Ordinary differential equations (ODEs) are central to scientific modelling, but inferring their vector fields from noisy trajectories remains challenging. Current approaches such as symbolic regressio...

arxiv.org/abs/2602.08733

10.02.2026 23:00 — 👍 0 🔁 0 💬 0 📌 0

Mauel and Hübers et al., "Foundation Inference Models for Ordinary Differential Equations"

Feed-forward solver for generic ODEs. Also easily fine-tunable for a given task. These things always fascinate me. Sort of learning to learn (solve?)

10.02.2026 23:00 — 👍 0 🔁 0 💬 1 📌 0

Splat and Distill: 3D-Aware Distillation Lifting 2D features into 3D Gaussians for better geometric and semantic awareness.

davidshavin4.github.io/Splat-and-Di...

06.02.2026 20:20 — 👍 0 🔁 0 💬 0 📌 0

Shavin and Benaim, "Splat and Distill: Augmenting Teachers with Feed-Forward 3D Reconstruction For 3D-Aware Distillation"

When distilling vision foundation models with a focus on geometric consistency, insert a feed-forward Gaussian Splatting in the middle.

06.02.2026 20:20 — 👍 1 🔁 0 💬 1 📌 0

Posts by Kwang Moo Yi (@kmyid.bsky.social)