This is how 3DGS is getting mainstream
16.02.2026 17:11 β π 0 π 0 π¬ 0 π 0This is how 3DGS is getting mainstream
16.02.2026 17:11 β π 0 π 0 π¬ 0 π 0
Ranran Huang, Krystian Mikolajczyk from Imperial College London
Project page: ranrhuang.github.io/spfsplat/
Paper: arxiv.org/abs/2508.01171
Source code (coming soon): github.com/ranrhuang/SP...
"No Pose at All Self-Supervised Pose-Free 3DGS from Sparse Views"
TLDR: 3DGS + no poses during training/inference; shared feature extraction backbone; simultaneous prediction of 3D Gaussian primitives+camera poses in a canonical space from unposed (1 feed-forward step).
Yang Yang 1,2, Siming Zheng 2, Jinwei Chen 2, Boxi Wu 1, Xiaofei He 1, Deng Cai 1, Bo Li 2, Peng-Tao Jiang 2
1 Zhejiang University
2 VIVO MOBILE Communication Co., Ltd
Project page: vivocameraresearch.github.io/any2bokeh/
Paper: arxiv.org/abs/2505.21593
Source code: github.com/vivoCameraRe...
"Any-to-Bokeh: One-Step Video Bokeh via Multi-Plane Image Guided Diffusion"
πTL;DR: Any-to-Bokeh is a novel one-step video bokeh framework that converts arbitrary input videos into temporally coherent, depth-aware bokeh effects.
@sharathgirish97 1,2 @_TianyeLi 2* Amrita Mazumdar 2* @abhi2610 2 @davedotluebke 2 @shalinidemello 2
1 @umdglobalcampus
2 @nvidia
*Equal contributions
Project page: research.nvidia.com/labs/amri/pr...
Paper: openreview.net/pdf?id=7xhwE...
Source code (released! NVIDIA license - non commercial): github.com/nvlabs/queen
"QUEEN: QUantized Efficient ENcoding of Dynamic Gaussians for Streaming Free-viewpoint Videos"
TL;DR: Streamable free-viewpoint videos efficient representations for with dynamic Gaussians. Reduce model size to just 0.7 MB per frame while training in < 5s and rendering at 350 FPS
Jiawei Yang *,ΒΆ, Jiahui Huang ΒΆ, Yuxiao Chen ΒΆ,
Yan Wang ΒΆ, Boyi Li ΒΆ, Yurong You ΒΆ, Maximilian Igl ΒΆ, Apoorva Sharma ΒΆ, Peter Karkus ΒΆ, Danfei Xu $,ΒΆ, Boris Ivanovic ΒΆ, Yue Wang β ,*,ΒΆ Marco Pavone β ,Β§,ΒΆ
* University of Southern California
$ GIT
Β§ Stanford University
ΒΆ NVIDIA
β Equal advising
Project page: jiawei-yang.github.io/STORM/
Paper: arxiv.org/abs/2501.00602
Source code: github.com/NVlabs/Gauss...
"STORM: Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes"
TL;DR: Data driven transformer in a feed forward manner; dense reconstruction in dynamic environment with 3D gaussians and velocities; self-supervised scene flows
@ucberkeleyofficial.bsky.social
, Max Planck Institute for Intelligent Systems, Stanford University
Project page: st4rtrack.github.io
Paper: st4rtrack.github.io/files/St4RTr...
Results: st4rtrack.github.io/page1.html
(2/2) capturing static+dynamic scene geometry while maintaining 3D correspondences; long-range correspondences, effectively combining 3D reconstruction with 3D tracking; re-projection loss.
22.04.2025 16:31 β π 1 π 0 π¬ 1 π 0
St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World
TL;DR: a feed-forward; (reconstructs+tracks dynamic video content); dust3r-like pointmaps for a pair of frames captured at different moments (1/2)
Shangzhan Zhang 1,2*, @jianyuan_wang
3*, @YinghaoXu1
4*β , Nan Xue 2, Christian Rupprecht 3, @XiaoweiZhou5
1β , Yujun Shen 2, @GordonWetzstein 4
1 Zhejiang University
2 AntGroup
3 University of Oxford
4 Stanford University
*, β equal contributions (?)
Project page: zhanghe3z.github.io/FLARE/
Paper: arxiv.org/pdf/2502.12138
Source code: github.com/ant-research...
Demo: huggingface.co/spaces/zhang...
FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views
TL;DR: feed-forward model; cascaded learning paradigm with camera pose serving as the critical bridge, recognizing its essential role in mapping 3D structures onto 2D image planes.
Project page: fast3r-3d.github.io
Demo: fast3r.ngrok.app
Source code: github.com/facebookrese...
β‘οΈFast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
TL;DR: multi-view generalization to DUSt3R; processing many views in parallel: Transformer-based architecture forwards N images in a single forward pass, bypassing the need for iterative alignment.
Project page: ali-vilab.github.io/VACE-Page/
Paper: arxiv.org/pdf/2503.07598
πͺ VACE: All-in-One Video Creation and Editing
from @alibabagroup.bsky.social's Tongyi Lab with:
Zeyinzi Jiang* Zhen Han* Chaojie Mao*β Jingfeng Zhang Yulin Pan Yu Liu
*Equal contribution, β Project lead
From @nvidia (1), @NUSingapore (2), @UofT (3) and @VectorInst (4)
@jayzhangjiewu 1,2*, Yuxuan Zhang 1*, Haithem Turki 1, Xuanchi Ren 1,3,4, @JunGao33210520 1,3,4, Mike Zheng Shou 2, @FidlerSanja 1,3,4, @ZGojcic 1β , @HuanLing6 1,3,4β
*, β equal contribution
Project page: research.nvidia.com/labs/toronto...
Paper: arxiv.org/abs/2503.01774
Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models
TL;DR: single-step diffusion models; a single-step image diffusion model trained to enhance and remove artifacts in rendered novel views caused by underconstrained regions of the 3D representation.
Authors: Jovana VidenoviΔ , Alan LukeΕΎiΔ , Matej Kristan
from Faculty of Computer and Information Science, University of Ljubljana
Project page: jovanavidenovic.github.io/dam-4-sam/
Paper: arxiv.org/abs/2411.17576
Source code: github.com/jovanavideno...
A Distractor-Aware Memory (DAM) for Visual Object Tracking with SAM2
TL;DR: SAM2.1 based; distractor-distilled (DiDi) dataset to better study the distractor problem
Project page: sites.google.com/view/cast4
Paper: arxiv.org/pdf/2502.12894
Youtube video: www.youtube.com/watch?v=cloV...
Planned to be open sourced
CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image
TL;DR: object-level 2D segmentation+relative depth; GPT-based model to analyze inter-object spatial relationships; occlusion-aware large-scale 3D generation model
Project page: alviur.github.io/color-illusi...
Paper: arxiv.org/abs/2412.10122
Planned to be released on github!
Are diffusion models falling for optical illusion?
"The Art of Deception: Color Visual Illusions and Diffusion Models"
TL;DR: Diffusion models exhibit human-like perceptual shifts in brightness and color within their latent space.