Loftπ Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models. We achieve SotA upsampling results for DINOv2. Paper and code:
andrehuang.github.io/loftup-site/
@mfanaswala.bsky.social
Computer vision | SLAM | AR/VR | Robotics | Self-driving cars | CrossFit | Salsero/Bachatero. MSFT, Ex-Nvidia
Loftπ Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models. We achieve SotA upsampling results for DINOv2. Paper and code:
andrehuang.github.io/loftup-site/
I really like this code review article from Jack Kennedy that @nrennie.bsky.social shared in her talk yesterday.
jcken95.github.io/projects/cod...
Roboverse: unified simulation + dataset + benchmarking that supports many different robotics simulators including nvidia omniverse. One step closer to robotics getting its mmlu, maybe.
08.04.2025 02:05 β π 24 π 5 π¬ 0 π 0We have made some post-conference improvements of our CVPR'25 paper on end-to-end trained navigation. The agent has similar success rate but is more efficient, faster, less hesitant. Will be presented at CVPR in June.
arxiv.org/abs/2503.08306
Work by @steevenj7.bsky.social et al.
πLooking for a multi-view depth method that just works?
We're excited to share MVSAnywhere, which we will present at #CVPR2025. MVSAnywhere produces sharp depths, generalizes and is robust to all kind of scenes, and it's scale agnostic.
More info:
nianticlabs.github.io/mvsanywhere/
MP-SfM: Monocular Surface Priors for Robust Structure-from-Motion
Zador Pataki, @pesarlin.bsky.social Johannes L. Schonberger, @marcpollefeys.bsky.social
tl;dr: using monodepth to reconstruct w/o co-visible triplets. Many ablations and details. M3Dv2 FTW
demuc.de/papers/patak...
Image as an IMU: Estimating Camera Motion from a Single Motion-Blurred Image
Jerred Chen, Ronald Clark
tl;dr:predict flow from blurred image -> solve for velocity, use as IMU information.
arxiv.org/abs/2503.17358
OpenCity3D: What do Vision-Language Models know about Urban Environments?
Valentin Bieri, Marco Zamboni, Nicolas S. Blumer, Qingxuan Chen, Francis Engelmann
tl;dr: if you have aerial 3D reconstruction, use SigLIP to be happy.
arxiv.org/abs/2503.16776
'We Are Eating the Earth' author Michael Grunwald, explores an important question, "How should writers like me approach four years of drill-baby-drill hostility to climate progress, and how should readers like you think about it?" @mikegrunwald.bsky.social
www.canarymedia.com/articles/foo...
Multi-view Reconstruction via SfM-guided Monocular Depth Estimation
Haoyu Guo, He Zhu, Sida Peng, Haotong Lin, Yunzhi Yan, Tao Xie, Wenguan Wang, Xiaowei Zhou, Hujun Bao
arxiv.org/abs/2503.14483
Bolt3D: Generating 3D Scenes in Seconds
Stanislaw Szymanowicz, Jason Y. Zhang, Pratul Srinivasan, Ruiqi Gao, Arthur Brussee, @holynski.bsky.social, Ricardo Martin-Brualla, @jonbarron.bsky.social, Philipp Henzler
arxiv.org/abs/2503.14445
Transformers, but without normalization layers
A simple alternative to normalization layers: the scaled tanh function, which they call Dynamic Tanh, or DyT.
Fixing the RANSAC Stopping Criterion
Johannes SchΓΆnberger, Viktor Larsson, @marcpollefeys.bsky.social
tl;dr: original RANSAC formula for number of iterations underestimates for hard cases and overestimates for easy. Here is corrected one -> better results
arxiv.org/abs/2503.07829
Another grad school RL friend on the pod! Lot's of non-reasoning RL talk here!
Interviewing Eugene Vinitsky (@eugenevinitsky.bsky.social) on self-play for self-driving and what else people do with RL
#13. Reinforcement learning fundamentals and scaling.
Post: buff.ly/8fLBJA6
YouTube: buff.ly/eJ6heSI
Amazing threads!
I wish to read more papers like this! Envying the reviewers
Data Augmentation for NeRFs in the Low Data Limit
Ayush Gaggar, Todd D. Murphey
tl;dr: any uncertainty-based view sampling is better than next-best-view sampling.
I didn't get where the "augmentation" comes from though
arxiv.org/abs/2503.02092
Introducing DaD (arxiv.org/abs/2503.07347), a pretty cool keypoint detector.
As this will get pretty long, this will be two threads.
The first will go into the RL part, and the second on the emergence and distillation.
Image-Based Relocalization and Alignment for Long-Term Monitoring of Dynamic Underwater Environments
Beverley Gorry, Tobias Fischer, Michael Milford, Alejandro Fontan
tl;dr: SuperPoint +LightGlue can breath underwater.
arxiv.org/abs/2503.04096
We made a new keypoint detector named DaD, paper isn't up yet, but code and weights are:
github.com/Parskatt/dad
This is fantastic. My human sensorimotor skills barely let me do something like that.
10.03.2025 16:38 β π 1 π 0 π¬ 0 π 0JamMa: Ultra-lightweight Local Feature Matching with Joint Mamba
Xiaoyong Lu, Songlin Du
tl;dr: replace Transformer in LoFTR with Mamba
Mamba takes the torch in local feature matching
no eval on IMC
github.com/leoluxxx/JamMa
arxiv.org/abs/2503.03437
Integral Forms in Matrix Lie Groups
Timothy D Barfoot
tl;dr: minimal polynomial->Lie algebra->compact analytic results
transfer back and forth between series form and integra
arxiv.org/abs/2503.02820
Dataset Distillation (2018/2020)
They show that it is possible to compress 60,000 MNIST training images into just 10 synthetic distilled images (one per class) and achieve close to original performance with only a few gradient descent steps, given a fixed network initialization.
smalldiffusion
A lightweight diffusion library for training and sampling from diffusion models. The core of this library for diffusion training and sampling is implemented in less than 100 lines of very readable pytorch code.
github.com/yuanchenyang...
This week's #PaperILike is "Model Predictive Control and Reinforcement Learning: A Unified Framework Based on Dynamic Programming" (Bertsekas 2024).
If you know 1 of {RL, controls} and want to understand the other, this is a good starting point.
PDF: arxiv.org/abs/2406.00592
Agentic Retrieval-Augmented Generation : A Survey On Agentic RAG
This repository complements the survey paper "Agentic Retrieval-Augmented Generation (Agentic RAG): A Survey On Agentic RAG".
github.com/asinghcsu/Ag...
RUBIK: A Structured Benchmark for Image Matching across Geometric Challenges
@thibautloiseau.bsky.social , Guillaume Bourmaud
tl;dr: GT from nuScenes (authonomous driving images), with difficulty bins: scale, perspective, etc.
Aliked+LG is best among detector
1/2
arxiv.org/abs/2502.19955
The Blue Report is amazing. The most clicked on articles are right here
theblue.report