Mustafa Fanaswala's Avatar

Mustafa Fanaswala

@mfanaswala.bsky.social

Computer vision | SLAM | AR/VR | Robotics | Self-driving cars | CrossFit | Salsero/Bachatero. MSFT, Ex-Nvidia

133 Followers  |  681 Following  |  3 Posts  |  Joined: 17.11.2024  |  2.139

Latest posts by mfanaswala.bsky.social on Bluesky

Video thumbnail

LoftπŸ†™ Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models. We achieve SotA upsampling results for DINOv2. Paper and code:
andrehuang.github.io/loftup-site/

26.04.2025 14:47 β€” πŸ‘ 28    πŸ” 3    πŸ’¬ 2    πŸ“Œ 0
Code review for statisticians, data scientists & modellers – Jack Kennedy Software developers have some really good approaches to code review. Here’s a data scientist’s plea to listen to the software developers!

I really like this code review article from Jack Kennedy that @nrennie.bsky.social shared in her talk yesterday.
jcken95.github.io/projects/cod...

24.04.2025 13:08 β€” πŸ‘ 18    πŸ” 5    πŸ’¬ 0    πŸ“Œ 0
Post image

Roboverse: unified simulation + dataset + benchmarking that supports many different robotics simulators including nvidia omniverse. One step closer to robotics getting its mmlu, maybe.

08.04.2025 02:05 β€” πŸ‘ 24    πŸ” 5    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

We have made some post-conference improvements of our CVPR'25 paper on end-to-end trained navigation. The agent has similar success rate but is more efficient, faster, less hesitant. Will be presented at CVPR in June.

arxiv.org/abs/2503.08306

Work by @steevenj7.bsky.social et al.

05.04.2025 14:38 β€” πŸ‘ 27    πŸ” 3    πŸ’¬ 2    πŸ“Œ 1
Video thumbnail

πŸ”Looking for a multi-view depth method that just works?

We're excited to share MVSAnywhere, which we will present at #CVPR2025. MVSAnywhere produces sharp depths, generalizes and is robust to all kind of scenes, and it's scale agnostic.

More info:
nianticlabs.github.io/mvsanywhere/

31.03.2025 12:52 β€” πŸ‘ 40    πŸ” 10    πŸ’¬ 2    πŸ“Œ 4
Post image Post image Post image

MP-SfM: Monocular Surface Priors for Robust Structure-from-Motion

Zador Pataki, @pesarlin.bsky.social Johannes L. Schonberger, @marcpollefeys.bsky.social
tl;dr: using monodepth to reconstruct w/o co-visible triplets. Many ablations and details. M3Dv2 FTW
demuc.de/papers/patak...

31.03.2025 06:38 β€” πŸ‘ 24    πŸ” 4    πŸ’¬ 1    πŸ“Œ 1
Post image Post image Post image Post image

Image as an IMU: Estimating Camera Motion from a Single Motion-Blurred Image

Jerred Chen, Ronald Clark
tl;dr:predict flow from blurred image -> solve for velocity, use as IMU information.
arxiv.org/abs/2503.17358

24.03.2025 09:38 β€” πŸ‘ 14    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Post image Post image Post image

OpenCity3D: What do Vision-Language Models know about Urban Environments?

Valentin Bieri, Marco Zamboni, Nicolas S. Blumer, Qingxuan Chen, Francis Engelmann
tl;dr: if you have aerial 3D reconstruction, use SigLIP to be happy.
arxiv.org/abs/2503.16776

24.03.2025 10:21 β€” πŸ‘ 18    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Post image

'We Are Eating the Earth' author Michael Grunwald, explores an important question, "How should writers like me approach four years of drill-baby-drill hostility to climate progress, and how should readers like you think about it?" @mikegrunwald.bsky.social
www.canarymedia.com/articles/foo...

20.03.2025 19:26 β€” πŸ‘ 6    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Post image Post image Post image Post image

Multi-view Reconstruction via SfM-guided Monocular Depth Estimation

Haoyu Guo, He Zhu, Sida Peng, Haotong Lin, Yunzhi Yan, Tao Xie, Wenguan Wang, Xiaowei Zhou, Hujun Bao

arxiv.org/abs/2503.14483

20.03.2025 09:28 β€” πŸ‘ 7    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Post image Post image Post image Post image

Bolt3D: Generating 3D Scenes in Seconds

Stanislaw Szymanowicz, Jason Y. Zhang, Pratul Srinivasan, Ruiqi Gao, Arthur Brussee, @holynski.bsky.social, Ricardo Martin-Brualla, @jonbarron.bsky.social, Philipp Henzler

arxiv.org/abs/2503.14445

20.03.2025 09:30 β€” πŸ‘ 12    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

Transformers, but without normalization layers

A simple alternative to normalization layers: the scaled tanh function, which they call Dynamic Tanh, or DyT.

14.03.2025 05:41 β€” πŸ‘ 15    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Post image Post image Post image Post image

Fixing the RANSAC Stopping Criterion

Johannes SchΓΆnberger, Viktor Larsson, @marcpollefeys.bsky.social

tl;dr: original RANSAC formula for number of iterations underestimates for hard cases and overestimates for easy. Here is corrected one -> better results

arxiv.org/abs/2503.07829

12.03.2025 07:37 β€” πŸ‘ 32    πŸ” 7    πŸ’¬ 4    πŸ“Œ 0
Preview
Interviewing Eugene Vinitsky on self-play for self-driving and what else people do with RL #13. Reinforcement learning fundamentals and scaling.

Another grad school RL friend on the pod! Lot's of non-reasoning RL talk here!
Interviewing Eugene Vinitsky (@eugenevinitsky.bsky.social) on self-play for self-driving and what else people do with RL
#13. Reinforcement learning fundamentals and scaling.
Post: buff.ly/8fLBJA6
YouTube: buff.ly/eJ6heSI

12.03.2025 14:09 β€” πŸ‘ 24    πŸ” 3    πŸ’¬ 0    πŸ“Œ 1

Amazing threads!
I wish to read more papers like this! Envying the reviewers

11.03.2025 06:39 β€” πŸ‘ 15    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image Post image Post image Post image

Data Augmentation for NeRFs in the Low Data Limit

Ayush Gaggar, Todd D. Murphey

tl;dr: any uncertainty-based view sampling is better than next-best-view sampling.
I didn't get where the "augmentation" comes from though
arxiv.org/abs/2503.02092

10.03.2025 11:21 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image

Introducing DaD (arxiv.org/abs/2503.07347), a pretty cool keypoint detector.
As this will get pretty long, this will be two threads.
The first will go into the RL part, and the second on the emergence and distillation.

11.03.2025 03:04 β€” πŸ‘ 62    πŸ” 11    πŸ’¬ 4    πŸ“Œ 3
Post image Post image Post image Post image

Image-Based Relocalization and Alignment for Long-Term Monitoring of Dynamic Underwater Environments

Beverley Gorry, Tobias Fischer, Michael Milford, Alejandro Fontan
tl;dr: SuperPoint +LightGlue can breath underwater.
arxiv.org/abs/2503.04096

10.03.2025 11:12 β€” πŸ‘ 4    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Post image

We made a new keypoint detector named DaD, paper isn't up yet, but code and weights are:
github.com/Parskatt/dad

10.03.2025 07:53 β€” πŸ‘ 44    πŸ” 8    πŸ’¬ 7    πŸ“Œ 0

This is fantastic. My human sensorimotor skills barely let me do something like that.

10.03.2025 16:38 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image Post image Post image Post image

JamMa: Ultra-lightweight Local Feature Matching with Joint Mamba

Xiaoyong Lu, Songlin Du

tl;dr: replace Transformer in LoFTR with Mamba

Mamba takes the torch in local feature matching

no eval on IMC

github.com/leoluxxx/JamMa
arxiv.org/abs/2503.03437

06.03.2025 04:57 β€” πŸ‘ 7    πŸ” 1    πŸ’¬ 1    πŸ“Œ 1
Post image Post image Post image

Integral Forms in Matrix Lie Groups

Timothy D Barfoot

tl;dr: minimal polynomial->Lie algebra->compact analytic results
transfer back and forth between series form and integra

arxiv.org/abs/2503.02820

05.03.2025 04:23 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image

Dataset Distillation (2018/2020)

They show that it is possible to compress 60,000 MNIST training images into just 10 synthetic distilled images (one per class) and achieve close to original performance with only a few gradient descent steps, given a fixed network initialization.

05.03.2025 00:23 β€” πŸ‘ 26    πŸ” 5    πŸ’¬ 2    πŸ“Œ 1
Preview
GitHub - yuanchenyang/smalldiffusion: Simple and readable code for training and sampling from diffusion models Simple and readable code for training and sampling from diffusion models - yuanchenyang/smalldiffusion

smalldiffusion

A lightweight diffusion library for training and sampling from diffusion models. The core of this library for diffusion training and sampling is implemented in less than 100 lines of very readable pytorch code.

github.com/yuanchenyang...

05.03.2025 05:34 β€” πŸ‘ 20    πŸ” 4    πŸ’¬ 3    πŸ“Œ 0
Preview
Model Predictive Control and Reinforcement Learning: A Unified Framework Based on Dynamic Programming In this paper we describe a new conceptual framework that connects approximate Dynamic Programming (DP), Model Predictive Control (MPC), and Reinforcement Learning (RL). This framework centers around ...

This week's #PaperILike is "Model Predictive Control and Reinforcement Learning: A Unified Framework Based on Dynamic Programming" (Bertsekas 2024).

If you know 1 of {RL, controls} and want to understand the other, this is a good starting point.

PDF: arxiv.org/abs/2406.00592

02.03.2025 16:19 β€” πŸ‘ 43    πŸ” 8    πŸ’¬ 0    πŸ“Œ 0
Post image

Agentic Retrieval-Augmented Generation : A Survey On Agentic RAG

This repository complements the survey paper "Agentic Retrieval-Augmented Generation (Agentic RAG): A Survey On Agentic RAG".

github.com/asinghcsu/Ag...

03.03.2025 03:26 β€” πŸ‘ 13    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image Post image Post image Post image

RUBIK: A Structured Benchmark for Image Matching across Geometric Challenges

@thibautloiseau.bsky.social , Guillaume Bourmaud

tl;dr: GT from nuScenes (authonomous driving images), with difficulty bins: scale, perspective, etc.
Aliked+LG is best among detector
1/2
arxiv.org/abs/2502.19955

28.02.2025 09:49 β€” πŸ‘ 11    πŸ” 3    πŸ’¬ 2    πŸ“Œ 0
Preview
The Blue Report The top links on Bluesky, updated hourly

The Blue Report is amazing. The most clicked on articles are right here

theblue.report

01.03.2025 21:21 β€” πŸ‘ 20000    πŸ” 4657    πŸ’¬ 583    πŸ“Œ 265

@mfanaswala is following 20 prominent accounts