Zubair Irshad's Avatar

Zubair Irshad

@zubair-irshad.bsky.social

Research Scientist @Toyota Research Institute | Prev. PhD in AI, ML and CV @GeorgiaTech | Researching 3D Perception, Generative AI for Robotics and Multimodal AI W: https://zubairirshad.com

1,916 Followers  |  700 Following  |  81 Posts  |  Joined: 20.11.2024  |  2.2035

Latest posts by zubair-irshad.bsky.social on Bluesky

Post image Post image Post image Post image

FastMap: Revisiting Dense and Scalable Structure from Motion

Jiahao Li, Haochen Wang, @zubair-irshad.bsky.social, @ivasl.bsky.social, Matthew R. Walter, Vitor Campagnolo Guizilini, Greg Shakhnarovich

tl;dr: replace BA with epipolar error+IRLS; fully PyTorch implementation

arxiv.org/abs/2505.04612

08.05.2025 12:51 β€” πŸ‘ 10    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Shoutout to the authors of the wonderful papers i.e. CtRNet-X, DUSt3R, Segment Anything, CLIP and Pytorch3D and for open-sourcing their codebase to advance science and make this effort happen!

Please check these works out if you haven’t already!

24.04.2025 00:33 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

We have released our improved extrinsics. Try it out now at droid-dataset.github.io and read more details about it in the updated DROID paper at arxiv.org/abs/2403.12945

This was a fun collaboration with
@vitorguizilini, @SashaKhazatsky and @KarlPertsch!

23.04.2025 23:50 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

There’s room to improve. Future work could explore:

β€’ Extending to in-the-wild scenes via foundation models for robot segmentation & keypoints.
β€’ Ensembling predictions over time for better temporal consistency.
β€’ Fine-tuning pointmap models on real robot data to handle cluttered tabletops.

8/n

23.04.2025 23:50 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Large-scale auto calibration in robotics is challenging, and our pipeline has some limits:

β€’ CtRNet-X is trained on Panda; generalization to other robots is untested.
β€’ DUSt3R struggles with clutter or minimal view overlap.
β€’ Steps 2️⃣ & 3️⃣ may yield false positives in tough lighting or geometry.

7/n

23.04.2025 23:50 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Similarly, we plot the distribution of number of matched points and cumulative curve after 3️⃣, helping to identify the top quantile of well-calibrated camera pairs within each lab.

6/n

23.04.2025 23:50 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Automatically calibrating a large-scale dataset is challenging. We provide quality assessment metrics across all three stages, with flexibility to narrow bounds for downstream tasks as needed.

1️⃣ and 2️⃣ quality metrics show IOU and Reprojection-error distributions post-calibration.

5/n

23.04.2025 23:50 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Below we show the Camera-to-Camera transformations, post-calibration improves the alignment of obtained pointclouds!

4/n

23.04.2025 23:50 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We provide:
πŸ€– ~36k calibrated episodes with good-quality extrinsic calibration
🦾 ~24k calibrated multi-view episodes with good-quality multi-view camera calibration
βœ… Quality assessment metrics for all provided camera poses

3/n

23.04.2025 23:50 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

To achieve this, we utilize:
1️⃣ Auto Segment Anything (SAM) based filtering (Camera-to-Base Calibration)
2️⃣ Tuned CtRNet-X for bringing in additional cams (Camera-to-Base Calibration)
3️⃣ Pretrained DUST3R with depth-based pose optimization (Camera-to-Camera Calibration)

2/n

23.04.2025 23:50 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

Introducing ✨Posed DROID✨, results of our efforts at automatic post-hoc calibration of a large-scale robotics manipulation dataset.

Try it out at: droid-dataset.github.io

Learn more at:
🌐 arXiv: arxiv.org/pdf/2403.12945
πŸ“„ Blog: medium.com/p/4ddfc45361d3

🧡 1/n

23.04.2025 23:50 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
3D Vision Language Models (VLMs) for Robotic Manipulation: Opportunities and Challenges

πŸ”— Learn more & submit your work: robo-3dvlms.github.io

Join us in shaping the future of robotics, 3D vision, and language models! πŸ€–πŸ“š #CVPR2025

10.02.2025 17:00 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

🎀 We’re honored to host top experts in the field:
⭐ Angel Chang (Simon Fraser University)
⭐ Chelsea Finn (Stanford University)
⭐ Hao Su (UC San Diego)
⭐ Katerina Fragkiadaki (CMU)
⭐ Yunzhu Li (Columbia University)
⭐ Ranjay Krishna (University of Washington)

5/N

10.02.2025 17:00 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

🎯 Key Topics:
βœ… 3D Vision-Language Policy Learning
βœ… Pretraining for 3D VLMs
βœ… 3D Representations for Policy Learning
βœ… 3D Benchmarks & Simulation Frameworks
βœ… 3D Vision-Language Action Models
βœ… 3D Instruction-Tuning & Pretraining Datasets for Robotics

4/N

10.02.2025 17:00 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

πŸ“’ Call for Papers: Submission opens today!
πŸ“… Deadline: April 15, 2024 (11:59 PM PST)
πŸ“œ Format: Up to 4 pages (excluding references/appendices), CVPR template, anonymized submissions
πŸ† Accepted papers: Poster presentations, with selected papers receiving spotlight talks!

3/N

10.02.2025 17:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

πŸ” Explore how 3D perception and language models can enhance robotic manipulation in the era of foundation models. Engage with leading experts and be part of this new frontier in 3D-based VLMs/VLAs for robotics.

2/N

10.02.2025 17:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸš€Exciting News! Join us at the inaugural #CVPR2025 Workshop on 3D Vision Language Models (VLMs) for Robotics Manipulation on June 11, 2025, in Nashville, TN! 🦾

robo-3dvlms.github.io

1/N

@cvprconference.bsky.social

10.02.2025 17:00 β€” πŸ‘ 13    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1

Welcome onboard!

18.01.2025 02:08 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Done, welcome aboard!

17.01.2025 18:55 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Welcome on board!

20.12.2024 09:16 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Congrats on the release, demos look cool and it's open source πŸ‘

04.12.2024 08:51 β€” πŸ‘ 9    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0

In a world full of AI, authenticity will be the most valuable thing in the universe.

30.11.2024 02:39 β€” πŸ‘ 75    πŸ” 6    πŸ’¬ 7    πŸ“Œ 1

Hello πŸ‘‹

28.11.2024 14:53 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
1 Andrew Davison, Imperial College London - BMVA Symposium: Robotics Foundation & World Models
YouTube video by BMVA: British Machine Vision Association 1 Andrew Davison, Imperial College London - BMVA Symposium: Robotics Foundation & World Models

For my first post on Bluesky, this recent talk I did at the recent BMVA one day meeting on World Models is a good summary of my work on Computer Vision, Robotics and SLAM, and my thoughts on a bigger picture of #SpatialAI.
youtu.be/NLnPG95vNhQ?...

28.11.2024 14:22 β€” πŸ‘ 90    πŸ” 23    πŸ’¬ 5    πŸ“Œ 2

Just included :) Welcome @ajdavison.bsky.social!

go.bsky.app/HcQYMj

28.11.2024 14:51 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

Check out this BEAUTIFUL interactive blog about cameras and lenses

ciechanow.ski/cameras-and-...

27.11.2024 02:54 β€” πŸ‘ 75    πŸ” 16    πŸ’¬ 6    πŸ“Œ 1

Hello πŸ‘‹ Would love to join!

26.11.2024 02:39 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
GitHub - SLAM-Handbook-contributors/slam-handbook-public-release: Release repo for our SLAM Handbook Release repo for our SLAM Handbook. Contribute to SLAM-Handbook-contributors/slam-handbook-public-release development by creating an account on GitHub.

We are in the process of editing a SLAM handbook, to be published by Cambridge University Press, with many *stellar* contributors. Part 1 is available as an online draft for public comments. Help us find bugs/problems!
Link to release repo is here: lnkd.in/gZhTkaxb

16.11.2024 15:45 β€” πŸ‘ 84    πŸ” 26    πŸ’¬ 5    πŸ“Œ 2

Added you!

25.11.2024 10:12 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Welcome onbaord!

25.11.2024 09:59 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@zubair-irshad is following 20 prominent accounts