Jianyuan Wang jianyuanwang - Bluesky Statics

I am trying to. Probably we could hear about this around next submission ddl 😂

18.03.2025 04:51 — 👍 1 🔁 0 💬 1 📌 0

It seems so (with a short glance only). The techniques used by Fast3R can also be applied to VGGT

18.03.2025 04:49 — 👍 1 🔁 0 💬 0 📌 0

Haha, this probably serves as an indirect validation of NVIDIA’s stock value.

18.03.2025 04:48 — 👍 2 🔁 0 💬 0 📌 0

Currently, this training approach is not very stable, but I believe that’s likely because I haven’t yet found the correct training method. I hope this can achieve better results in the future, which could then avoid an explicit modelling of point map.

17.03.2025 12:40 — 👍 2 🔁 0 💬 1 📌 0

Finally, great work together with Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny!

@oxford-vgg.bsky.social

17.03.2025 02:14 — 👍 3 🔁 0 💬 0 📌 0

Interesting observation: VGGT’s camera & depth predictions are highly accurate and consistent. Unprojecting our predicted depth with predicted camera parameters yields even more precise point clouds than directly predicted point maps! Try this yourself using the Hugging Face demo 🤗

17.03.2025 02:12 — 👍 5 🔁 1 💬 3 📌 1

Compared to concurrent CVPR'25 Transformer-based 3D reconstruction methods, VGGT achieves significantly higher accuracy, with speed similar to the fastest variant Fast3R.

17.03.2025 02:12 — 👍 1 🔁 0 💬 1 📌 0

Bonus insight: Using pretrained VGGT significantly enhances downstream tasks like:
🚀 Non-rigid point tracking
🚀 Feed-forward novel view synthesis

17.03.2025 02:12 — 👍 1 🔁 0 💬 1 📌 0

A strong advantage of our method is the ability to predict 3D attributes without any expensive optimization. For example, 🔸 VGGT can easily process ~200 images in ~10s on a single 40GB A100 GPU 🔸 50x faster than optimization-based methods, using far less memory.

17.03.2025 02:11 — 👍 1 🔁 0 💬 1 📌 0

Try our demo live on Hugging Face Spaces!

🤗: huggingface.co/spaces/faceb...

(See demo illustration below) 👇

17.03.2025 02:10 — 👍 1 🔁 0 💬 1 📌 0

No expensive optimization needed, yet delivers SOTA results for:

✅ Camera Pose Estimation
✅ Multi-view Depth Estimation
✅ Dense Point Cloud Reconstruction
✅ Point Tracking

17.03.2025 02:08 — 👍 1 🔁 0 💬 1 📌 0

Introducing VGGT (CVPR'25), a feedforward Transformer that directly infers all key 3D attributes from one, a few, or hundreds of images, in seconds!

Project Page: vgg-t.github.io
Code & Weights: github.com/facebookrese...

17.03.2025 02:08 — 👍 44 🔁 14 💬 3 📌 1

Posts by Jianyuan Wang (@jianyuanwang.bsky.social)