Dominik Schnaus schnaus - Bluesky Statics

𝗠𝗖𝗠𝗟 𝗕𝗹𝗼𝗴: Images and text are usually aligned using millions of image–caption pairs. But could they still be matched if they were never seen together?

In “It’s a (Blind) Match!”, MCML Members explore this question.
mcml.ai/news/2026-01...

16.01.2026 09:24 — 👍 2 🔁 1 💬 0 📌 0

🦖 We present “Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion”. #ICCV2025
🌍: visinf.github.io/scenedino/
📃: arxiv.org/abs/2507.06230
🤗: huggingface.co/spaces/jev-a...
@jev-aleks.bsky.social @fwimbauer.bsky.social @olvrhhn.bsky.social @stefanroth.bsky.social @dcremers.bsky.social

09.07.2025 13:17 — 👍 24 🔁 10 💬 1 📌 1

The code for our #CVPR2025 paper, PRaDA: Projective Radial Distortion Averaging, is now out!

Turns out distortion calibration from multiview 2D correspondences can be fully decoupled from 3D reconstruction, greatly simplifying the problem

arxiv.org/abs/2504.16499
github.com/DaniilSinits...

09.07.2025 13:54 — 👍 12 🔁 5 💬 1 📌 0

4/4

𝐈𝐭’𝐬 𝐚 (𝐁𝐥𝐢𝐧𝐝) 𝐌𝐚𝐭𝐜𝐡! 𝐓𝐨𝐰𝐚𝐫𝐝𝐬 𝐕𝐢𝐬𝐢𝐨𝐧–𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞 𝐂𝐨𝐫𝐫𝐞𝐬𝐩𝐨𝐧𝐝𝐞𝐧𝐜𝐞 𝐰𝐢𝐭𝐡𝐨𝐮𝐭 𝐏𝐚𝐫𝐚𝐥𝐥𝐞𝐥 𝐃𝐚𝐭𝐚

@schnaus.bsky.social @neekans.bsky.social @dcremers.bsky.social

📝 Paper: arxiv.org/pdf/2503.241...
🌐 Project page: dominik-schnaus.github.io/itsamatch/
💻 Code: github.com/dominik-schn...

03.06.2025 09:27 — 👍 0 🔁 0 💬 0 📌 0

3/4

✅ This enables unsupervised matching — finding vision-language correspondences without any paired data.

🤯 As a proof of concept, we build an unsupervised image classifier that assigns labels without seeing a single image-text pair.

03.06.2025 09:27 — 👍 0 🔁 0 💬 1 📌 0

2/4

🔍 As models and datasets scale, distances in vision and language embeddings become similar (Platonic Representation Hypothesis).

💡 We cast the matching task as a Quadratic Assignment Problem (QAP) and propose a new heuristic solver.

03.06.2025 09:27 — 👍 0 🔁 0 💬 1 📌 0

Can we match vision and language representations without any supervision or paired data?

Surprisingly, yes!

Our #CVPR2025 paper with @neekans.bsky.social and @dcremers.bsky.social shows that the pairwise distances in both modalities are often enough to find correspondences.

⬇️ 1/4

03.06.2025 09:27 — 👍 27 🔁 12 💬 1 📌 0

Can you train a model for pose estimation directly on casual videos without supervision?

Turns out you can!

In our #CVPR2025 paper AnyCam, we directly train on YouTube videos and achieve SOTA results by using an uncertainty-based flow loss and monocular priors!

⬇️

13.05.2025 08:11 — 👍 25 🔁 10 💬 1 📌 1

Check out our latest recent #CVPR2025 paper AnyCam, a fast method for pose estimation in casual videos!

1️⃣ Can be directly trained on casual videos without the need for 3D annotation.
2️⃣ Based around a feed-forward transformer and light-weight refinement.

Code and more info: ⏩ fwmb.github.io/anycam/

23.04.2025 15:52 — 👍 23 🔁 6 💬 1 📌 0

We are thrilled to have 12 papers accepted to #CVPR2025. Thanks to all our students and collaborators for this great achievement!
For more details check out cvg.cit.tum.de

13.03.2025 13:11 — 👍 36 🔁 12 💬 1 📌 2

Indeed - everyone had a blast - thank you all for the great talks, discussions and Ski/snowboarding!

16.01.2025 17:56 — 👍 45 🔁 4 💬 1 📌 3

Posts by Dominik Schnaus (@schnaus.bsky.social)