๐ ๐๐ ๐ ๐๐น๐ผ๐ด: Images and text are usually aligned using millions of imageโcaption pairs. But could they still be matched if they were never seen together?
In โItโs a (Blind) Match!โ, MCML Members explore this question.
mcml.ai/news/2026-01...
16.01.2026 09:24 โ
๐ 2
๐ 1
๐ฌ 0
๐ 0
๐ฆ We present โFeed-Forward SceneDINO for Unsupervised Semantic Scene Completionโ. #ICCV2025
๐: visinf.github.io/scenedino/
๐: arxiv.org/abs/2507.06230
๐ค: huggingface.co/spaces/jev-a...
@jev-aleks.bsky.social @fwimbauer.bsky.social @olvrhhn.bsky.social @stefanroth.bsky.social @dcremers.bsky.social
09.07.2025 13:17 โ
๐ 24
๐ 10
๐ฌ 1
๐ 1
The code for our #CVPR2025 paper, PRaDA: Projective Radial Distortion Averaging, is now out!
Turns out distortion calibration from multiview 2D correspondences can be fully decoupled from 3D reconstruction, greatly simplifying the problem
arxiv.org/abs/2504.16499
github.com/DaniilSinits...
09.07.2025 13:54 โ
๐ 12
๐ 5
๐ฌ 1
๐ 0
4/4
๐๐ญโ๐ฌ ๐ (๐๐ฅ๐ข๐ง๐) ๐๐๐ญ๐๐ก! ๐๐จ๐ฐ๐๐ซ๐๐ฌ ๐๐ข๐ฌ๐ข๐จ๐งโ๐๐๐ง๐ ๐ฎ๐๐ ๐ ๐๐จ๐ซ๐ซ๐๐ฌ๐ฉ๐จ๐ง๐๐๐ง๐๐ ๐ฐ๐ข๐ญ๐ก๐จ๐ฎ๐ญ ๐๐๐ซ๐๐ฅ๐ฅ๐๐ฅ ๐๐๐ญ๐
@schnaus.bsky.social @neekans.bsky.social @dcremers.bsky.social
๐ย Paper: arxiv.org/pdf/2503.241...
๐ย Project page: dominik-schnaus.github.io/itsamatch/
๐ปย Code: github.com/dominik-schn...
03.06.2025 09:27 โ
๐ 0
๐ 0
๐ฌ 0
๐ 0
3/4
โ
ย This enables unsupervised matching โ finding vision-language correspondences without any paired data.
๐คฏย As a proof of concept, we build an unsupervised image classifier that assigns labels without seeing a single image-text pair.
03.06.2025 09:27 โ
๐ 0
๐ 0
๐ฌ 1
๐ 0
2/4
๐ย As models and datasets scale, distances in vision and language embeddings become similar (Platonic Representation Hypothesis).
๐กย We cast the matching task as a Quadratic Assignment Problem (QAP) and propose a new heuristic solver.
03.06.2025 09:27 โ
๐ 0
๐ 0
๐ฌ 1
๐ 0
Can we match vision and language representations without any supervision or paired data?
Surprisingly, yes!ย
Our #CVPR2025 paper with @neekans.bsky.social and @dcremers.bsky.social shows that the pairwise distances in both modalities are often enough to find correspondences.
โฌ๏ธ 1/4
03.06.2025 09:27 โ
๐ 27
๐ 12
๐ฌ 1
๐ 0
Can you train a model for pose estimation directly on casual videos without supervision?
Turns out you can!
In our #CVPR2025 paper AnyCam, we directly train on YouTube videos and achieve SOTA results by using an uncertainty-based flow loss and monocular priors!
โฌ๏ธ
13.05.2025 08:11 โ
๐ 25
๐ 10
๐ฌ 1
๐ 1
Check out our latest recent #CVPR2025 paper AnyCam, a fast method for pose estimation in casual videos!
1๏ธโฃ Can be directly trained on casual videos without the need for 3D annotation.
2๏ธโฃ Based around a feed-forward transformer and light-weight refinement.
Code and more info: โฉ fwmb.github.io/anycam/
23.04.2025 15:52 โ
๐ 23
๐ 6
๐ฌ 1
๐ 0
We are thrilled to have 12 papers accepted to #CVPR2025. Thanks to all our students and collaborators for this great achievement!
For more details check out cvg.cit.tum.de
13.03.2025 13:11 โ
๐ 36
๐ 12
๐ฌ 1
๐ 2
Indeed - everyone had a blast - thank you all for the great talks, discussions and Ski/snowboarding!
16.01.2025 17:56 โ
๐ 45
๐ 4
๐ฌ 1
๐ 3