Dominik Schnaus's Avatar

Dominik Schnaus

@schnaus.bsky.social

PhD student @ TUM with Daniel Cremers

115 Followers  |  411 Following  |  4 Posts  |  Joined: 16.01.2025
Posts Following

Posts by Dominik Schnaus (@schnaus.bsky.social)

Post image

๐— ๐—–๐— ๐—Ÿ ๐—•๐—น๐—ผ๐—ด: Images and text are usually aligned using millions of imageโ€“caption pairs. But could they still be matched if they were never seen together?

In โ€œItโ€™s a (Blind) Match!โ€, MCML Members explore this question.
mcml.ai/news/2026-01...

16.01.2026 09:24 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Video thumbnail

๐Ÿฆ– We present โ€œFeed-Forward SceneDINO for Unsupervised Semantic Scene Completionโ€. #ICCV2025
๐ŸŒ: visinf.github.io/scenedino/
๐Ÿ“ƒ: arxiv.org/abs/2507.06230
๐Ÿค—: huggingface.co/spaces/jev-a...
@jev-aleks.bsky.social @fwimbauer.bsky.social @olvrhhn.bsky.social @stefanroth.bsky.social @dcremers.bsky.social

09.07.2025 13:17 โ€” ๐Ÿ‘ 24    ๐Ÿ” 10    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Post image

The code for our #CVPR2025 paper, PRaDA: Projective Radial Distortion Averaging, is now out!

Turns out distortion calibration from multiview 2D correspondences can be fully decoupled from 3D reconstruction, greatly simplifying the problem

arxiv.org/abs/2504.16499
github.com/DaniilSinits...

09.07.2025 13:54 โ€” ๐Ÿ‘ 12    ๐Ÿ” 5    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

4/4

๐ˆ๐ญโ€™๐ฌ ๐š (๐๐ฅ๐ข๐ง๐) ๐Œ๐š๐ญ๐œ๐ก! ๐“๐จ๐ฐ๐š๐ซ๐๐ฌ ๐•๐ข๐ฌ๐ข๐จ๐งโ€“๐‹๐š๐ง๐ ๐ฎ๐š๐ ๐ž ๐‚๐จ๐ซ๐ซ๐ž๐ฌ๐ฉ๐จ๐ง๐๐ž๐ง๐œ๐ž ๐ฐ๐ข๐ญ๐ก๐จ๐ฎ๐ญ ๐๐š๐ซ๐š๐ฅ๐ฅ๐ž๐ฅ ๐ƒ๐š๐ญ๐š

@schnaus.bsky.social @neekans.bsky.social @dcremers.bsky.social

๐Ÿ“ย Paper: arxiv.org/pdf/2503.241...
๐ŸŒย Project page: dominik-schnaus.github.io/itsamatch/
๐Ÿ’ปย Code: github.com/dominik-schn...

03.06.2025 09:27 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

3/4

โœ…ย This enables unsupervised matching โ€” finding vision-language correspondences without any paired data.

๐Ÿคฏย As a proof of concept, we build an unsupervised image classifier that assigns labels without seeing a single image-text pair.

03.06.2025 09:27 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

2/4

๐Ÿ”ย As models and datasets scale, distances in vision and language embeddings become similar (Platonic Representation Hypothesis).

๐Ÿ’กย We cast the matching task as a Quadratic Assignment Problem (QAP) and propose a new heuristic solver.

03.06.2025 09:27 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Video thumbnail

Can we match vision and language representations without any supervision or paired data?

Surprisingly, yes!ย 

Our #CVPR2025 paper with @neekans.bsky.social and @dcremers.bsky.social shows that the pairwise distances in both modalities are often enough to find correspondences.

โฌ‡๏ธ 1/4

03.06.2025 09:27 โ€” ๐Ÿ‘ 27    ๐Ÿ” 12    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Video thumbnail

Can you train a model for pose estimation directly on casual videos without supervision?

Turns out you can!

In our #CVPR2025 paper AnyCam, we directly train on YouTube videos and achieve SOTA results by using an uncertainty-based flow loss and monocular priors!

โฌ‡๏ธ

13.05.2025 08:11 โ€” ๐Ÿ‘ 25    ๐Ÿ” 10    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Video thumbnail

Check out our latest recent #CVPR2025 paper AnyCam, a fast method for pose estimation in casual videos!

1๏ธโƒฃ Can be directly trained on casual videos without the need for 3D annotation.
2๏ธโƒฃ Based around a feed-forward transformer and light-weight refinement.

Code and more info: โฉ fwmb.github.io/anycam/

23.04.2025 15:52 โ€” ๐Ÿ‘ 23    ๐Ÿ” 6    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

We are thrilled to have 12 papers accepted to #CVPR2025. Thanks to all our students and collaborators for this great achievement!
For more details check out cvg.cit.tum.de

13.03.2025 13:11 โ€” ๐Ÿ‘ 36    ๐Ÿ” 12    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 2

Indeed - everyone had a blast - thank you all for the great talks, discussions and Ski/snowboarding!

16.01.2025 17:56 โ€” ๐Ÿ‘ 45    ๐Ÿ” 4    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 3