DΓ‘vid Komorowicz's Avatar

DΓ‘vid Komorowicz

@dawars.me.bsky.social

PhD candidate @Jena_DH & @TU_Muenchen working on 3D Reconstruction from Historic Imagery. @TU_Muenchen graduate. πŸ“ Munich

1,050 Followers  |  685 Following  |  20 Posts  |  Joined: 14.11.2024  |  1.9589

Latest posts by dawars.me on Bluesky

Post image

How can one reconstruct the complete 3D interior of a wood block using only photos of its surfaces? πŸͺ΅
At SIGGRAPH'25 (Thursday!), Maria Larsson will present *Mokume*: a dataset of 190 diverse wood samples and a pipeline that solves this inverse texturing challenge. πŸ§΅πŸ‘‡

08.08.2025 11:53 β€” πŸ‘ 66    πŸ” 14    πŸ’¬ 2    πŸ“Œ 1
Post image Post image Post image Post image

VLM-Guided Visual Place Recognition for Planet-Scale Geo-Localization

Sania Waheed, Na Min An, Michael Milford , Sarvapali D. Ramchurn, Shoaib Ehsan

tl;dr: in title
arxiv.org/abs/2507.17455

05.08.2025 08:11 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image Post image Post image Post image

Unposed 3DGS Reconstruction with Probabilistic Procrustes Mapping

Chong Cheng, Zijian Wang, Sicheng Yu, Yu Hu, Nanjie Yao, Hao Wang

tl;dr: submap alignment->point cloud registration->robust Umeyama algorithm->global point cloud and camera trajectory

arxiv.org/abs/2507.18541

25.07.2025 12:43 β€” πŸ‘ 5    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Post image Post image

New 3D foundation model dropped.

Note: Seems they might have messed up their image matching metrics (seems like acc rather than auc), but should be at least as good as mast3r.

24.07.2025 22:50 β€” πŸ‘ 11    πŸ” 2    πŸ’¬ 2    πŸ“Œ 0

Turns out that by default huggingface models run on the CPU...

20.07.2025 12:10 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Awesome initiative πŸŽ‰
This leaves me wondering though: how come authors attending #EurIPS still have to register for the main #NeurIPS (in the Americas) for their paper to be considered accepted?
You stopped so short of actually allowing ML researchers to fly less!

17.07.2025 14:12 β€” πŸ‘ 32    πŸ” 5    πŸ’¬ 5    πŸ“Œ 2
A meme where Anakin and Padme discuss the logics of allowing a NeurIPS event in Europe while forcing authors to also present in the US for publication

A meme where Anakin and Padme discuss the logics of allowing a NeurIPS event in Europe while forcing authors to also present in the US for publication

Sofar it doesn’t look good: neurips.cc/FAQ/AuthorRe...

β€œAt least one author of each accepted paper must register for the main conference. A β€˜Virtual Only Pass’ is not sufficient.”

17.07.2025 07:32 β€” πŸ‘ 7    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0

WeTransfer just changed their TOS giving themselves permission to train AI on any content you transfer and produce derivative works based on content you transfer that they are allowed to monetize and you are not allowed payment for.

Stop using WeTransfer.

14.07.2025 23:05 β€” πŸ‘ 7659    πŸ” 5358    πŸ’¬ 132    πŸ“Œ 475
Post image

The code for our #CVPR2025 paper, PRaDA: Projective Radial Distortion Averaging, is now out!

Turns out distortion calibration from multiview 2D correspondences can be fully decoupled from 3D reconstruction, greatly simplifying the problem

arxiv.org/abs/2504.16499
github.com/DaniilSinits...

09.07.2025 13:54 β€” πŸ‘ 12    πŸ” 5    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

πŸ¦– We present β€œFeed-Forward SceneDINO for Unsupervised Semantic Scene Completion”. #ICCV2025
🌍: visinf.github.io/scenedino/
πŸ“ƒ: arxiv.org/abs/2507.06230
πŸ€—: huggingface.co/spaces/jev-a...
@jev-aleks.bsky.social @fwimbauer.bsky.social @olvrhhn.bsky.social @stefanroth.bsky.social @dcremers.bsky.social

09.07.2025 13:17 β€” πŸ‘ 24    πŸ” 10    πŸ’¬ 1    πŸ“Œ 1
Post image Post image Post image

We just released COLMAP v3.12, which adds long-awaited, end-to-end support for multi-camera rigs and 360Β° panoramas πŸ‘€ COLMAP just got better at handling your robotics, AR/VR, or 360 data - try it yourself and let us know! github.com/colmap/colma... Kudos to Johannes & team for this great work πŸš€

01.07.2025 16:33 β€” πŸ‘ 22    πŸ” 6    πŸ’¬ 1    πŸ“Œ 0
Post image Post image Post image

Dense Match Summarization for Faster Two-view Estimation

Jonathan Astermark, Anders Heyden, Viktor Larsson
tl;dr: use clustering to reduce RANSAC time when using dense methods like RoMa.
Kudos for eval on WxBS.
P.S. now the same, but for BA?

arxiv.org/abs/2506.028...

24.06.2025 12:22 β€” πŸ‘ 12    πŸ” 2    πŸ’¬ 2    πŸ“Œ 1
Video thumbnail

πŸ€— I’m excited to share our recent work: TwoSquared: 4D Reconstruction from 2D Image Pairs.
πŸ”₯ Our method produces geometry, texture-consistent, and physically plausible 4D reconstructions
πŸ“° Check our project page sangluisme.github.io/TwoSquared/
❀️ @ricmarin.bsky.social @dcremers.bsky.social

23.04.2025 16:48 β€” πŸ‘ 9    πŸ” 3    πŸ’¬ 0    πŸ“Œ 1
Video thumbnail

Can we match vision and language representations without any supervision or paired data?

Surprisingly, yes!Β 

Our #CVPR2025 paper with @neekans.bsky.social and @dcremers.bsky.social shows that the pairwise distances in both modalities are often enough to find correspondences.

⬇️ 1/4

03.06.2025 09:27 β€” πŸ‘ 27    πŸ” 12    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

Can you train a model for pose estimation directly on casual videos without supervision?

Turns out you can!

In our #CVPR2025 paper AnyCam, we directly train on YouTube videos and achieve SOTA results by using an uncertainty-based flow loss and monocular priors!

⬇️

13.05.2025 08:11 β€” πŸ‘ 25    πŸ” 10    πŸ’¬ 1    πŸ“Œ 1
Video thumbnail

We also found that this allows the CTM to decide to spend less time thinking on simpler images, thus saving energy. When identifying a gorilla, for example, the CTM’s attention moves from eyes to nose to mouth in a pattern remarkably similar to human visual attention.

12.05.2025 02:42 β€” πŸ‘ 18    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Post image Post image Post image Post image

High Dynamic Range Novel View Synthesis with Single Exposure

Kaixuan Zhang, Hu Wang, Minxian Li, Mingwu Ren, Mao Ye, Xiatian Zhu

tl;dr:single exposure LDR images in training; LDR image->model+lift->HDR colors; HDR image->LDR image->additional supervision

arxiv.org/abs/2505.01212

05.05.2025 20:52 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

πŸ“’ New paper CVPRβ€―25!
Can meshes capture fuzzy geometry? Volumetricβ€―Surfaces uses adaptive textured shells to model hair, furβ€―without the splatting / volume overhead. It’s fast, looks great, and runs in real time even on budget phones.
πŸ”— autonomousvision.github.io/volsurfs/
πŸ“„ arxiv.org/pdf/2409.02482

05.05.2025 13:00 β€” πŸ‘ 28    πŸ” 20    πŸ’¬ 1    πŸ“Œ 1
Preview
ZurichCV #9 | ZurichAI Linus Scheibenreif (ETH Zurich) will talk about self-supervised learning for satellite imagery, and Pascal Chang (ETH Zurich/Disney Research) will present his recent work (topic to be announced).

8th ZurichCV is on the 29th of April. We have two fantastic speakers: Linus Scheibenreif (ETH Zurich) will talk about self-supervised learning for satellite imagery, and Pascal Chang (Disney Research) will give us a preview of his soon-to-be-published work.

RSVP: www.zurichai.ch/events/zuric...

20.04.2025 06:31 β€” πŸ‘ 17    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

No meal has ever sustained me for more than a few hours, a mere blip on the timeline of my life, 0.001% of my expected lifespan. So therefore I'll no longer be paying at restaurants

17.04.2025 11:53 β€” πŸ‘ 74    πŸ” 24    πŸ’¬ 2    πŸ“Œ 0
Post image

The Visual Recognition Group at CTU in Prague organizes the 49th Pattern Recognition and Computer Vision Colloquium with D. Karatzas, M. Masana, T. Tommasi, P. Mettes @pascalmettes.bsky.social , E. Brachmann @ericbrachmann.bsky.social and V. Stojnic @stojnicv.xyz

cmp.felk.cvut.cz/colloquium/#...

07.04.2025 13:57 β€” πŸ‘ 34    πŸ” 10    πŸ’¬ 2    πŸ“Œ 2
Video thumbnail

3D Gaussian splatting relies on depth-sorting of splats, which is costly and prone to artifacts (e.g., "popping"). In our latest work, "StochasticSplats", we replace sorted alpha blending by stochastic transparency, an unbiased Monte Carlo estimator from the real-time rendering literature.

07.04.2025 07:56 β€” πŸ‘ 52    πŸ” 13    πŸ’¬ 2    πŸ“Œ 2
Post image

π— π—–π— π—Ÿ 𝗕𝗹𝗼𝗴: Robots & self-driving cars rely on scene understanding, but AI models for understanding these scenes need costly human annotations. Daniel Cremers & his team introduce πŸ₯€πŸ₯€ CUPS: a scene-centric unsupervised panoptic segmentation approach to reduce this dependency. πŸ”— mcml.ai/news/2025-04...

03.04.2025 09:45 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 0    πŸ“Œ 1

I've been wondering the same

02.04.2025 05:31 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
NeurIPS participation in Europe We seek to understand if there is interest in being able to attend NeurIPS in Europe, i.e. without travelling to San Diego, US. In the following, assume that it is possible to present accepted papers ...

Would you present your next NeurIPS paper in Europe instead of traveling to San Diego (US) if this was an option? SΓΈren Hauberg (DTU) and I would love to hear the answer through this poll: (1/6)

30.03.2025 18:04 β€” πŸ‘ 280    πŸ” 159    πŸ’¬ 6    πŸ“Œ 14
Post image Post image Post image Post image

Can Video Diffusion Model Reconstruct 4D Geometry?

Jinjie Mai, Wenxuan Zhu, Haozhe Liu, Bing Li, Cheng Zheng, JΓΌrgen Schmidhuber, Bernard Ghanem

tl;dr: pretrained video VAE->finetune pointmap VAE; finetune Open-Sora within latent space of video&pointmap

arxiv.org/abs/2503.21082

28.03.2025 07:30 β€” πŸ‘ 7    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Post image Post image Post image

FG2: Fine-Grained Cross-View Localization by Fine-Grained Feature Matching

Zimin Xia, Alexandre Alahi

tl;dr: DINOv2-based transformer matcher for ortho-ground photo matching.
arxiv.org/abs/2503.18725

25.03.2025 11:34 β€” πŸ‘ 7    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image Post image Post image Post image

A Recipe for Generating 3D Worlds From a Single Image

Katja Schwarz, Denys Rozumnyi, Samuel Rota BulΓ², Lorenzo Porzi, Peter Kontschieder

tl;dr: progressive panorama generation for the win.
arxiv.org/abs/2503.16611

24.03.2025 10:18 β€” πŸ‘ 9    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image Post image Post image

OpenCity3D: What do Vision-Language Models know about Urban Environments?

Valentin Bieri, Marco Zamboni, Nicolas S. Blumer, Qingxuan Chen, Francis Engelmann
tl;dr: if you have aerial 3D reconstruction, use SigLIP to be happy.
arxiv.org/abs/2503.16776

24.03.2025 10:21 β€” πŸ‘ 18    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

*Please repost* @sjgreenwood.bsky.social and I just launched a new personalized feed (*please pin*) that we hope will become a "must use" for #academicsky. The feed shows posts about papers filtered by *your* follower network. It's become my default Bluesky experience bsky.app/profile/pape...

10.03.2025 18:14 β€” πŸ‘ 506    πŸ” 290    πŸ’¬ 23    πŸ“Œ 76

@dawars.me is following 20 prominent accounts