To bridge this 2D-to-3D gap, we propose "Render-Localize-Lift":
- Render: 3D human/object meshes into multiview 2D images.
- Localize: A Multiview Localization (MV-Loc) model, guided by VLM tokens, predicts 2D contact masks.
- Lift: 2D contact masks to 3D.
(5/10)
15.06.2025 12:23 — 👍 1 🔁 1 💬 1 📌 0
How can we infer 3D contact with limited 3D data? InteractVLM exploits foundational models—a VLM & localization model fine tuned to reason about contact. Given an image & prompt, the VLM outputs tokens for localization. But these models work in 2D, while contact is 3D. (4/10)
15.06.2025 12:23 — 👍 1 🔁 1 💬 1 📌 0
Why does 3D human-object reconstruction fail in the wild or get limited to a few object classes? A key missing piece is accurate 3D contact. InteractVLM (#CVPR2025) uses foundational models to infer contact on humans & objects, improving reconstruction from a single image. (1/10)
15.06.2025 12:23 — 👍 5 🔁 2 💬 1 📌 0
📢 Short deadline extension (24/2) -- One more week left to submit your application!
16.02.2025 22:42 — 👍 5 🔁 2 💬 0 📌 0
Passionate about Human-centric Computer Vision? 📸🤖
We’re looking for motivated PhD candidates to join our dynamic team! 🚀
26.01.2025 17:54 — 👍 2 🔁 0 💬 0 📌 0
Humanoids Daily brings you the latest developments in robotics, with a special focus on humanoid robots and intelligent machines.
https://humanoidsdaily.com
Newsletter: https://newsletter.humanoidsdaily.com/subscribe
PhD student @ Uni Tübingen and IMPRS-IS, working on 3D vision
patriciagschossmann.github.io
Trending papers in Vision and Graphics on www.scholar-inbox.com.
Scholar Inbox is a personal paper recommender which keeps you up-to-date with the most relevant progress in your field. Follow us and never miss a beat again!
🇪🇺 ELLIS PhD @ University of Amsterdam
🤖 Vision-Language, Video Learning, SSL
🏡 www.mdorkenwald.com
Professor at University of Technology Nuremberg
Head of Fundamental AI Lab
Professor of Computer Vision and AI at TU Munich, Director of the Munich Center for Machine Learning mcml.ai and of ELLIS Munich ellismunich.ai
cvg.cit.tum.de
Research Scientist @Toyota Research Institute | Prev. PhD in AI, ML and CV @GeorgiaTech | Researching 3D Perception, Generative AI for Robotics and Multimodal AI
W: https://zubairirshad.com
PhD student @ Linköping University
I like 3D vision and training neural networks.
Code: https://github.com/parskatt
Weights: https://github.com/Parskatt/storage/releases/tag/roma
PhD @ ETH Zurich | Computer Vision | Monocular Depth | 3D Stuff | Ex intern at Meta
nandometzger.github.io
PhD student at MPI Informatics, ex @bayesgroup. 3D Scene Representations, Computer Vision, Graphics, VR/AR, Bayesian ML, Music | 🇺🇦
https://vrudnev.me/
PhD student. Working on Computer Vision for 3D geometry and semantics.
https://tberriel.github.io
PhD candidate @Jena_DH & @TU_Muenchen working on 3D Reconstruction from Historic Imagery. @TU_Muenchen graduate.
📍 Munich
PhD @RWTH.bsky.social | 3D Computer Vision
🔗 https://ka.codes
Chief Scientist @ EveryPoint.io
3D Computer Vision Researcher (PhD) and Engineer
Research Scientist at Valeo.ai.
https://anhquancao.github.io/
3D Vision | Visiting PhD @ Stanford | ETH AI Center Fellow
elisabettafedele.github.io
Helping robots see @ http://intrinsic.ai (Alphabet company). Here talking about 3D Computer Vision and everything around it. Views are my own.
📖 PhD Student in Robotics at ETH Zürich
🔍 Computer Vision, 3D Reconstruction, Structure-from-Motion, SLAM, Geometry
3D Machine Learning @ NavVis.
Prev. PhD Candidate @ Visual Computing Group, TUM (Prof. Nießner).
https://manuel-dahnert.com | Scene Understanding & Reconstruction - Generative Models