🚀 New paper: ConeGS Error-Guided Densification Using Pixel Cones. We improve 3D Gaussian Splatting by placing Gaussians where they matter most: ConeGS adds primitives along pixel-view cones guided by image error, boosting quality with fewer Gaussians. baranowskibrt.github.io/conegs/
ConeGS: Error-Guided Densification Using Pixel Cones for Improved Reconstruction with Fewer Primitives
Bartłomiej Baranowski, @s-esposito.bsky.social, @pgschossmann.bsky.social, @apchen.bsky.social, @andreasgeiger.bsky.social
arxiv.org/abs/2511.06810
Here's a recording of my talk on how perspective works! If you're interested in learning about how picture perspective works in human vision, this is the video to watch. #visionscience
www.youtube.com/watch?v=eamc...
𝟯𝗗-𝗟𝗔𝗧𝗧𝗘: 𝗟𝗮𝘁𝗲𝗻𝘁 𝗦𝗽𝗮𝗰𝗲 𝟯𝗗 𝗘𝗱𝗶𝘁𝗶𝗻𝗴 𝗳𝗿𝗼𝗺 𝗧𝗲𝘅𝘁𝘂𝗮𝗹 𝗜𝗻𝘀𝘁𝗿𝘂𝗰𝘁𝗶𝗼𝗻𝘀
Maria Parelli, Michael Oechsle, Michael Niemeyer ... Andreas Geiger
arxiv.org/abs/2509.00269
Trending on www.scholar-inbox.com
🚀 Introducing our new paper, MDPO: Overcoming the Training-Inference Divide of Masked Diffusion Language Models.
📄 Paper: www.scholar-inbox.com/papers/He202...
arxiv.org/pdf/2508.13148
💻 Code: github.com/autonomousvi...
🌐 Project Page: cli212.github.io/MDPO/
Today, we moved into our new building on the CyberValley campus. Everyone is super excited. PhD students went right back to work. But wait, is there something missing? ;)
Today we had our AVG Deep Cave Expedition Day! Exploring the challenges of the (unlit, narrow, crawling-only) Hofener Höhle near Grabenstetten ..
SpatialTrackerV2: 3D Point Tracking Made Easy
Yuxi Xiao, @jianyuanwang.bsky.social, Nan Xue, @nikkar.bsky.social, Yuri Makarov, Bingyi Kang, Xing Zhu, Hujun Bao, Yujun Shen, Xiaowei Zhou
tl;dr: DAv2+VGGT->depths & poses->iterative cross-attention-based optimizer
arxiv.org/abs/2507.12462
In case you find it as relaxing as we do: Here is a 2h+ video of our autonomous RL driving agent CaRL in action! @danieldauner.bsky.social @bernhard-jaeger.bsky.social @kashyap7x.bsky.social
youtube.com/watch?v=_god...
At #ICML, you can just use scholar inbox to help you find your way through the poster sessions. It just sorts the papers according to your preferences and it really works.
www.scholar-inbox.com/conference/i... ICML 2025 Planner
I am in Vancouver at ICML, and tomorrow I will present our newest paper "Partially Observable Reinforcement Learning with Memory Traces". We argue that eligibility traces are more effective than sliding windows as a memory mechanism for RL in POMDPs. 🧵
We have released the code for our work, CaRL: Learning Scalable Planning Policies with Simple Rewards.
The repository contains the first public code base for training RL agents with the CARLA leaderboard 2.0 and nuPlan.
github.com/autonomousvi...
Scaling 4D Representations
Self-supervised learning from video does scale! In our latest work, we scaled masked auto-encoding models to 22B params, boosting performance on pose estimation, tracking & more.
Paper: arxiv.org/abs/2412.15212
Code & models: github.com/google-deepmind/representations4d
𝗚𝗲𝗼𝗺𝗲𝘁𝗿𝘆-𝗮𝘄𝗮𝗿𝗲 𝟰𝗗 𝗩𝗶𝗱𝗲𝗼 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻 𝗳𝗼𝗿 𝗥𝗼𝗯𝗼𝘁 𝗠𝗮𝗻𝗶𝗽𝘂𝗹𝗮𝘁𝗶𝗼𝗻
Zeyi Liu, Shuang Li, Eric Cousineau ... Shuran Song
arxiv.org/abs/2507.01099
Trending on www.scholar-inbox.com
MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details
Ruicheng Wang, Sicheng Xu, Yue Dong, Yu Deng, Jianfeng Xiang, Zelong Lv, Guangzhong Sun, Xin Tong, Jiaolong Yang
arxiv.org/abs/2507.02546
I am very proud of my group! These are the nationalities of my current and past team members. Diversity is key.
🇩🇪 🇬🇷 🇮🇹 🇮🇳 🇷🇺 🇺🇦 🇨🇳 🇷🇸 🇯🇵 🇧🇪 🇺🇸 🇰🇷 🇹🇷
That’s a wrap on #CVPR2025 in Nashville! From online convos to in-person vibes, one thing’s clear: this community is STRONG 💪 Thanks for following along!
Until next time. @deblinaml.bsky.social, @jbhaurum.bsky.social, @csprofkgd.bsky.social signing off.
LLM product placement and search optimization is here and it's as dystopian as you expected.
Hey #CVPR2025! Curious about this work? I'll be presenting it this morning! Poster 31, from 10:30 to 12:30 🤠
@cvprconference.bsky.social
Check out the ScanNet++ workshop @CVPR on June 12 in 211 from 8:50am!
Exciting keynotes on state-of-the-art NVS & 3D understanding from Andrea Vedaldi, Cordelia Schmid, Gordon Wetzstein, Katja Schwarz, Qianqian Wang, and leading methods on the benchmark!
kaldir.vc.in.tum.de/scannetpp/cv...
Join us for the 4D Vision Workshop #CVPR on June 11 starting at 9:20am!
We'll have an incredible lineup of speakers discussing the frontier of 3D computer vision techniques for dynamic world modeling across spatial AI, robotics, astrophysics, and more.
4dvisionworkshop.github.io
This Wednesday (1-6PM, Room 106A) at CVPR @cvprconference.bsky.social we have a great lineup of keynote speakers, posters, and spotlights on neural fields and beyond: neural-bcc.github.io
Have a question you want answered by a panel of experts in the field? Send it to us via: tinyurl.com/bdddf36f
Excited to present our #CVPR2025 paper DepthSplat next week!
DepthSplat is a feed-forward model that achieves high-quality Gaussian reconstruction and view synthesis in just 0.6 seconds.
Looking forward to great conversations at the conference!
🚗 Pseudo-simulation combines the efficiency of open-loop and robustness of closed-loop evaluation. It uses real data + 3D Gaussian Splatting synthetic views to assess error recovery, achieving strong correlation with closed-loop simulations while requiring 6x less compute. arxiv.org/abs/2506.04218
🚀🚀🚀Announcing our $13M funding round to build the next generation of AI: 𝐒𝐩𝐚𝐭𝐢𝐚𝐥 𝐅𝐨𝐮𝐧𝐝𝐚𝐭𝐢𝐨𝐧 𝐌𝐨𝐝𝐞𝐥𝐬 that can generate entire 3D environments anchored in space & time. 🚀🚀🚀
Interested? Join our world-class team:
🌍 spaitial.ai
youtu.be/FiGX82RUz8U
"ILM "artists" are now being paid to make shimpanzini bananini and bombardiro crocodilo"
Can you train a model for pose estimation directly on casual videos without supervision?
Turns out you can!
In our #CVPR2025 paper AnyCam, we directly train on YouTube videos and achieve SOTA results by using an uncertainty-based flow loss and monocular priors!
⬇️
New Paper: Continuous Thought Machines
pub.sakana.ai/ctm/
Neurons in brains use timing and synchronization in the way that they compute, but this is largely ignored in modern neural nets. We believe neural timing is key for the flexibility and adaptability of biological intelligence.
Thread ↓
📣 Excited to share our #CVPR2025 Spotlight paper and my internship project @wayve: SimLingo.
A Vision-Language-Action (VLA) model that achieves state-of-the-art driving performance with language capabilities.
Code: github.com/RenzKa/simli...
Paper: arxiv.org/abs/2503.09594