Stefano Esposito (@s-esposito) — Bluesky Profile

4 months ago

🚀 New paper: ConeGS Error-Guided Densification Using Pixel Cones. We improve 3D Gaussian Splatting by placing Gaussians where they matter most: ConeGS adds primitives along pixel-view cones guided by image error, boosting quality with fewer Gaussians. baranowskibrt.github.io/conegs/

14 3 0 1

4 months ago

ConeGS: Error-Guided Densification Using Pixel Cones for Improved Reconstruction with Fewer Primitives

Bartłomiej Baranowski, @s-esposito.bsky.social, @pgschossmann.bsky.social, @apchen.bsky.social, @andreasgeiger.bsky.social

arxiv.org/abs/2511.06810

5 1 1 0

5 months ago

Picture Perspective and Our Eyes YouTube video by Aaron Hertzmann

Here's a recording of my talk on how perspective works! If you're interested in learning about how picture perspective works in human vision, this is the video to watch. #visionscience
www.youtube.com/watch?v=eamc...

19 6 2 1

6 months ago

𝟯𝗗-𝗟𝗔𝗧𝗧𝗘: 𝗟𝗮𝘁𝗲𝗻𝘁 𝗦𝗽𝗮𝗰𝗲 𝟯𝗗 𝗘𝗱𝗶𝘁𝗶𝗻𝗴 𝗳𝗿𝗼𝗺 𝗧𝗲𝘅𝘁𝘂𝗮𝗹 𝗜𝗻𝘀𝘁𝗿𝘂𝗰𝘁𝗶𝗼𝗻𝘀
Maria Parelli, Michael Oechsle, Michael Niemeyer ... Andreas Geiger
arxiv.org/abs/2509.00269
Trending on www.scholar-inbox.com

3 3 0 0

6 months ago

ELLIS PhD Program: Call for Applications 2025 The ELLIS mission is to create a diverse European network that promotes research excellence and advances breakthroughs in AI, as well as a pan-European PhD program to educate the next generation of AI...

ellis.eu/news/ellis-p...

17 9 0 0

6 months ago

🚀 Introducing our new paper, MDPO: Overcoming the Training-Inference Divide of Masked Diffusion Language Models.

📄 Paper: www.scholar-inbox.com/papers/He202...
arxiv.org/pdf/2508.13148
💻 Code: github.com/autonomousvi...
🌐 Project Page: cli212.github.io/MDPO/

12 8 1 2

7 months ago

Today, we moved into our new building on the CyberValley campus. Everyone is super excited. PhD students went right back to work. But wait, is there something missing? ;)

35 1 1 0

7 months ago

Today we had our AVG Deep Cave Expedition Day! Exploring the challenges of the (unlit, narrow, crawling-only) Hofener Höhle near Grabenstetten ..

19 1 0 0

7 months ago

SpatialTrackerV2: 3D Point Tracking Made Easy

Yuxi Xiao, @jianyuanwang.bsky.social, Nan Xue, @nikkar.bsky.social, Yuri Makarov, Bingyi Kang, Xing Zhu, Hujun Bao, Yujun Shen, Xiaowei Zhou

tl;dr: DAv2+VGGT->depths & poses->iterative cross-attention-based optimizer

arxiv.org/abs/2507.12462

2 2 0 0

7 months ago

CaRL: Learning Scalable Planning Policies with Simple Rewards YouTube video by Daniel Dauner

In case you find it as relaxing as we do: Here is a 2h+ video of our autonomous RL driving agent CaRL in action! @danieldauner.bsky.social @bernhard-jaeger.bsky.social @kashyap7x.bsky.social
youtube.com/watch?v=_god...

21 5 0 1

7 months ago

At #ICML, you can just use scholar inbox to help you find your way through the poster sessions. It just sorts the papers according to your preferences and it really works.

www.scholar-inbox.com/conference/i... ICML 2025 Planner

50 5 3 3

7 months ago

I am in Vancouver at ICML, and tomorrow I will present our newest paper "Partially Observable Reinforcement Learning with Memory Traces". We argue that eligibility traces are more effective than sliding windows as a memory mechanism for RL in POMDPs. 🧵

59 12 3 3

7 months ago

GitHub - autonomousvision/CaRL: [ArXiv 2025] CaRL: Learning Scalable Planning Policies with Simple Rewards [ArXiv 2025] CaRL: Learning Scalable Planning Policies with Simple Rewards - autonomousvision/CaRL

We have released the code for our work, CaRL: Learning Scalable Planning Policies with Simple Rewards.

The repository contains the first public code base for training RL agents with the CARLA leaderboard 2.0 and nuPlan.

github.com/autonomousvi...

20 7 0 2

8 months ago

Scaling 4D Representations

Self-supervised learning from video does scale! In our latest work, we scaled masked auto-encoding models to 22B params, boosting performance on pose estimation, tracking & more.

Paper: arxiv.org/abs/2412.15212
Code & models: github.com/google-deepmind/representations4d

20 8 0 0

8 months ago

𝗚𝗲𝗼𝗺𝗲𝘁𝗿𝘆-𝗮𝘄𝗮𝗿𝗲 𝟰𝗗 𝗩𝗶𝗱𝗲𝗼 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻 𝗳𝗼𝗿 𝗥𝗼𝗯𝗼𝘁 𝗠𝗮𝗻𝗶𝗽𝘂𝗹𝗮𝘁𝗶𝗼𝗻
Zeyi Liu, Shuang Li, Eric Cousineau ... Shuran Song
arxiv.org/abs/2507.01099
Trending on www.scholar-inbox.com

4 4 0 0

8 months ago

MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details

Ruicheng Wang, Sicheng Xu, Yue Dong, Yu Deng, Jianfeng Xiang, Zelong Lv, Guangzhong Sun, Xin Tong, Jiaolong Yang

arxiv.org/abs/2507.02546

5 4 1 0

8 months ago

I am very proud of my group! These are the nationalities of my current and past team members. Diversity is key.
🇩🇪 🇬🇷 🇮🇹 🇮🇳 🇷🇺 🇺🇦 🇨🇳 🇷🇸 🇯🇵 🇧🇪 🇺🇸 🇰🇷 🇹🇷

49 1 1 0

8 months ago

That’s a wrap on #CVPR2025 in Nashville! From online convos to in-person vibes, one thing’s clear: this community is STRONG 💪 Thanks for following along!

Until next time. @deblinaml.bsky.social, @jbhaurum.bsky.social, @csprofkgd.bsky.social signing off.

12 3 0 0

8 months ago

LLM product placement and search optimization is here and it's as dystopian as you expected.

37 10 3 1

8 months ago

Hey #CVPR2025! Curious about this work? I'll be presenting it this morning! Poster 31, from 10:30 to 12:30 🤠

@cvprconference.bsky.social

8 1 0 0

9 months ago

Check out the ScanNet++ workshop @CVPR on June 12 in 211 from 8:50am!

Exciting keynotes on state-of-the-art NVS & 3D understanding from Andrea Vedaldi, Cordelia Schmid, Gordon Wetzstein, Katja Schwarz, Qianqian Wang, and leading methods on the benchmark!

kaldir.vc.in.tum.de/scannetpp/cv...

13 6 0 0

9 months ago

Join us for the 4D Vision Workshop #CVPR on June 11 starting at 9:20am!

We'll have an incredible lineup of speakers discussing the frontier of 3D computer vision techniques for dynamic world modeling across spatial AI, robotics, astrophysics, and more.

4dvisionworkshop.github.io

9 2 0 2

9 months ago

This Wednesday (1-6PM, Room 106A) at CVPR @cvprconference.bsky.social we have a great lineup of keynote speakers, posters, and spotlights on neural fields and beyond: neural-bcc.github.io

Have a question you want answered by a panel of experts in the field? Send it to us via: tinyurl.com/bdddf36f

11 2 0 1

9 months ago

Excited to present our #CVPR2025 paper DepthSplat next week!
DepthSplat is a feed-forward model that achieves high-quality Gaussian reconstruction and view synthesis in just 0.6 seconds.
Looking forward to great conversations at the conference!

27 7 3 0

9 months ago

🚗 Pseudo-simulation combines the efficiency of open-loop and robustness of closed-loop evaluation. It uses real data + 3D Gaussian Splatting synthetic views to assess error recovery, achieving strong correlation with closed-loop simulations while requiring 6x less compute. arxiv.org/abs/2506.04218

22 10 0 1

9 months ago

SpAItial AI: Building Spatial Foundation Models YouTube video by SpAItial AI

🚀🚀🚀Announcing our $13M funding round to build the next generation of AI: 𝐒𝐩𝐚𝐭𝐢𝐚𝐥 𝐅𝐨𝐮𝐧𝐝𝐚𝐭𝐢𝐨𝐧 𝐌𝐨𝐝𝐞𝐥𝐬 that can generate entire 3D environments anchored in space & time. 🚀🚀🚀

Interested? Join our world-class team:
🌍 spaitial.ai

youtu.be/FiGX82RUz8U

52 9 4 0

9 months ago

"ILM "artists" are now being paid to make shimpanzini bananini and bombardiro crocodilo"

1 0 0 0

10 months ago

Can you train a model for pose estimation directly on casual videos without supervision?

Turns out you can!

In our #CVPR2025 paper AnyCam, we directly train on YouTube videos and achieve SOTA results by using an uncertainty-based flow loss and monocular priors!

⬇️

25 10 1 1

10 months ago

New Paper: Continuous Thought Machines

pub.sakana.ai/ctm/

Neurons in brains use timing and synchronization in the way that they compute, but this is largely ignored in modern neural nets. We believe neural timing is key for the flexibility and adaptability of biological intelligence.

Thread ↓

129 29 5 1

10 months ago

📣 Excited to share our #CVPR2025 Spotlight paper and my internship project @wayve: SimLingo.
A Vision-Language-Action (VLA) model that achieves state-of-the-art driving performance with language capabilities.

Code: github.com/RenzKa/simli...
Paper: arxiv.org/abs/2503.09594

25 9 1 0