#TTT3R: 3D Reconstruction as Test-Time Training
TTT3R offers a simple state update rule to enhance length generalization for #CUT3R β No fine-tuning required!
πPage: rover-xingyu.github.io/TTT3R
We rebuilt @taylorswift13βs "22" live at the 2013 Billboard Music Awards - in 3D!
01.10.2025 06:35 β π 36 π 4 π¬ 0 π 4
A futuristic corridor inside a data center with rows of tall, blue-lit server racks on both sides. Text overlaid at the bottom reads "JUPITER Supercomputer: Europe enters the exascale supercomputing league." In the lower right corner, there is a logo of the European Commission.
π Europeβs first exascale supercomputer is here!
JUPITER, launched in Germany, is the EUβs most powerful system and fourth fastest worldwide.
100% powered by renewables, it has also ranked first in energy efficiency. It will boost AI, science, and climate research.
Read more - europa.eu/!vcWBqW
06.09.2025 07:17 β π 231 π 52 π¬ 11 π 6
There is a lot to hate about the politics of the silicon valley right, but they do actually want to build stuff, and I would prefer if the left didn't cede "we should be able to build stuff" to the right.
05.09.2025 15:19 β π 264 π 17 π¬ 287 π 186
People often use "smart" when they mean "wise" and I don't think it's too controversial to doubt the wisdom of some tech elites. Other than that I certainly agree with you.
05.09.2025 10:39 β π 4 π 0 π¬ 1 π 0
I can't* fathom why the top picture, and not the bottom picture, is the standard diagram for an autoencoder.
The whole idea of an autoencoder is that you complete a round trip and seek cycle consistencyβwhy lay out the network linearly?
29.08.2025 22:46 β π 160 π 25 π¬ 11 π 3
I love both.
23.08.2025 13:55 β π 1 π 0 π¬ 0 π 0
Great video on the convergent evolution from hierarchical military command structures to cybernetics to centralized AI coordination across political ideologies:
www.youtube.com/watch?v=mayo...
23.08.2025 11:07 β π 2 π 0 π¬ 1 π 0
Gaussian Belief Propagation
I'd also welcome a Bayesian framing. I know Andrew Davison's group has done work on Gaussian belief propagation for SLAM factor graphs (gaussianbp.github.io) but other than that and arxiv.org/abs/1703.04977, I'm not aware of of much Bayesian (deep) learning in (3D) vision right now.
22.08.2025 16:45 β π 3 π 0 π¬ 0 π 0
In general I think 3D vision would do well to take some inspiration from Bayesians. I guess these days they lost their glamour, but imo it's a very nice way of thinking that feels somewhat lost currently.
22.08.2025 15:18 β π 2 π 1 π¬ 2 π 0
"It is beautiful. It is elegant. Does it work well in practice? Not really. This is often the caveat we face in research: the things that are beautiful don't work and the things that work are not beautiful." β Daniel Cremers
22.08.2025 11:55 β π 36 π 5 π¬ 2 π 1
You follow him. Andrew Davison from Imperial College London.
22.08.2025 11:21 β π 0 π 0 π¬ 1 π 0
"As roboticists and computer vision people [outside of big tech], do we have to just wait for the next foundation model?"
I share the frustration. It's disempowering when most major progress recently is downstream of "foundation models" that you don't have the compute or data to train yourself.
21.08.2025 17:37 β π 24 π 2 π¬ 5 π 0
Sort of, but DINOv3 also seems to (inadvertently?) point towards the limits of pure scaling.
x.com/chrisoffner3...
19.08.2025 19:34 β π 3 π 0 π¬ 2 π 0
The US calculus seems to be:
- The main 21st century story is US v. China.
- The US thus needs to focus on the Pacific.
- They need to peel Russia off of China and make it an ally.
- If this happens at the cost of the Europeans, so be it.
- Europe is useless as an ally and harmless as an adversary.
04.03.2025 11:10 β π 13 π 1 π¬ 8 π 1
If you maximize cosine similarity, aren't you left with only a single dimension (i.e. scaling the vector norm) as CosSim-invariant "wiggle room" to encode geometric information that isn't also captured by the language?
15.08.2025 08:54 β π 0 π 0 π¬ 0 π 0
Yes but that's an additional training objective beyond merely minimizing cosine similarity. You'd need to introduce something that ensures that pixel features don't just collapse to language semantics, via some auxiliary task, no?
15.08.2025 08:43 β π 0 π 0 π¬ 1 π 0
It just seems to me that mapping pixels and language to highly similar internal representations means that you'll drop a lot of information that is not (or cannot) be accurately described by language.
15.08.2025 08:04 β π 1 π 0 π¬ 3 π 0
If we try to perfectly reconstruct, e.g., a complex 3D mesh from a natural language description, we'll find that the two modalities operate on very different levels of precision and abstraction.
15.08.2025 07:55 β π 0 π 0 π¬ 0 π 0
My concern is that language as a modality inherently biases the data towards coarser labels/concepts. You won't perfectly describe per-pixel normals and depth in natural language. Geometry is continuous and "raw", language is discrete and abstract.
15.08.2025 07:55 β π 2 π 0 π¬ 2 π 0
Oh, interesting. I'll check that out!
15.08.2025 07:48 β π 0 π 0 π¬ 0 π 0
Yay, DINOv3 is out!
SigLIP (VLMs) and DINO are two competing paradigms for image encoders.
My intuition is that joint vision-language modeling works great for semantic problems but may be too coarse for geometry problems like SfM or SLAM.
Most animals navigate 3D space perfectly without language.
14.08.2025 17:59 β π 31 π 5 π¬ 1 π 1
What are the best resources to learn about VLMs? Papers, tutorials, courses, blog posts, whatever is good. I can read the Kimi-VL or GLM tech reports and follow the breadcrumbs but I'd appreciate any and all recommendations towards a useful VLM curriculum! π
12.08.2025 17:58 β π 8 π 2 π¬ 1 π 0
The tiny hand of the market.
07.08.2025 19:48 β π 4 π 0 π¬ 0 π 0
Ensuring the robots canβt take our jobs by teaching the robots functional programming
28.06.2025 01:21 β π 11 π 3 π¬ 0 π 1
Yes to both. ππ¨
08.07.2025 04:41 β π 3 π 0 π¬ 1 π 0
a man in a white shirt is saying i deny this reality !
ALT: a man in a white shirt is saying i deny this reality !
As someone who loves Swiftβs type system, Iβm not listening!
24.06.2025 19:57 β π 2 π 0 π¬ 1 π 0
I was not aware that the #ECCV2024 oral recordings are publicly available... So here is the #ACEZero talk: eccv.ecva.net/virtual/2024...
23.06.2025 14:37 β π 29 π 2 π¬ 3 π 1
I don't think having basic type checks is equivalent to doing proofs in Lean. I just don't want to chase through fifty layers of inheritance to figure out what type of input the blessed authors of some godforsaken research codebase expect to be given to their function.
23.06.2025 19:06 β π 0 π 0 π¬ 1 π 0
The books you love, shared with your network! The online, decentralised community to build the best reading lists.
Alpha is now live at https://bibliome.club!
Research Scientist in the DINO team at Meta FAIR. Previously: PhD at Max-Planck Institute for Intelligent Systems, TΓΌbingen. Representation learning, agents, structure.
Professor of Computer Vision/Machine Learning at Imagine/LIGM, Γcole nationale des Ponts et ChaussΓ©es @ecoledesponts.bsky.social Music & overall happiness π³πͺ» Born well below 350ppm π¬ mostly silly personal views
πParis π https://davidpicard.github.io/
Large Models, Multimodality, Continual Learning | ELLIS ML PhD with Oriol Vinyals & Zeynep Akata | Previously Google DeepMind, Meta AI, AWS, Vector, MILA
π karroth.com
PhD student @ Uni TΓΌbingen and IMPRS-IS, working on 3D vision
patriciagschossmann.github.io
Postdoc, Real Virtual Humans group, University of TΓΌbingen, Germany
Postdoc@VGG, University of Oxford
A global commitment is characteristic of ETH Zurich in education, research & collaboration with partners worldwide.
doctoral researcher in soft robotics, artificial muscles, materials science | MPI-IS & ETH ZΓΌrich | previously TUM, TU Berlin & CentraleSupΓ©lec
PhD student | Aston University π¬π§ | Computer Vision | Self-supervision | Monocular Depth | Robotic Grasping | RobustDepth | BaseBoostDepth | more soonβ¦
Research Scientist at Google; 3D face&body modeling (FLAME/SMPL-X), single-view (DECA/ExPose) & multi-view 3D face reconstruction (ToFU/TEMPEH), speech-driven 3D animation (VOCA/EMOTE), neural avatars (INSTA/GEM).
https://sites.google.com/site/bolkartt/
PhD student in ML @uni_tue & @hdm_stg Interested in robust vision and object-centric learning ππ΄πΆππ©
Postdoc Fellow @ ETH AI Center.
https://eleanor-h.github.io/
Applied Scientist at Microsoft.
Computer Vision | Deep Learning | Machine Learning
Character Technical Artist at Google / Disney / Genies.
I'm a researcher at AMD working on improving computer graphics with the help of deep learning.
Previously: Intel Labs. PhD from UCSD with Prof. Ravi Ramamoorthi
https://alexku.me/
PhD student @ UT Austin; Intern @ NVIDIA
cairuisi.gitHub.io
Senior Devtech @Nvidia. Former graphics engineer at Ready At Dawn, Naughty Dog, Ubisoft. Views are my own.