Over the past year, my lab has been working on fleshing out theory + applications of the Platonic Representation Hypothesis.
Today I want to share two new works on this topic:
Eliciting higher alignment: arxiv.org/abs/2510.02425
Unpaired learning of unified reps: arxiv.org/abs/2510.08492
1/9
10.10.2025 22:13 โ ๐ 129 ๐ 32 ๐ฌ 1 ๐ 5
#TTT3R: 3D Reconstruction as Test-Time Training
TTT3R offers a simple state update rule to enhance length generalization for #CUT3R โ No fine-tuning required!
๐Page: rover-xingyu.github.io/TTT3R
We rebuilt @taylorswift13โs "22" live at the 2013 Billboard Music Awards - in 3D!
01.10.2025 06:35 โ ๐ 36 ๐ 4 ๐ฌ 0 ๐ 4
KiVA Challenge @ ICCV 2025
๐ง How โoldโ is your model?
Put it to the test with the KiVA Challenge: a new benchmark for abstract visual reasoning, grounded in real developmental data from children and adults.
๐ Prizes:
๐ฅ$1K to the top model
๐ฅ๐ฅ$500
๐
Deadline: 10/7/25
๐ kiva-challenge.github.io
@iccv.bsky.social
15.07.2025 19:19 โ ๐ 22 ๐ 12 ๐ฌ 1 ๐ 0
(ChatGPT claims that this piece is Twinkle Twinkle Little Star, while Gemini says it is Do-Re-Me.)
11.07.2025 22:46 โ ๐ 4 ๐ 0 ๐ฌ 0 ๐ 0
ChatGPT and Gemini both seem to struggle with sheet music. They both insist that this excerpt is in D major (2 sharps), and resist any attempt to tell them that there 3 sharps in the key signature. I think this is really cool and interesting!
11.07.2025 22:44 โ ๐ 12 ๐ 0 ๐ฌ 2 ๐ 0
Think LMMs can reason like a 3-year-old?
Think again!
Our Kid-Inspired Visual Analogies benchmark reveals where young children still win: ey242.github.io/kiva.github....
Catch our #ICLR2025 poster today to see where models still fall short!
Thurs. April 24
3-5:30 pm
Halls 3 + 2B #312
23.04.2025 22:58 โ ๐ 24 ๐ 7 ๐ฌ 2 ๐ 0
1/6 ๐โก๏ธ How to transform standard videos into immersive 360ยฐ panoramas? We've designed a new AI system for video-to-360ยฐ panorama generation!
Our key insight: large-scale data is crucial for robust panoramic synthesis across diverse scenes.
23.04.2025 15:49 โ ๐ 3 ๐ 1 ๐ฌ 5 ๐ 0
We have released the Stereo4D dataset! Explore the real-world dynamic 3D tracks: github.com/Stereo4d/ste...
15.04.2025 19:59 โ ๐ 13 ๐ 3 ๐ฌ 0 ๐ 0
This is really nice work on visual discovery from @boyangdeng.bsky.social!
14.04.2025 13:40 โ ๐ 6 ๐ 0 ๐ฌ 0 ๐ 0
We're very excited to introduce TAPNext: a model that sets a new state-of-art for Tracking Any Point in videos, by formulating the task as Next Token Prediction. For more, see: tap-next.github.io
09.04.2025 14:04 โ ๐ 23 ๐ 9 ๐ฌ 1 ๐ 0
A thread of thoughts on radiance fields, from my keynote at 3DV:
Radiance fields have had 3 distinct generations. First was NeRF: just posenc and a tiny MLP. This was slow to train but worked really well, and it was unusually compressed --- The NeRF was smaller than the images.
08.04.2025 17:25 โ ๐ 93 ๐ 21 ๐ฌ 2 ๐ 1
Fifth Ave jammed #handsoff
05.04.2025 17:56 โ ๐ 4062 ๐ 544 ๐ฌ 29 ๐ 23
๐ Weโve just released the code and checkpoints for our #ICLR2025 Oral paper: "LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias".
Check it out below ๐
๐ Code: github.com/haian-jin/LVSM
๐ Paper: arxiv.org/abs/2410.17242
๐ Project Page: haian-jin.github.io/projects/LVSM/
05.04.2025 18:25 โ ๐ 18 ๐ 2 ๐ฌ 0 ๐ 0
This is really cool work!
30.03.2025 00:14 โ ๐ 7 ๐ 1 ๐ฌ 1 ๐ 0
[1/10] Is scene understanding solved?
Models today can label pixels and detect objects with high accuracy. But does that mean they truly understand scenes?
Super excited to share our new paper and a new task in computer vision: Visual Jenga!
๐ arxiv.org/abs/2503.21770
๐ visualjenga.github.io
29.03.2025 19:36 โ ๐ 58 ๐ 14 ๐ฌ 7 ๐ 1
#Backslash at #CornellTech, dedicated to advancing new works of art and technology that escape convention, has announced Mimi แปnแปฅแปha as its first Backslash Fellow: tech.cornell.edu/news/mimi-on...
โThis work feels like a marked evolution for me personally,โ said แปnแปฅแปha.
@snavely.bsky.social
12.03.2025 16:44 โ ๐ 3 ๐ 1 ๐ฌ 0 ๐ 0
Very nice! Is this a thing that happens each night at the hotel?
02.03.2025 14:46 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
This is really bad!
26.02.2025 04:25 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
Exciting news! MegaSAM code is out๐ฅ & the updated Shape of Motion results with MegaSAM are really impressive! A year ago I didn't think we could make any progress on these videos: shape-of-motion.github.io/results.html
Huge congrats to everyone involved and the community ๐
24.02.2025 18:52 โ ๐ 75 ๐ 17 ๐ฌ 3 ๐ 0
Very interesting! The guy who loves singing through a megaphone comes to mind, but I think he came later.
24.02.2025 14:09 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
The Dispossesed is an interesting choice! I didn't know it had a big influence.
19.02.2025 00:54 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
Very interesting -- thank you!
18.02.2025 19:31 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
I think Qianqian et al's work is really cool! The problem of modeling state within a 3D reasoning system is quite interesting.
(And I believe it's pronounced "cuter".)
18.02.2025 17:09 โ ๐ 8 ๐ 0 ๐ฌ 0 ๐ 0
Late to post, but excited to introduce CUT3R!
An online 3D reasoning framework for many 3D tasks directly from just RGB. For static or dynamic scenes. Video or image collections, all in one!
Project Page: cut3r.github.io
Code and Model: github.com/CUT3R/CUT3R
18.02.2025 17:03 โ ๐ 34 ๐ 6 ๐ฌ 2 ๐ 1
The thought also occurred to me that LLMs might intentionally be designed to produce slightly off-kilter text, to make it easier for whoever cares to distinguish human writing from LLM writing.
18.02.2025 15:56 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0
That's a great point! This case seems like such a step function to me -- I never noticed it before, now all of a sudden I see it everywhere. I asked one author, and they said that they did an LLM polishing step. But maybe I'm leaping to conclusions. Or, maybe LLMs are accelerating an existing trend.
18.02.2025 15:56 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
This is really unimportant, but I keep seeing the word "advancements" in writing where I would have seen the word "advances" before.
I'm taking this to mean that LLMs are at play and therefore, they will influence the language such that the two words will eventually come to mean the same thing!
18.02.2025 15:49 โ ๐ 7 ๐ 1 ๐ฌ 3 ๐ 0
I made So to Speak, a puzzle game for learning Japanese.
https://sotospeakgame.com/
Co-Hotdogist at 1900HOTDOG.com, Dogg Zzone 9000 podcast, Internet Treasure
Assistant Professor of Computer Science at the University of British Columbia. I also post my daily finds on arxiv.
jon
https://www.patreon.com/c/SecretBase
https://www.youtube.com/SecretBaseSBN
Southern Arizona's PBS & NPR Member Station, a service of the University of Arizona
Associate Professor, University of Utah
Founder, Cyber Radiance
http://www.cemyuksel.com
Sometimes those checks clear, and sometimes they bounce, baby! Your favorite movie podcast's favorite movie podcast, hosted by Griffin Newman and David Sims.
ai phd at university of toronto //
prev at meta, snap research and georgia tech //
web: https://yashkant.github.io/
Assistant Professor Stanford CS. Perception, learning and control for autonomous robotic manipulation. https://web.stanford.edu/~bohg/
CS PhD Student @cornelluniversity.bsky.social
Assistant Prof. at Georgia Tech | NVIDIA AI | Making robots smarter
https://ocremix.org
https://discord.gg/ocremix
OC ReMix promotes video game music as an art form.
4,500+ FREE VGM fan mixes since 1999!
Posts by CEO @darkesword.com & CM @larryoji.bsky.social
Research Scientist @ Google DeepMind - working on video models for science. Worked on video generation; self-supervised learning; VLMs - ๐ฆฉ; point tracking.
Doing research in 3D computer vision!
Assist. Prof. @UvA Amsterdam | Prev. ETH Zurich | TUM
Assistant Professor at University of Pennsylvania.
Robot Learning.
https://www.seas.upenn.edu/~dineshj/
Professor at Cornell Tech. Vice Chair of AI&Eng Research at Weill Cornell Radiology. AI for Medical Imaging. Ex: Princeton, MIT, Harvard. Hobbies: Running, NBA, NFL, Music (Rock!), Books, Broadway, Science, Technology. New here.
Assistant Professor at UC Berkeley