There are similarities between JEPAs and PFNs. In JEPAs, synthetic data is generated through learning. Notably, random weights can already perform well on downstream tasks, suggesting that the learning process induces useful operations on which you can do predictive coding.
17.10.2025 07:38 β π 2 π 0 π¬ 0 π 0
Idk, but maybe not necessarily, we observe discrete tokens but the language states themselves can live in a continuous world.
14.10.2025 12:43 β π 0 π 0 π¬ 1 π 0
Generative models that assume the underlying distribution is continuous, for example, flow matching and common diffusion models.
13.10.2025 14:20 β π 0 π 0 π¬ 1 π 0
I really hope someone can revive continuous models for language. Theyβve taken over the visual domain by far, but getting them to work in language still feels like pure alchemy.
12.10.2025 19:31 β π 4 π 0 π¬ 1 π 0
Using Knowledge Graphs to harvest datasets for efficient CLIP model training
Training high-quality CLIP models typically requires enormous datasets, which limits the development of domain-specific models -- especially in areas that even the largest CLIP models do not cover wel...
Excited to release our models and preprint: "Using Knowledge Graphs to harvest datasets for efficient CLIP model training"
We propose a dataset collection method using knowledge graphs and web image search, and create EntityNet-33M: a dataset of 33M images paired with 46M texts.
08.05.2025 12:58 β π 1 π 2 π¬ 2 π 0
Over the past year, my lab has been working on fleshing out theory + applications of the Platonic Representation Hypothesis.
Today I want to share two new works on this topic:
Eliciting higher alignment: arxiv.org/abs/2510.02425
Unpaired learning of unified reps: arxiv.org/abs/2510.08492
1/9
10.10.2025 22:13 β π 131 π 32 π¬ 1 π 5
Orbis shows that the objective matters.
Continuous modeling yields more stable and generalizable world models, yet true probabilistic coverage remains a challenge.
Immensely grateful to my co-authors @arianmousakhan.bsky.social, Sudhanshu Mittal, and Silvio Galesso, and to @thomasbrox.bsky.social
12.10.2025 15:51 β π 1 π 0 π¬ 0 π 0
Under the hood π§
Orbis uses a hybrid tokenizer with semantic + detail tokens that work in both continuous and discrete spaces.
The world model then predicts the next frame by gradually denoising or unmasking it, using past frames as context.
12.10.2025 15:31 β π 1 π 0 π¬ 1 π 0
Realistic and Diverse Rollouts 4/4
12.10.2025 15:26 β π 1 π 0 π¬ 1 π 0
Realistic and Diverse Rollouts 3/4
12.10.2025 15:25 β π 1 π 0 π¬ 1 π 0
Realistic and Diverse Rollouts 2/4
12.10.2025 15:25 β π 1 π 0 π¬ 1 π 0
Realistic and Diverse Rollouts 1/4
12.10.2025 15:24 β π 1 π 0 π¬ 1 π 0
We ask how continuous vs. discrete models and their tokenizers shape long-horizon behavior.
Findings:
Continuous models (Flow Matching) are
β’ Far less brittle to design choices
β’ Produce realistic, stable rollouts up to 20s
β’ And generalize better to unseen driving conditions
Continuous > Discrete
12.10.2025 15:01 β π 1 π 0 π¬ 1 π 0
Driving world models look good for a few frames, then they drift, blur, or freeze, especially when a turn or complex scene appears. These failures reveal a deeper issue: models arenβt capturing real dynamics. We introduce new metrics to measure such breakdowns.
12.10.2025 14:53 β π 2 π 0 π¬ 1 π 0
Our work Orbis goes to #NeurIPS2025!
A continuous autoregressive driving world model that outperforms Cosmos, Vista, and GEM with far less compute.
469M parameters
Trained on ~280h of driving videos
π arxiv.org/pdf/2507.13162
π¬ lmb-freiburg.github.io/orbis.github...
π» github.com/lmb-freiburg...
12.10.2025 14:39 β π 15 π 2 π¬ 1 π 0
The question raised here is whether this approach is a generalist or a specialist that cannot transcend to the G-foundation state.
12.10.2025 13:51 β π 0 π 0 π¬ 0 π 0
I think HRM is quite great too. I would say they contributed the main idea (deep supervision) behind TRM.
12.10.2025 13:51 β π 0 π 0 π¬ 1 π 0
Transformers do not need to have something like "gradient descent" as an emergent property when it is kind of baked into it.
12.10.2025 13:50 β π 0 π 0 π¬ 1 π 0
The TRM works because it has an optimization algorithm as an inductive bias to find the answer. Can't say anything about this work but brilliant.
12.10.2025 13:50 β π 0 π 0 π¬ 1 π 0
We should normalize having the βIdeas That Failedβ section. It would save enormous amounts of compute and time otherwise spent rediscovering stuff that doesnβt work.
12.10.2025 13:49 β π 1 π 0 π¬ 0 π 0
Eugene Vinitsky
I stumbled on @eugenevinitsky.bsky.social 's blog and his "Personal Rules of Productive Research" is very good. I now do a lot of things in the post, & wish I had done them when I was younger.
I share my "mini-paper" w ppl I hope will be co-authors.
www.eugenevinitsky.com/posts/person...
16.12.2024 15:14 β π 61 π 14 π¬ 6 π 1
Just had an idea
10.12.2024 09:44 β π 2229 π 341 π¬ 31 π 71
My major realization of the past year of teaching is that a lot is forgiven if students believe you genuinely care about them and the topic
05.12.2024 20:50 β π 51 π 1 π¬ 5 π 0
Possible challenge: getting a model of {X,Y,Z,...} that is much better than independent models of each individual modality {X}, {Y}, {Z}, ... i.e. where the whole is greater than the sum of the parts.
04.12.2024 20:24 β π 12 π 1 π¬ 2 π 0
I also really hope that the LAM from V1 is still there!
05.12.2024 11:11 β π 1 π 0 π¬ 0 π 0
Inspiring! Genie incentives generative models to learn actionable latent states by enforcing a latent action model. Action spaces and actionable states are entangled, so more causal WMs. However, I was wondering why would you call the βcounterfactualsβ counterfactual? Sounds more like interventional
05.12.2024 11:09 β π 2 π 0 π¬ 1 π 0
SODA: Bottleneck Diffusion Models for Representation Learning
We introduce SODA, a self-supervised diffusion model, designed for representation learning. The model incorporates an image encoder, which distills a source view into a compact representation, that, i...
Nice! There was some skepticism around diffusion models representation learning capacity as they do not optimize for an explicit abstraction loss as other SSL models.
I guess the work would benefit a lot from a comparison with SODA, what do you think?
arxiv.org/abs/2311.17901
05.12.2024 10:12 β π 1 π 0 π¬ 0 π 0
I'm excited about scaling up robot learning! Weβve been scaling up data gen with RL in realistic sims generated from crowdsourced videos. Enables data collection far more cheaply than real world teleop. Importantly, data becomes *cheaper* with more environments and transfers to real robots! π§΅ (1/N)
05.12.2024 02:12 β π 21 π 11 π¬ 3 π 0
AI, RL, and ML researcher. Interested in hard problems which have the potential to improve the world. Currently Postdocing at University of Calgary looking for hard problems in power systems research. Previously @modl.ai @ualberta @borealis.ai @huawei.
Researching public interest AI, NLP, tech policy, interfacing bits to meaning, and more. Based in Berlin.
AI undergrad @ Northeastern University, China | RA @ UNC-Chapel Hill & SJTU
Language Grounding, Multimodal Reasoning & Planning, Human-Robot Interaction
Seeking PhD Fall 2026 |
https://10-oasis-01.github.io
Computer Vision & Machine Learning researcher at NAVER LABS europe
she/her - https://dlarlus.github.io/
Program Manager ML & AI @ Google Research | Ex-Google Brain. Speaker (FR/EN)
Abdoulaye.ai
Opinions are my own. He/His
A lot of my retweets and likes are for bookmarking purposes.
π Accra, Ghana
(cover photo: Dar Es Saalam, circa 2015)
Mayor-Elect of New York City
Functional roboticist. Robots, Haskell, Rust, nix, emacs, FPV⦠and the rest of life, too.
building something new
reposting art, research
prev: ed tech startup (10M users, acquired), yc, that forbes list, mit
https://www.leandra.dev/
A curious explorer of human and machine learning π§ π€π€
M.Sc. Computer Science | Computer Vision @ University of Freiburg
Author, Speaker, Peace Researcher: josef.muehlbauer@uni-graz.at
Researcher on MDPs and RL. Retired prof. #orms #rl
Psychology with teeth: unpacking everything from gaming quirks and cultural absurdities to authoritarian power plays and the psychology of resistance.
π Empowering students with psychology tools.
π Empowering people with psychological insights.
PhD@UniKonstanz@SwarmIntelligence
PhD in Computer Vision
Supervised and Inspired by Prof. Dr.-Ing Margret Keuper
Member of the Data & Web Science Group @ University of Mannheim.
ML, Ξ» β’ language and the machines that understand it β’ https://ocramz.github.io
English professor, book historian. 19th C, mainly. Interested in histories of medicine, sexuality, and print culture, text reuse, IP, letterpress, DH/computational approaches
Book: Selling Sexual Knowledge (CUP, 2025). Working on MANUFACTURING LITERATURE