Karim Farid @kifarid - Bluesky Profile

There are similarities between JEPAs and PFNs. In JEPAs, synthetic data is generated through learning. Notably, random weights can already perform well on downstream tasks, suggesting that the learning process induces useful operations on which you can do predictive coding.

17.10.2025 07:38 — 👍 2 🔁 0 💬 0 📌 0

Idk, but maybe not necessarily, we observe discrete tokens but the language states themselves can live in a continuous world.

14.10.2025 12:43 — 👍 0 🔁 0 💬 1 📌 0

Generative models that assume the underlying distribution is continuous, for example, flow matching and common diffusion models.

13.10.2025 14:20 — 👍 0 🔁 0 💬 1 📌 0

I really hope someone can revive continuous models for language. They’ve taken over the visual domain by far, but getting them to work in language still feels like pure alchemy.

12.10.2025 19:31 — 👍 4 🔁 0 💬 1 📌 0

Using Knowledge Graphs to harvest datasets for efficient CLIP model training Training high-quality CLIP models typically requires enormous datasets, which limits the development of domain-specific models -- especially in areas that even the largest CLIP models do not cover wel...

Excited to release our models and preprint: "Using Knowledge Graphs to harvest datasets for efficient CLIP model training"

We propose a dataset collection method using knowledge graphs and web image search, and create EntityNet-33M: a dataset of 33M images paired with 46M texts.

08.05.2025 12:58 — 👍 1 🔁 2 💬 2 📌 0

Over the past year, my lab has been working on fleshing out theory + applications of the Platonic Representation Hypothesis.

Today I want to share two new works on this topic:

Eliciting higher alignment: arxiv.org/abs/2510.02425
Unpaired learning of unified reps: arxiv.org/abs/2510.08492

1/9

10.10.2025 22:13 — 👍 131 🔁 32 💬 1 📌 5

Orbis shows that the objective matters.
Continuous modeling yields more stable and generalizable world models, yet true probabilistic coverage remains a challenge.

Immensely grateful to my co-authors @arianmousakhan.bsky.social, Sudhanshu Mittal, and Silvio Galesso, and to @thomasbrox.bsky.social

12.10.2025 15:51 — 👍 1 🔁 0 💬 0 📌 0

Under the hood 🧠

Orbis uses a hybrid tokenizer with semantic + detail tokens that work in both continuous and discrete spaces.
The world model then predicts the next frame by gradually denoising or unmasking it, using past frames as context.

12.10.2025 15:31 — 👍 1 🔁 0 💬 1 📌 0

Realistic and Diverse Rollouts 4/4

12.10.2025 15:26 — 👍 1 🔁 0 💬 1 📌 0

Realistic and Diverse Rollouts 3/4

12.10.2025 15:25 — 👍 1 🔁 0 💬 1 📌 0

Realistic and Diverse Rollouts 2/4

12.10.2025 15:25 — 👍 1 🔁 0 💬 1 📌 0

Realistic and Diverse Rollouts 1/4

12.10.2025 15:24 — 👍 1 🔁 0 💬 1 📌 0

While other models drift or blur on turns, Orbis stays on track — generating realistic, stable futures beyond the training horizon.

On our curated nuPlan-turns dataset, Orbis achieves better FVD, precision, and recall, capturing both visual and dynamics realism.

12.10.2025 15:18 — 👍 1 🔁 0 💬 1 📌 0

We ask how continuous vs. discrete models and their tokenizers shape long-horizon behavior.

Findings:
Continuous models (Flow Matching) are
• Far less brittle to design choices
• Produce realistic, stable rollouts up to 20s
• And generalize better to unseen driving conditions

Continuous > Discrete

12.10.2025 15:01 — 👍 1 🔁 0 💬 1 📌 0

Driving world models look good for a few frames, then they drift, blur, or freeze, especially when a turn or complex scene appears. These failures reveal a deeper issue: models aren’t capturing real dynamics. We introduce new metrics to measure such breakdowns.

12.10.2025 14:53 — 👍 2 🔁 0 💬 1 📌 0

Our work Orbis goes to #NeurIPS2025!

A continuous autoregressive driving world model that outperforms Cosmos, Vista, and GEM with far less compute.

469M parameters
Trained on ~280h of driving videos

📄 arxiv.org/pdf/2507.13162
🎬 lmb-freiburg.github.io/orbis.github...
💻 github.com/lmb-freiburg...

12.10.2025 14:39 — 👍 15 🔁 2 💬 1 📌 0

The question raised here is whether this approach is a generalist or a specialist that cannot transcend to the G-foundation state.

12.10.2025 13:51 — 👍 0 🔁 0 💬 0 📌 0

I think HRM is quite great too. I would say they contributed the main idea (deep supervision) behind TRM.

12.10.2025 13:51 — 👍 0 🔁 0 💬 1 📌 0

Transformers do not need to have something like "gradient descent" as an emergent property when it is kind of baked into it.

12.10.2025 13:50 — 👍 0 🔁 0 💬 1 📌 0

The TRM works because it has an optimization algorithm as an inductive bias to find the answer. Can't say anything about this work but brilliant.

12.10.2025 13:50 — 👍 0 🔁 0 💬 1 📌 0

We should normalize having the ‘Ideas That Failed’ section. It would save enormous amounts of compute and time otherwise spent rediscovering stuff that doesn’t work.

12.10.2025 13:49 — 👍 1 🔁 0 💬 0 📌 0

Eugene Vinitsky

I stumbled on @eugenevinitsky.bsky.social 's blog and his "Personal Rules of Productive Research" is very good. I now do a lot of things in the post, & wish I had done them when I was younger.

I share my "mini-paper" w ppl I hope will be co-authors.

www.eugenevinitsky.com/posts/person...

16.12.2024 15:14 — 👍 61 🔁 14 💬 6 📌 1

Just had an idea

10.12.2024 09:44 — 👍 2229 🔁 341 💬 31 📌 71

My major realization of the past year of teaching is that a lot is forgiven if students believe you genuinely care about them and the topic

05.12.2024 20:50 — 👍 51 🔁 1 💬 5 📌 0

Possible challenge: getting a model of {X,Y,Z,...} that is much better than independent models of each individual modality {X}, {Y}, {Z}, ... i.e. where the whole is greater than the sum of the parts.

04.12.2024 20:24 — 👍 12 🔁 1 💬 2 📌 0

I also really hope that the LAM from V1 is still there!

05.12.2024 11:11 — 👍 1 🔁 0 💬 0 📌 0

Inspiring! Genie incentives generative models to learn actionable latent states by enforcing a latent action model. Action spaces and actionable states are entangled, so more causal WMs. However, I was wondering why would you call the “counterfactuals” counterfactual? Sounds more like interventional

05.12.2024 11:09 — 👍 2 🔁 0 💬 1 📌 0

SODA: Bottleneck Diffusion Models for Representation Learning We introduce SODA, a self-supervised diffusion model, designed for representation learning. The model incorporates an image encoder, which distills a source view into a compact representation, that, i...

Nice! There was some skepticism around diffusion models representation learning capacity as they do not optimize for an explicit abstraction loss as other SSL models.

I guess the work would benefit a lot from a comparison with SODA, what do you think?

arxiv.org/abs/2311.17901

05.12.2024 10:12 — 👍 1 🔁 0 💬 0 📌 0

I'm excited about scaling up robot learning! We’ve been scaling up data gen with RL in realistic sims generated from crowdsourced videos. Enables data collection far more cheaply than real world teleop. Importantly, data becomes *cheaper* with more environments and transfers to real robots! 🧵 (1/N)

05.12.2024 02:12 — 👍 21 🔁 11 💬 3 📌 0

Karim Farid

Latest posts by kifarid.bsky.social on Bluesky

@kifarid is following 20 prominent accounts