Compositionality is a central desideratum for intelligent systems...but it's a fuzzy concept and difficult to quantify. In this blog post, lab member @ericelmoznino.bsky.social outlines ideas toward formalizing it & surveys recent work. A must-read for interested researchers in AI and Neuro
19.08.2025 13:50 โ ๐ 21 ๐ 5 ๐ฌ 0 ๐ 0
This work wouldnโt exist without my amazing co-authors:
@mnoukhov.bsky.social & @AaronCourville๐
22.07.2025 14:41 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
Example: There are no โteapots on mountainsโ in ImageNet.
We verify this via nearest-neighbor search in DinoV2 space.
But our model can still create themโby composing concepts it learned separately.
22.07.2025 14:41 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
LLMs can speak in DLC!
We fine-tune a language model to sample DLC tokens from text, giving us a pipeline:
Text โ DLC โ Image
This also enables generation beyond ImageNet.
22.07.2025 14:41 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
DLCs are compositional.
Swap tokens between two images (๐โฏKomodor + ๐โฏCarbonara) โ the model produces coherent hybrids never seen during training.
22.07.2025 14:41 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
๐ Results:
DiT-XL/2 + DLC โ FID 1.59 on unconditional ImageNet
Works well with and without classifier-free guidance
Learns faster and better than prior works using pre-trained encoders
๐คฏ
22.07.2025 14:41 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
Unconditional generation pipeline:
Sample a DLC (e.g., with SEDD)
Decode it into an image (e.g., with DiT)
This ancestral sampling approach is simple but powerful.
22.07.2025 14:41 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
DLCs enables exactly this.
Images โ sequences of discrete tokens via a Simplicial Embedding (SEM) encoder
We take the argmax over token distributions โ get the DLC sequence
Think of it as โtokenizingโ imagesโlike words for LLMs.
22.07.2025 14:41 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
Text models donโt have this problem! LLMs can model internet scale corpus.
Soโฆ can we improve image generation of highly-modal distributions by decomposing it into:
1. Generating discrete tokens - p(c)
2. Decoding tokens into images - p(x|c)
22.07.2025 14:41 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
Modeling highly multimodal distributions in continuous space is hard.
Even a simple 2D Gaussian mixture with a large number of modes may be tricky to model directly. Good conditioning solves this!
Could this be why large image generative models are almost always conditional? ๐ค
22.07.2025 14:41 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
๐งต Everyone is chasing new diffusion modelsโbut what about the representations they model from?
We introduce Discrete Latent Codes (DLCs):
- Discrete representation for diffusion models
- Uncond. gen. SOTA FID (1.59 on ImageNet)
- Compositional generation
- Integrates with LLM
๐งฑ
22.07.2025 14:41 โ ๐ 5 ๐ 3 ๐ฌ 1 ๐ 0
Congrats Lucas! Looking forward to see what will come out of your lab in Zurich!
05.12.2024 12:55 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
Committed to the daily re-imagining of what a university press can be since 1962.
Website: https://mitpress.mit.edu // The Reader (our home for excerpts, essays, & interviews): https://thereader.mitpress.mit.edu
CTO of Technology & Society at Google, working on fundamental AI research and exploring the nature and origins of intelligence.
Mathematician/informatician thinking probabilistically, expecting the same of you.
Edinburgh ๐ด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ / theyโฅshe>heโฅ0 ๐ณ๏ธโโง๏ธ
It is the categories in the mind and the guns in their hands which keep us enslaved.
Researching the dark arts of deep learning at Meta's FAIR (Fundamental AI Research) Lab https://markibrahim.me/
AI4science research, density functional theory @ Microsoft Research Amsterdam. PhD on generative modeling, flows, diffusion @ Mila Montreal
Research Scientist @Apple MLR
Mathematician at UCLA. My primary social media account is https://mathstodon.xyz/@tao . I also have a blog at https://terrytao.wordpress.com/ and a home page at https://www.math.ucla.edu/~tao/
Research Scientist at ServiceNow
Gradient-descent enthusiast building LLM agents.
Formerly Mila, Deepmind, Amazon, ElemenAI, Spotify
Research Scientist Meta/FAIR, Prof. University of Geneva, co-founder Neural Concept SA. I like reality.
https://fleuret.org
messing up with gaussians
Ph.D. Student at Mila
Visiting Researcher at Meta FAIR
Causality, Trustworthy ML
Former: Microsoft Research, IIT Kanpur
divyat09.github.io
PhD student @mila-quebec.bsky.social, visiting researcher at FAIR. Responsible AI and generative models.
Ph.D. Student studying AI & decision making at Mila / McGill University. Currently at FAIR @ Meta. Previously Google DeepMind & Google Brain.
https://brosa.ca
Advocate for tech that makes humans better | Spatial Computing, Holodeck, and AI Futurist | Ex-Microsoft, Rackspace | Co-author, "The Infinite Retina."
AIxBio Research Scientist ๐ฉโ๐ฌ | PhD Computational Biology ๐ฉโ๐ป | BSc Biomedicine ๐งซ | "Singlecellologist" ๐ฆ into biologically meaningful representation learning ๐งฌ | Decoding life in London ๐ฌ๐ง
Staff ML Scientist @valenceai.bsky.social Labs/Recursion Pharma, Mila
GFlowNets, molecules & stuff
https://folinoid.com
Researcher (OpenAI. Ex: DeepMind, Brain, RWTH Aachen), Gamer, Hacker, Belgian.
Anon feedback: https://admonymous.co/giffmana
๐ Zรผrich, Suisse ๐ http://lucasb.eyer.be
AI @ OpenAI, Tesla, Stanford
PhD researcher at Mila Quebec