Sander Dieleman's Avatar

Sander Dieleman

@sedielem.bsky.social

Blog: https://sander.ai/ ๐Ÿฆ: https://x.com/sedielem Research Scientist at Google DeepMind (WaveNet, Imagen 3, Veo, ...). I tweet about deep learning (research + software), music, generative models (personal account).

4,369 Followers  |  621 Following  |  89 Posts  |  Joined: 04.07.2023  |  1.8418

Latest posts by sedielem.bsky.social on Bluesky

On N-dimensional Rotary Positional Embeddings An exploration of N-dimensional rotary positional embeddings (RoPE) for vision transformers.

Great blog post on rotary position embeddings (RoPE) in more than one dimension, with interactive visualisations, a bunch of experimental results, and code!

28.07.2025 14:51 โ€” ๐Ÿ‘ 18    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

... also very honoured and grateful to see my blog linked in the video description! ๐Ÿฅน๐Ÿ™๐Ÿ™‡

26.07.2025 21:59 โ€” ๐Ÿ‘ 9    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I blog and give talks to help build people's intuition for diffusion models. YouTubers like @3blue1brown.com and Welch Labs have been a huge inspiration: their ability to make complex ideas in maths and physics approachable is unmatched. Really great to see them tackle this topic!

26.07.2025 21:59 โ€” ๐Ÿ‘ 30    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Everyone is welcome!

15.07.2025 21:39 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Hello #ICML2025๐Ÿ‘‹, anyone up for a diffusion circle? We'll just sit down somewhere and talk shop.

๐Ÿ•’Join us at 3PM on Thursday July 17. We'll meet here (see photo, near the west building's west entrance), and venture out from there to find a good spot to sit. Tell your friends!

15.07.2025 21:33 โ€” ๐Ÿ‘ 13    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
Post image

Diffusion models have analytical solutions, but they involve sums over the entire training set, and they don't generalise at all. They are mainly useful to help us understand how practical diffusion models generalise.

Nice blog + code by Raymond Fan: rfangit.github.io/blog/2025/op...

05.07.2025 16:01 โ€” ๐Ÿ‘ 34    ๐Ÿ” 3    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1

Note also that getting this number slightly wrong isn't that big a deal. Even if you make it 100k instead of 10k, it's not going to change the granularity of the high frequencies that much because of the logarithmic frequency spacing.

24.06.2025 23:39 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

The frequencies are log-spaced, so historically, 10k was plenty to ensure that all positions can be uniquely distinguished. Nowadays of course sequences can be quite a bit longer.

24.06.2025 23:39 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
History of Diffusion -  Sander Dieleman
YouTube video by Bain Capital Ventures History of Diffusion - Sander Dieleman

Here's the third and final part of Slater Stich's "History of diffusion" interview series!

The other two interviewees' research played a pivotal role in the rise of diffusion models, whereas I just like to yap about them ๐Ÿ˜ฌ this was a wonderful opportunity to do exactly that!

14.05.2025 16:11 โ€” ๐Ÿ‘ 21    ๐Ÿ” 7    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
[โ€œMachine Learning for Audio Workshopโ€] [โ€œDiscover the harmony of AI and sound.โ€]

The ML for audio ๐Ÿ—ฃ๏ธ๐ŸŽต๐Ÿ”Š workshop is back at ICML 2025 in Vancouver! It will take place on Saturday, July 19. Featuring invited talks from Dan Ellis, Albert Gu, James Betker, Laura Laurenti and Pratyusha Sharma.

Submission deadline: May 23 (Friday next week)
mlforaudioworkshop.github.io

14.05.2025 12:16 โ€” ๐Ÿ‘ 13    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

I am very happy to share our latest work on the information theory of generative diffusion:

"Entropic Time Schedulers for Generative Diffusion Models"

We find that the conditional entropy offers a natural data-dependent notion of time during generation

Link: arxiv.org/abs/2504.13612

29.04.2025 13:17 โ€” ๐Ÿ‘ 25    ๐Ÿ” 5    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

One weird trick for better diffusion models: concatenate some DINOv2 features to your latent channels!

Combining latents with PCA components extracted from DINOv2 features yields faster training and better samples. Also enables a new guidance strategy. Simple and effective!

25.04.2025 13:03 โ€” ๐Ÿ‘ 28    ๐Ÿ” 4    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Generative modelling in latent space Latent representations for generative models.

New blog post: let's talk about latents!
sander.ai/2025/04/15/l...

15.04.2025 09:43 โ€” ๐Ÿ‘ 74    ๐Ÿ” 18    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 5
History of Diffusion - Yang Song
YouTube video by Bain Capital Ventures History of Diffusion - Yang Song

Amazing interview with Yang Song, one of the key researchers we have to thank for diffusion models.

The most important lesson: be fearless! The community's view on score matching was quite pessimistic at the time, he went against the grain and made it work at scale!

www.youtube.com/watch?v=ud6z...

14.04.2025 16:47 โ€” ๐Ÿ‘ 26    ๐Ÿ” 4    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image Post image

๐ŸฅIntroducing Gemini 2.5, our most intelligent model with impressive capabilities in advanced reasoning and coding.

Now integrating thinking capabilities, 2.5 Pro Experimental is our most performant Gemini model yet. Itโ€™s #1 on the LM Arena leaderboard. ๐Ÿฅ‡

25.03.2025 17:25 โ€” ๐Ÿ‘ 216    ๐Ÿ” 65    ๐Ÿ’ฌ 34    ๐Ÿ“Œ 11
Preview
Research Scientist, Generative Media London, UK

We are hiring on the Generative Media team in London: boards.greenhouse.io/deepmind/job...

We work on Imagen, Veo, Lyria and all that good stuff. Come work with us! If you're interested, apply before Feb 28.

21.02.2025 19:00 โ€” ๐Ÿ‘ 36    ๐Ÿ” 12    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 0
History of Diffusion - Jascha Sohl-Dickstein
YouTube video by Bain Capital Ventures History of Diffusion - Jascha Sohl-Dickstein

Great interview with @jascha.sohldickstein.com about diffusion models! This is the first in a series: similar interviews with Yang Song and yours truly will follow soon.

(One of these is not like the others -- both of them basically invented the field, and I occasionally write a blog post ๐Ÿฅฒ)

10.02.2025 22:30 โ€” ๐Ÿ‘ 43    ๐Ÿ” 11    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
Preview
Cosmogenesis, by grumusic 8 track album

Yes! Also listen to this and contemplate the universe: grumusic.bandcamp.com/album/cosmog...

28.01.2025 23:55 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
NeurIPS 2024 Schedule

This is just a tiny fraction of what's available, check out the schedule for more: neurips.cc/virtual/2024...

22.01.2025 21:06 โ€” ๐Ÿ‘ 6    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
NeurIPS Multimodal Iterative RefinementNeurIPS 2024

10. Last but not least (๐Ÿ˜Ž), here's my own workshop talk about multimodal iterative refinement: the methodological tension between language and perceptual modalities, autoregression and diffusion, and how to bring these together ๐Ÿธ neurips.cc/virtual/2024...

22.01.2025 21:06 โ€” ๐Ÿ‘ 7    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
NeurIPS Colin RaffleNeurIPS 2024

9. A great overview of various strategies for merging multiple models together by Colin Raffel ๐Ÿชฟ neurips.cc/virtual/2024...

22.01.2025 21:06 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
NeurIPS Invited Talk 4 (Speker: Ishan Misra)NeurIPS 2024

8. Ishan Misra gives a nice overview of Meta's Movie Gen model ๐Ÿ“ฝ๏ธ (I have some questions about the diffusion vs. flow matching comparison though๐Ÿ˜) neurips.cc/virtual/2024...

22.01.2025 21:06 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
NeurIPS Tom Goldstein: Can transformers solve harder problems than they were trained on? Scaling up test-time computation via recurrenceNeurIPS 2024

7. More on test-time scaling from @tomgoldstein.bsky.social, using a different approach based on recurrence ๐Ÿš neurips.cc/virtual/2024... (some interesting comments on the link with diffusion models in the questions at the end!)

22.01.2025 21:06 โ€” ๐Ÿ‘ 5    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
NeurIPS Invited Speaker: Noam Brown, OpenAINeurIPS 2024

6. @polynoamial.bsky.social talks about scaling compute at inference time, and the trade-offs involved -- in language models, but also in other settings ๐Ÿงฎ neurips.cc/virtual/2024...

22.01.2025 21:06 โ€” ๐Ÿ‘ 5    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
NeurIPS Neel Nanda: Sparse Autoencoders - Assessing the evidenceNeurIPS 2024

5. Sparse autoencoders were in vogue well over a decade ago, back when I was doing my PhD. They've recently been revived in the context of mechanistic interpretability of LLMs ๐Ÿ” @neelnanda.bsky.social gives a nice overview: neurips.cc/virtual/2024...

22.01.2025 21:06 โ€” ๐Ÿ‘ 7    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
NeurIPS Surya Ganguli: An analytic theory of creativity in convolutional diffusion modelsNeurIPS 2024

4. Insights from @suryaganguli.bsky.social on creativity, generalisation and overfitting in diffusion models ๐ŸŽจ neurips.cc/virtual/2024...

22.01.2025 21:06 โ€” ๐Ÿ‘ 5    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
NeurIPS Geometry of the Distribution of Natural ImagesNeurIPS 2024

3. @eerosim.bsky.social provides an in-depth look at the geometry of the distribution of natural images ๐Ÿ–ผ๏ธ Extremely relevant to anyone trying to understand what diffusion models are really doing. neurips.cc/virtual/2024...

22.01.2025 21:06 โ€” ๐Ÿ‘ 9    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
NeurIPS Alexis ConneauNeurIPS 2024

2. A great talk from Alexis Conneau demonstrating the various challenges involved in giving LLMs a voice: neurips.cc/virtual/2024...

22.01.2025 21:06 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
NeurIPS Keynote: LLM Posteriors over Functions as a New Output ModalityNeurIPS 2024

1. @davidduvenaud.bsky.social gave an inspiring talk about using language models to learn to represent functions -- the kind of thing people like to use e.g. Gaussian processes for ๐Ÿ“ˆ neurips.cc/virtual/2024...

22.01.2025 21:06 โ€” ๐Ÿ‘ 5    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

๐Ÿ“ขPSA: #NeurIPS2024 recordings are now publicly available!

The workshops always have tons of interesting things on at once, so the FOMO is real๐Ÿ˜ตโ€๐Ÿ’ซ Luckily it's all recorded, so I've been catching up on what I missed.

Thread below with some personal highlights๐Ÿงต

22.01.2025 21:06 โ€” ๐Ÿ‘ 129    ๐Ÿ” 33    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

@sedielem is following 20 prominent accounts