Luca Ambrogioni's Avatar

Luca Ambrogioni

@lucamb.bsky.social

Assistant professor in Machine Learning and Theoretical Neuroscience. Generative modeling and memory. Opinionated, often wrong.

1,832 Followers  |  117 Following  |  39 Posts  |  Joined: 05.12.2023  |  1.9899

Latest posts by lucamb.bsky.social on Bluesky

Many when the number of steps in the puzzle is in the thousands and any error leads to a wrong solution

12.06.2025 15:46 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Have you ever asked your child to solve a simple puzzle in 60.000 easy steps?

11.06.2025 18:19 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Students using AI to write their reports is like me going to the gym and getting a robot to lift my weights

11.06.2025 17:09 โ€” ๐Ÿ‘ 58    ๐Ÿ” 16    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 3
Post image

Generative decisions in diffusion models can be detected locally as symmetry breaking in the energy and globally as peaks in the conditional entropy rate.

The both corresponds to a (local or global) suppression of the quadratic potential (Hessian trace).

16.05.2025 09:12 โ€” ๐Ÿ‘ 7    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

๐Ÿง โœจHow do we rebuild our memories? In our new study, we show that hippocampal ripples kickstart a coordinated expansion of cortical activity that helps reconstruct past experiences.

We recorded iEEG from patients during memory retrieval... and found something really cool ๐Ÿ‘‡(thread)

29.04.2025 05:59 โ€” ๐Ÿ‘ 169    ๐Ÿ” 65    ๐Ÿ’ฌ 5    ๐Ÿ“Œ 5

Why? You can just mute out politics and owner's antics and it becomes perfecly fine again

03.05.2025 09:24 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 0
Post image

In continuous generative diffusion, the conditional entropy rate is the constant term that separates the score matching and the denoising score matching loss

This can be directly interpreted as the information transfer (bit rate) from the state x_t and the final generation x_0.

02.05.2025 13:32 โ€” ๐Ÿ‘ 21    ๐Ÿ” 5    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Decisions during generative diffusion are analogous to phase transitions in physics. They can be identified as peaks in the conditional entropy rate curve!

30.04.2025 13:37 โ€” ๐Ÿ‘ 10    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I'd put these on the NeuroAI vision board:

@tyrellturing.bsky.social's Deep learning framework
www.nature.com/articles/s41...

@tonyzador.bsky.social's Next-gen AI through neuroAI
www.nature.com/articles/s41...

@adriendoerig.bsky.social's Neuroconnectionist framework
www.nature.com/articles/s41...

28.04.2025 23:15 โ€” ๐Ÿ‘ 34    ๐Ÿ” 10    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1

Very excited that our work (together with my PhD student @gbarto.bsky.social and our collaborator Dmitry Vetrov) was recognized with a Best Paper Award at #AABI2025!

#ML #SDE #Diffusion #GenAI ๐Ÿค–๐Ÿง 

30.04.2025 00:02 โ€” ๐Ÿ‘ 19    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Indeed. We are currently doing a lot of work on guidance, so we will likely try to use entropic time there as well soon

29.04.2025 15:03 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

The largest we have tried so far is EDM2 XL on 512 ImageNet. It works very well there!

We did not try with guidance so far

29.04.2025 14:55 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

I am very happy to share our latest work on the information theory of generative diffusion:

"Entropic Time Schedulers for Generative Diffusion Models"

We find that the conditional entropy offers a natural data-dependent notion of time during generation

Link: arxiv.org/abs/2504.13612

29.04.2025 13:17 โ€” ๐Ÿ‘ 25    ๐Ÿ” 5    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Post image

Flow Matching in a nutshell.

27.11.2024 14:07 โ€” ๐Ÿ‘ 52    ๐Ÿ” 7    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

I will be at #NeurIPS2024 in Vancouver. Iโ€™m looking for post-docs, and if you want to talk about post-doc opportunities, get in touch. ๐Ÿค—

Hereโ€™s my current team at Aalto University: users.aalto.fi/~asolin/group/

08.12.2024 10:56 โ€” ๐Ÿ‘ 15    ๐Ÿ” 5    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
NeurIPS Poster Rule Extrapolation in Language Modeling: A Study of Compositional Generalization on OOD PromptsNeurIPS 2024

Can language models transcend the limitations of training data?

We train LMs on a formal grammar, then prompt them OUTSIDE of this grammar. We find that LMs often extrapolate logical rules and apply them OOD, too. Proof of a useful inductive bias.

Check it out at NeurIPS:

nips.cc/virtual/2024...

06.12.2024 13:31 โ€” ๐Ÿ‘ 114    ๐Ÿ” 8    ๐Ÿ’ฌ 7    ๐Ÿ“Œ 1
Photograph of Johannes Margraph and Gรผnter Klambauer introducing the ELLIS ML4Molecules Workshop 2024 in Berlin at the Fritz-Haber Institute in Dahlem.

Photograph of Johannes Margraph and Gรผnter Klambauer introducing the ELLIS ML4Molecules Workshop 2024 in Berlin at the Fritz-Haber Institute in Dahlem.

Excited to speak at the ELLIS ML4Molecules Workshop 2024 in Berlin!

moleculediscovery.github.io/workshop2024/

06.12.2024 08:08 โ€” ๐Ÿ‘ 46    ๐Ÿ” 4    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 0

Can we please stop sharing posts that legitimate murder? Please.

06.12.2024 11:14 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Our team at Google DeepMind is hiring Student Researchers for 2025!

๐Ÿง‘โ€๐Ÿ”ฌ Interested in understanding reasoning capabilities of neural networks from first principles?
๐Ÿง‘โ€๐ŸŽ“ Currently studying for a BS/MS/PhD?
๐Ÿง‘โ€๐Ÿ’ป Have solid engineering and research skills?

๐ŸŒŸ We want to hear from you! Details in thread.

05.12.2024 23:08 โ€” ๐Ÿ‘ 59    ๐Ÿ” 5    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
On the left figure, it showcases the behavior of Hopfield models. Given a query (the initial point of energy descent), a Hopfield model will retrieve the closest memory (local minimum) to that query such that it minimizes the energy function. A perfect Hopfield model is able to store patterns in distinct minima (or buckets). In contrast, the right figure illustrates a bad Associative Memory system, where stored patterns share a distinctive bucket. This enables the creation of spurious patterns, which appear like mixture of stored patterns. Spurious patterns will have lower energy than the memories due to this overlapping.

On the left figure, it showcases the behavior of Hopfield models. Given a query (the initial point of energy descent), a Hopfield model will retrieve the closest memory (local minimum) to that query such that it minimizes the energy function. A perfect Hopfield model is able to store patterns in distinct minima (or buckets). In contrast, the right figure illustrates a bad Associative Memory system, where stored patterns share a distinctive bucket. This enables the creation of spurious patterns, which appear like mixture of stored patterns. Spurious patterns will have lower energy than the memories due to this overlapping.

Diffusion models create beautiful novel images, but they can also memorize samples from the training set. How does this blending of features allow creating novel patterns? Our new work in Sci4DL workshop #neurips2024 shows that diffusion models behave like Dense Associative Memory networks.

05.12.2024 17:29 โ€” ๐Ÿ‘ 40    ๐Ÿ” 5    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

The naivete of these takes is always amusing

They could be equally applied to human beings, and they would work as well

04.12.2024 14:12 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

There are indeed cases in which obtaining an SDE equivalence isn't straightforward

04.12.2024 11:10 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I have always been saying that diffusion = flow matching.

Is it supposed to be some sort of news now??

04.12.2024 10:36 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

However, flow matching theory doesn't provide much guidance on how to do stochastic sampling

It relies on the extra structure of diffusion

03.12.2024 06:33 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Disagree, religious literacy is important

03.12.2024 06:29 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Samples y | x from Treeffuser vs. true densities, for multiple values of x under three different scenarios. Treeffuser captures arbitrarily complex conditional distributions that vary with x.

Samples y | x from Treeffuser vs. true densities, for multiple values of x under three different scenarios. Treeffuser captures arbitrarily complex conditional distributions that vary with x.

I am very excited to share our new Neurips 2024 paper + package, Treeffuser! ๐ŸŒณ We combine gradient-boosted trees with diffusion models for fast, flexible probabilistic predictions and well-calibrated uncertainty.

paper: arxiv.org/abs/2406.07658
repo: github.com/blei-lab/tre...

๐Ÿงต(1/8)

02.12.2024 21:48 โ€” ๐Ÿ‘ 156    ๐Ÿ” 23    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 4
Post image

A common question nowadays: Which is better, diffusion or flow matching? ๐Ÿค”

Our answer: Theyโ€™re two sides of the same coin. We wrote a blog post to show how diffusion models and Gaussian flow matching are equivalent. Thatโ€™s great: It means you can use them interchangeably.

02.12.2024 18:45 โ€” ๐Ÿ‘ 255    ๐Ÿ” 58    ๐Ÿ’ฌ 7    ๐Ÿ“Œ 7

I'm still cautiously optimistic that we'll find a way to leverage Bayesian ideas in "Modern" AI without retrofitting. However, I'm very much an agnostic when it comes the philosophy of uncertainty (Bayes vs frequentist vs imprecise etc.)

30.11.2024 08:04 โ€” ๐Ÿ‘ 13    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@lucamb is following 20 prominent accounts