Carl Allen's Avatar

Carl Allen

@carl-allen.bsky.social

Laplace Junior Chair, Machine Learning ENS Paris. (prev ETH Zurich, Edinburgh, Oxford..) Working on mathematical foundations/probabilistic interpretability of ML (what NNs learn๐Ÿคทโ€โ™‚๏ธ, disentanglement๐Ÿค”, king-man+woman=queen?๐Ÿ‘Œโ€ฆ)

2,149 Followers  |  439 Following  |  44 Posts  |  Joined: 16.11.2024  |  1.8681

Latest posts by carl-allen.bsky.social on Bluesky

Video thumbnail

How do tokens evolve as they are processed by a deep Transformer?

With Josรฉ A. Carrillo, @gabrielpeyre.bsky.social and @pierreablin.bsky.social, we tackle this in our new preprint: A Unified Perspective on the Dynamics of Deep Transformers arxiv.org/abs/2501.18322

ML and PDE lovers, check it out!

31.01.2025 16:56 โ€” ๐Ÿ‘ 96    ๐Ÿ” 16    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

Softmax is also the exact formula for a label distribution p(y|x) under Bayes rule if class distributions p(x|y) have exponential family form (equivariant if Gaussian), so it can have a deeper rationale in a probabilistic model of the data (than a one-hot relaxation).

17.01.2025 09:57 โ€” ๐Ÿ‘ 5    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Sorry, more a question re the OP. Just looking to understand the context.

29.12.2024 04:38 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Can you give some examples of the kind of papers youโ€™re referring to?

29.12.2024 00:44 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

And of course this all builds on the seminal work of @wellingmax.bsky.social, @dpkingma.bsky.social, Irina Higgins, Chris Burgess et al.

19.12.2024 15:03 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

sorry, @benmpoole.bsky.social (fat fingers..)

18.12.2024 17:07 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Any constructive feedback, discussion or future collaboration more than welcome!

Full paper: arxiv.org/pdf/2410.22559

18.12.2024 16:57 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Building on this, we clarify the connection between diagonal covariance and Jacobian orthogonality and explain how disentanglement follows, ultimately defining disentanglement as factorising the data distribution into statistically independent components

18.12.2024 16:57 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

We focus on VAEs, used as building blocks of SOTA diffusion models. Recent works by Rolinek et al. and Kumar & @benmpoole.bsy.social suggest that disentanglement arises because diagonal posterior covariance matrices promote column-orthogonality in the decoderโ€™s Jacobian matrix.

18.12.2024 16:57 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

While disentanglement is often linked to different models whose popularity may ebb & flow, we show that the phenomenon itself relates to the dataโ€™s latent structure and is more fundamental than any model that may expose it.

18.12.2024 16:57 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Machine learning has made incredible breakthroughs, but our theoretical understanding lags behind.

We take a step towards unravelling its mystery by explaining why the phenomenon of disentanglement arises in generative latent variable models.

Blog post: carl-allen.github.io/theory/2024/...

18.12.2024 16:57 โ€” ๐Ÿ‘ 18    ๐Ÿ” 4    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

Maybe give it time. Rome, a day, etc..

18.12.2024 10:33 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Yup sure, the curve has to kick in at some point. I guess โ€œlawโ€ sounds cooler than linear-ish graph. Maybe it started out as an acronym โ€œLinear for A Whileโ€.. ๐Ÿคทโ€โ™‚๏ธ

15.12.2024 13:57 โ€” ๐Ÿ‘ 5    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

I guess as complexity increases math->phys->chem->bio->โ€ฆ Itโ€™s inevitable that โ€œtheory-drivenโ€ tends to โ€œtheory-inspiredโ€. ML seems a bit tangential tho since experimenting is relatively consequence free and you donโ€™t need to deeply theorise, more iterate. So theory is deprioritised and lags for now

15.12.2024 08:16 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

But doesnโ€™t theory follow empirics in all of science.. until it doesnโ€™t? Except that in most sciences you canโ€™t endlessly experiment for cost/risk/melting your face off reasons. But ML keeps going, making it a tricky moving/expanding target to try to explain/get ahead of.. I think itโ€™ll happen tho.

14.12.2024 18:47 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

The last KL is nice as itโ€™s clear that the objective is optimised when the model and posteriors match as well as possible. The earlier KL is nice as it contains the data distribution and all explicitly modelled distributions, so maximising ELBO can be seen intuitively as bringing them all โ€œin lineโ€.

05.12.2024 15:41 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I think an intuitive view is that:
- max likelihood minimises
KL[p(x)||pโ€™(x)] (pโ€™(x)=model)

- max ELBO minimises
KL[p(x)q(z|x) || pโ€™(x|z)pโ€™(z)]
So brings together 2 models of the joint. (where pโ€™(x)=\int pโ€™(x|z)pโ€™(z))

Can rearrange in diff ways, eg as
KL[p(x)q(z|x) || pโ€™(x)pโ€™(z|x)]
(or as in VAE)

05.12.2024 15:36 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Ha me too, exactly that..

03.12.2024 22:36 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
โ€ชVariational classification: A probabilistic generalization of the softmax classifierโ€ฌ โ€ชSZ Dhuliawala, M Sachan, C Allenโ€ฌ, โ€ชTransactions on Machine Learning Research, 2024โ€ฌ - โ€ชCited by 10โ€ฌ

(and here it comes.. ;) ). The latter view of classification is the motivation behind this work: scholar.google.co.uk/citations?vi...

02.12.2024 08:29 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

In the binary case, both look the same: sigmoid might be a good model of how y becomes more likely (in future) as x increases. But sigmoid is also 2-case softmax so models Bayes rule for 2 classes of (exp-fam) x|y. The causality between x and y are very different, which "p(y|x)" doesn't capture.

02.12.2024 08:26 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I think this comes down to the model behind p(x,y). If features of x cause y, e.g. aspects of a website (x) -> clicks (y); age/health -> disease, then p(y|x) is a (regression) fn of x. But if x|y is a distrib'n of different y's (e.g. cats) then p(y|x) is given by Bayes rule (squint at softmax).

02.12.2024 08:20 โ€” ๐Ÿ‘ 7    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Pls add me thanks!

29.11.2024 15:53 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

If few-shot transfer is ur thing!

28.11.2024 17:07 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Could you pls add me? Thanks!

26.11.2024 07:13 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Yep, could maybe work. The accepted-to-RR bar would need to be high to maintain value, but โ€œshininessโ€ test cld be deferred. Think thereโ€™s still a separate issue of โ€œhighly irresponsibleโ€ reviews that needs addressing either way (as at #CVPR2025). We canโ€™t just whinge & doing absolutely nothing!

24.11.2024 23:00 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Definitely something to be said for RR, as main confs are effectively a lumpy version. But if acceptance to main confs is still the metric for recruiters etc, RR acceptance may not mean so much and the issue of the subjective criteria for what gets accepted to confs remainsโ€ฆ?

24.11.2024 20:40 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Youโ€™d think that just the threat of it (& the occasional AC pointed reminder) would be enough to rarely, if ever, need to enforce it. If so, how much itโ€™s used wouldnโ€™t reflect success.
Youโ€™d maybe need a survey of review quality from authors/ACsโ€ฆ or an analysis of the anger on here! ๐Ÿคฌ๐Ÿคฃ

24.11.2024 20:11 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

go.bsky.app/PFpnqeM

23.11.2024 11:08 โ€” ๐Ÿ‘ 34    ๐Ÿ” 17    ๐Ÿ’ฌ 7    ๐Ÿ“Œ 0

Some conferences already give free entry to top reviewers. But this prob just rewards those who would anyway give good reviews.

23.11.2024 21:01 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
CVPR 2025 Changes

This is happening at #CVPR2025:
โ€œIf a reviewer is flagged by an AC as โ€œhighly irresponsibleโ€, their paper submissions will be desk rejected per discretion of the PCsโ€.

Canโ€™t see why all confs donโ€™t do this, especially if making all authors review.

(pt 2) cvpr.thecvf.com/Conferences/...

23.11.2024 20:56 โ€” ๐Ÿ‘ 15    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

@carl-allen is following 19 prominent accounts