Quentin Berthet's Avatar

Quentin Berthet

@qberthet.bsky.social

Machine learning Google DeepMind Paris

904 Followers  |  128 Following  |  31 Posts  |  Joined: 22.11.2024  |  2.8309

Latest posts by qberthet.bsky.social on Bluesky

Being a rejected paper seems like a good criteria (possible issues, but very easy to implement)

20.09.2025 10:26 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I'm surprised, there must have been an additional round of changes.

All the emails I saw as an SAC were "help your ACs write new meta-reviews".

20.09.2025 10:25 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I think that they are hard to bootstrap.

While I agree that general chairs, program chairs, local chairs etc do a lot of work (seeing a glimpse of that sometimes myself with ALT/AISTATS), once you start having a bit of money, it can get easier with using conference organising services

19.09.2025 13:34 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

How do-able would it be to organize a conference with its deadline a week after NeurIPS, held a week before, that would always take place physically in Europe (but welcome contributions from anywhere)?

19.09.2025 12:52 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I'm very happy that EurIPS is happening, and planning to attend.

But as a NeurIPS SAC I must say that the last definition should not exist - ACs are supposed to change the decisions and update their meta-review themselves.

19.09.2025 12:50 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Call for Workshops

๐ŸšจAISTATS workshop proposals open!

Big News: For the first time, there will be a day of workshops at #AISTATS 2026, in Tangier, Morocco ๐ŸŒด๐Ÿ‡ฒ๐Ÿ‡ฆ

Quentin Berthet @qberthet.bsky.social and I are workshop chairs.

virtual.aistats.org/Conferences/...
Deadline: Oct 17, AOE

29.08.2025 13:59 โ€” ๐Ÿ‘ 28    ๐Ÿ” 12    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Video thumbnail

At #AISTATS2025, I will be giving an "Oral" presentation of our work on "Implicit Diffusion"

arxiv.org/abs/2402.05468

17.04.2025 12:50 โ€” ๐Ÿ‘ 8    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I will be attending #ICLR2025 in Singapore and #AISTATS2025 in Mai Khao over the next two weeks.

Looking forward to meeting new people and learning about new things. Feel free to reach out if you want to talk about Google DeepMind.

17.04.2025 12:50 โ€” ๐Ÿ‘ 18    ๐Ÿ” 4    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 0

ICML reviews are out

25.03.2025 09:09 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Very proud to have contributed to this work, and very happy to have learnt a lot about votes, choices, and agents!

24.02.2025 22:37 โ€” ๐Ÿ‘ 7    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Il y a plus de 200 notes ?

22.02.2025 20:37 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

๐Ÿ‡จ๐Ÿ‡ฆ

21.02.2025 10:08 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

๐Ÿ“ฃ New preprint ๐Ÿ“ฃ

Learning Theory for Kernel Bilevel Optimization

w/ @fareselkhoury.bsky.social E. Pauwels @michael-arbel.bsky.social

We provide generalization error bounds for bilevel optimization problems where the inner objective is minimized over a RKHS.

arxiv.org/abs/2502.08457

20.02.2025 13:55 โ€” ๐Ÿ‘ 20    ๐Ÿ” 6    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 2
Post image

These findings are quite consistent

Our end-to-end method captures a regression and a classification objective, as well as the autoencoder loss.

We see it as "building a bridge" between these different problems.

8/8

10.02.2025 12:00 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

This is a simple generalization from previous binning approaches, the main difference being that we learn the encoding.

We compare different training methods, showing up to 25% improvement on the least-squares baseline error for our full end-to-end method, over 8 datasets.

7/8

10.02.2025 12:00 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

You can then train a classification model ฯ€_ฮธ (green, top) on the encoded targets, with a KL/cross-entropy objective!

At inference time, you use the same decoder ฮผ to perform your prediction!

6/8

10.02.2025 12:00 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

First, the architecture - we use a general target encoder ฮจ_w (red, bottom) that transforms the target y in a distribution over k classes

e.g. use softmax(dist) to k different centers

The encoder and the associated decoder ฮผ (in blue) can be trained on an autoencoder loss

5/8

10.02.2025 12:00 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

In this work, we propose improvements for "regression as classification"

- Soft-binning: encode the target as a probability, not just a one-hot.
- Learnt target encoders: Instead of designing this transformation by hand, learn it from data.
- Train everything jointly!

4/8

10.02.2025 12:00 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

The idea is to transform each value of the target into a class one-hot, and to train a classification model to predict the value of the target. (here with y in 2D, with a grid)

It seems strange, but it's been shown to work well in many settings, even for RL applications.

3/8

10.02.2025 12:00 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

An intriguing observation:

In many tasks with a continuous target (price, rating, pitch..), instead of training on a regression objective with least-squares [which seems super natural!] - people have been instead training their models using classification!

2/8

10.02.2025 12:00 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Building Bridges between Regression, Clustering, and Classification Regression, the task of predicting a continuous scalar target y based on some features x is one of the most fundamental tasks in machine learning and statistics. It has been observed and...

Check out our paper, with Lawrence Stewart and @bachfrancis.bsky.social

Link: arxiv.org/abs/2502.02996

1/8

10.02.2025 12:00 โ€” ๐Ÿ‘ 9    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿšจ New paper on regression and classification!

Adding to the discussion on using least-squares or cross-entropy, regression or classification formulations of supervised problems!

A thread on how to bridge these problems:

10.02.2025 12:00 โ€” ๐Ÿ‘ 49    ๐Ÿ” 8    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 0
Brian Driscoll, the acting director of the FBI, sporting curly hair, a kinda rakish jazz dirtbag moustache and beard combo, a faintly amused look in his eyes and a wry "whaddaya gonna do" smirk

Brian Driscoll, the acting director of the FBI, sporting curly hair, a kinda rakish jazz dirtbag moustache and beard combo, a faintly amused look in his eyes and a wry "whaddaya gonna do" smirk

I've got to say, the guy who accidentally became the Director of the FBI does 100% look like the guy who accidentally becomes the Director of the FBI in a mid-2000s comedy about a guy who accidentally becomes the Director of the FBI

05.02.2025 22:12 โ€” ๐Ÿ‘ 20928    ๐Ÿ” 2723    ๐Ÿ’ฌ 570    ๐Ÿ“Œ 398
Post image

๐Ÿš€ Policy gradient methods like DeepSeekโ€™s GRPO are great for finetuning LLMs via RLHF.

But what happens when we swap autoregressive generation for discrete diffusion, a rising architecture promising faster & more controllable LLMs?

Introducing SEPO !

๐Ÿ“‘ arxiv.org/pdf/2502.01384

๐Ÿงต๐Ÿ‘‡

04.02.2025 15:42 โ€” ๐Ÿ‘ 5    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Congratulations!

04.02.2025 13:39 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

The only "nice" thing is that if you have a lot of centered equations that are not very wide, it saves space. This is very visible when you switch templates e.g. from ICML to NeurIPS.

For everything else, it's just horrible. No surprise that "unsubmitted preprints" in ML are single column

01.02.2025 11:03 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Theorem. If $n$ is an integer and $n^2$ is even, then $n$ is itself even.

Proof. Contrapositives are for cowards, so assume $n$ is an integer and $n^2$ is even. Then $n^2=2k$ for some integer $k$, and thus $n^2-2k=0$. Behold:
\[ n = n + (n^2-2k) = n(n+1)-2k. \]
Both $n(n+1)$ and $2k$ are even, so $n$ is even.
QED.

Theorem. If $n$ is an integer and $n^2$ is even, then $n$ is itself even. Proof. Contrapositives are for cowards, so assume $n$ is an integer and $n^2$ is even. Then $n^2=2k$ for some integer $k$, and thus $n^2-2k=0$. Behold: \[ n = n + (n^2-2k) = n(n+1)-2k. \] Both $n(n+1)$ and $2k$ are even, so $n$ is even. QED.

Contrapositives are for cowards. Behold.

22.01.2025 17:58 โ€” ๐Ÿ‘ 134    ๐Ÿ” 38    ๐Ÿ’ฌ 6    ๐Ÿ“Œ 9

#AISTATS results are out

22.01.2025 10:13 โ€” ๐Ÿ‘ 9    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1

Also, researchers who don't have a Fields medal might feel more pressure to travel to advertise their work and network with others.

02.01.2025 11:54 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

@qberthet is following 20 prominent accounts