Nicolas Dufour's Avatar

Nicolas Dufour

@nicolasdufour.bsky.social

PhD student at IMAGINE (ENPC) and GeoVic (Ecole Polytechnique). Working on image generation. http://nicolas-dufour.github.io

427 Followers  |  426 Following  |  59 Posts  |  Joined: 19.11.2024  |  2.5083

Latest posts by nicolasdufour.bsky.social on Bluesky

Yes it's latent space just because i had my setup that way. Might try in pixel space in the future.

18.11.2025 14:20 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Yes it's the raw prediction, we predict the velocity directly

18.11.2025 14:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

It's also very domain dependent. I know that for example, x-pred works better than epsilon pred for human motion generation.

18.11.2025 13:49 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Epsilon loss was used for a while for image generation since DDPM.
Recently it was more flow matching (or v-loss) that is mostly used since SD3 basically.
From my experience, flow doesn't really improve quality, but sampling in fewer steps works better than epsilon prediction

18.11.2025 13:47 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Don't drop your samples! Coherence-aware training benefits Conditional diffusion Conditional diffusion models are powerful generative models that can leverage various types of conditional information, such as class labels, segmentation masks, or text captions. However, in many rea...

Thanks for the pointer! We were doing something similar in "Don't drop your samples" (arxiv.org/abs/2405.20324)

MIRO is quite different in the sense we focus on improving pretraining (not finetuning). Also, we explore the advantages of having multiple rewards to push the Pareto frontier.

03.11.2025 13:20 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Yes, thanks for pointing it out, will try to clarify

03.11.2025 13:15 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Check out our new work: MIRO

No more post-training alignment!
We integrate human alignment right from the start, during pretraining!

Results:
✨ 19x faster convergence ⚑
✨ 370x less compute πŸ’»

πŸ”— Explore the project: nicolas-dufour.github.io/miro/

31.10.2025 21:10 β€” πŸ‘ 9    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

Image generation becomes much more energy efficient. πŸ‘

31.10.2025 20:28 β€” πŸ‘ 5    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

I'm super happy about Nicolas' latest work, probably the magnum opus of his PhD.

Read the thread for all the great details.
The main conclusion I draw from this work is that better pretraining, in particular by conditioning on better data, allows us to train SOTA models at a fraction of the cost.

31.10.2025 11:39 β€” πŸ‘ 29    πŸ” 4    πŸ’¬ 0    πŸ“Œ 0

Work with @lucasdegeorge.bsky.social @arrijitghosh.bsky.social @vickykalogeiton.bsky.social and @davidpicard.bsky.social.

This will be the last work of my PhD as I will be defending the 26th of November!

31.10.2025 11:24 β€” πŸ‘ 13    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
MIRO: Multi-Reward Conditioning for Efficient Text-to-Image Generation Train once, align many rewards. MIRO achieves 19Γ— faster convergence and 370Γ— less compute than FLUX while reaching GenEval score of 75. Controllable trade-offs at inference time.

MIRO demonstrates that aligning T2I models during pretraining is not only viable but superior: it's faster, more compute-efficient, and provides fine-grained, interpretable control.

Project page for all the details: nicolas-dufour.github.io/miro

31.10.2025 11:24 β€” πŸ‘ 8    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

The explicit reward conditioning allows for flexible trade-offs, like optimizing for GenEval by reducing the aesthetic weight in the prompt. We can also isolate the look of a specific reward or interpolate them via multi-reward classifier-free guidance

31.10.2025 11:24 β€” πŸ‘ 7    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

MIRO excels on challenging compositional tasks (Geneval here)

The multi-reward conditioning fosters better understanding of complex spatial relationships and object interactions.

31.10.2025 11:24 β€” πŸ‘ 7    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Despite being a compact model (0.36B parameters), MIRO achieves state-of-the-art results:

GenEval score of 75, outperforming the 12B FLUX-dev (67) for 370x less inference cost.
Conditioning on rich reward signals is a highly effective way to achieve large model capabilities in a compact form!

31.10.2025 11:24 β€” πŸ‘ 9    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

MIRO dramatically improves sample efficiency for test-time scaling.

On PickScore, MIRO needs just 4 samples to match the baseline's 128 samples (a 32x efficiency gain).
For ImageReward, it's a 16x efficiency gain

This demonstrates superior inference-time efficiency for high-quality generation.

31.10.2025 11:24 β€” πŸ‘ 8    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Traditional single-objective optimization often leads to reward hacking. MIRO's multi-dimensional conditioning naturally prevents this by requiring the model to balance multiple objectives simultaneously. This produces balanced, robust performance across all metrics contrary to single rewards.

31.10.2025 11:24 β€” πŸ‘ 10    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

The multi-reward conditioning provides a dense supervisory signal, accelerating convergence dramatically. A snapshot of the speed-up:

AestheticScore: 19.1x faster to reach baseline quality.
HPSv2: 6.2x faster.

You can clearly see the improvements visually

31.10.2025 11:24 β€” πŸ‘ 10    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1
Video thumbnail

This reward vector s becomes an explicit, interpretable control input at inference time. We extend classifier-free guidance to the multi-reward setting, allowing users to steer generation toward jointly high-reward regions by defining positive (s^+) and negative (s^βˆ’) targets.

31.10.2025 11:24 β€” πŸ‘ 9    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

MIRO trains p(x∣c,s) by conditioning the generative model on a vector s of reward scores for each image-text pair. Instead of correcting a pre-trained model, we teach it how to trade off multiple rewards from the start.

31.10.2025 11:24 β€” πŸ‘ 10    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We introduce MIRO: a new paradigm for T2I model alignment integrating reward conditioning into pretraining, eliminating the need for separate fine-tuning/RL stages. This single-stage approach offers unprecedented efficiency and control.

- 19x faster convergence ⚑
- 370x less FLOPS than FLUX-dev πŸ“‰

31.10.2025 11:24 β€” πŸ‘ 60    πŸ” 14    πŸ’¬ 3    πŸ“Œ 5
Post image Post image

Kickstarting our workshop on Flow matching and Diffusion with a talk by Eric Vanden Eijnden on how to optimize learning and sampling in Stochastic Interpolants!

Broadcast available at gdr-iasis.cnrs.fr/reunions/mod...

24.10.2025 08:30 β€” πŸ‘ 15    πŸ” 5    πŸ’¬ 1    πŸ“Œ 0

Final note: I'm (we're) tempted to organize a challenge on that topic as a workshop at a CV conf. ImageNet is the only source of images allowed and then you compete to get the bold numbers.

Do you think there would be people in for that? Do you think it would make for a nice competition?

08.10.2025 20:40 β€” πŸ‘ 8    πŸ” 4    πŸ’¬ 2    πŸ“Œ 0

Very proud of our recent work, kudos to the team! Read @davidpicard.bsky.social’s excellent post for more details or the paper arxiv.org/pdf/2502.21318

08.10.2025 21:19 β€” πŸ‘ 17    πŸ” 6    πŸ’¬ 0    πŸ“Œ 0
Post image Post image Post image Post image

Today is Antoine Guedon's PhD! Already pretty cool visuals right at the start.

25.09.2025 15:16 β€” πŸ‘ 24    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0

Annnnnd it's a reject!

Scale is a religion and if you go against it, you're a heretic and you should burn, "despite [the reviewers] final ratings".

But scale is still not necessary!

Side note: First time swinging reviews up (from 2,2,4,4 to 2,4,4,5) does not get the paper accepted. Strange days.

18.09.2025 17:04 β€” πŸ‘ 18    πŸ” 3    πŸ’¬ 4    πŸ“Œ 0
Post image

Does a smaller latent space lead to worse generation in latent diffusion models? Not necessarily! We show that LDMs are extremely robust to a wide range of compression rates (10-1000x) in the context of physics emulation.

We got lost in latent space. Join us πŸ‘‡

03.09.2025 13:40 β€” πŸ‘ 27    πŸ” 8    πŸ’¬ 1    πŸ“Œ 1
Post image

Next week, I'll be in Strasbourg for the GRETSI (@gretsi-info.bsky.social) to present a small discovery on transformers generalization we made with Simon and JΓ©rΓ©mie while working on generative recommender systems. I love these "phase transition" plots.

πŸ“œ: arxiv.org/abs/2508.03934

Short summary πŸ‘‡

23.08.2025 10:12 β€” πŸ‘ 14    πŸ” 5    πŸ’¬ 2    πŸ“Œ 2
Post image

Makes me think of StyleGAN3 visualizations

18.08.2025 22:44 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Plonk project page: nicolas-dufour.github.io/plonk

@vickykalogeiton.bsky.social , @davidpicard.bsky.social and @loicland.bsky.social

18.08.2025 15:46 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Congrats to the Dino team for the DinoV3 release!

Seeing it outperforms CLIP on "cultural knowledge" based task like geoloc make me very hopeful for it working really well in VLMs!

18.08.2025 15:45 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@nicolasdufour is following 20 prominent accounts