Nicolas Dufour's Avatar

Nicolas Dufour

@nicolasdufour.bsky.social

PhD student at IMAGINE (ENPC) and GeoVic (Ecole Polytechnique). Working on image generation. http://nicolas-dufour.github.io

403 Followers  |  426 Following  |  42 Posts  |  Joined: 19.11.2024  |  2.1787

Latest posts by nicolasdufour.bsky.social on Bluesky

Post image Post image Post image Post image

Today is Antoine Guedon's PhD! Already pretty cool visuals right at the start.

25.09.2025 15:16 β€” πŸ‘ 24    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0

Annnnnd it's a reject!

Scale is a religion and if you go against it, you're a heretic and you should burn, "despite [the reviewers] final ratings".

But scale is still not necessary!

Side note: First time swinging reviews up (from 2,2,4,4 to 2,4,4,5) does not get the paper accepted. Strange days.

18.09.2025 17:04 β€” πŸ‘ 18    πŸ” 3    πŸ’¬ 4    πŸ“Œ 0
Post image

Does a smaller latent space lead to worse generation in latent diffusion models? Not necessarily! We show that LDMs are extremely robust to a wide range of compression rates (10-1000x) in the context of physics emulation.

We got lost in latent space. Join us πŸ‘‡

03.09.2025 13:40 β€” πŸ‘ 27    πŸ” 8    πŸ’¬ 1    πŸ“Œ 1
Post image

Next week, I'll be in Strasbourg for the GRETSI (@gretsi-info.bsky.social) to present a small discovery on transformers generalization we made with Simon and JΓ©rΓ©mie while working on generative recommender systems. I love these "phase transition" plots.

πŸ“œ: arxiv.org/abs/2508.03934

Short summary πŸ‘‡

23.08.2025 10:12 β€” πŸ‘ 14    πŸ” 5    πŸ’¬ 2    πŸ“Œ 2
Post image

Makes me think of StyleGAN3 visualizations

18.08.2025 22:44 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Plonk project page: nicolas-dufour.github.io/plonk

@vickykalogeiton.bsky.social , @davidpicard.bsky.social and @loicland.bsky.social

18.08.2025 15:46 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Congrats to the Dino team for the DinoV3 release!

Seeing it outperforms CLIP on "cultural knowledge" based task like geoloc make me very hopeful for it working really well in VLMs!

18.08.2025 15:45 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

🌍 Geoloc is a fantastic downstream benchmark:

- Requires fine-grained visual understanding (textures, vegetation, road signs, architecture)

- Tests global generalization

- Forces models to pick up real-world cues

That’s why DinoV3 shining here is such a big deal πŸš€

18.08.2025 15:14 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Even crazier 🀯 DinoV3 works in some out-of-distribution setups too β€” as long as there are geographical cues πŸŒ„πŸ—ΊοΈ

(Remember: the network is trained only on road images!)

Where DinoV2 totally failed, DinoV3 is holding up πŸ‘Š

18.08.2025 15:14 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

The setup πŸ‘‰ We use our riemannian flow matching model PLONK (CVPR25: Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation) 🌍

We simply swap StreetCLIP with DinoV3 as a drop-in backbone, and train on OpenStreetView-5M.

And boom πŸ’₯ β€” DinoV3 wins.

18.08.2025 15:14 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸš€ DinoV3 just became the new go-to backbone for geoloc!
It outperforms CLIP-like models (SigLip2, finetuned StreetCLIP)… and that’s shocking 🀯
Why? CLIP models have an innate advantage β€” they literally learn place names + images. DinoV3 doesn’t.

18.08.2025 15:14 β€” πŸ‘ 46    πŸ” 14    πŸ’¬ 1    πŸ“Œ 1

Dear bsky friends, I have a question: Do you really think that the visual quality of these images is so bad that the research that produced them is deeply flawed?
And if I told you that the model was mostly trained on ImageNet with a bit of artistic fine-tuning at 1024 resolution, still really bad?

07.08.2025 06:42 β€” πŸ‘ 29    πŸ” 4    πŸ’¬ 6    πŸ“Œ 1
Il a conΓ§u la premiΓ¨re IA d’OSINT (terrifiant… et gΓ©nial)
YouTube video by Underscore_ Il a conΓ§u la premiΓ¨re IA d’OSINT (terrifiant… et gΓ©nial)

I had the privilege to be invited to speak about our work "Around the World in 80 Timesteps" at the French Podcast Underscore! If you speak french, i highly recommend it they did a great job with the montage!

If you want to learn more nicolas-dufour.github.io/plonk

www.youtube.com/watch?v=s5oH...

31.07.2025 16:43 β€” πŸ‘ 15    πŸ” 5    πŸ’¬ 0    πŸ“Œ 0
Post image

1/ Can open-data models beat DINOv2? Today we release Franca, a fully open-sourced vision foundation model. Franca with ViT-G backbone matches (and often beats) proprietary models like SigLIPv2, CLIP, DINOv2 on various benchmarks setting a new standard for open-source research.

21.07.2025 14:47 β€” πŸ‘ 84    πŸ” 21    πŸ’¬ 2    πŸ“Œ 3

Really cool work! I've seen that you haven't used registers but seem to have smooth latents anyway. Is this a consequence of having the matryoshka loss that require both global and local knowledge?
Have u tried using registers?

21.07.2025 15:40 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

✨Thrilled to see EurIPS launch β€” the first officially endorsed European NeurIPS presentation venue!

πŸ‘€ But NeurIPS now requires at least one author to attend in San Diego or Mexico (and not just virtually as before). This is detrimental to many. Why not allow presenting at EurIPS or online?
1/4

17.07.2025 08:48 β€” πŸ‘ 25    πŸ” 11    πŸ’¬ 2    πŸ“Œ 2

Sadly, helium doesn't carry enough weight 😭

15.06.2025 21:10 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image Post image

Some of our IMAGINE members at #CVPR2025

15.06.2025 19:14 β€” πŸ‘ 34    πŸ” 7    πŸ’¬ 0    πŸ“Œ 0

Come on! Who else has a hot air ballon on their poster?

(fun fact: there is no hot air ballon emoji, but @loicland.bsky.social made a tikz macro for it! πŸ˜…)

15.06.2025 15:57 β€” πŸ‘ 15    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Post image

Come see us in poster 186 to see our poster Around the World in 80 timesteps: A generative Approach to Global Visual Geolocation!

Cc @loicland.bsky.social @davidpicard.bsky.social @vickykalogeiton.bsky.social

15.06.2025 15:30 β€” πŸ‘ 22    πŸ” 5    πŸ’¬ 0    πŸ“Œ 1

A bit disappointed by the PAMI TC meeting, mostly repetitions of what’s been said at the opening, the "open discussion" slide was really just there to *exist* but no discussion/vote took place, no topic was debated. What space is left to reflect on our community and what we stand for as scientists?

14.06.2025 23:25 β€” πŸ‘ 3    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

I am heartbroken that I am not at the conference, but seeing what the government is doing to its people and the world, I simply couldn't go there.

14.06.2025 09:51 β€” πŸ‘ 21    πŸ” 6    πŸ’¬ 1    πŸ“Œ 0

I will also be presenting CoDeX at the same workshop between 1:15PM and 1:45PM.

Abhishek Kuriyal, Mathieu Aubry, @loicland.bsky.social and I improve the performance of deep learning models in challenging domain shift settings by learning how to combine spatial domain experts.

11.06.2025 04:03 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Discover DAFA-LS, a dataset of SITS centered on Afghan archeological sites and annotated with preservation classification labels.

🎀 1:45PM Oral (room 208 B)
πŸ“° 4:30PM Poster (poster boards #419 – #443)

11.06.2025 04:03 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

I will be presenting our work on the detection of archaeological looting with satellite image time series at CVPR 2025 EarthVision workshop tomorrow!

Honored and grateful that this paper received the best student paper award!

11.06.2025 04:03 β€” πŸ‘ 15    πŸ” 6    πŸ’¬ 1    πŸ“Œ 0

We will present it at:

"3rd Workshop on Generative Models for Computer Vision" Wednesday 11 of June, 1pm.

Main Conference: Sunday 15 of June, 10h30am, Poster 186.

11.06.2025 00:52 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

I will be at #CVPR2025 this week in Nashville.

I will be presenting our paper "Around the World in 80 Timesteps:
A Generative Approach to Global Visual Geolocation".

We tackle geolocalization as a generative task allowing for SOTA performance and more interpretable predictions.

11.06.2025 00:52 β€” πŸ‘ 26    πŸ” 5    πŸ’¬ 1    πŸ“Œ 0
Preview
How far can we go with ImageNet for Text-to-Image generation? Recent text-to-image generation models have achieved remarkable results by training on billion-scale datasets, following a `bigger is better' paradigm that prioritizes data quantity over availability ...

Btw, we know that diffusion models are highly capable of composition. It's obvious when you prompt for "a pink elephant on the beach" while training on imagenet, as no such combination exists in the training set. That's what we had in mind for our "T2I from imagenet" paper arxiv.org/abs/2502.21318

04.06.2025 06:42 β€” πŸ‘ 10    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Post image

Working on those NeurIPS submissions on a sunny Parisian day

06.05.2025 12:05 β€” πŸ‘ 37    πŸ” 2    πŸ’¬ 2    πŸ“Œ 0

Looking forward to #CVPR2025! We will present the following papers:

30.04.2025 13:04 β€” πŸ‘ 28    πŸ” 7    πŸ’¬ 1    πŸ“Œ 1

@nicolasdufour is following 20 prominent accounts