Today is Antoine Guedon's PhD! Already pretty cool visuals right at the start.
25.09.2025 15:16 β π 24 π 4 π¬ 1 π 0@nicolasdufour.bsky.social
PhD student at IMAGINE (ENPC) and GeoVic (Ecole Polytechnique). Working on image generation. http://nicolas-dufour.github.io
Today is Antoine Guedon's PhD! Already pretty cool visuals right at the start.
25.09.2025 15:16 β π 24 π 4 π¬ 1 π 0Annnnnd it's a reject!
Scale is a religion and if you go against it, you're a heretic and you should burn, "despite [the reviewers] final ratings".
But scale is still not necessary!
Side note: First time swinging reviews up (from 2,2,4,4 to 2,4,4,5) does not get the paper accepted. Strange days.
Does a smaller latent space lead to worse generation in latent diffusion models? Not necessarily! We show that LDMs are extremely robust to a wide range of compression rates (10-1000x) in the context of physics emulation.
We got lost in latent space. Join us π
Next week, I'll be in Strasbourg for the GRETSI (@gretsi-info.bsky.social) to present a small discovery on transformers generalization we made with Simon and JΓ©rΓ©mie while working on generative recommender systems. I love these "phase transition" plots.
π: arxiv.org/abs/2508.03934
Short summary π
Makes me think of StyleGAN3 visualizations
18.08.2025 22:44 β π 2 π 0 π¬ 1 π 0Plonk project page: nicolas-dufour.github.io/plonk
@vickykalogeiton.bsky.social , @davidpicard.bsky.social and @loicland.bsky.social
Congrats to the Dino team for the DinoV3 release!
Seeing it outperforms CLIP on "cultural knowledge" based task like geoloc make me very hopeful for it working really well in VLMs!
π Geoloc is a fantastic downstream benchmark:
- Requires fine-grained visual understanding (textures, vegetation, road signs, architecture)
- Tests global generalization
- Forces models to pick up real-world cues
Thatβs why DinoV3 shining here is such a big deal π
Even crazier π€― DinoV3 works in some out-of-distribution setups too β as long as there are geographical cues ππΊοΈ
(Remember: the network is trained only on road images!)
Where DinoV2 totally failed, DinoV3 is holding up π
The setup π We use our riemannian flow matching model PLONK (CVPR25: Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation) π
We simply swap StreetCLIP with DinoV3 as a drop-in backbone, and train on OpenStreetView-5M.
And boom π₯ β DinoV3 wins.
π DinoV3 just became the new go-to backbone for geoloc!
It outperforms CLIP-like models (SigLip2, finetuned StreetCLIP)β¦ and thatβs shocking π€―
Why? CLIP models have an innate advantage β they literally learn place names + images. DinoV3 doesnβt.
Dear bsky friends, I have a question: Do you really think that the visual quality of these images is so bad that the research that produced them is deeply flawed?
And if I told you that the model was mostly trained on ImageNet with a bit of artistic fine-tuning at 1024 resolution, still really bad?
I had the privilege to be invited to speak about our work "Around the World in 80 Timesteps" at the French Podcast Underscore! If you speak french, i highly recommend it they did a great job with the montage!
If you want to learn more nicolas-dufour.github.io/plonk
www.youtube.com/watch?v=s5oH...
1/ Can open-data models beat DINOv2? Today we release Franca, a fully open-sourced vision foundation model. Franca with ViT-G backbone matches (and often beats) proprietary models like SigLIPv2, CLIP, DINOv2 on various benchmarks setting a new standard for open-source research.
21.07.2025 14:47 β π 84 π 21 π¬ 2 π 3Really cool work! I've seen that you haven't used registers but seem to have smooth latents anyway. Is this a consequence of having the matryoshka loss that require both global and local knowledge?
Have u tried using registers?
β¨Thrilled to see EurIPS launch β the first officially endorsed European NeurIPS presentation venue!
π But NeurIPS now requires at least one author to attend in San Diego or Mexico (and not just virtually as before). This is detrimental to many. Why not allow presenting at EurIPS or online?
1/4
Sadly, helium doesn't carry enough weight π
15.06.2025 21:10 β π 2 π 0 π¬ 1 π 0Some of our IMAGINE members at #CVPR2025
15.06.2025 19:14 β π 34 π 7 π¬ 0 π 0Come on! Who else has a hot air ballon on their poster?
(fun fact: there is no hot air ballon emoji, but @loicland.bsky.social made a tikz macro for it! π
)
Come see us in poster 186 to see our poster Around the World in 80 timesteps: A generative Approach to Global Visual Geolocation!
Cc @loicland.bsky.social @davidpicard.bsky.social @vickykalogeiton.bsky.social
A bit disappointed by the PAMI TC meeting, mostly repetitions of whatβs been said at the opening, the "open discussion" slide was really just there to *exist* but no discussion/vote took place, no topic was debated. What space is left to reflect on our community and what we stand for as scientists?
14.06.2025 23:25 β π 3 π 2 π¬ 0 π 0I am heartbroken that I am not at the conference, but seeing what the government is doing to its people and the world, I simply couldn't go there.
14.06.2025 09:51 β π 21 π 6 π¬ 1 π 0I will also be presenting CoDeX at the same workshop between 1:15PM and 1:45PM.
Abhishek Kuriyal, Mathieu Aubry, @loicland.bsky.social and I improve the performance of deep learning models in challenging domain shift settings by learning how to combine spatial domain experts.
Discover DAFA-LS, a dataset of SITS centered on Afghan archeological sites and annotated with preservation classification labels.
π€ 1:45PM Oral (room 208 B)
π° 4:30PM Poster (poster boards #419 β #443)
I will be presenting our work on the detection of archaeological looting with satellite image time series at CVPR 2025 EarthVision workshop tomorrow!
Honored and grateful that this paper received the best student paper award!
We will present it at:
"3rd Workshop on Generative Models for Computer Vision" Wednesday 11 of June, 1pm.
Main Conference: Sunday 15 of June, 10h30am, Poster 186.
I will be at #CVPR2025 this week in Nashville.
I will be presenting our paper "Around the World in 80 Timesteps:
A Generative Approach to Global Visual Geolocation".
We tackle geolocalization as a generative task allowing for SOTA performance and more interpretable predictions.
Btw, we know that diffusion models are highly capable of composition. It's obvious when you prompt for "a pink elephant on the beach" while training on imagenet, as no such combination exists in the training set. That's what we had in mind for our "T2I from imagenet" paper arxiv.org/abs/2502.21318
04.06.2025 06:42 β π 10 π 2 π¬ 0 π 0Working on those NeurIPS submissions on a sunny Parisian day
06.05.2025 12:05 β π 37 π 2 π¬ 2 π 0Looking forward to #CVPR2025! We will present the following papers:
30.04.2025 13:04 β π 28 π 7 π¬ 1 π 1