Mickael Chen's Avatar

Mickael Chen

@mickaelchen.bsky.social

Generating MNIST digits for a decade. Research Multimodal Generative AI. Currently at H company.

93 Followers  |  92 Following  |  11 Posts  |  Joined: 16.11.2024  |  2.0567

Latest posts by mickaelchen.bsky.social on Bluesky

Is it just me or is fucking linkedin taking over some of the functions that twitter used to fill?

08.05.2025 08:30 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image Post image

๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ CV Folks, I have some news! We're organizing a 1-day meeting in center Paris on June 6th before CVPR called CVPR@Paris (similar as NeurIPS@Paris) ๐Ÿฅ๐Ÿพ๐Ÿฅ–๐Ÿท

Registration is open (it's free) with priority given to authors of accepted papers: cvprinparis.github.io/CVPR2025InPa...

Big ๐Ÿงต๐Ÿ‘‡ with details!

21.03.2025 06:43 โ€” ๐Ÿ‘ 137    ๐Ÿ” 51    ๐Ÿ’ฌ 8    ๐Ÿ“Œ 10

Wow, neet! Reannotation is key here.

Conjecture:
As we are get more and more well-aligned text-image data, it will become easier and easier to train models.

This will allow us to explore both more streamlined and more exotic training recipes.
More signals that exciting times are coming!

03.03.2025 11:50 โ€” ๐Ÿ‘ 2    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

arxiv.org/abs/2310.16834
More likely, they just use this very nice work of theirs.

28.02.2025 02:20 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Wild guess: VAE-Bidirectional transformers as text embedder for per-token low dimension embeddings suitable to diffusion.

That would be an cool thing to try anyway.

28.02.2025 01:31 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

A game changer. A lot of people suspected it *should* work, but actually seeing it in action is something.

28.02.2025 00:25 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿš— Ever wondered if an AI model could learn to drive just by watching YouTube? ๐ŸŽฅ๐Ÿ‘€

We trained a 1.2B parameter model on 1,800+ hours of raw driving videos.

No labels. No maps. Just pure observation.

And it works! ๐Ÿคฏ

๐Ÿงต๐Ÿ‘‡ [1/10]

24.02.2025 12:53 โ€” ๐Ÿ‘ 24    ๐Ÿ” 7    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 2

Bluesky is less engaging because the algorithm is less predatory.

08.02.2025 13:14 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Microsoft probing if DeepSeek-linked group improperly obtained OpenAI data, Bloomberg News reports Microsoft and OpenAI are investigating whether data output from OpenAI's technology was obtained in an unauthorized manner by a group linked to Chinese artificial intelligence (AI) startup DeepSeek, Bloomberg News reported on Tuesday.

I'm curious who at Microsoft or OpenAI thought it was a good idea to publicize this narrative.

If you are an organisation concered about ethics of training data, now is probably your best chance to act and be heard.

www.reuters.com/technology/m...

29.01.2025 19:51 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

The plateau on training scaling and the shift to test-time scaling created favorable conditions for a competitor like DeepSeek to raise and catch up with OpenAI.

Nah, I just made that up. Need to put more thoughts into this. ๐Ÿค”

29.01.2025 00:23 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Also, the whole system could already almost be seen as a form of self-improvement with some minimal human signals.

14.12.2024 11:26 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

We've reached a point where synthetic data is just better and more convenient than messy noisy web-crawled data.

It's been true for multimodal data for a while, and semi-automated data as in the Florence-2 paper has been very succesful. arxiv.org/abs/2311.06242

14.12.2024 11:23 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Better VQ-VAEs with this one weird rotation trick!

I missed this when it came out, but I love papers like this: a simple change to an already powerful technique, that significantly improves results without introducing complexity or hyperparameters.

02.12.2024 19:52 โ€” ๐Ÿ‘ 87    ๐Ÿ” 13    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Have you ever wondered how to train an autoregressive generative transformer on text and raw pixels, without a pretrained visual tokenizer (e.g. VQ-VAE)?

We have been pondering this during summer and developed a new model: JetFormer ๐ŸŒŠ๐Ÿค–

arxiv.org/abs/2411.19722

A thread ๐Ÿ‘‡

1/

02.12.2024 16:41 โ€” ๐Ÿ‘ 155    ๐Ÿ” 36    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 7

For AI to be fair and sustainable, we'd need to figure out attribution, i.e. "How much does training sample X contribute to model output Y?" Then the creator of sample X gets paid an amount proportional to what the user paid for the inference call that produced output Y.

21.11.2024 09:31 โ€” ๐Ÿ‘ 5    ๐Ÿ” 1    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 2

A great place for students interested in AI/CV research internship. It's a very strong team, invested with all of their students. Check it out.

23.11.2024 13:50 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Andrei Bursuc on LinkedIn: #cvpr2024 #cvpr Inย case you missed our PointBeV poster at #CVPR2024 here's a quick presentation by the lead author Loรฏck C.. PointBEVย brings a change of paradigm inโ€ฆ

ICYMI our PointBeV #CVPR2024 poster here's a quick talk by lead author Loรฏck Chambon.
It brings a change of paradigm in multi-camera bird's-eye-view (BeV) segmentation via a flexible mechanism to produce sparse BeV points that can adapt to situation, task, compute

www.linkedin.com/posts/andrei...

22.11.2024 11:18 โ€” ๐Ÿ‘ 11    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image Post image Post image Post image

The Cosmos suite of neural tokenizers for images & videos is impressive.
Cosmos is trained on diverse high-res imgs & long-vids, scales well for both discrete & continuous tokens, generalizes to multiple domains (robotics, driving, egocentric ...) & has excellent runtime
github.com/NVIDIA/Cosmo...

20.11.2024 22:58 โ€” ๐Ÿ‘ 19    ๐Ÿ” 5    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

This is ridiculous. And then people will talk about inclusivity and mental health. Sorry to speak my mind so openly, but this has to be the most toxic idea in a very long time.

18.11.2024 19:23 โ€” ๐Ÿ‘ 14    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

@mickaelchen is following 20 prominent accounts