Is it just me or is fucking linkedin taking over some of the functions that twitter used to fill?
08.05.2025 08:30 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0@mickaelchen.bsky.social
Generating MNIST digits for a decade. Research Multimodal Generative AI. Currently at H company.
Is it just me or is fucking linkedin taking over some of the functions that twitter used to fill?
08.05.2025 08:30 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0๐ฅ๐ฅ๐ฅ CV Folks, I have some news! We're organizing a 1-day meeting in center Paris on June 6th before CVPR called CVPR@Paris (similar as NeurIPS@Paris) ๐ฅ๐พ๐ฅ๐ท
Registration is open (it's free) with priority given to authors of accepted papers: cvprinparis.github.io/CVPR2025InPa...
Big ๐งต๐ with details!
Wow, neet! Reannotation is key here.
Conjecture:
As we are get more and more well-aligned text-image data, it will become easier and easier to train models.
This will allow us to explore both more streamlined and more exotic training recipes.
More signals that exciting times are coming!
arxiv.org/abs/2310.16834
More likely, they just use this very nice work of theirs.
Wild guess: VAE-Bidirectional transformers as text embedder for per-token low dimension embeddings suitable to diffusion.
That would be an cool thing to try anyway.
A game changer. A lot of people suspected it *should* work, but actually seeing it in action is something.
28.02.2025 00:25 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0๐ Ever wondered if an AI model could learn to drive just by watching YouTube? ๐ฅ๐
We trained a 1.2B parameter model on 1,800+ hours of raw driving videos.
No labels. No maps. Just pure observation.
And it works! ๐คฏ
๐งต๐ [1/10]
Bluesky is less engaging because the algorithm is less predatory.
08.02.2025 13:14 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0I'm curious who at Microsoft or OpenAI thought it was a good idea to publicize this narrative.
If you are an organisation concered about ethics of training data, now is probably your best chance to act and be heard.
www.reuters.com/technology/m...
The plateau on training scaling and the shift to test-time scaling created favorable conditions for a competitor like DeepSeek to raise and catch up with OpenAI.
Nah, I just made that up. Need to put more thoughts into this. ๐ค
Also, the whole system could already almost be seen as a form of self-improvement with some minimal human signals.
14.12.2024 11:26 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0We've reached a point where synthetic data is just better and more convenient than messy noisy web-crawled data.
It's been true for multimodal data for a while, and semi-automated data as in the Florence-2 paper has been very succesful. arxiv.org/abs/2311.06242
Better VQ-VAEs with this one weird rotation trick!
I missed this when it came out, but I love papers like this: a simple change to an already powerful technique, that significantly improves results without introducing complexity or hyperparameters.
Have you ever wondered how to train an autoregressive generative transformer on text and raw pixels, without a pretrained visual tokenizer (e.g. VQ-VAE)?
We have been pondering this during summer and developed a new model: JetFormer ๐๐ค
arxiv.org/abs/2411.19722
A thread ๐
1/
For AI to be fair and sustainable, we'd need to figure out attribution, i.e. "How much does training sample X contribute to model output Y?" Then the creator of sample X gets paid an amount proportional to what the user paid for the inference call that produced output Y.
21.11.2024 09:31 โ ๐ 5 ๐ 1 ๐ฌ 2 ๐ 2A great place for students interested in AI/CV research internship. It's a very strong team, invested with all of their students. Check it out.
23.11.2024 13:50 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0ICYMI our PointBeV #CVPR2024 poster here's a quick talk by lead author Loรฏck Chambon.
It brings a change of paradigm in multi-camera bird's-eye-view (BeV) segmentation via a flexible mechanism to produce sparse BeV points that can adapt to situation, task, compute
www.linkedin.com/posts/andrei...
The Cosmos suite of neural tokenizers for images & videos is impressive.
Cosmos is trained on diverse high-res imgs & long-vids, scales well for both discrete & continuous tokens, generalizes to multiple domains (robotics, driving, egocentric ...) & has excellent runtime
github.com/NVIDIA/Cosmo...
This is ridiculous. And then people will talk about inclusivity and mental health. Sorry to speak my mind so openly, but this has to be the most toxic idea in a very long time.
18.11.2024 19:23 โ ๐ 14 ๐ 2 ๐ฌ 1 ๐ 0