🚀🚀PaliGemma 2 is our updated and improved PaliGemma release using the Gemma 2 models and providing new pre-trained checkpoints for the full cross product of {224px,448px,896px} resolutions and {3B,10B,28B} model sizes.
1/7
05.12.2024 18:16 — 👍 68 🔁 21 💬 1 📌 5
Our big_vision codebase is really good! And it's *the* reference for ViT, SigLIP, PaliGemma, JetFormer, ... including fine-tuning them.
However, it's criminally undocumented. I tried using it outside Google to fine-tune PaliGemma and SigLIP on GPUs, and wrote a tutorial: lb.eyer.be/a/bv_tuto.html
03.12.2024 00:18 — 👍 116 🔁 18 💬 3 📌 2
Did you ever try to get an auto-regressive transformer to operate in a continuous latent space which is not fixed ahead of time but learned end to end from scratch?
Enter JetFormer: arxiv.org/abs/2411.19722 -- joint work in a dream team: @mtschannen.bsky.social and @kolesnikov.ch
02.12.2024 18:17 — 👍 14 🔁 2 💬 0 📌 0
Research Scientist at Google DeepMind
https://e-bug.github.io
Google DeepMind, Zurich; before: RWTH, DFKI. Views my own.
Researching #ComputerVision at #GoogleDeepMind using JAX/Flax (http://github.com/google/flax). Views are my own.
Research Fellow @ox.ac.uk | Multimodal ML PhD University of Amsterdam | Previously @msftresearch.bsky.social, Google AI Gemini, Bloomberg AI, Amazon Science, ETH, KU Leuven | MS AI, BS Computational Linguistics
mariyahendriksen.github.io
San Diego Dec 2-7, 25 and Mexico City Nov 30-Dec 5, 25. Comments to this account are not monitored. Please send feedback to townhall@neurips.cc.
Professor at UW; Researcher at Meta. LMs, NLP, ML. PNW life.
Research scientist at Anthropic. Prev. Google Brain/DeepMind, founding team OpenAI. Computer scientist; inventor of the VAE, Adam optimizer, and other methods. ML PhD. Website: dpkingma.com
Blog: https://sander.ai/
🐦: https://x.com/sedielem
Research Scientist at Google DeepMind (WaveNet, Imagen 3, Veo, ...). I tweet about deep learning (research + software), music, generative models (personal account).
Research Scientist @GoogleDeepMind. Representation learning for multimodal understanding and generation.
mitscha.github.io
Researcher (OpenAI. Ex: DeepMind, Brain, RWTH Aachen), Gamer, Hacker, Belgian.
Anon feedback: https://admonymous.co/giffmana
📍 Zürich, Suisse 🔗 http://lucasb.eyer.be