Looking for a small or medium sized VLM? PaliGemma 2 spans more than 150x of compute!
Not sure yet if you want to invest the time πͺfinetuningπͺ on your data? Give it a try with our ready-to-use "mix" checkpoints:
π€ huggingface.co/blog/paligem...
π€ developers.googleblog.com/en/introduci...
19.02.2025 17:47 β π 19 π 7 π¬ 0 π 0
Attending #NeurIPS2024? If you're interested in multimodal systems, building inclusive & culturally aware models, and how fractals relate to LLMs, we've 3 posters for you. I look forward to presenting them on behalf of our GDM team @ Zurich & collaborators. Details below (1/4)
07.12.2024 18:50 β π 12 π 5 π¬ 1 π 0
Want to get started using PaliGemma 2?
π€ developers.googleblog.com/en/introduci...
π€ huggingface.co/blog/paligem...
πΎ kaggle.com/models/googl...
π§ github.com/google-resea...
7/7
05.12.2024 18:19 β π 7 π 1 π¬ 0 π 0
If you want to know more, now is a good time to head over to the 31 page tech report.
Brought to you by an amazing team of collaborators from
@GoogleDeepMind
and
@GoogleAI
.
arxiv.org/abs/2412.03555
6/7
05.12.2024 18:18 β π 2 π 2 π¬ 1 π 0
In addition to the pre-trained checkpoints, we also release two checkpoints fine-tuned on the DOCCI dataset, which generate fine-grained captions with a great quality/compute trade-off β and no yapping!
5/7
05.12.2024 18:18 β π 3 π 0 π¬ 1 π 0
After πͺfinetuningπͺ on your data, you can expect to see great results, like the sota we got on recognizing table structures, music scores, molecular structures, and text, and on radiography report generation.
4/7
05.12.2024 18:17 β π 4 π 0 π¬ 1 π 0
As the original PaliGemma, the pre-trained PaliGemma 2 models have segmentation and detection capabilities, and excel at OCR β which makes them extremely versatile for πͺfinetuningπͺ. The original demo hf.co/spaces/big-v... gives you an idea of the capabilities.
3/7
05.12.2024 18:17 β π 4 π 0 π¬ 1 π 0
Adding this new "model size" dimension unlocks substantial improvements for some tasks (blue, e.g. AI2D), and compounds with improvements from increased resolution for most tasks (green, e.g. InfoVQA).
2/7
05.12.2024 18:16 β π 2 π 0 π¬ 1 π 0
ππPaliGemma 2 is our updated and improved PaliGemma release using the Gemma 2 models and providing new pre-trained checkpoints for the full cross product of {224px,448px,896px} resolutions and {3B,10B,28B} model sizes.
1/7
05.12.2024 18:16 β π 68 π 21 π¬ 1 π 5
Research Scientist at FAIR, Meta. π¬ My opinions are my own.
Researching digital education at University of Copenhagen's Department of Public Health. Honorary fellow at Deakin University's Centre for Research on Assessment and Digital Learning.
Into #Feedback #OnlineLearning #HigherEducation #AncientGreeks
Research Scientist @GoogleDeepMind. Representation learning for multimodal understanding and generation.
mitscha.github.io
Researcher (OpenAI. Ex: DeepMind, Brain, RWTH Aachen), Gamer, Hacker, Belgian.
Anon feedback: https://admonymous.co/giffmana
π ZΓΌrich, Suisse π http://lucasb.eyer.be
official Bluesky account (check usernameπ)
Bugs, feature requests, feedback: support@bsky.app