Andreas Steiner's Avatar

Andreas Steiner

@andreaspsteiner.bsky.social

Researching #ComputerVision at #GoogleDeepMind using JAX/Flax (http://github.com/google/flax). Views are my own.

191 Followers  |  5 Following  |  8 Posts  |  Joined: 28.11.2024  |  1.7151

Latest posts by andreaspsteiner.bsky.social on Bluesky

Post image

Looking for a small or medium sized VLM? PaliGemma 2 spans more than 150x of compute!

Not sure yet if you want to invest the time πŸͺ„finetuningπŸͺ„ on your data? Give it a try with our ready-to-use "mix" checkpoints:

πŸ€— huggingface.co/blog/paligem...
🎀 developers.googleblog.com/en/introduci...

19.02.2025 17:47 β€” πŸ‘ 19    πŸ” 7    πŸ’¬ 0    πŸ“Œ 0

Attending #NeurIPS2024? If you're interested in multimodal systems, building inclusive & culturally aware models, and how fractals relate to LLMs, we've 3 posters for you. I look forward to presenting them on behalf of our GDM team @ Zurich & collaborators. Details below (1/4)

07.12.2024 18:50 β€” πŸ‘ 12    πŸ” 5    πŸ’¬ 1    πŸ“Œ 0

Want to get started using PaliGemma 2?

🎀 developers.googleblog.com/en/introduci...
πŸ€— huggingface.co/blog/paligem...
πŸ’Ύ kaggle.com/models/googl...
πŸ”§ github.com/google-resea...

7/7

05.12.2024 18:19 β€” πŸ‘ 7    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image

If you want to know more, now is a good time to head over to the 31 page tech report.

Brought to you by an amazing team of collaborators from
@GoogleDeepMind
and
@GoogleAI
.

arxiv.org/abs/2412.03555

6/7

05.12.2024 18:18 β€” πŸ‘ 2    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Post image

In addition to the pre-trained checkpoints, we also release two checkpoints fine-tuned on the DOCCI dataset, which generate fine-grained captions with a great quality/compute trade-off – and no yapping!

5/7

05.12.2024 18:18 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

After πŸͺ„finetuningπŸͺ„ on your data, you can expect to see great results, like the sota we got on recognizing table structures, music scores, molecular structures, and text, and on radiography report generation.

4/7

05.12.2024 18:17 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

As the original PaliGemma, the pre-trained PaliGemma 2 models have segmentation and detection capabilities, and excel at OCR – which makes them extremely versatile for πŸͺ„finetuningπŸͺ„. The original demo hf.co/spaces/big-v... gives you an idea of the capabilities.

3/7

05.12.2024 18:17 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Adding this new "model size" dimension unlocks substantial improvements for some tasks (blue, e.g. AI2D), and compounds with improvements from increased resolution for most tasks (green, e.g. InfoVQA).

2/7

05.12.2024 18:16 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸš€πŸš€PaliGemma 2 is our updated and improved PaliGemma release using the Gemma 2 models and providing new pre-trained checkpoints for the full cross product of {224px,448px,896px} resolutions and {3B,10B,28B} model sizes.

1/7

05.12.2024 18:16 β€” πŸ‘ 68    πŸ” 21    πŸ’¬ 1    πŸ“Œ 5

@andreaspsteiner is following 5 prominent accounts