Check out the code, models, and demo iOS/macOS app using MLX for our fast vision-language models, FastVLM:
github.com/apple/ml-fas...
Paper: "FastVLM: Efficient Vision Encoding for Vision Language Models", Anasosalu et al., CVPR 2025
arxiv.org/abs/2412.13303
#CVPR2025 #Apple #research
07.05.2025 12:20 β π 2 π 0 π¬ 0 π 0
Today is a great day for optimal transport π! Lots of gratitude π for all folks who contributed to ott-jax.readthedocs.io and pushed for the MOSCOT (now @ nature!) paper, from visionaries @dominik1klein.bsky.social, G. Palla, Z. Piran to the magician, Michal Klein! β€οΈ
www.nature.com/articles/s41...
22.01.2025 22:17 β π 22 π 7 π¬ 0 π 1
FastVLM: Efficient Vision Encoding for Vision Language Models
Scaling the input image resolution is essential for enhancing the performance of Vision Language Models (VLMs), particularly in text-rich image understanding tasks. However, popular visual encoders su...
For more, check out our paper on arxiv: arxiv.org/abs/2412.13303
With the amazing people: @pavankumarvasu.bsky.social , Fartash Faghri, Chun-Liang Li, Hadi Pouransari, Nate True, Albert Antony, Gokul Santhanam, James Gabriel, Peter Grasch, and @onceltuzel.bsky.social
19.12.2024 19:22 β π 1 π 1 π¬ 0 π 0
WVD Pipeline
π€Image-to-3D, monocular depth estimation, camera pose estimation, β¦, can we achieve all of this with just ONE model easily?
πOur answer is Yes -- Excited to introduce our latest work: World-consistent Video Diffusion (WVD) with Explicit 3D Modeling!
arxiv.org/abs/2412.01821
04.12.2024 13:41 β π 13 π 6 π¬ 1 π 0
AI / ML comms person (formerly Meta, Linden Lab). Guitar in Butterfly Knives. Vespa enthusiast.
Google Chief Scientist, Gemini Lead. Opinions stated here are my own, not those of Google. Gemini, TensorFlow, MapReduce, Bigtable, Spanner, ML things, ...
Research Scientist at @Apple. Previous: @Meta (FAIR), @Inria, @MSFTResearch, @VectorInst and @UofG . Egyptian πͺπ¬
machine learning researcher @ Apple machine learning research
Ramen whisperer, bad throat singer
ML Research @ Apple.
Understanding deep learning (generalization, calibration, diffusion, etc).
preetum.nakkiran.org
Internet pedestrian. Machine learning mercenary. α(γ)α (he/him/his)
https://laurent-dinh.github.io/
Research scientist at Apple | machine learning, optimization, language modeling
pierreablin.com
Coffee Lover β’ Husky Dad β’ ML Researcher @ ο£Ώ β’ Berkeley Grad
Professor of HCII and LTI at Carnegie Mellon School of Computer Science.
jeffreybigham.com
Differential Privacy. Machine Learning. Apple.
official Bluesky account (check usernameπ)
Bugs, feature requests, feedback: support@bsky.app