โจ Introducing a new #SOTA action recognition large multimodal language model: #LLaVAction!
By @shaokaiye.bsky.social Haozhe Qi, @trackingskills.bsky.social and me!
๐ arxiv.org/abs/2503.18712
๐ค mmathislab.github.io/llavaction/
1/n