Round of applause for the fantastic collaborators in this project: Wenshan Wu, Huanyu Zhang, Yan Xia, Shaoguang Mao, Li Dong, Ivan VuliΔ and Furu Weiπ₯³π₯³
14.01.2025 14:50 β π 2 π 0 π¬ 0 π 0Round of applause for the fantastic collaborators in this project: Wenshan Wu, Huanyu Zhang, Yan Xia, Shaoguang Mao, Li Dong, Ivan VuliΔ and Furu Weiπ₯³π₯³
14.01.2025 14:50 β π 2 π 0 π¬ 0 π 0
π Dive Deeper into MVoT
Discover how MVoT rewrites the rules with details like loss design, image tokenization and interleaved multimodal training.
πRead our paper on arXiv: arxiv.org/abs/2501.07542
π MVoT + CoT: New Ceiling for Reasoning
MVoT doesnβt replace CoTβit elevates it. By combining MVoT and CoT, the synergy of multimodal reasoning and verbal reasoning unlocks the performance upper bound, proving that two reasoning paradigms are potentially better than one!
π¨ Revolutionizing Visual Reasoning with Token Discrepancy Loss
Messy visuals? Not anymore. Our token discrepancy loss ensures that MVoT generates accurate, meaningful visualizations with less redundancy.
Result? Better images, clearer reasoning, stronger performance.
π― Performance Boosts with MVoT
MVoT isnβt just newβitβs better.
π₯ Better and more stable performance than CoT, particularly in complex scenarios like FrozenLake.
π Plug-and-play power: Supercharges models like GPT-4o for unprecedented versatility.
π§ MVoT
MVoT moves beyond Chain-of-Thought (CoT) to enable AI to imagine what it thinks with generated visual images. By blending verbal and visual reasoning, MVoT makes tackling complex problems more intuitive, interpretable, and powerful.
Forget just thinking in words.
πOur New Preprint:
π New Era of Multimodal Reasoningπ¨
π Imagine While Reasoning in Space with MVoT
Multimodal Visualization-of-Thought (MVoT) revolutionizes reasoning by generating visual "thoughts" that transform how AI thinks, reasons, and explains itself.
Hi would love to be added in the list! Thanks!
05.12.2024 15:06 β π 1 π 0 π¬ 0 π 0πworking on VLMs and would love to be added! Thanks!
05.12.2024 15:02 β π 1 π 0 π¬ 1 π 0