Chengzu's Avatar

Chengzu

@chengzu-li.bsky.social

PhD student at Language Technology Lab, University of Cambridge

56 Followers  |  100 Following  |  9 Posts  |  Joined: 27.11.2024  |  1.7421

Latest posts by chengzu-li.bsky.social on Bluesky

Round of applause for the fantastic collaborators in this project: Wenshan Wu, Huanyu Zhang, Yan Xia, Shaoguang Mao, Li Dong, Ivan VuliΔ‡ and Furu WeiπŸ₯³πŸ₯³

14.01.2025 14:50 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Imagine while Reasoning in Space: Multimodal Visualization-of-Thought Chain-of-Thought (CoT) prompting has proven highly effective for enhancing complex reasoning in Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs). Yet, it struggles in complex ...

πŸ“„ Dive Deeper into MVoT

Discover how MVoT rewrites the rules with details like loss design, image tokenization and interleaved multimodal training.
πŸ‘‰Read our paper on arXiv: arxiv.org/abs/2501.07542

14.01.2025 14:50 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸ”— MVoT + CoT: New Ceiling for Reasoning

MVoT doesn’t replace CoTβ€”it elevates it. By combining MVoT and CoT, the synergy of multimodal reasoning and verbal reasoning unlocks the performance upper bound, proving that two reasoning paradigms are potentially better than one!

14.01.2025 14:50 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

🎨 Revolutionizing Visual Reasoning with Token Discrepancy Loss

Messy visuals? Not anymore. Our token discrepancy loss ensures that MVoT generates accurate, meaningful visualizations with less redundancy.

Result? Better images, clearer reasoning, stronger performance.

14.01.2025 14:50 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

🎯 Performance Boosts with MVoT

MVoT isn’t just newβ€”it’s better.
πŸ”₯ Better and more stable performance than CoT, particularly in complex scenarios like FrozenLake.
🌟 Plug-and-play power: Supercharges models like GPT-4o for unprecedented versatility.

14.01.2025 14:50 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

🧠MVoT

MVoT moves beyond Chain-of-Thought (CoT) to enable AI to imagine what it thinks with generated visual images. By blending verbal and visual reasoning, MVoT makes tackling complex problems more intuitive, interpretable, and powerful.

14.01.2025 14:50 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Forget just thinking in words.

πŸ””Our New Preprint:
πŸš€ New Era of Multimodal Reasoning🚨
πŸ” Imagine While Reasoning in Space with MVoT

Multimodal Visualization-of-Thought (MVoT) revolutionizes reasoning by generating visual "thoughts" that transform how AI thinks, reasons, and explains itself.

14.01.2025 14:50 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Hi would love to be added in the list! Thanks!

05.12.2024 15:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

πŸ™‹working on VLMs and would love to be added! Thanks!

05.12.2024 15:02 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@chengzu-li is following 20 prominent accounts