Weβre thrilled that SEA-VL has been accepted to the ACL 2025 (Main)!
Thank you to everyone who contributed to this project π₯³
Paper: arxiv.org/abs/2503.07920
Project: seacrowd.github.io/seavl-launch/
#ACL2025NLP #SEACrowd #ForSEABySEA
@seacrowd.bsky.social
Advancing Southeast Asian (SEA) NLP Research https://seacrowd.github.io/
Weβre thrilled that SEA-VL has been accepted to the ACL 2025 (Main)!
Thank you to everyone who contributed to this project π₯³
Paper: arxiv.org/abs/2503.07920
Project: seacrowd.github.io/seavl-launch/
#ACL2025NLP #SEACrowd #ForSEABySEA
Letβs build a VLM that sees and celebrates Southeast Asiaβtogether. πͺ
@josephimperial.bsky.social @samuel-cahyawijaya.bsky.social @jcblaise.bsky.social @ruochenzhang.bsky.social @rianadamr.bsky.social @antonrufino.bsky.social
Whether youβre a researcher, developer, artist, linguist, photographer, student, or simply someone who loves Southeast Asia, your voice and skills matter. Join us!
π₯ Apply now: seacrowd.github.io//seavl-phase...
π¬ Questions? Join the conversation on Discord: discord.gg/XXRHFuvkTA
Why contribute?
π€ Work with an international team of passionate researchers
π
Earn points for every contributionβwith opportunities for a certificate, exclusive merch (t-shirt & keychain), and even co-authorship on our final paper
We are looking for contributors who can:
πΉSubmit culturally relevant images from SEA
πΉAnnotate image submissions
πΉTranslate existing benchmarks to SEA languages
πΉCreate high-quality questions for multicultural images from SEA
πΉCreate high-quality prompts for image generation with our VLM
We want build the first open-source vision-language model (VLM) that fully captures Southeast Asiaβs rich cultures, languages, and everyday life!
08.05.2025 09:41 β π 0 π 0 π¬ 1 π 0π’ Calling all SEA-passionate individuals!
SEACrowd is excited to launch our contributor call for SEA-VL Phase 2: Building Visual Language Models for Southeast Asia! π
After the success of Phase 1, we're now taking on a bigger mission (see thread)π
Interested in pushing research for Southeast Asian languages? We're happy to welcome you in SEACrowd and SIGSEA! See links below:
SIGSEA: www.sigsea.org/home
Discord: discord.gg/XXRHFuvkTA
Introducing SEA-VL with 1.3M culturally relevant imagesβ50x larger than existing datasets!
π Key insights:
β
Crowdsourcing: good accuracy but slow & costly
β
Image Crawling: ~85% cultural relevance
β Image Generation fails to capture SEA nuances & faces licensing issues
Why is this important?
β
AI models trained on culturally relevant data can better understand local contexts, traditions, and languages.
β
Community contributions ensure AI does not misrepresent local identities.
β
We empower local communities in AI development.
π‘ Thatβs why we created SEA-VL, an open-source initiative designed to bridge the resource gap and provide AI models with more accurate, culturally relevant data from SEA. But we couldnβt have done it alone!
#NoLanguageLeftBehind #SoutheastAsia
AI is shaping the future, but how often does it reflect the cultures, languages, and traditions of Southeast Asia? Not enough!
Most VL datasets used to train AI are dominated by Western-centric data, leaving Southeast Asian cultures largely underrepresented.
SEA-VL: Building AI for Southeast Asian Research π
We release SEA-VL, the largest vision-language dataset tailored for SEAβs diverse culture.
π arXiv: arxiv.org/abs/2503.07920
π€ Data: huggingface.co/collections/...
Check the thread π§΅