Adding a bunch of tags for discoverability: #machinetranslation #flores #seed #languages #multilinguality #ai #nlp #mt
05.07.2025 13:18 — 👍 1 🔁 0 💬 0 📌 0
The Seed training dataset also received a few submissions, including new translations into Spanish and Italian (from which it might be easier to translate into lower-resourced languages).
05.07.2025 13:17 — 👍 1 🔁 0 💬 1 📌 0
BTW, last year, as part of the previous shared task (aclanthology.org/2024.wmt-1.4), FLORES+ was extended with the languages Emakhuwa, Erzya, Tuvan, Karakalpak, Aragonese, Aranese, Asturian, Valencian, and Wu Chinese, and received a number of edits to other languages.
05.07.2025 13:16 — 👍 0 🔁 0 💬 1 📌 0
What to do now?
- Download the dataset and benchmark multilingual models: huggingface.co/datasets/ope...
- Subscribe to our newsletter: openlanguagedata.substack.com/about
- Participate in the WMT25 Open Data shared task to enrich open datasets with new languages www2.statmt.org/wmt25/open-d...
05.07.2025 13:15 — 👍 1 🔁 0 💬 1 📌 0
We (oldi.org) recently released version 3.0 of the FLORES+ dataset: a benchmark for multilingual machine translation.
In this version, we added Ladin language (now there are 222 language varieties in the dataset!), corrected the spelling for Chuvash and Dargwa, and fixed sentence order in Aranese.
05.07.2025 13:14 — 👍 2 🔁 0 💬 1 📌 0
Associate Professor at GroNLP ( @gronlp.bsky.social ) #NLP | Multilingualism | Interpretability | Language Learning in Humans vs NeuralNets | Mum^2
Head of the InClow research group: https://inclow-lm.github.io/
Postdoc @rug.nl with Arianna Bisazza.
Interested in NLP, interpretability, syntax, language acquisition and typology.
Professor a NYU; Chief AI Scientist at Meta.
Researcher in AI, Machine Learning, Robotics, etc.
ACM Turing Award Laureate.
http://yann.lecun.com
I lead Cohere For AI. Formerly Research
Google Brain. ML Efficiency, LLMs,
@trustworthy_ml.
Researcher at Cohere | Multilingual LLM evaluation
Assistant Prof at GMU. NLP, CompLing, ML, and other things language+humans
EMNLP 2026 - The annual Conference on Empirical Methods in Natural Language Processing
Dates: October 2026 in Hungary
Hashtags: #EMNLP2026 #NLP
Submission Deadline: May 25, 2026 (TBC)
Research Scientist Meta/FAIR, Prof. University of Geneva, co-founder Neural Concept SA. I like reality.
https://fleuret.org
Host of scientistic papers for @aclmeeting.bsky.social and other venues in the field of natural language processing. https://aclanthology.org/
#NLP #NLProc
The 26th Nordic Conference on Computational Linguistics will be held in Copenhagen, Denmark in 2027
🌐 TBD
NLP. NMT. Main author of Marian NMT. Research Scientist at Microsoft Translator.
https://marian-nmt.github.io
🗣️ Teaching you about the science and diversity of language
📰 Sign up for the newsletter at LinguisticDiscovery.Substack.com or LinguisticDiscovery.com!
⭐business : linguisticae@hildgard.fr | 🔍Linguistique, socio, pol |
Associate Professor in Computer Science at the University of Maryland. Human-Centered Natural Language Processing & Machine Translation
The world's largest online linguistic resource (https://linguistlist.org).
Senior Research Engineer with the Common Crawl Foundation.
(languages ∪ tech) in Dùn Èideann
linguist bylinina.github.io
Senior Research Scientist at FAIR (Meta)
hadyelsahar.io
official Bluesky account (check username👆)
Bugs, feature requests, feedback: support@bsky.app