's Avatar

@cointegrated.bsky.social

9 Followers  |  20 Following  |  5 Posts  |  Joined: 24.01.2025  |  1.2194

Latest posts by cointegrated.bsky.social on Bluesky

Adding a bunch of tags for discoverability: #machinetranslation #flores #seed #languages #multilinguality #ai #nlp #mt

05.07.2025 13:18 — 👍 1    🔁 0    💬 0    📌 0

The Seed training dataset also received a few submissions, including new translations into Spanish and Italian (from which it might be easier to translate into lower-resourced languages).

05.07.2025 13:17 — 👍 1    🔁 0    💬 1    📌 0

BTW, last year, as part of the previous shared task (aclanthology.org/2024.wmt-1.4), FLORES+ was extended with the languages Emakhuwa, Erzya, Tuvan, Karakalpak, Aragonese, Aranese, Asturian, Valencian, and Wu Chinese, and received a number of edits to other languages.

05.07.2025 13:16 — 👍 0    🔁 0    💬 1    📌 0

What to do now?
- Download the dataset and benchmark multilingual models: huggingface.co/datasets/ope...
- Subscribe to our newsletter: openlanguagedata.substack.com/about
- Participate in the WMT25 Open Data shared task to enrich open datasets with new languages www2.statmt.org/wmt25/open-d...

05.07.2025 13:15 — 👍 1    🔁 0    💬 1    📌 0

We (oldi.org) recently released version 3.0 of the FLORES+ dataset: a benchmark for multilingual machine translation.

In this version, we added Ladin language (now there are 222 language varieties in the dataset!), corrected the spelling for Chuvash and Dargwa, and fixed sentence order in Aranese.

05.07.2025 13:14 — 👍 2    🔁 0    💬 1    📌 0

@cointegrated is following 20 prominent accounts