Ready for our poster today at #COLM2025!
πThis paper has had an interesting journey, come find out and discuss with us! @swetaagrawal.bsky.social @kocmitom.bsky.social
Side note: being a parent in research does have its perks, poster transportation solved β
08.10.2025 12:16 β π 12 π 1 π¬ 0 π 0
This project wouldnβt have been possible without the brilliant minds driving the work: Lorenzo Proietti, @sted19.bsky.social and @zouharvi.bsky.social
16.09.2025 09:51 β π 3 π 0 π¬ 0 π 0
One way to raise the bar is by rethinking the source selection process: instead of random samples, we built model that chooses the most difficult data for translation. And weβve already put our work into practice: this yearβs WMT25 General MT test set use our approach to make eval more challenging.
16.09.2025 09:51 β π 2 π 0 π¬ 1 π 0
π©Machine Translation is far from βsolvedβ - the test sets just got too easy. π©
Yes, the systems are much stronger. But the other half of the story is that test sets havenβt kept up. Itβs no longer enough to just take a random news article and expect systems to stumble.
16.09.2025 09:51 β π 6 π 1 π¬ 1 π 0
Command A Translate: Secure translation for global enterprises
The new industry standard for secure, enterprise-ready machine translation.
Oh, and the best part: weβre releasing the weights so researchers can run wild with it. Stay tuned for our upcoming technical report!
cohere.com/blog/command...
28.08.2025 19:55 β π 0 π 0 π¬ 0 π 0
π Thrilled to share what Iβve been working on at Cohere!
What began in January as a scribble in my notebook βhow challenging would it be...β turned into a fully-fledged translation model that outperforms both open and closed-source systems, including long-standing MT leaders.
28.08.2025 19:55 β π 5 π 1 π¬ 1 π 0
A correction: we obtained 22 multilingual systems while in contrast we got only 14 bilingual systems, highlighting a shift in the field towards multilinguality.
26.08.2025 19:06 β π 0 π 0 π¬ 0 π 0
We received 14 specialized systems while 10 are multilingual. And almost all participants finetuned some LLMs.
In contrast to previous years, constrained systems are now reaching top-tier rankings, challenging the dominance of unconstrained ones.
Stay tuned for the 20th anniversary WMT conference.
23.08.2025 09:28 β π 1 π 0 π¬ 1 π 0
We saw increased momentum in participation growth this year: 36 unique teams competing to improve the performance of MT. Furthermore, we added collected outputs of 24 popular LLMs and online systems. Reaching 50 evaluated systems in our annual benchmark.
23.08.2025 09:28 β π 3 π 1 π¬ 1 π 0
Preliminary Ranking of WMT25 General Machine Translation Systems
We present the preliminary ranking of the WMT25 General Machine Translation Shared Task, in which MT systems have been evaluated using automatic metrics. As this ranking is based on automatic evaluati...
π Preliminary ranking of WMT 2025 General Machine Translation benchmark is here!
But don't draw conclusions just yet - automatic metrics are biased for techniques like metric as a reward model or MBR. The official human ranking will be part of General MT findings at WMT.
arxiv.org/abs/2508.14909
23.08.2025 09:28 β π 9 π 4 π¬ 1 π 0
WMT 2025
Hey, hey! π Weβve released the blind test set for this yearβs WMT General MT and multilingual instruction tasks. Submit your systems to the special 20th anniversary of the conference and see how you compare to others!
The deadline is next week on 3rd July.
www2.statmt.org/wmt25/
26.06.2025 18:09 β π 1 π 0 π¬ 0 π 0
Tired of messy non-replicable multilingual LLM evaluation? So were we.
In our new paper, we experimentally illustrate common eval. issues and present how structured evaluation design, transparent reporting, and meta-evaluation can help us to build stronger models.
17.04.2025 13:12 β π 7 π 1 π¬ 0 π 0
βοΈ Summer internship at Cohere!
Are you excited about multilingual evaluation, human judgment, or meta-eval? Come help us explore how a rigorous eval really looks like while questioning the status quo in LLM evaluation.
Iβm looking for an intern (EU timezone preferred), are you interested? Ping me!
28.03.2025 16:44 β π 7 π 2 π¬ 2 π 0
Itβs here! Our new modelβs technical report is out. I'm especially proud of the work we did on its multilingual capabilities - this was a massive, collective effort!
27.03.2025 16:42 β π 1 π 0 π¬ 0 π 0
Multilingual Instruction Shared Task
Big news from WMT! π We are expanding beyond MT and launching a new multilingual instruction shared task. Our goal is to foster truly multilingual LLM evaluation and best practices in automatic and human evaluation. Join us and build the winning multilingual system!
www2.statmt.org/wmt25/multil...
11.03.2025 18:26 β π 12 π 7 π¬ 1 π 2
AI is evolving fast, and Aya Vision is proof of that. This open-weights model is designed to make LLM more powerful across languages and modalities, especially vision! Canβt wait to see the real-world applications, perhaps at WMT this year π
04.03.2025 14:40 β π 2 π 0 π¬ 0 π 0
WMT24++: Expanding the Language Coverage of WMT24 to 55 Languages & Dialects
As large language models (LLM) become more and more capable in languages other than English, it is important to collect benchmark datasets in order to evaluate their multilingual performance, includin...
Huge shoutout to colleagues at Google & Unbabel for extending our WMT24 testset to 55 languages in four domains, this is game changer! π
I really hope it puts the final nail in the coffin of FLORES or WMT14. The field is evolving, legacy testsets can't show your progress
arxiv.org/abs/2502.124...
01.03.2025 20:30 β π 14 π 6 π¬ 0 π 0
Shared Task: General Machine Translation
* Revamped constrained track β No restrictions on training data except licensing; all open models under 20B parameters are allowed.
* More challenging sources; long-context translation; prompt preambles; and much more.
π All details are available at www2.statmt.org/wmt25/transl...
20.02.2025 21:31 β π 2 π 0 π¬ 0 π 0
* New human-evaluated language pairs: ENβArabic, ENβEstonian, ENβKorean, ENβSerbian, CzechβGerman, BhojpuriβEN, MaasaiβEN
* New multilingual subtask β Can you build a system that translates 30 languages?
* New modalities β Additional context from video and image (text-to-text remains the core).
20.02.2025 21:31 β π 4 π 0 π¬ 1 π 0
Guess what? The jubilee π 20th iteration of WMT General MT π is here, and we want you to participate - as the entry barrier to make an impact is so low!
This isnβt just any repeat. Weβve kept what worked, removed what was outdated, and introduced many exciting new twists! Among the key changes are:
20.02.2025 21:31 β π 18 π 5 π¬ 1 π 3
Yeah, I haven't wrote a paper since it's just a different prompt. It's published in the github repository of GEMBA
09.02.2025 10:14 β π 0 π 0 π¬ 1 π 0
That one is extremely large, but we haven't used it either in the automatic ranking. Unfortunately I'm not aware of any API service for metrics
08.02.2025 11:44 β π 0 π 0 π¬ 1 π 0
π A huge thank you to all organizers, partners, and participants for making this year's WMT General MT Shared Task a success! Stay tuned for WMT25 - many exciting changes are coming! π
20.11.2024 10:16 β π 2 π 0 π¬ 0 π 0
π Highlights from top systems:
β
IOL-Research: led in constrained/open, winning 10/11 in its category.
β
Unbabel-Tower70B: Best participant, winning 8/11 pairs.
β
Claude-3.5-Sonnet: Best overall with 9/11 wins.
β
Shoutout to Dubformer (speech) & CUNI-MH (strong constrained)
20.11.2024 10:16 β π 4 π 0 π¬ 1 π 0
π We introduced new robust and efficient human evaluation protocol: Error Span Annotations (ESA).
π Test sets are now finally document-level!
π We've added three new language pairs, including English-Spanish where translations are near-perfect.
For more details, read our findings paper.
20.11.2024 10:16 β π 0 π 0 π¬ 1 π 0
Exciting time at this year's WMT24 General MT Shared Task:
π Participant numbers increased by over 50%!
ποΈ Decoder-only architectures are leading the way.
π We've introduced a new speech audio modality domain.
π Online systems are losing ground to LLMs.
20.11.2024 10:16 β π 6 π 1 π¬ 2 π 0
PhD Student @SapienzaNLP
Applied Scientist Intern @Amazon Madrid
PhD student at the University of Zurich. Trying to get to know what LLMs knowπ€
#NLProc postdoc @AdaptCentre @TUDublinCompSci | nerd, gamer, runner, language-lover, time-waster, pun enthusiast, singer @arduvocal | he/him π³οΈβπ
Building robust LLMs @Cohere
Senior Research Engineer with the Common Crawl Foundation.
(languages βͺ tech) in DΓΉn Γideann
Lecturer@Queen's Uni Belfast; postdoc&PhD@Edinburgh Uni. I work on LLM post-training, multilingualism, machine translation, and financial AI.
NLP Researcher at ADAPT Centre | PhD
Machine Translation, Speech, LLMs
A series of state-of-the-art, open source and transparent
foundation models for European languages
NLP & ML research @cohereforai.bsky.social π¨π¦
PhD candidate in Biosocial Research at UCL. Research interests include causal inference and psychobiology. Funded by the ESRC & BBSRC. π³οΈβπ
distinguished professor at Universitat d'Alacant | machine translation, low-resourced languages, literary texts | views my own