Stefano's Avatar

Stefano

@sted19.bsky.social

PhD Student @SapienzaNLP Applied Scientist Intern @Amazon Madrid

8 Followers  |  10 Following  |  11 Posts  |  Joined: 21.02.2025  |  1.623

Latest posts by sted19.bsky.social on Bluesky

Preview
Estimating Machine Translation Difficulty Machine translation quality has steadily improved over the years, achieving near-perfect translations in recent benchmarks. These high-quality outputs make it difficult to distinguish between state-of...

πŸ“„ Paper: arxiv.org/abs/2508.10175

πŸ€— Models: huggingface.co/collections/...

πŸ’» Code: github.com/zouharvi/tra...

16.09.2025 08:46 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

A huge thanks to my fantastic co-authors: Lorenzo Proietti, @zouharvi.bsky.social, Roberto Navigli, and @kocmitom.bsky.social. πŸ‘

#AI #NLProc #Evaluation

16.09.2025 08:46 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

πŸ€– We release our best models, sentinel-src-24 and sentinel-src-25! Use them to build more robust evaluations, filter data, or explore applications in other areas such as curriculum learning.

16.09.2025 08:46 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

πŸ” Our most surprising finding? LLM-based methods struggle with this task, performing worse than even simple heuristics like sentence length. In contrast, our specialized, trained models are the clear winners.

16.09.2025 08:46 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

In our paper, we:
1️⃣ Define the task and introduce Difficulty Estimation Correlation to evaluate difficulty estimators.
2️⃣ Benchmark a wide range of methods establishing the first SOTA.
3️⃣ Demonstrate their effectiveness in building more challenging test sets automatically.

16.09.2025 08:46 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

πŸ’‘Our solution: increase benchmark difficulty!

What if we could predict in advance which texts are hard to translate? We introduce Translation Difficulty Estimation as a novel task to automatically identify challenging texts for MT systems.

16.09.2025 08:46 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Our new #EMNLP2025 paper is out: "Estimating Machine Translation Difficulty"! πŸš€

Are today's #MachineTranslation systems flawless? When SOTA models all achieve near-perfect scores on standard benchmarks, we hit an evaluation ceiling. How can we tell their true capabilities and drive future progress?

16.09.2025 08:46 β€” πŸ‘ 8    πŸ” 2    πŸ’¬ 1    πŸ“Œ 2

πŸ€– We release our best models, sentinel-src-24 and sentinel-src-25! Use them to build more robust evaluations, filter data, or explore applications in other areas such as curriculum learning.

16.09.2025 08:41 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

πŸ” Our most surprising finding? LLM-based methods struggle with this task, performing worse than even simple heuristics like sentence length. In contrast, our specialized, trained models are the clear winners.

16.09.2025 08:41 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

In our paper, we:

1️⃣ Define the task and introduce Difficulty Estimation Correlation to evaluate difficulty estimators.
2️⃣ Benchmark a wide range of methods establishing the first SOTA.
3️⃣ Demonstrate their effectiveness in building more challenging test sets automatically.

16.09.2025 08:41 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

πŸ’‘Our solution: increase benchmark difficulty!

What if we could predict in advance which texts are hard to translate? We introduce Translation Difficulty Estimation as a novel task to automatically identify challenging texts for MT systems.

16.09.2025 08:41 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@sted19 is following 10 prominent accounts