π Paper: arxiv.org/abs/2508.10175
π€ Models: huggingface.co/collections/...
π» Code: github.com/zouharvi/tra...
@sted19.bsky.social
PhD Student @SapienzaNLP Applied Scientist Intern @Amazon Madrid
π Paper: arxiv.org/abs/2508.10175
π€ Models: huggingface.co/collections/...
π» Code: github.com/zouharvi/tra...
A huge thanks to my fantastic co-authors: Lorenzo Proietti, @zouharvi.bsky.social, Roberto Navigli, and @kocmitom.bsky.social. π
#AI #NLProc #Evaluation
π€ We release our best models, sentinel-src-24 and sentinel-src-25! Use them to build more robust evaluations, filter data, or explore applications in other areas such as curriculum learning.
16.09.2025 08:46 β π 2 π 0 π¬ 1 π 0π Our most surprising finding? LLM-based methods struggle with this task, performing worse than even simple heuristics like sentence length. In contrast, our specialized, trained models are the clear winners.
16.09.2025 08:46 β π 3 π 0 π¬ 1 π 0In our paper, we:
1οΈβ£ Define the task and introduce Difficulty Estimation Correlation to evaluate difficulty estimators.
2οΈβ£ Benchmark a wide range of methods establishing the first SOTA.
3οΈβ£ Demonstrate their effectiveness in building more challenging test sets automatically.
π‘Our solution: increase benchmark difficulty!
What if we could predict in advance which texts are hard to translate? We introduce Translation Difficulty Estimation as a novel task to automatically identify challenging texts for MT systems.
Our new #EMNLP2025 paper is out: "Estimating Machine Translation Difficulty"! π
Are today's #MachineTranslation systems flawless? When SOTA models all achieve near-perfect scores on standard benchmarks, we hit an evaluation ceiling. How can we tell their true capabilities and drive future progress?
π€ We release our best models, sentinel-src-24 and sentinel-src-25! Use them to build more robust evaluations, filter data, or explore applications in other areas such as curriculum learning.
16.09.2025 08:41 β π 0 π 0 π¬ 0 π 0π Our most surprising finding? LLM-based methods struggle with this task, performing worse than even simple heuristics like sentence length. In contrast, our specialized, trained models are the clear winners.
16.09.2025 08:41 β π 0 π 0 π¬ 1 π 0In our paper, we:
1οΈβ£ Define the task and introduce Difficulty Estimation Correlation to evaluate difficulty estimators.
2οΈβ£ Benchmark a wide range of methods establishing the first SOTA.
3οΈβ£ Demonstrate their effectiveness in building more challenging test sets automatically.
π‘Our solution: increase benchmark difficulty!
What if we could predict in advance which texts are hard to translate? We introduce Translation Difficulty Estimation as a novel task to automatically identify challenging texts for MT systems.