Vitalii Hirak's Avatar

Vitalii Hirak

@v-hirak.bsky.social

PhD student in Natural Language Processing and Information Retrieval at University at DΓΌsseldorf. Working in the EmergentIR project at GESIS Cologne.

7 Followers  |  6 Following  |  9 Posts  |  Joined: 25.01.2025  |  1.4374

Latest posts by v-hirak.bsky.social on Bluesky

6/6: We hope our work will inspire further research on the intrinsic difficulty of translating and generating different languages in the age of LLMs, particularly through experimentation with alternative decoding strategies.

For now, I'm looking forward to presenting our work in Rabat, Morocco πŸ‡²πŸ‡¦

08.02.2026 16:56 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image Post image

5/6: In the context of
searching for the model’s highest-probability translation, we found that languages with more complex morphology and flexible word order benefit more from wider beam size.

In other words, the standard practice of left-to-right beam search may be suboptimal for these languages.

08.02.2026 16:56 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Spearman correlations between continuous language properties and NLLB-200 chrF++ translation quality scores at beam size k = 5. Source language is English. Sample sizes (i.e. number of target languages) for each property are indicated next to their respective bars. Correlations significant at p < 0.05 are marked with *, at p < 0.01 with **, at p < 0.001 with ***.

Spearman correlations between continuous language properties and NLLB-200 chrF++ translation quality scores at beam size k = 5. Source language is English. Sample sizes (i.e. number of target languages) for each property are indicated next to their respective bars. Correlations significant at p < 0.05 are marked with *, at p < 0.01 with **, at p < 0.001 with ***.

4/6: Through correlation and regression experiments, we found that language properties like typological distance, type/token ratio, and head-finality drive translation quality of both NMT models, even after controlling for more trivial factors such as language resourcedness and script similarity.

08.02.2026 16:56 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Tower+ 9B chrF++ scores vs. NLLB-200 3.3B chrF++ scores at beam size k = 7. Each point denotes a language pair and is colored by source language, while β–Ό denotes target languages officially supported
by Tower+. The blue and orange shaded regions indicate language pairs for which either NLLB-200 or Tower+ scores are higher, respectively. Sample size is n = 7 Γ— 52 = 364.

Tower+ 9B chrF++ scores vs. NLLB-200 3.3B chrF++ scores at beam size k = 7. Each point denotes a language pair and is colored by source language, while β–Ό denotes target languages officially supported by Tower+. The blue and orange shaded regions indicate language pairs for which either NLLB-200 or Tower+ scores are higher, respectively. Sample size is n = 7 Γ— 52 = 364.

3/6: We analyze 2 NMT models, NLLB-200 and Tower+.

Although current SOTA has shifted to prompting decoder-only LLMs such as Tower+, we find that NLLB achieves higher chrF++ scores on all languages outside Tower's coverage, reaffirming the relevance of encoder-decoders for low-resourced languages.

08.02.2026 16:56 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

2/6: First, we compile a broad set of fine-grained typological and morphosyntactic features for 212 languages in the FLORES+ MT benchmark. We release this set publicly: github.com/v-hirak/expl...

08.02.2026 16:56 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Assessing the Impact of Typological Features on Multilingual Machine Translation in the Age of Large Language Models Despite major advances in multilingual modeling, large quality disparities persist across languages. Besides the obvious impact of uneven training resources, typological properties have also been prop...

Our paper has been accepted to #EACL2026 main conference!

Together with @jumelet.bsky.social and @arianna-bis.bsky.social, we study the effect of target language typology on the difficulty of state-of-the-art neural machine translation.

arXiv preprint: arxiv.org/abs/2602.03551

1/6 Our findings ⬇️

08.02.2026 16:56 β€” πŸ‘ 15    πŸ” 1    πŸ’¬ 1    πŸ“Œ 1

Henry Cavill is a creep though

17.06.2025 10:54 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

They aren't canonizing anything, this show is gonna be as canon as the millions of other people's playthroughs. It's just their take on the story

17.06.2025 10:53 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Thank you from a Ukrainian, Kala, sincerely πŸ™ I love your Mass Effect content

28.02.2025 22:34 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@v-hirak is following 6 prominent accounts