Form and Meaning in Intrinsic Multilingual Evaluations
Intrinsic evaluation metrics for conditional language models, such as perplexity or bits-per-character, are widely used in both mono- and multilingual settings. These metrics are rather straightforwar...
New EACL paper (with @mdlhx.bsky.social)! We tested if comparing perplexity of parallel data across languages is fair. Turns out: it depends. We show the choice of test set (even with consistent meaning) can flip conclusions about which language is easier to model.
Paper: arxiv.org/abs/2601.10580
28.01.2026 13:25 β
π 10
π 3
π¬ 0
π 0
Authors: @wpoelman.bsky.social, Thomas Bauwens and @mdlhx.bsky.social
03.11.2025 12:06 β
π 0
π 0
π¬ 0
π 0
Confounding Factors in Relating Model Performance to Morphology
Wessel Poelman, Thomas Bauwens, Miryam de Lhoneux. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025.
We are presenting this paper at #EMNLP2025 in the βMultilinguality and Language Diversityβ oral session this Wednesday (November 5th) from 11:00-12:30 (UTC+8). Paper: aclanthology.org/2025.emnlp-m... Code: github.com/LAGoM-NLP/Co...
03.11.2025 11:53 β
π 3
π 0
π¬ 1
π 0
Our proposed tokenizer metrics are a step in that direction
03.11.2025 11:53 β
π 0
π 0
π¬ 1
π 0
We disentangle more such factors in an attempt to outline what the βidealβ experiment would look like and how to work backwards to a feasible setup. This way, we outline the requirements to reliably answer whether, and how, morphology relates to language modeling.
03.11.2025 11:53 β
π 0
π 0
π¬ 1
π 0
Finally, we take a look at experimental factors that confounded experiments and conclusions in prior research. Coarse language grouping is one of several confounding factors.
03.11.2025 11:53 β
π 0
π 0
π¬ 1
π 0
What's more: using entropy allows for finer-grained ordering of languages than the coarse groupings of "agglutinative" and "fusional".
03.11.2025 11:53 β
π 2
π 0
π¬ 1
π 0
We compute the normalized entropy over each token's distribution of neighbors, and indeed find that agglutinative languages tend to have higher entropy than fusional languages on average.
03.11.2025 11:53 β
π 2
π 0
π¬ 1
π 0
To measure this token ambiguity, we re-visit the idea of accessor variety (AV) from Harris (1955) and Feng et al. (2004) by counting which tokens neighbor each other in a corpus and how many times.
03.11.2025 11:53 β
π 1
π 0
π¬ 1
π 0
it is harder to predict the next token. We then hypothesize that this contextual ambiguity is higher in morphologically complex languages.
03.11.2025 11:53 β
π 0
π 0
π¬ 1
π 0
In our new #EMNLP2025 paper, we argue that such statistics should relate directly to what a language model actually does: reliably predicting the next token produced by its tokenizer. We argue that if the most recent token has more contextual ambiguity,
03.11.2025 11:53 β
π 3
π 0
π¬ 1
π 0
When is a language hard to model? Previous research has suggested that morphological complexity both does and does not play a role, but it does so by relating the performance of language models to corpus statistics of words or subword tokens in isolation.
03.11.2025 11:53 β
π 7
π 3
π¬ 1
π 0
Ok, added the ones that were missing from yours to ours
12.08.2025 10:54 β
π 1
π 0
π¬ 0
π 0
β
12.08.2025 10:48 β
π 1
π 0
π¬ 0
π 0
You're included in the NLP labs starter pack, see go.bsky.app/LKGekew
11.08.2025 09:46 β
π 1
π 0
π¬ 1
π 0
Reminder, a few more days to apply!
03.06.2025 09:35 β
π 2
π 3
π¬ 0
π 0
CLIN35
Computational Linguistics in The Netherlands (CLIN) is a yearly conference on computational linguistics. Each year the conference is organized by a different institution in the Dutch-speaking region. ...
π
Don't forget! The deadline for submitting your abstract to the #CLIN conference in Leuven is coming: 13th of June! Submitting is easy: name, title of your work, 500-word abstract, done! #nlp #nlproc #compling #llm #ai #dutch clin35.ccl.kuleuven.be
02.06.2025 07:08 β
π 1
π 2
π¬ 0
π 2
We are hiring in #nlproc!!
16.05.2025 08:24 β
π 2
π 1
π¬ 0
π 0
β
18.02.2025 08:14 β
π 1
π 0
π¬ 0
π 0
Iβm looking for a postdoc, to start ideally ASAP!
The work would be in the EU-funded TrustLLM project, focusing on modularisation and language adaptation of LLMs, tokenization, and evaluation benchmarks for multilingual LLMs. The position would be full-time for 2 years with no teaching obligation.
13.12.2024 10:18 β
π 17
π 8
π¬ 1
π 0
We look at the role of English in this evaluation: it can be, and is often used as, an interface to boost task performance. Or it can be used as a natural language to evaluate language understanding. We recommend to move away from task performance as a main goal and focus on language understanding.
12.12.2024 15:28 β
π 7
π 0
π¬ 0
π 0
π¨ New Account Alert! This is the official account of the *MilaNLP group*. We had to recreate it because it was not indexed.
If you were following us before, please follow us again. If not, nowβs the perfect time to start!
06.12.2024 14:08 β
π 19
π 7
π¬ 1
π 0
MilaNLP Lab (@milanlp.bsky.social)
The Milan Natural Language Processing Group #NLProc #ML #AI https://milanlproc.github.io/
milanlp.bsky.social is having the same issue, maybe take a look at this github issue here: github.com/bluesky-soci...
02.12.2024 09:39 β
π 2
π 0
π¬ 0
π 0
NLP grad students
Join the conversation
There's too many starter packs.
π Here's a list, mostly for NLP, ML, and related areas.
01.12.2024 03:05 β
π 40
π 11
π¬ 3
π 2
Moreover, we advocate for a shift in perspective from seeking a general definition of data quality towards a more language- and task-specific one. Ultimately, we aim for this study to serve as a guide to using Wikipedia for pretraining in a multilingual setting.
29.11.2024 14:02 β
π 3
π 0
π¬ 0
π 0
We evaluate the downstream impact of quality filtering on Wikipedia by training tiny monolingual pretrained models for each Wikipedia to find that data quality pruning is an effective means for resource-efficient training without hurting performance, especially for LRLs.
29.11.2024 14:02 β
π 3
π 0
π¬ 1
π 0