Jaap Jumelet's Avatar

Jaap Jumelet

@jumelet.bsky.social

Postdoc @rug.nl with Arianna Bisazza. Interested in NLP, interpretability, syntax, language acquisition and typology.

719 Followers  |  277 Following  |  38 Posts  |  Joined: 06.10.2023  |  1.881

Latest posts by jumelet.bsky.social on Bluesky

Post image

๐Ÿ“ขOut now in NEJLT!๐Ÿ“ข

In each of these sentences, a verb that doesn't usually encode motion is being used to convey that an object is moving to a destination.

Given that these usages are rare, complex, and creative, we ask:

Do LLMs understand what's going on in them?

๐Ÿงต1/7

19.11.2025 13:56 โ€” ๐Ÿ‘ 14    ๐Ÿ” 3    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Screenshot of a figure with two panels, labeled (a) and (b). The caption reads: "Figure 1: (a) Illustration of messages (left) and strings (right) in toy domain. Blue = grammatical strings. Red = ungrammatical strings. (b) Surprisal (negative log probability) assigned to toy strings by GPT-2."

Screenshot of a figure with two panels, labeled (a) and (b). The caption reads: "Figure 1: (a) Illustration of messages (left) and strings (right) in toy domain. Blue = grammatical strings. Red = ungrammatical strings. (b) Surprisal (negative log probability) assigned to toy strings by GPT-2."

New work to appear @ TACL!

Language models (LMs) are remarkably good at generating novel well-formed sentences, leading to claims that they have mastered grammar.

Yet they often assign higher probability to ungrammatical strings than to grammatical strings.

How can both things be true? ๐Ÿงต๐Ÿ‘‡

10.11.2025 22:11 โ€” ๐Ÿ‘ 83    ๐Ÿ” 19    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 3
Post image

I'm in Suzhou to present our work on MultiBLiMP, Friday @ 11:45 in the Multilinguality session (A301)!

Come check it out if your interested in multilingual linguistic evaluation of LLMs (there will be parse trees on the slides! There's still use for syntactic structure!)

arxiv.org/abs/2504.02768

06.11.2025 07:08 โ€” ๐Ÿ‘ 27    ๐Ÿ” 7    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
accepted papers at main conference and findings

accepted papers at main conference and findings

accepted papers at TACL and workshops

accepted papers at TACL and workshops

With only a week left for #EMNLP2025, we are happy to announce all the works we ๐Ÿฎ will present ๐Ÿฅณ - come and say "hi" to our posters and presentations during the Main and the co-located events (*SEM and workshops) See you in Suzhou โœˆ๏ธ

27.10.2025 11:54 โ€” ๐Ÿ‘ 16    ๐Ÿ” 5    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 3
BabyBabelLM

For more information check out the website, paper, and datasets:

Website: babylm.github.io/babybabellm/
Paper: arxiv.org/pdf/2510.10159

We hope BabyBabelLM will continue as a 'living resource', fostering both more efficient NLP methods, and opening ways for cross-lingual computational linguistics!

15.10.2025 10:53 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Next to our training resources, we also release an evaluation pipeline that assess different aspects of language learning.

We present results for various simple baseline models, but hope this can serve as a starting point for a multilingual BabyLM challenge in future years!

15.10.2025 10:53 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

To deal with data imbalances, we divide languages into three Tiers. This better enables cross-lingual studies and makes it possible for low-resource languages to be a part of BabyBabelLM as well.

15.10.2025 10:53 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

With a fantastic team of international collaborators we have developed a pipeline for creating LM training data from resources that children are exposed to.

We release this pipeline and welcome new contributions!

Website: babylm.github.io/babybabellm/
Paper: arxiv.org/pdf/2510.10159

15.10.2025 10:53 โ€” ๐Ÿ‘ 1    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐ŸŒIntroducing BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data!

LLMs learn from vastly more data than humans ever experience. BabyLM challenges this paradigm by focusing on developmentally plausible data

We extend this effort to 45 new languages!

15.10.2025 10:53 โ€” ๐Ÿ‘ 43    ๐Ÿ” 16    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 3

Wij speelden als kind (in Breda) vaak "1 keer tets", waar je een voetbal maximaal 1 keer mocht laten stuiteren; ik had ook geen idee dat dat een Brabants woord was.

01.09.2025 14:33 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Happening now at the SIGTYP poster session! Come talk to Leonie and me about MultiBLiMP!

01.08.2025 10:17 โ€” ๐Ÿ‘ 20    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I'll be in Vienna only from tomorrow, but today my star PhD student Marianne is already presenting some of our work:

BLIMP-NL, in which we create a large new dataset for syntactic evaluation of Dutch LLMs, and learn a lot about dataset creation, LLM evaluation and grammatical abilities on the way.

29.07.2025 09:46 โ€” ๐Ÿ‘ 11    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Congrats and good luck in Canada!

01.07.2025 23:05 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
TurBLiMP: A Turkish Benchmark of Linguistic Minimal Pairs We introduce TurBLiMP, the first Turkish benchmark of linguistic minimal pairs, designed to evaluate the linguistic abilities of monolingual and multilingual language models (LMs). Covering 16 linguis...

Proud to introduce TurBLiMP, the 1st benchmark of minimal pairs for free-order, morphologically rich Turkish language!

Pre-print: arxiv.org/abs/2506.13487

Fruit of an almost year-long project by amazing MS student @ezgibasar.bsky.social in collab w/ @frap98.bsky.social and @jumelet.bsky.social

19.06.2025 16:28 โ€” ๐Ÿ‘ 11    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 3
Post image

Ik snap niet dat hier niet meer ophef over is:

Het binnenhalen van Amerikaanse wetenschappers wordt betaalt door Nederlandse academici geen inflatiecorrectie op hun salaris te geven.

1/2

13.06.2025 09:00 โ€” ๐Ÿ‘ 66    ๐Ÿ” 35    ๐Ÿ’ฌ 5    ๐Ÿ“Œ 7

Ohh cool! Nice to see the interactions-as-structure idea I had back in 2021 is still being explored!

12.06.2025 22:37 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

My paper with @tylerachang.bsky.social and @jamichaelov.bsky.social will appear at #ACL2025NLP! The updated preprint is available on arxiv. I look forward to chatting about bilingual models in Vienna!

05.06.2025 14:18 โ€” ๐Ÿ‘ 8    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Preview
Child-Directed Language Does Not Consistently Boost Syntax Learning in Language Models Seminal work by Huebner et al. (2021) showed that language models (LMs) trained on English Child-Directed Language (CDL) can reach similar syntactic abilities as LMs trained on much larger amounts of ...

โ€œChild-Directed Language Does Not Consistently Boost Syntax Learning in Language Modelsโ€

Iโ€™m happy to share that the preprint of my first PhD project is now online!

๐ŸŽŠ Paper: arxiv.org/abs/2505.23689

30.05.2025 07:39 โ€” ๐Ÿ‘ 62    ๐Ÿ” 17    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 3
Preview
University: a good idea | Patrick Porter | The Critic Magazine A former student of mine has penned an attack on universities, derived from their own disappointing experience studying Politics and International Relations at the place where I ply my trade. In short...

"A well-delivered lecture isnโ€™t primarily a delivery system for information. It is an ignition point for curiosity, all the better for being experienced in an audience."

Marvellous defence of the increasingly maligned university experience by @patporter76.bsky.social
thecritic.co.uk/university-a...

28.05.2025 10:36 โ€” ๐Ÿ‘ 59    ๐Ÿ” 19    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 2
Stellen OBP - Georg-August-Universitรคt Gรถttingen Webseiten der Georg-August-Universitรคt Gรถttingen

Interested in multilingual tokenization in #NLP? Lisa Beinborn and I are hiring!

PhD candidate position in Gรถttingen, Germany: www.uni-goettingen.de/de/644546.ht...

PostDoc position in Leuven, Belgium:
www.kuleuven.be/personeel/jo...

Deadline 6th of June

16.05.2025 08:23 โ€” ๐Ÿ‘ 25    ๐Ÿ” 13    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 2
Post image

BlackboxNLP, the leading workshop on interpretability and analysis of language models, will be co-located with EMNLP 2025 in Suzhou this November! ๐Ÿ“†

This edition will feature a new shared task on circuits/causal variable localization in LMs, details here: blackboxnlp.github.io/2025/task

15.05.2025 08:21 โ€” ๐Ÿ‘ 21    ๐Ÿ” 8    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 4
Post image

Close your books, test time!
The evaluation pipelines are out, baselines are released & the challenge is on

There is still time to join and
We are excited to learn from you on pretraining and human-model gaps

*Don't forget to fastEval on checkpoints
github.com/babylm/evalu...
๐Ÿ“ˆ๐Ÿค–๐Ÿง 
#AI #LLMS

09.05.2025 14:20 โ€” ๐Ÿ‘ 10    ๐Ÿ” 4    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book? Extremely low-resource (XLR) languages lack substantial corpora for training NLP models, motivating the use of all available resources such as dictionaries and grammar books. Machine Translation from ...

Pleased to announce our paper was accepted at ICLR 2025 as a Spotlight! I will present our poster on Saturday April 26, 3-5pm, Poster #241. Hope to see you there!
arxiv.org/abs/2409.19151

25.04.2025 06:19 โ€” ๐Ÿ‘ 17    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1

Scherp geschreven en geheel mee eens, maar beetje wrang wel dat de boodschap zich achter een paywall van 450 euro bevindt :') (dank voor de screenshots!)

23.04.2025 11:40 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

โœจ New Paper โœจ
[1/] Retrieving passages from many languages can boost retrieval augmented generation (RAG) performance, but how good are LLMs at dealing with multilingual contexts in the prompt?

๐Ÿ“„ Check it out: arxiv.org/abs/2504.00597
(w/ @arianna-bis.bsky.social @Raquel_Fernรกndez)

#NLProc

11.04.2025 16:04 โ€” ๐Ÿ‘ 4    ๐Ÿ” 5    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

That is definitely possible indeed, and a potential confounding factor. In RuBLiMP, a Russian benchmark, they defined a way to validate this based on LM probs, but we left that open for future work. The poor performance on low-res langs shows they're definitely not trained on all of UD though!

17.04.2025 19:03 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

โœจNew paper โœจ

Introducing ๐ŸŒMultiBLiMP 1.0: A Massively Multilingual Benchmark of Minimal Pairs for Subject-Verb Agreement, covering 101 languages!

We present over 125,000 minimal pairs and evaluate 17 LLMs, finding that support is still lacking for many languages.

๐Ÿงตโฌ‡๏ธ

07.04.2025 14:55 โ€” ๐Ÿ‘ 78    ๐Ÿ” 22    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 4

Modern LLMs "speak" hundreds of languages... but do they really?
Multilinguality claims are often based on downstream tasks like QA & MT, while *formal* linguistic competence remains hard to gauge in lots of languages

Meet MultiBLiMP!
(joint work w/ @jumelet.bsky.social & @weissweiler.bsky.social)

08.04.2025 12:27 โ€” ๐Ÿ‘ 21    ๐Ÿ” 6    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1
Preview
MultiBLiMP 1.0: A Massively Multilingual Benchmark of Linguistic Minimal Pairs We introduce MultiBLiMP 1.0, a massively multilingual benchmark of linguistic minimal pairs, covering 101 languages, 6 linguistic phenomena and containing more than 125,000 minimal pairs. Our minimal ...

Joint work with @weissweiler.bsky.social and @arianna-bis.bsky.social.
Check out the full paper at arxiv.org/abs/2504.02768!

We have released all our data on huggingface huggingface.co/datasets/jum...

We hope to extend this pipeline to many more phenomena!

07.04.2025 14:55 โ€” ๐Ÿ‘ 6    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Person agreement is easier to model than Gender or Number. Sentences with higher overall perplexity lead to less accurate judgements, and models are more likely to pick the wrong inflection if it is split into more tokens. Surprisingly, subject-verb distance has no effect.

07.04.2025 14:55 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

@jumelet is following 19 prominent accounts