๐ขOut now in NEJLT!๐ข
In each of these sentences, a verb that doesn't usually encode motion is being used to convey that an object is moving to a destination.
Given that these usages are rare, complex, and creative, we ask:
Do LLMs understand what's going on in them?
๐งต1/7
19.11.2025 13:56 โ ๐ 14 ๐ 3 ๐ฌ 2 ๐ 0
Screenshot of a figure with two panels, labeled (a) and (b). The caption reads: "Figure 1: (a) Illustration of messages (left) and strings (right) in toy domain. Blue = grammatical strings. Red = ungrammatical strings. (b) Surprisal (negative log probability) assigned to toy strings by GPT-2."
New work to appear @ TACL!
Language models (LMs) are remarkably good at generating novel well-formed sentences, leading to claims that they have mastered grammar.
Yet they often assign higher probability to ungrammatical strings than to grammatical strings.
How can both things be true? ๐งต๐
10.11.2025 22:11 โ ๐ 83 ๐ 19 ๐ฌ 2 ๐ 3
I'm in Suzhou to present our work on MultiBLiMP, Friday @ 11:45 in the Multilinguality session (A301)!
Come check it out if your interested in multilingual linguistic evaluation of LLMs (there will be parse trees on the slides! There's still use for syntactic structure!)
arxiv.org/abs/2504.02768
06.11.2025 07:08 โ ๐ 27 ๐ 7 ๐ฌ 0 ๐ 0
accepted papers at main conference and findings
accepted papers at TACL and workshops
With only a week left for #EMNLP2025, we are happy to announce all the works we ๐ฎ will present ๐ฅณ - come and say "hi" to our posters and presentations during the Main and the co-located events (*SEM and workshops) See you in Suzhou โ๏ธ
27.10.2025 11:54 โ ๐ 16 ๐ 5 ๐ฌ 0 ๐ 3
BabyBabelLM
For more information check out the website, paper, and datasets:
Website: babylm.github.io/babybabellm/
Paper: arxiv.org/pdf/2510.10159
We hope BabyBabelLM will continue as a 'living resource', fostering both more efficient NLP methods, and opening ways for cross-lingual computational linguistics!
15.10.2025 10:53 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
Next to our training resources, we also release an evaluation pipeline that assess different aspects of language learning.
We present results for various simple baseline models, but hope this can serve as a starting point for a multilingual BabyLM challenge in future years!
15.10.2025 10:53 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
To deal with data imbalances, we divide languages into three Tiers. This better enables cross-lingual studies and makes it possible for low-resource languages to be a part of BabyBabelLM as well.
15.10.2025 10:53 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
With a fantastic team of international collaborators we have developed a pipeline for creating LM training data from resources that children are exposed to.
We release this pipeline and welcome new contributions!
Website: babylm.github.io/babybabellm/
Paper: arxiv.org/pdf/2510.10159
15.10.2025 10:53 โ ๐ 1 ๐ 1 ๐ฌ 1 ๐ 0
๐Introducing BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data!
LLMs learn from vastly more data than humans ever experience. BabyLM challenges this paradigm by focusing on developmentally plausible data
We extend this effort to 45 new languages!
15.10.2025 10:53 โ ๐ 43 ๐ 16 ๐ฌ 1 ๐ 3
Wij speelden als kind (in Breda) vaak "1 keer tets", waar je een voetbal maximaal 1 keer mocht laten stuiteren; ik had ook geen idee dat dat een Brabants woord was.
01.09.2025 14:33 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
Happening now at the SIGTYP poster session! Come talk to Leonie and me about MultiBLiMP!
01.08.2025 10:17 โ ๐ 20 ๐ 2 ๐ฌ 1 ๐ 0
I'll be in Vienna only from tomorrow, but today my star PhD student Marianne is already presenting some of our work:
BLIMP-NL, in which we create a large new dataset for syntactic evaluation of Dutch LLMs, and learn a lot about dataset creation, LLM evaluation and grammatical abilities on the way.
29.07.2025 09:46 โ ๐ 11 ๐ 1 ๐ฌ 1 ๐ 0
Congrats and good luck in Canada!
01.07.2025 23:05 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
TurBLiMP: A Turkish Benchmark of Linguistic Minimal Pairs
We introduce TurBLiMP, the first Turkish benchmark of linguistic minimal pairs, designed to evaluate the linguistic abilities of monolingual and multilingual language models (LMs). Covering 16 linguis...
Proud to introduce TurBLiMP, the 1st benchmark of minimal pairs for free-order, morphologically rich Turkish language!
Pre-print: arxiv.org/abs/2506.13487
Fruit of an almost year-long project by amazing MS student @ezgibasar.bsky.social in collab w/ @frap98.bsky.social and @jumelet.bsky.social
19.06.2025 16:28 โ ๐ 11 ๐ 2 ๐ฌ 1 ๐ 3
Ik snap niet dat hier niet meer ophef over is:
Het binnenhalen van Amerikaanse wetenschappers wordt betaalt door Nederlandse academici geen inflatiecorrectie op hun salaris te geven.
1/2
13.06.2025 09:00 โ ๐ 66 ๐ 35 ๐ฌ 5 ๐ 7
Ohh cool! Nice to see the interactions-as-structure idea I had back in 2021 is still being explored!
12.06.2025 22:37 โ ๐ 3 ๐ 0 ๐ฌ 0 ๐ 0
My paper with @tylerachang.bsky.social and @jamichaelov.bsky.social will appear at #ACL2025NLP! The updated preprint is available on arxiv. I look forward to chatting about bilingual models in Vienna!
05.06.2025 14:18 โ ๐ 8 ๐ 2 ๐ฌ 1 ๐ 1
University: a good idea | Patrick Porter | The Critic Magazine
A former student of mine has penned an attack on universities, derived from their own disappointing experience studying Politics and International Relations at the place where I ply my trade. In short...
"A well-delivered lecture isnโt primarily a delivery system for information. It is an ignition point for curiosity, all the better for being experienced in an audience."
Marvellous defence of the increasingly maligned university experience by @patporter76.bsky.social
thecritic.co.uk/university-a...
28.05.2025 10:36 โ ๐ 59 ๐ 19 ๐ฌ 0 ๐ 2
Stellen OBP - Georg-August-Universitรคt Gรถttingen
Webseiten der Georg-August-Universitรคt Gรถttingen
Interested in multilingual tokenization in #NLP? Lisa Beinborn and I are hiring!
PhD candidate position in Gรถttingen, Germany: www.uni-goettingen.de/de/644546.ht...
PostDoc position in Leuven, Belgium:
www.kuleuven.be/personeel/jo...
Deadline 6th of June
16.05.2025 08:23 โ ๐ 25 ๐ 13 ๐ฌ 2 ๐ 2
BlackboxNLP, the leading workshop on interpretability and analysis of language models, will be co-located with EMNLP 2025 in Suzhou this November! ๐
This edition will feature a new shared task on circuits/causal variable localization in LMs, details here: blackboxnlp.github.io/2025/task
15.05.2025 08:21 โ ๐ 21 ๐ 8 ๐ฌ 3 ๐ 4
Close your books, test time!
The evaluation pipelines are out, baselines are released & the challenge is on
There is still time to join and
We are excited to learn from you on pretraining and human-model gaps
*Don't forget to fastEval on checkpoints
github.com/babylm/evalu...
๐๐ค๐ง
#AI #LLMS
09.05.2025 14:20 โ ๐ 10 ๐ 4 ๐ฌ 0 ๐ 0
Scherp geschreven en geheel mee eens, maar beetje wrang wel dat de boodschap zich achter een paywall van 450 euro bevindt :') (dank voor de screenshots!)
23.04.2025 11:40 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
โจ New Paper โจ
[1/] Retrieving passages from many languages can boost retrieval augmented generation (RAG) performance, but how good are LLMs at dealing with multilingual contexts in the prompt?
๐ Check it out: arxiv.org/abs/2504.00597
(w/ @arianna-bis.bsky.social @Raquel_Fernรกndez)
#NLProc
11.04.2025 16:04 โ ๐ 4 ๐ 5 ๐ฌ 1 ๐ 1
That is definitely possible indeed, and a potential confounding factor. In RuBLiMP, a Russian benchmark, they defined a way to validate this based on LM probs, but we left that open for future work. The poor performance on low-res langs shows they're definitely not trained on all of UD though!
17.04.2025 19:03 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
โจNew paper โจ
Introducing ๐MultiBLiMP 1.0: A Massively Multilingual Benchmark of Minimal Pairs for Subject-Verb Agreement, covering 101 languages!
We present over 125,000 minimal pairs and evaluate 17 LLMs, finding that support is still lacking for many languages.
๐งตโฌ๏ธ
07.04.2025 14:55 โ ๐ 78 ๐ 22 ๐ฌ 3 ๐ 4
Modern LLMs "speak" hundreds of languages... but do they really?
Multilinguality claims are often based on downstream tasks like QA & MT, while *formal* linguistic competence remains hard to gauge in lots of languages
Meet MultiBLiMP!
(joint work w/ @jumelet.bsky.social & @weissweiler.bsky.social)
08.04.2025 12:27 โ ๐ 21 ๐ 6 ๐ฌ 2 ๐ 1
Person agreement is easier to model than Gender or Number. Sentences with higher overall perplexity lead to less accurate judgements, and models are more likely to pick the wrong inflection if it is split into more tokens. Surprisingly, subject-verb distance has no effect.
07.04.2025 14:55 โ ๐ 3 ๐ 0 ๐ฌ 1 ๐ 0
Deutsche Gesellschaft fรผr Kognitive Linguistik / German Cognitive Linguistics Association https://www.dgkl-gcla.de/
Next conference: Bielefeld University, 31.08.-02.09.2026 https://t1p.de/dgkl2026
Sustainable Fashion Journalist | Climate Justice | focus on Latin America |
PhD candidate in Computational Linguistics, University of Groningen.
From ๐จ๐ฑ Now in ๐ฉ Netherlands
Life is too short for bad wine, bad coffee and bad ontologies.
#NLProc PhD Student at EPFL
PhD Candidate @University of Amsterdam. Working on understanding language generation from visual eventsโparticularly from a "creative" POV!
๐ akskuchi.github.io
professor for natural language processing, head of
BamNLP @bamnlp.de
๐ Duisburg, Stuttgart, Bamberg
#NLProc #emotion #sentiment #factchecking #argumentmining #informationextraction #bionlp
Professor of Language Technology at the University of Helsinki @helsinki.fi
Head of Helsinki-NLP @helsinki-nlp.bsky.social
Member of the Ellis unit Helsinki @ellisfinland.bsky.social
Postdoctoral Scientist at University of Copenhagen. I am currently more focused on developing pixel language models. #nlproc #multilinguality #multimodality
PhD candidate @ University of Amsterdam
evgeniia.tokarch.uk
Sentence processing modeling | Computational psycholinguistics | 1st year PhD student at LLF, CNRS, Universitรฉ Paris Citรฉ | Currently visiting COLT, Universitat Pompeu Fabra, Barcelona, Spain
https://ninanusb.github.io/
Compling PhD student @UT_Linguistics | prev. CS, Math, Comp. Cognitive Sci @cornell
nuclear reactor maintenance
PhD student at the University of Amsterdam working on vision-language models and cognitive computational neuroscience
You can learn more about me here: https://nikitas-theo.github.io/
The largest workshop on analysing and interpreting neural networks for NLP.
BlackboxNLP will be held at EMNLP 2025 in Suzhou, China
blackboxnlp.github.io
PhD candidate // LLMs // low-resource languages // tartunlp.ai & cs.ut.ee // ๐ช๐ช
๐ก https://helehh.github.io/
๐ช๐ช baromeeter.ai