Jelle Zuidema 🟥 @wzuidema

Human individual judgements correlate even more strongly with the difference between a model's scores, but that says nothing about a model's abilities *in the wild*! This is contra Hu et al. '24 (www.pnas.org/doi/10.1073/...), & most importantly, provides a fresh dataset for use in this debate.

29.07.2025 11:45 — 👍 1 🔁 0 💬 0 📌 0

My favourite image from the paper, illustrating that LLMs are surprisingly weak at judging grammaticality. Human judgment correlates quite strongly with the *difference* in likelihood (or SLOR) that LLMs assign to pairs of grammatical & ungrammatical sentences, but that's the wrong measure.

29.07.2025 11:18 — 👍 3 🔁 0 💬 1 📌 0

#ACL2025

29.07.2025 09:47 — 👍 0 🔁 0 💬 1 📌 0

I'll be in Vienna only from tomorrow, but today my star PhD student Marianne is already presenting some of our work:

BLIMP-NL, in which we create a large new dataset for syntactic evaluation of Dutch LLMs, and learn a lot about dataset creation, LLM evaluation and grammatical abilities on the way.

29.07.2025 09:46 — 👍 10 🔁 1 💬 1 📌 0

BLiMP-NL: A Corpus of Dutch Minimal Pairs and Acceptability Judgments for Language Model Evaluation Abstract. We present a corpus of 8400 Dutch sentence pairs, intended primarily for the grammatical evaluation of language models. Each pair consists of a grammatical sentence and a minimally different...

Find our paper (w/ Michelle Suijkerbuijk, Zoë Prins, Jelle Zuidema @wzuidema.bsky.social, & Stefan Frank @stefanfrank.bsky.social), and dataset here:
📑 doi.org/10.1162/coli...
🗃️ doi.org/10.34973/tj4...

24.07.2025 15:30 — 👍 3 🔁 1 💬 1 📌 0

Experimental Studies on the Cultural Evolution of Language | Annual Reviews Why are languages the way they are? The biases and constraints that explain why languages display the traits they do—instead of other possible ones—include human cognition, social dynamics, communicat...

Here's a 2017 review paper by Monica Tamariz on all kinds of experiments with humans learning to agree on the meaning of signals. Much of it in the context of origins of language; some experiments use gestures to try to avoid biases from existing language.

www.annualreviews.org/content/jour...

05.07.2025 21:24 — 👍 2 🔁 0 💬 0 📌 0

Ik haal dat niet uit de tekst (de tekst leest als een onhandig compromis, maar wel één waar ik mee kan leven), ik zeg alleen dat het een uitkomst kan zijn van het onderzoek, als dat onderzoek met kennis van zaken wordt uitgevoerd.

03.07.2025 10:41 — 👍 0 🔁 0 💬 0 📌 0

Prima, maar dat is de klassieke discussie tussen fundi's en realo's. Maar dankzij de realo's hebben we in Europa, in veel landen èn aan veel universiteiten nu wel regels die een verantwoord gebruik van nieuwe technologie bevorderen.

03.07.2025 10:38 — 👍 0 🔁 0 💬 0 📌 0

En over de open brieven: Ik deel veel van de zorgen, maar ze zitten zo vol hyperbolen, en het advies "verban AI" is zo onrealistisch, dat ze gemakkelijk terzijde kunnen en zullen worden geschoven. Dus daar ben ik niet erg enthousiast over.

03.07.2025 08:46 — 👍 0 🔁 0 💬 1 📌 0

Zeker hebben werknemers die kans nu ook, maar tegelijk ontbreekt het bij veel medewerkers aan kennis over AI. Een uitkomst van zo'n studie kan ook zijn een beter beeld van wanneer AI wel verantwoordelijk kan worden ingezet, en wanneer niet, en vd mogelijkheden van open source & energiezuinige AI.

03.07.2025 08:39 — 👍 0 🔁 0 💬 1 📌 0

Mmm, je tirade richt zich op een subtext die ik er niet meteen in herken. Letterlijk staat er "een kans" en werknemers moeten "de kans krijgen", en daar lijkt mij niet veel mis mee.

03.07.2025 07:19 — 👍 0 🔁 0 💬 1 📌 0

Even voor mijn begrip: het is vanwege die twee zinnen in het akkoord, waarin in wat ambtelijke taal wordt opgeroepen tot 'een studie', dat je tegen gaat stemmen? Of is er op de rest ook wat aan te merken?

03.07.2025 06:59 — 👍 0 🔁 0 💬 1 📌 0

Het stuk is van the Editorial Board, dus zonder specifieke auteursnamen.

En heeft m.i. schokkend beperkt beeld over de internationale conflicten van de afgelopen decennia.

15.06.2025 08:48 — 👍 0 🔁 0 💬 1 📌 0

Interpretability Techniques for Speech Models — Tutorial @ Interspeech 2025

The @interspeech.bsky.social early registration deadline is coming up in a few days!

Want to learn how to analyze the inner workings of speech processing models? 🔍 Check out the programme for our tutorial:
interpretingdl.github.io/speech-inter... & sign up through the conference registration form!

13.06.2025 05:18 — 👍 28 🔁 10 💬 1 📌 2

Deadlines for PhD and Postdoc vacancies coming up: applications open until Monday June 2!

30.05.2025 08:44 — 👍 5 🔁 5 💬 0 📌 1

Ben ik heel cynisch als ik denk dat economische belangen bij de grote advocatenkantoren al heel lang belangrijker zijn dan "recht en gerechtigheid" en het beschermen van "kwetsbare groepen"? Eens, hoor, met de verontwaardiging en de oproep, maar volgens mij is alleen de "angst" nieuw.

22.05.2025 07:46 — 👍 6 🔁 0 💬 1 📌 0

Amsterdam Lectures in AI and Society ‹ clclab Melanie Mitchell, leading AI researcher from the Santa Fe Institute and key voice in the discussions on the abilities and inabilities of Large Language Models, will speak at Amsterdam Science Park.

This Friday at 3pm, Amsterdam's Computational Linguistics Seminar and the CLClab will host Melanie Mitchell, for a special edition of the Amsterdam Lectures in AI and Society. She will speak about "AI's Challenge of Understanding the World". Zoom link/ register:
clclab.netlify.app/2025/05/07/a...

14.05.2025 07:19 — 👍 2 🔁 2 💬 0 📌 0

Exciting to see a extensive study of -ity/-ness & frequency effects in LLMs! The same phenomena already inspired a beautiful analysis of pre-deep learning, Bayesian learning algorithms. Do you know Tim O'Donnell's papers & book?

Productivity and Reuse in Language www.jstor.org/stable/j.ctt...

11.05.2025 07:22 — 👍 0 🔁 0 💬 0 📌 0

I usually don't comment on my typos, but since we're talking AI innovation: isn't it infuriating that Google's gboard still hasn't learned that I never ever mean 'neutral network' when I swype 'neural network'? And that good-old-swypers need to choose between that and MS's even worse SwiftKey?

08.05.2025 17:54 — 👍 3 🔁 0 💬 0 📌 0

TIL that professor Kunihiko Fukushima, inventor of the neocognitron, is still an active researcher in his late 90s. The neocognitron is a convolutional neutral network (CNN); CNNs is turn were the model family with which deep learning revolution in Artificial Intelligence started. A living legend!

08.05.2025 12:33 — 👍 23 🔁 3 💬 1 📌 1

Isn't what you do in section 4 simply "representational similarity analysis"? I'm surprised not to see that term in the paper.

08.05.2025 06:21 — 👍 1 🔁 0 💬 0 📌 0

Congratulations on finding a nice opportunity to advertise your book. But if you want your hitpiece to be convincing you need better points than "he calls himself a historian but only studied... history" and "I found his secret donors on... his website". Don't waste your writing talent on nastiness.

06.05.2025 19:44 — 👍 1 🔁 0 💬 0 📌 1

Thanks. I mainly hope that the papers prove useful for people developing benchmarks and/or measures! For us, it was a hard paper to write, with so many differences in terms, writing styles, evaluation standards etc between psychology and NLP.

Good to see converging viewpoints!

03.05.2025 19:08 — 👍 4 🔁 0 💬 0 📌 0

Oskar van der Wal's personal website

Thanks for the reference to @hannawallach.bsky.social ++'s paper! So far only scanned it, but it looks like they arrive at
similar conclusions as we did in JAIR last year: odvanderwal.nl/2024/paper-c...

So, yes, I agree evaluation in NLP is a bit of a mess, & measurement theory has much to offer!

03.05.2025 16:13 — 👍 5 🔁 0 💬 1 📌 0

✨New paper ✨

Introducing 🌍MultiBLiMP 1.0: A Massively Multilingual Benchmark of Minimal Pairs for Subject-Verb Agreement, covering 101 languages!

We present over 125,000 minimal pairs and evaluate 17 LLMs, finding that support is still lacking for many languages.

🧵⬇️

07.04.2025 14:55 — 👍 75 🔁 22 💬 3 📌 4

Thanks for the reference! I have also wondered "What happened to mirror neurons?", so this paper looks like a useful overview. But what a missed opportunity to not have something like "Reflecting back on the mirror neuron debate" in the title. :).

03.04.2025 08:15 — 👍 2 🔁 0 💬 0 📌 0

Ah, good point - I did not check the rubric (not a reviewer for ICML).

Just for the record: reviewer 1, who gave us a 1 for not citing their favourite paper and for "not being super clear", may return from hell now, and instead spend some time in the purgatory.

25.03.2025 13:06 — 👍 2 🔁 0 💬 0 📌 0

Wow, grumpy lot those ICML reviewers!

25.03.2025 12:44 — 👍 1 🔁 0 💬 1 📌 0

Seems like a crazy way to spend these resources. I'm all for helping American scholars move to Europe, but those that already have an ERC grant waiting for them are pretty well off. The extra 1M€ would be better used to help *other* scholars, and not to make them dependent on those lucky colleagues.

25.03.2025 12:24 — 👍 9 🔁 2 💬 1 📌 0

Florence Nightingale: The pioneer statistician | Science Museum Discover how pioneering statistical methods helped prove that widespread reform of hospital care was vital.

I don't know if there are other examples directly from her; the one you linked is the one I have seen multiple times. E.g., here:
www.sciencemuseum.org.uk/objects-and-...

24.03.2025 10:23 — 👍 1 🔁 0 💬 1 📌 0

Jelle Zuidema 🟥

Latest posts by wzuidema.bsky.social on Bluesky

@wzuidema is following 20 prominent accounts