Human individual judgements correlate even more strongly with the difference between a model's scores, but that says nothing about a model's abilities *in the wild*! This is contra Hu et al. '24 (www.pnas.org/doi/10.1073/...), & most importantly, provides a fresh dataset for use in this debate.
29.07.2025 11:45 β π 1 π 0 π¬ 0 π 0
My favourite image from the paper, illustrating that LLMs are surprisingly weak at judging grammaticality. Human judgment correlates quite strongly with the *difference* in likelihood (or SLOR) that LLMs assign to pairs of grammatical & ungrammatical sentences, but that's the wrong measure.
29.07.2025 11:18 β π 3 π 0 π¬ 1 π 0
#ACL2025
29.07.2025 09:47 β π 0 π 0 π¬ 1 π 0
I'll be in Vienna only from tomorrow, but today my star PhD student Marianne is already presenting some of our work:
BLIMP-NL, in which we create a large new dataset for syntactic evaluation of Dutch LLMs, and learn a lot about dataset creation, LLM evaluation and grammatical abilities on the way.
29.07.2025 09:46 β π 10 π 1 π¬ 1 π 0
Ik haal dat niet uit de tekst (de tekst leest als een onhandig compromis, maar wel één waar ik mee kan leven), ik zeg alleen dat het een uitkomst kan zijn van het onderzoek, als dat onderzoek met kennis van zaken wordt uitgevoerd.
03.07.2025 10:41 β π 0 π 0 π¬ 0 π 0
Prima, maar dat is de klassieke discussie tussen fundi's en realo's. Maar dankzij de realo's hebben we in Europa, in veel landen Γ¨n aan veel universiteiten nu wel regels die een verantwoord gebruik van nieuwe technologie bevorderen.
03.07.2025 10:38 β π 0 π 0 π¬ 0 π 0
En over de open brieven: Ik deel veel van de zorgen, maar ze zitten zo vol hyperbolen, en het advies "verban AI" is zo onrealistisch, dat ze gemakkelijk terzijde kunnen en zullen worden geschoven. Dus daar ben ik niet erg enthousiast over.
03.07.2025 08:46 β π 0 π 0 π¬ 1 π 0
Zeker hebben werknemers die kans nu ook, maar tegelijk ontbreekt het bij veel medewerkers aan kennis over AI. Een uitkomst van zo'n studie kan ook zijn een beter beeld van wanneer AI wel verantwoordelijk kan worden ingezet, en wanneer niet, en vd mogelijkheden van open source & energiezuinige AI.
03.07.2025 08:39 β π 0 π 0 π¬ 1 π 0
Mmm, je tirade richt zich op een subtext die ik er niet meteen in herken. Letterlijk staat er "een kans" en werknemers moeten "de kans krijgen", en daar lijkt mij niet veel mis mee.
03.07.2025 07:19 β π 0 π 0 π¬ 1 π 0
Even voor mijn begrip: het is vanwege die twee zinnen in het akkoord, waarin in wat ambtelijke taal wordt opgeroepen tot 'een studie', dat je tegen gaat stemmen? Of is er op de rest ook wat aan te merken?
03.07.2025 06:59 β π 0 π 0 π¬ 1 π 0
Het stuk is van the Editorial Board, dus zonder specifieke auteursnamen.
En heeft m.i. schokkend beperkt beeld over de internationale conflicten van de afgelopen decennia.
15.06.2025 08:48 β π 0 π 0 π¬ 1 π 0
Interpretability Techniques for Speech Models β Tutorial @ Interspeech 2025
The @interspeech.bsky.social early registration deadline is coming up in a few days!
Want to learn how to analyze the inner workings of speech processing models? π Check out the programme for our tutorial:
interpretingdl.github.io/speech-inter... & sign up through the conference registration form!
13.06.2025 05:18 β π 28 π 10 π¬ 1 π 2
Deadlines for PhD and Postdoc vacancies coming up: applications open until Monday June 2!
30.05.2025 08:44 β π 5 π 5 π¬ 0 π 1
Ben ik heel cynisch als ik denk dat economische belangen bij de grote advocatenkantoren al heel lang belangrijker zijn dan "recht en gerechtigheid" en het beschermen van "kwetsbare groepen"? Eens, hoor, met de verontwaardiging en de oproep, maar volgens mij is alleen de "angst" nieuw.
22.05.2025 07:46 β π 6 π 0 π¬ 1 π 0
Amsterdam Lectures in AI and Society βΉ clclab
Melanie Mitchell, leading AI researcher from the Santa Fe Institute and key voice in the discussions on the abilities and inabilities of Large Language Models, will speak at Amsterdam Science Park.
This Friday at 3pm, Amsterdam's Computational Linguistics Seminar and the CLClab will host Melanie Mitchell, for a special edition of the Amsterdam Lectures in AI and Society. She will speak about "AI's Challenge of Understanding the World". Zoom link/ register:
clclab.netlify.app/2025/05/07/a...
14.05.2025 07:19 β π 2 π 2 π¬ 0 π 0
Exciting to see a extensive study of -ity/-ness & frequency effects in LLMs! The same phenomena already inspired a beautiful analysis of pre-deep learning, Bayesian learning algorithms. Do you know Tim O'Donnell's papers & book?
Productivity and Reuse in Language www.jstor.org/stable/j.ctt...
11.05.2025 07:22 β π 0 π 0 π¬ 0 π 0
I usually don't comment on my typos, but since we're talking AI innovation: isn't it infuriating that Google's gboard still hasn't learned that I never ever mean 'neutral network' when I swype 'neural network'? And that good-old-swypers need to choose between that and MS's even worse SwiftKey?
08.05.2025 17:54 β π 3 π 0 π¬ 0 π 0
TIL that professor Kunihiko Fukushima, inventor of the neocognitron, is still an active researcher in his late 90s. The neocognitron is a convolutional neutral network (CNN); CNNs is turn were the model family with which deep learning revolution in Artificial Intelligence started. A living legend!
08.05.2025 12:33 β π 23 π 3 π¬ 1 π 1
Isn't what you do in section 4 simply "representational similarity analysis"? I'm surprised not to see that term in the paper.
08.05.2025 06:21 β π 1 π 0 π¬ 0 π 0
Congratulations on finding a nice opportunity to advertise your book. But if you want your hitpiece to be convincing you need better points than "he calls himself a historian but only studied... history" and "I found his secret donors on... his website". Don't waste your writing talent on nastiness.
06.05.2025 19:44 β π 1 π 0 π¬ 0 π 1
Thanks. I mainly hope that the papers prove useful for people developing benchmarks and/or measures! For us, it was a hard paper to write, with so many differences in terms, writing styles, evaluation standards etc between psychology and NLP.
Good to see converging viewpoints!
03.05.2025 19:08 β π 4 π 0 π¬ 0 π 0
Oskar van der Wal's personal website
Thanks for the reference to @hannawallach.bsky.social ++'s paper! So far only scanned it, but it looks like they arrive at
similar conclusions as we did in JAIR last year: odvanderwal.nl/2024/paper-c...
So, yes, I agree evaluation in NLP is a bit of a mess, & measurement theory has much to offer!
03.05.2025 16:13 β π 5 π 0 π¬ 1 π 0
β¨New paper β¨
Introducing πMultiBLiMP 1.0: A Massively Multilingual Benchmark of Minimal Pairs for Subject-Verb Agreement, covering 101 languages!
We present over 125,000 minimal pairs and evaluate 17 LLMs, finding that support is still lacking for many languages.
π§΅β¬οΈ
07.04.2025 14:55 β π 75 π 22 π¬ 3 π 4
Thanks for the reference! I have also wondered "What happened to mirror neurons?", so this paper looks like a useful overview. But what a missed opportunity to not have something like "Reflecting back on the mirror neuron debate" in the title. :).
03.04.2025 08:15 β π 2 π 0 π¬ 0 π 0
Ah, good point - I did not check the rubric (not a reviewer for ICML).
Just for the record: reviewer 1, who gave us a 1 for not citing their favourite paper and for "not being super clear", may return from hell now, and instead spend some time in the purgatory.
25.03.2025 13:06 β π 2 π 0 π¬ 0 π 0
Wow, grumpy lot those ICML reviewers!
25.03.2025 12:44 β π 1 π 0 π¬ 1 π 0
Seems like a crazy way to spend these resources. I'm all for helping American scholars move to Europe, but those that already have an ERC grant waiting for them are pretty well off. The extra 1M⬠would be better used to help *other* scholars, and not to make them dependent on those lucky colleagues.
25.03.2025 12:24 β π 9 π 2 π¬ 1 π 0
#obadiah11 singer of things frazeyford.com
historicus, penvoerder, dagvoorzitter, jurysecretaris, podcaster 'Betrouwbare Bronnen' en nu 'cultheld' [dixit Trouw] en 'geheim wapen' [dixit Volkskrant]
Computational cognitive scientist @ MIT, reverse engineering the mind and engineering more human intelligence in machines.
Professor of Online Communication. Researching the transformation of communication and society. Critical optimist.
Freelancer in non-profit sector. Eerder gemeenteraadslid in Nijmegen en Amsterdam en voorzitter FNV-ledenparlement. Campagnes klimaat- en mensenrechten. www.anderegeluiden.nl
Researcher in TTS | Interested in text processing for TTS, phonetics, prosody, evaluation, multilingual processing and underresourced languages. | Currently self-employed. Member of ISCA Board, Technical Program Chair @ Interspeech 2025
Linguist (Semantics, Cognition, Computation)
PI of ABSTRACTION (ERC-2021-STG-101039777)
https://linktr.ee/mariannabolognesi
I work at Sakana AI ππ π‘ β @sakanaai.bsky.social
https://sakana.ai/careers
Engineer by day | Nature & landscape photographer | Birds, bugs, spiders | Friend of animals that scare people | Be decent & kind | she / her | IG heyjencross
πΈwww.jencrossphoto.com
academic at @phoneticslab.bsky.social :: speech production, vocal tract imaging, dynamical systems, computational modelling :: https://samkirkham.github.io
Human-computer interaction researcher. PhD from University of Minnesota. Tacoma, WA. Mastodon: zwlevonian@hci.social
VP and Distinguished Scientist at Microsoft Research NYC. AI evaluation and measurement, responsible AI, computational social science, machine learning. She/her.
One photo a day since January 2018: https://www.instagram.com/logisticaggression/
First Workshop on Technical AI Governance. ICML 2025, Vancouver.
Prof. Public Philosophy EUR, cinema lover, literature. Daily podcast: Zin van de Dag. Columnist for NRC
Assistant Professor in Neuroscience at the Donders Institute & Radboudumc.
Oscillations, language, the visual system, source reconstruction methods, and decoding. Open source enthusiast. https://britta-wstnr.github.io
Research director @Inria, Head of @flowersInria
lab, prev. @MSFTResearch @SonyCSLParis
Artificial intelligence, cognitive sciences, sciences of curiosity, language, self-organization, autotelic agents, education, AI and society
http://www.pyoudeyer.com
Thinking about multimodal representations | Postdoc at UCPH/Pioneer Centre for AI (DK).
Associate Professor at Polytechnique Montreal and Mila.