Dialogue Is Not Enough to Make a Communicative BabyLM
(But Neither Is Developmentally Inspired Reinforcement Learning)
Francesca Padovani1∗ Bastian Bunzeck2∗ Manar Ali2 Omar Momen2
Arianna Bisazza1 Hendrik Buschmeier2 Sina Zarrieß2
1Center for Language and Cognition (CLCG), University of Groningen
2CRC 1646 – Linguistic Creativity in Communication, Bielefeld University
f.padovani@rug.nl bastian.bunzeck@uni-bielefeld.de
                                                
    
    
    
    
            As part of this year's BabyLM challenge, we (researchers from @gronlp.bsky.social and @clausebielefeld.bsky.social  diverged from established pretraining paradigm by training only on dialogue data from CHILDES.
               
            
            
                28.10.2025 12:53 — 👍 16    🔁 3    💬 1    📌 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Preprint alert! We release BabyBabelLM, a multilingual benchmark of developmentally plausible training data. I was responsible for German and Polish data as well as various child-directed wikis. Immensely rewarding project with exceptionally cool co-authors. 🥳🚀
               
            
            
                14.10.2025 17:19 — 👍 11    🔁 3    💬 0    📌 1                      
            
         
            
        
            
            
            
            
                                                
                                                
    
    
    
    
            𝐃𝐨 𝐲𝐨𝐮 𝐫𝐞𝐚𝐥𝐥𝐲 𝐰𝐚𝐧𝐭 𝐭𝐨 𝐬𝐞𝐞 𝐰𝐡𝐚𝐭 𝐦𝐮𝐥𝐭𝐢𝐥𝐢𝐧𝐠𝐮𝐚𝐥 𝐞𝐟𝐟𝐨𝐫𝐭 𝐥𝐨𝐨𝐤𝐬 𝐥𝐢𝐤𝐞? 🇨🇳🇮🇩🇸🇪
Here’s the proof! 𝐁𝐚𝐛𝐲𝐁𝐚𝐛𝐞𝐥𝐋𝐌 is the first Multilingual Benchmark of Developmentally Plausible Training Data available for 45 languages to the NLP community 🎉
arxiv.org/abs/2510.10159
               
            
            
                14.10.2025 17:01 — 👍 40    🔁 16    💬 2    📌 1                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Happening in an hour! 🥳
               
            
            
                23.09.2025 13:36 — 👍 1    🔁 0    💬 0    📌 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            If you are at #IWCS, then you should not miss Sanne‘s talk ”Not Just Who or What: Modeling the Interaction of Linguistic and Annotator Variation in Hateful Word Interpretation“ (Sanne Hoeken, Özge Alacam, Dong Nguyen, Massimo Poesio, Sina Zarrieß), tomorrow at 16:30! 🕟
@sannehoeken.bsky.social
               
            
            
                22.09.2025 10:15 — 👍 4    🔁 1    💬 0    📌 1                      
            
         
            
        
            
            
            
            
                                                
                                            Sina in front of a slide with different size circles
                                                
    
    
    
    
            Sina Zarieß is giving the KONVENS keynote on training BabyLMs #nlproc
The slide shows the number of words a 12yo human has seen in their lifetime compared to the numbers of words typical language models have seen in training #llm
               
            
            
                11.09.2025 11:43 — 👍 6    🔁 3    💬 0    📌 0                      
            
         
            
        
            
            
            
            
                                                
                                                
    
    
    
    
            Happening now: Sina‘s keynote on our BabyLM work. 🥳
               
            
            
                11.09.2025 11:34 — 👍 5    🔁 0    💬 0    📌 1                      
            
         
            
        
            
            
            
            
                                                
                                                
    
    
    
    
            Great first day at #KONVENS2015 today. Looking forward to another engaging day with a keynote by Sina Zarrieß tomorrow 🤓
@clausebielefeld.bsky.social
               
            
            
                10.09.2025 20:36 — 👍 2    🔁 1    💬 1    📌 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Don’t miss Sina‘s keynote on BabyLMs at #konvens tomorrow!
               
            
            
                10.09.2025 11:09 — 👍 3    🔁 0    💬 0    📌 0                      
            
         
            
        
            
            
            
            
                                                
                                                
    
    
    
    
            Final Keynote of #semdial by David Schlangen on ”Meaningful Interaction with Unreal Speakers?“ 😇💬
               
            
            
                05.09.2025 09:32 — 👍 2    🔁 0    💬 1    📌 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Final day at #semdial2025 #bialogue — four more presentations, one key note and hopefully many engaging discussions. Let's go!
               
            
            
                05.09.2025 06:11 — 👍 0    🔁 1    💬 0    📌 0                      
            
         
            
        
            
            
            
            
                                                
                                                
    
    
    
    
            Second #semdial keynote by Robert Hawkins on ”Foraging for common ground“
               
            
            
                04.09.2025 14:03 — 👍 3    🔁 0    💬 0    📌 0                      
            
         
            
        
            
            
            
            
                                                
                                                
    
    
    
    
            Day 2 of #semdial starts with a session on LMs and dialogue systems 🤩
               
            
            
                04.09.2025 06:40 — 👍 3    🔁 0    💬 0    📌 0                      
            
         
            
        
            
            
            
            
                                                
                                                
    
    
    
    
            Actually yes! Dialogue differs distinctly from monologues in terms of phonetic features and in the production of novel phonetic forms!
               
            
            
                03.09.2025 09:41 — 👍 2    🔁 0    💬 0    📌 0                      
            
         
            
        
            
            
            
            
                                                
                                                
    
    
    
    
            Leonie Schade asks whether it takes two to do an articulatory tango 😁
               
            
            
                03.09.2025 09:24 — 👍 6    🔁 1    💬 1    📌 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            And the second talk features contributions by our PI Sina Zarrieß. 🤩
               
            
            
                03.09.2025 08:35 — 👍 6    🔁 0    💬 1    📌 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            #semdial has begun 💬
               
            
            
                03.09.2025 07:33 — 👍 1    🔁 0    💬 0    📌 0                      
            
         
            
        
            
            
            
            
                                                
                                                
    
    
    
    
            #semdial is about to begin 🥳
               
            
            
                03.09.2025 07:01 — 👍 2    🔁 2    💬 1    📌 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Program: semdial2025.github.io/program/
Proceedings: purl.org/semdial/2025...
               
            
            
                02.09.2025 20:11 — 👍 0    🔁 0    💬 0    📌 0                      
            
         
            
        
            
            
            
            
                                                
                                                
    
    
    
    
            #semdial2025, the long-awaited #bialogue conference starts tomorrow! We are looking forward to three wonderful conference days, featuring three exciting keynotes, and many oral and poster presentations on the semantics and pragmatics of dialogue. 👄💬
Check out the program and proceedings below. 👇
               
            
            
                02.09.2025 20:10 — 👍 3    🔁 0    💬 1    📌 1                      
            
         
            
        
            
            
            
            
                                                
                                                
    
    
    
    
            Let’s go!
               
            
            
                01.08.2025 10:00 — 👍 3    🔁 0    💬 0    📌 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Is simpler child-directed language easier to learn?
Check out our CoNLL paper "Do Construction Distributions Shape Formal Language Learning in German BabyLMs?" 
@conll-conf.bsky.social
               
            
            
                01.08.2025 09:24 — 👍 2    🔁 2    💬 1    📌 0                      
            
         
            
        
            
        
            
            
            
            
            
    
    
    
    
            Our PI Sina will give an oral presentation on "Components of Creativity: Language Model-based Predictors for Clustering and Switching in Verbal Fluency" at @conll-conf.bsky.social in 45 minutes. Come check it out if you are at @aclmeeting.bsky.social  #ACL2025NLP
               
            
            
                01.08.2025 09:13 — 👍 4    🔁 1    💬 1    📌 0                      
            
         
            
        
            
            
            
            
                                                
                                                
    
    
    
    
            Impromptu dinner after @conll-conf.bsky.social #ACL2025NLP, connecting Bielefeld and the Netherlands over Greek food 😇👌
               
            
            
                31.07.2025 17:17 — 👍 6    🔁 0    💬 0    📌 0                      
            
         
            
        
            
            
            
            
                                                
                                                
    
    
    
    
            Happening now: catch Simeon, Manar and Larissa presenting their paper -Are Multimodal Large Language Models Pragmatically Competent Listeners in Simple Reference Resolution Tasks?- in hall X5. #ACL2025NLP
               
            
            
                28.07.2025 16:00 — 👍 3    🔁 1    💬 0    📌 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            It’s actually 41. 🙃
               
            
            
                28.07.2025 08:46 — 👍 3    🔁 0    💬 0    📌 0                      
            
         
            
        
            
            
            
            
                                                
                                                
    
    
    
    
            Happening now — Clara, Judith and Sina present their poster:
Can LLMs Ground when they (Don’t) Know: A Study on Direct and Loaded Political Questions 
(Poster board 45) #ACL2025NLP
               
            
            
                28.07.2025 08:45 — 👍 3    🔁 0    💬 1    📌 0                      
            
         
            
        
            
            
            
            
                                                
                                            Overview of CLAUSE papers at ACL
                                                
    
    
    
    
            The CLAUSE group from Bielefeld University is looking forward to next month‘s ACL in Vienna, where we will be presenting quite a few papers. 🥳 
Feel free to get in touch if you want to know more. 😇
               
            
            
                26.06.2025 07:55 — 👍 9    🔁 6    💬 0    📌 0                      
            
         
            
        
            
            
            
            
                                                
                                            Overview of CLAUSE papers at ACL
                                                
    
    
    
    
            The CLAUSE group from Bielefeld University is looking forward to next month‘s ACL in Vienna, where we will be presenting quite a few papers. 🥳 
Feel free to get in touch if you want to know more. 😇
               
            
            
                26.06.2025 07:55 — 👍 9    🔁 6    💬 0    📌 0                      
            
         
    
         
        
            
        
                            
                    
                    
                                            Glossa: a journal of general linguistics is a Diamond Open Access journal owned and controlled by the linguistics scholarly community, with no financial barriers to publishing for authors. https://www.glossa-journal.org
                                     
                            
                    
                    
                                            Postdoctoral Researcher at the Princeton Social Neuroscience Lab
Incoming Assistant Professor at Texas A&M - 2026
                                     
                            
                    
                    
                                            You can learn more about me here: https://nikitas-theo.github.io/
                                     
                            
                    
                    
                                            ReDICo: Researching Digital Interculturality Co-operatively. redico.eu
An interdisciplinary project exploring intercultural practices and discourses within digital spaces and beyond. 
Join our research hub: hub.redico.eu
                                     
                            
                    
                    
                                            Asst Prof UCSD Cognitive Science
language development | cognitive development | learning
https://mzettersten.github.io/
(he/his)
                                     
                            
                    
                    
                                            Postdoctoral researcher at @UniPotsdam Topics: Computational Models for Multimodal Understanding, Conversational AI, Multimodal Hate Speech/Emotion/Sentiment
Website: https://sherzod-hakimov.github.io/
                                     
                            
                    
                    
                                            Queer. Catholic. Immigrant. Computational linguist. Often earnest but here to learn the art of shitposting. they/them
Languages: en, de, fr, sv
This account mixes personal and professional content. Also available at https://mastodon.social/@_dmh
                                     
                            
                    
                    
                                            2nd year PhD Student at @gronlp.bsky.social  🐮 - University of Groningen 
Language Acquisition - NLP
                                     
                            
                    
                    
                                            asst prof @Stanford linguistics | director of social interaction lab 🌱 | bluskies about computational cognitive science & language
                                     
                            
                    
                    
                                            Cognitive scientist. Language, Categorization, social interactions. 
Lover of sea, dance & cats 🍉🌈
🎓 Postdoctoral researcher for ABSTRACTION ERC project at University of Bologna. @abstractionerc.bsky.social
                                     
                            
                    
                    
                                            Professora del Departament de Filologia Catalana de la U. d'Alacant. Fraseòloga. Entre la lingüística, la traducció and anything in between.
https://orcid.org/0000-0003-1235-8301
                                     
                            
                    
                    
                                            ELLIS PhD Student at MaiNLP
@ellis.eu @mainlp.bsky.social @munichcenterml.bsky.social
Semi-serious runner for Berlin Track Club and my sanity
                                     
                            
                    
                    
                                            Researcher at Cohere | Multilingual LLM evaluation
                                     
                            
                    
                    
                                            👉 Please now follow @interspeech.bsky.social instead!
                                     
                            
                    
                    
                                            PhD student @ Saarland University
                                     
                            
                    
                    
                                            Associate professor of comparative literature and digital humanist
                                     
                            
                    
                    
                                            Language technology research group at the University of Helsinki @helsinki.fi
                                     
                            
                    
                    
                                            « Ὅσον ζῇς φαίνου »
Researcher in computational linguistics at University of Zurich
                                     
                            
                    
                    
                                            PhD student at @gesis.org & @hhu.de, computational linguist, researching linguistic factors in (annotation) disagreement and language model behavior.
                                     
                            
                    
                    
                                            PhD student at ILLC - University of Amsterdam 🌷
Interested in linguistics and interpretability.