Francois Meyer @francois-meyer

Francois Meyer

@francois-meyer.bsky.social

PhD student at the University of Cape Town, working on text generation for low-resource, morphologically complex languages. https://francois-meyer.github.io/ Cape Town, South Africa

51 Followers | 426 Following | 7 Posts | Joined: 19.11.2024 | 1.8921

Latest posts by francois-meyer.bsky.social on Bluesky

🌍Introducing BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data!

LLMs learn from vastly more data than humans ever experience. BabyLM challenges this paradigm by focusing on developmentally plausible data

We extend this effort to 45 new languages!

15.10.2025 10:53 — 👍 43 🔁 16 💬 1 📌 3

𝐃𝐨 𝐲𝐨𝐮 𝐫𝐞𝐚𝐥𝐥𝐲 𝐰𝐚𝐧𝐭 𝐭𝐨 𝐬𝐞𝐞 𝐰𝐡𝐚𝐭 𝐦𝐮𝐥𝐭𝐢𝐥𝐢𝐧𝐠𝐮𝐚𝐥 𝐞𝐟𝐟𝐨𝐫𝐭 𝐥𝐨𝐨𝐤𝐬 𝐥𝐢𝐤𝐞? 🇨🇳🇮🇩🇸🇪

Here’s the proof! 𝐁𝐚𝐛𝐲𝐁𝐚𝐛𝐞𝐥𝐋𝐌 is the first Multilingual Benchmark of Developmentally Plausible Training Data available for 45 languages to the NLP community 🎉

arxiv.org/abs/2510.10159

14.10.2025 17:01 — 👍 42 🔁 16 💬 2 📌 1

Today our poster will be up at @loreslm.bsky.social Poster Session #2 (2-3pm local time Abu Dhabi).

It's also available online at Whova: whova.com/portal/webap...

20.01.2025 06:43 — 👍 1 🔁 0 💬 0 📌 1

This work was carried out by three great UCT CS Honours students - Alexis, Charl, and Hishaam.

14.01.2025 07:11 — 👍 0 🔁 0 💬 0 📌 0

This work unites two directions of research: cognitively plausible modelling and NLP for low-resource languages. We hope more researchers pursue work at the intersection of these two subfields, since they share the goal of improving data-efficiency in the era of scaling.

14.01.2025 07:11 — 👍 0 🔁 0 💬 1 📌 0

However, unlike in the original BabyLM challenge, our isiXhosa BabyLMs do not outperform all skylines. We attribute this to a lack of developmentally plausible isiXhosa data. The success of English BabyLMs is due to both modelling innovations and highly curated pretraining data.

14.01.2025 07:11 — 👍 0 🔁 0 💬 1 📌 0

We pretrain two of the top BabyLM submissions (ELC-BERT and MLSM) for isiXhosa and evaluate it on isiXhosa POS tagging, NER, and topic classification. The BabyLMs outperform an isiXhosa RoBERTa and ELC-BERT even outperforms XLM-R on two tasks.

14.01.2025 07:11 — 👍 0 🔁 0 💬 1 📌 0

The BabyLM challenge (babylm.github.io) produced new sample-efficient architectures. We investigate the potential of BabyLMs to improve LMs for low-resource languages with limited pretraining data. As a case study we use isiXhosa, a language with corpora similar in size to BabyLM strict-small.

14.01.2025 07:11 — 👍 0 🔁 0 💬 1 📌 0

Our paper "BabyLMs for isiXhosa: Data-Efficient Language Modelling in a Low-Resource Context" will be presented at The First Workshop on Language Models for Low-Resource Languages at #COLING2025 in Abu Dhabi.

Paper: arxiv.org/pdf/2501.03855

14.01.2025 07:09 — 👍 1 🔁 1 💬 1 📌 1

@francois-meyer is following 20 prominent accounts

Nikitas Theodoropoulos
@nikitas-theo

You can learn more about me here: https://nikitas-theo.github.io/

Bastian Bunzeck
@bbunzeck

Computational linguist trying to understand how humans and computers learn and use language 👶🧠🗣️🖥️💬 The work is mysterious and important. See https://bbunzeck.github.io PhDing at @clausebielefeld.bsky.social

Francesca Padovani
@frap98

2nd year PhD Student at @gronlp.bsky.social 🐮 - University of Groningen Language Acquisition - NLP

Paper Skygest Team
@paper-feed

Building personalized Bluesky feeds for academics! Pin Paper Skygest, which serves posts about papers from accounts you're following: https://bsky.app/profile/paper-feed.bsky.social/feed/preprintdigest. By @sjgreenwood.bsky.social and @nkgarg.bsky.social

GroNLP
@gronlp

Natural Language Processing and Computational Linguistics group at the University of Groningen 🐮 https://www.rug.nl/research/clcg/research/cl/

Arianna Bisazza #EMNLP
@arianna-bis

Associate Professor at GroNLP ( @gronlp.bsky.social‬ ) #NLP | Multilingualism | Interpretability | Language Learning in Humans vs NeuralNets | Mum^2 Head of the InClow research group: https://inclow-lm.github.io/

Christoph Molnar
@christophmolnar

Author of Interpretable Machine Learning and other books Newsletter: https://mindfulmodeler.substack.com/ Website: https://christophmolnar.com/

Christina (Chrisy) Bornberg
@variint

Enjoy not enjoying ideals | Interpretability of modular convnets applied to 👁️ and 🛰️🐝 | she/her 🦒💕 variint.github.io

Joao Barbosa
@jbarbosa.org

INSERM group leader @ Neuromodulation Institute and NeuroSpin (Paris) in computational neuroscience. How and why are computations enabling cognition distributed across the brain? Expect neuroscience and ML content. jbarbosa.org

Kyle Morgenstein
@kylem

Full of childlike wonder. Building friendly robots. UT Austin PhD student, MIT ‘20.

Dino Sejdinovic
@sejdino

Professor of Statistical Machine Learning at the University of Adelaide. https://sejdino.github.io/

Thomas Fel
@thomasfel

Explainability, Computer Vision, Neuro-AI.🪴 Kempner Fellow @Harvard. Prev. PhD @Brown, @Google, @GoPro. Crêpe lover. 📍 Boston | 🔗 thomasfel.me

André Panisson
@panisson

Principal Researcher @ CENTAI.eu | Leading the Responsible AI Team. Building Responsible AI through Explainable AI, Fairness, and Transparency. Researching Graph Machine Learning, Data Science, and Complex Systems to understand collective human behavior.

Sarah Wiegreffe
@sarah-nlp

Research in NLP (mostly LM interpretability & explainability). Assistant prof at UMD CS + CLIP. Previously @ai2.bsky.social @uwnlp.bsky.social Views my own. sarahwie.github.io

Marianne de Heer Kloots
@mdhk.net

Linguist in AI & CogSci 🧠👩‍💻🤖 PhD student @ ILLC, University of Amsterdam 🌐 https://mdhk.net/ 🐘 https://scholar.social/@mdhk 🐦 https://twitter.com/mariannedhk

Max Müller-Eberstein
@mxij.me

Postdoc AI Researcher (NLP) @ ITU Copenhagen 🧭 https://mxij.me

Anne Oeldorf-Hirsch
@anneo

Comm tech & social media research professor by day, symphony violinist by night, outside as much as possible otherwise. German/American Pacific Northwestern New Englander, #firstgen academic, she/her, 🏳️‍🌈 https://anne-oeldorf-hirsch.uconn.edu

Alicia Curth
@aliciacurth

Machine Learner by day, 🦮 Statistician at ❤️ In search of statistical intuition for modern ML & simple explanations for complex things👀 Interested in the mysteries of modern ML, causality & all of stats. Opinions my own. https://aliciacurth.github.io

Eliana Pastor
@elianapastor

Assistant Professor at PoliTo 🇮🇹 | Former Visiting scholar at UCSC 🇺🇸 | she/her | TrustworthyAI, XAI, Fairness in AI https://elianap.github.io/

Dilyara Bareeva
@dilya

PhD Candidate in Interpretability @FraunhoferHHI | 📍Berlin, Germany dilyabareeva.github.io