This work was heavily inspired my many amazing works such as:
www.nature.com/articles/s41...
arxiv.org/abs/2404.01413
arxiv.org/abs/2311.09807
arxiv.org/abs/2402.0704
This work was heavily inspired my many amazing works such as:
www.nature.com/articles/s41...
arxiv.org/abs/2404.01413
arxiv.org/abs/2311.09807
arxiv.org/abs/2402.0704
P.S. This project wraps up my PhD research exploring how to leverage human sciences (psychology, cultural evolution) to better evaluate and understand LLMs.
I am now on the job market for EU-based remote roles in industry (LLM Researcher/Engineer). I’d love to connect! 👋
This was done with:
@kovacgrgur.bsky.social *,Jérémy Perez *, Remy Portelas, Peter Ford Dominey, @pyoudeyer.bsky.social
(*equal contribution)
In the FlowersTeam, INRIA
Caveat: Model collapse is a nascent field and studies currently make many assumptions wrt real world dynamics. Here we explore one assumption - homogeneity of data - but many more remain to be explored!
18.12.2025 14:37 — 👍 0 🔁 0 💬 1 📌 0Implication: These two takeaways together imply that different internet domains could exhibit different collapse dynamics (pertaining to the data properties of that domain).
18.12.2025 14:37 — 👍 0 🔁 0 💬 1 📌 0Finding 2: The effects are within-domain. For LLMs trained on multiple domains, drops in one domain (e.g. reddit) are influenced by that domain’s properties (e.g. reddit, not twitter/X or wikipedia), i.e. effects do not spill to other domains.
18.12.2025 14:37 — 👍 0 🔁 0 💬 1 📌 0Finding 1: Human data properties influence collapse dynamics. Some human data properties (lexical diversity, gaussianity) are associated with bigger drops in both quality and semantic diversity of generated text, and some (quality, semantic diversity) with smaller drops.
18.12.2025 14:37 — 👍 0 🔁 0 💬 1 📌 0
We used an iterative chain design (iteratively fine-tuning base LLMs on data generated by previously fine-tuned models).
We use regression analysis to find associations between human data properties and relative drops in quality and semantic diversity of LLM-generated data.
#LLMs are trained on internet data, which increasingly contains more synthetic data. These LLMs then generate new online data, which will be used to train future LLMs.
Will this closed loop result in future models generating data of lower quality and diversity (i.e. collapse)?
📄 Paper: arxiv.org/abs/2504.03814
18.12.2025 14:37 — 👍 0 🔁 0 💬 1 📌 0
Will the influx of synthetic data lead to uniform #ModelCollapse across the internet?
Our recent #EMNLP2025 (Oral) paper suggests a nuanced picture: different collapse dynamics might emerge in different internet domains based on the properties of human data in those domains! 🧵
What's wrong with evaluating #LLMs after a single interaction? Come find out @iclr-conf.bsky.social and learn how cultural attraction theory can help us do better. Poster #288, 10 am.
23.04.2025 22:11 — 👍 7 🔁 2 💬 1 📌 2🚀 Introducing 🧭MAGELLAN—our new metacognitive framework for LLM agents! It predicts its own learning progress (LP) in vast natural language goal spaces, enabling efficient exploration of complex domains.🌍✨Learn more: 🔗 arxiv.org/abs/2502.07709 #OpenEndedLearning #LLM #RL
24.03.2025 15:09 — 👍 9 🔁 3 💬 1 📌 4
The leaderboard is explained in our previous tweet (haven't transferred it to Bluesky yet) 😐:
x.com/KovacGrgur/s...
LLama 3.3 is great, but Nemotron is still the leader in our StickToYourRole Leaderboard !
Nemotron 🥇
Llama 3.3 🥈
huggingface.co/spaces/flowe...
I'm excited to announce that this work has been accepted at
@blog.neurips.cc.web.brid.gy 🧠🤖 We hope to spark conversations on goal selection in biological and artificial agents.
Check it out at openreview.net/forum?id=Gbq...
With Cédric Colas, Pierre-Yves Oudeyer, & Anne Collins
🚨New preprint🚨
When testing LLMs with questions, how can we know they did not see the answer in their training? In this new paper we propose a simple out of the box and fast method to spot contamination on short texts with @stepalminteri.bsky.social and Pierre-Yves Oudeyer !