Grgur Kovač's Avatar

Grgur Kovač

@kovacgrgur.bsky.social

AI Researcher at INRIA in the Flowers team. https://grgkovac.github.io Twitter: @KovacGrgur

40 Followers  |  63 Following  |  13 Posts  |  Joined: 21.11.2024
Posts Following

Posts by Grgur Kovač (@kovacgrgur.bsky.social)

This work was heavily inspired my many amazing works such as:
www.nature.com/articles/s41...
arxiv.org/abs/2404.01413
arxiv.org/abs/2311.09807
arxiv.org/abs/2402.0704

18.12.2025 14:37 — 👍 0    🔁 0    💬 0    📌 0

P.S. This project wraps up my PhD research exploring how to leverage human sciences (psychology, cultural evolution) to better evaluate and understand LLMs.
I am now on the job market for EU-based remote roles in industry (LLM Researcher/Engineer). I’d love to connect! 👋

18.12.2025 14:37 — 👍 2    🔁 0    💬 1    📌 0

This was done with:
@kovacgrgur.bsky.social *,Jérémy Perez *, Remy Portelas, Peter Ford Dominey, @pyoudeyer.bsky.social
(*equal contribution)
In the FlowersTeam, INRIA

18.12.2025 14:37 — 👍 0    🔁 0    💬 1    📌 0

Caveat: Model collapse is a nascent field and studies currently make many assumptions wrt real world dynamics. Here we explore one assumption - homogeneity of data - but many more remain to be explored!

18.12.2025 14:37 — 👍 0    🔁 0    💬 1    📌 0
Post image

Implication: These two takeaways together imply that different internet domains could exhibit different collapse dynamics (pertaining to the data properties of that domain).

18.12.2025 14:37 — 👍 0    🔁 0    💬 1    📌 0
Post image

Finding 2: The effects are within-domain. For LLMs trained on multiple domains, drops in one domain (e.g. reddit) are influenced by that domain’s properties (e.g. reddit, not twitter/X or wikipedia), i.e. effects do not spill to other domains.

18.12.2025 14:37 — 👍 0    🔁 0    💬 1    📌 0
Post image

Finding 1: Human data properties influence collapse dynamics. Some human data properties (lexical diversity, gaussianity) are associated with bigger drops in both quality and semantic diversity of generated text, and some (quality, semantic diversity) with smaller drops.

18.12.2025 14:37 — 👍 0    🔁 0    💬 1    📌 0
Post image

We used an iterative chain design (iteratively fine-tuning base LLMs on data generated by previously fine-tuned models).

We use regression analysis to find associations between human data properties and relative drops in quality and semantic diversity of LLM-generated data.

18.12.2025 14:37 — 👍 0    🔁 0    💬 1    📌 0
Post image

#LLMs are trained on internet data, which increasingly contains more synthetic data. These LLMs then generate new online data, which will be used to train future LLMs.

Will this closed loop result in future models generating data of lower quality and diversity (i.e. collapse)?

18.12.2025 14:37 — 👍 0    🔁 0    💬 1    📌 0
Preview
Recursive Training Loops in LLMs: How training data properties modulate distribution shift in generated data? Large language models (LLMs) are increasingly used in the creation of online content, creating feedback loops as subsequent generations of models will be trained on this synthetic data. Such loops wer...

📄 Paper: arxiv.org/abs/2504.03814

18.12.2025 14:37 — 👍 0    🔁 0    💬 1    📌 0
Post image

Will the influx of synthetic data lead to uniform #ModelCollapse across the internet?
Our recent #EMNLP2025 (Oral) paper suggests a nuanced picture: different collapse dynamics might emerge in different internet domains based on the properties of human data in those domains! 🧵

18.12.2025 14:37 — 👍 1    🔁 1    💬 1    📌 0

What's wrong with evaluating #LLMs after a single interaction? Come find out @iclr-conf.bsky.social and learn how cultural attraction theory can help us do better. Poster #288, 10 am.

23.04.2025 22:11 — 👍 7    🔁 2    💬 1    📌 2
Preview
MAGELLAN: Metacognitive predictions of learning progress guide... Open-ended learning agents must efficiently prioritize goals in vast possibility spaces, focusing on those that maximize learning progress (LP). When such autotelic exploration is achieved by LLM...

🚀 Introducing 🧭MAGELLAN—our new metacognitive framework for LLM agents! It predicts its own learning progress (LP) in vast natural language goal spaces, enabling efficient exploration of complex domains.🌍✨Learn more: 🔗 arxiv.org/abs/2502.07709 #OpenEndedLearning #LLM #RL

24.03.2025 15:09 — 👍 9    🔁 3    💬 1    📌 4
x.com

The leaderboard is explained in our previous tweet (haven't transferred it to Bluesky yet) 😐:
x.com/KovacGrgur/s...

10.12.2024 14:15 — 👍 0    🔁 0    💬 0    📌 0
Post image

LLama 3.3 is great, but Nemotron is still the leader in our StickToYourRole Leaderboard !
Nemotron 🥇
Llama 3.3 🥈

huggingface.co/spaces/flowe...

10.12.2024 14:15 — 👍 4    🔁 0    💬 1    📌 0

I'm excited to announce that this work has been accepted at
@blog.neurips.cc.web.brid.gy 🧠🤖 We hope to spark conversations on goal selection in biological and artificial agents.

Check it out at openreview.net/forum?id=Gbq...

With Cédric Colas, Pierre-Yves Oudeyer, & Anne Collins

18.11.2024 20:20 — 👍 15    🔁 6    💬 1    📌 1
Post image Post image

🚨New preprint🚨
When testing LLMs with questions, how can we know they did not see the answer in their training? In this new paper we propose a simple out of the box and fast method to spot contamination on short texts with @stepalminteri.bsky.social and Pierre-Yves Oudeyer !

15.11.2024 13:47 — 👍 9    🔁 4    💬 1    📌 0