To learn more about the extraordinary PhD work of Nicolas Yax: www.linkedin.com/posts/pierre...
📖 Blog: developmentalsystems.org/phylolm
📄 ICLR 2025 paper: iclr.cc/virtual/2025...
💻 Code: github.com/Nicolas-Yax/...
Hugging Face online demo: t.co/9wEdav3LZA
This work was led by the outstanding
@nicolasyax.bsky.social and co-supervised by
@stepalminteri.bsky.social and me
The blog walks the full intellectual journey — from cat genetics to Dawkins' memes, from myth phylogenetics to the evolutionary landscape of modern AI. Written to be accessible to anyone curious about evolution, culture, and intelligence. 🌱
Why does this matter? 300+ models appear daily on
@huggingface
. Training transparency is declining. Tools like PhyloLM could matter for AI governance, safety monitoring, and simply mapping the evolutionary landscape of what's out there.
Result: surprisingly accurate reconstruction of known model family trees. And a few insights on the ancestry of closed-source models 👀
Our method PhyloLM (ICLR 2025) treats language models as populations of text.
"Genes" = short prompt contexts.
"Alleles" = their completions.
Apply classic genetic distance measures → build phylogenetic trees. No prior knowledge of training history needed.
A good evolutionary marker needs 3 properties:
→ compressed & universal representation
→ moderate rate of change
→ functional grounding
DNA has all three. So do high-level myth features. And so does an LLM's core identity: its probability distribution over text.
Now the key question: can you apply the same method to LLMs? Without access to their code, weights, or training data?
Mostly Yes. 🧬
The myth phylogenies match known human migration patterns — shown beautifully by the outstanding work of Julien d'Huy. Same logic, different substrate: not DNA, but narrative structure as evolutionary marker.
Can you reconstruct the evolutionary tree of cat breeds just by looking at their genes? Yes — biologists do this routinely.
Can you do the same with ancient myths, tracing how stories evolved across continents by comparing narrative structure? To a large extent, yes.
New blog post: The Phylogenetics of Artifacts — inferring the evolution of cultural objects, artificial life forms, and language models.
From cat genetics to ancient myths to LLMs. 🧬 1/n
Wow, some of my old #EvolutionOfLanguage #EoL pals may have just done something huge for #AILaw data attribution #LLM #AIGovernance Specifically
@pyoudeyer.bsky.social @nicolasyax.bsky.social @stepalminteri.bsky.social
Evolutionary biology can track LLM phylogeny!
developmentalsystems.org/phylolm
Major new system and result from the team: SOAR is an open-source self-improving genAI system pushing the frontier of ARC performances using program synthesis
It relies on using LLMs as self-improving smart operators for evolutionary search
Congratulations @jul-p.bsky.social for this major achievement and @ccolas.bsky.social for the amazing co-supervision ! Your work was magic to develop this self-improving system pushing the frontier of what can be done with program synthesis and open-source methods and models on the ARC challenge !
Using LLMs to advance the cognitive science of collectives
Very interesting new paper by @sucholutsky.bsky.social
Katherine Collins @norijacoby.bsky.social @billdthompson
@roberthawkins.bsky.social
arxiv.org/pdf/2506.00052
Le travail de Samuel Bianchini à partir des algorithmes de @pyoudeyer.bsky.social mis en lumière par le Jeu de paume dans son expo IA
Merci pour le lien ! Pour être plus précis, ce sont les algorithmes développés par des membres de mon équipe, ici en particulier @eplantec.bsky.social @clemmoulinfrier.bsky.social @hamongautier.bsky.social et le système Flow Lenia sites.google.com/view/flowlen...
Humans' ability to invent their own games & goals is at the core of open-ended learning.
Understanding and modeling computationally how they do it would be enlightening to understand better human cognition and build open-ended AI
Great step in this direction in new paper by Guy Davidson et al.
I reviewed "These Strange New Minds: How AI Learned to Talk and What It Means" by Chris Summerfield.
melaniemitchell.me/EssaysConten...
royalsocietypublishing.org/toc/rstb/202...
Cool
Another one.
Here is a cool Flow-Lenia simulation, which is a continuous cellular automaton with two distinct features:
1. Mass conservation - sum of all activations is constant through time
2. Localized update rules - enables simulations with differently behaving matter on the same grid
Very happy that our paper received both the EvoApps best paper and the EvoStar best student paper! Congrats team!!
Emergent Kin Selection of Altruistic Feeding via Non-Episodic Neuroevolution
arxiv.org/abs/2411.10536
Check out @nicolasyax.bsky.social
thread about our paper (co-supervised by @pyoudeyer.bsky.social) where we show that evolutionary tree reconstruction can be successfully applied to map LLMs to map relations and predict their performance! Currently at @iclr-conf.bsky.social
And thank you @gaiamolinaro.bsky.social for presenting the work ! + thanks Jérémy Perez for leading the project and thanks the whole team Corentin Leger
@kovacgrgur.bsky.social @ccolas.bsky.social
@clemmoulinfrier.bsky.social
@maximederex.bsky.social
Generative AI is a cultural transmission technology:
it plays a growing role in generation, selection and transmission of ideas/opinions in human society 🧠🔄🌐
And yet we understand very little of this dynamics at this point 🤔❓
A step forward is our #ICLR2025 paper !👇