Thanks to The Leverhulme Trust (RF-2023-408) for supporting this work, and to the reviewers and associate editor whose feedback greatly improved the manuscript.
Paper: doi.org/10.1093/molbev/msag041
@jomcinerney.bsky.social
Writing a book about horizontal gene transfer and non treelike evolution. Bioinformatics, Evolutionary Biology. Pangenomes. Chair in Evolutionary Biology. ๐ฎ๐ช http://github.com/mol-evol/panGPT
Thanks to The Leverhulme Trust (RF-2023-408) for supporting this work, and to the reviewers and associate editor whose feedback greatly improved the manuscript.
Paper: doi.org/10.1093/molbev/msag041
This connects to real tools. Transformer-based genome models (DNABERT, Evo) can calculate perplexity directly. AlphaFold confidence scores estimate structural perplexity. Flux Balance Analysis handles metabolic perplexity. The framework is testable now.
27.02.2026 10:45 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0The practical shift: instead of asking "what does this gene do?" we should ask "what can this gene become?" Synthetic biology, antimicrobial development, and evolutionary prediction all become questions of context engineering rather than gene optimisation alone.
27.02.2026 10:45 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0It also explains open vs closed pangenomes. Open pangenomes (like E. coli) arise when large population sizes can detect small fitness advantages and high environmental variability creates many contexts where accessory genes pay off - despite integration costs.
27.02.2026 10:45 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0The framework predicts pangenome structure. Core genes = low perplexity across contexts. Rare accessory genes = high perplexity generally, but strong benefits in specific contexts. The U-shaped frequency distribution falls out naturally.
27.02.2026 10:45 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0This explains why HGT has a fitness cost โ not because transferred genes are broken, but because they arrive into a genome optimised for different statistical patterns. Over time, codon adaptation, regulatory rewiring, and compensatory mutations reduce perplexity. The gene becomes "expected."
27.02.2026 10:45 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0Perplexity operates across multiple dimensions: codon usage, protein structure, regulatory compatibility, metabolic integration, protein-protein interactions, chromosomal organisation, and gene co-occurrence patterns. Each contributes to the fitness cost of genomic novelty.
27.02.2026 10:45 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0This leads to the concept of "genomic perplexity" โ borrowed from information theory. Perplexity measures how "surprised" a model is by a sequence. A horizontally transferred gene landing in a new genome is a high-perplexity token โ statistically unexpected in that context.
27.02.2026 10:45 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0I propose that evolution shapes genomes not to encode fixed functions, but to optimise probability distributions of functional outcomes across the contexts organisms actually encounter. Selection acts on these distributions, not on singular gene activities.
27.02.2026 10:45 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0Modern language models (transformers) succeed because they learn probability distributions over outcomes given context. They use "attention"-each word's contribution depends on other words in the sequence. Epistasis is the biological equivalent. A gene's effect depends on what else is in the genome.
27.02.2026 10:45 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0This is exactly the problem NLP faced for decades. Trying to understand language through fixed word definitions failed. The breakthrough came when researchers stopped assigning fixed meanings and started treating words as things whose meaning emerges from context.
27.02.2026 10:45 โ ๐ 1 ๐ 1 ๐ฌ 1 ๐ 0We've long asked "what does this gene do?" But the same gene can be essential in one strain and dispensable in another. SP_0185 in Streptococcus pneumoniae is a magnesium transporter โ lethal to lose in some strains, irrelevant in others. Same gene. Different context. Different function.
27.02.2026 10:45 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
New paper out in MBE! ๐งต
"Genomic Perplexity and the Evolution of Context-Dependent Function"
The big idea: genes don't have fixed functions. Function emerges from context - genomic, cellular, environmental. And we can quantify this. academic.oup.com/mbe/article/...
MENI is back! Join us in Dublin this August 2026 for our 3rd Meeting for Microbial Evolution in Ireland. We are delighted to have @rachelmwheatley.bsky.social @drrebeccajhall.bsky.social @jpjhall.bsky.social and @tweethinking.bsky.social join us as keynote speakers this year. miniurl.com/MENI
18.02.2026 12:16 โ ๐ 39 ๐ 28 ๐ฌ 2 ๐ 1You can change the denomination to euro. Itโs general for people with a finite amount of time and more than one option for what grant to write.
29.01.2026 13:35 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0Very interesting paper. www.science.org/doi/10.1126/...
24.01.2026 13:24 โ ๐ 12 ๐ 5 ๐ฌ 1 ๐ 0Ever wondered if writing a grant proposal was actually worth your time? Presenting....The Grant Portfolio Evaluator - (for entertainment only) mol-evol.github.io/grant-evalua...
15.01.2026 14:22 โ ๐ 22 ๐ 6 ๐ฌ 1 ๐ 2PanForest: predicting genes in genomes using random forests academic.oup.com/bioinformati... #jcampubs
12.01.2026 14:30 โ ๐ 20 ๐ 7 ๐ฌ 0 ๐ 0Dissecting Phylogenetic Support: Unified Decay Indices, AU Tests, and Branch-Site Specific Visualizations. https://www.biorxiv.org/content/10.64898/2025.12.05.692543v1
05.12.2025 23:33 โ ๐ 3 ๐ 1 ๐ฌ 0 ๐ 1Oh nooooo. ๐๐๐
15.01.2026 17:34 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0its for "entertainment" purposes only :)
15.01.2026 14:43 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0Ever wondered if writing a grant proposal was actually worth your time? Presenting....The Grant Portfolio Evaluator - (for entertainment only) mol-evol.github.io/grant-evalua...
15.01.2026 14:22 โ ๐ 22 ๐ 6 ๐ฌ 1 ๐ 2Interested in pangenomics ? This paper just out from Alan Beavan (not on socials), @blackpassiflora.bsky.social and @jomcinerney.bsky.social is a must: doi.org/10.1093/bioi...
15.01.2026 12:27 โ ๐ 12 ๐ 5 ๐ฌ 0 ๐ 0Latest preprint with @hairyllama.bsky.social and @evol-molly.bsky.social This is a program that helps calculate several phylogenetic support measures.
05.12.2025 23:56 โ ๐ 11 ๐ 8 ๐ฌ 0 ๐ 1
Open source & freely available:
github.com/alanbeavan/PanForest
Congrats to Alan on leading this work! ๐
Outputs both prediction accuracy (how predictable is each gene?) and feature importance (which genes matter most for each prediction?). Useful for understanding genome organisation, synthetic biology, and molecular ecology.
14.01.2026 22:25 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0Case study: antimicrobial resistance genes. Certain AMR genes reliably predict other AMR genes for the same drug. But we also found unexpected associations with genes NOT previously linked to resistance. New targets for investigation?
14.01.2026 22:25 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0We tested it on 1,000 E. coli genomes with ~12,700 accessory genes. Runs in ~5 hours on 8 processors. Scales to Network of Life pangenomes.
14.01.2026 22:25 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0The core insight: genes don't distribute randomly across genomes. Some genes "like" to co-occur, others avoid each other. PanForest learns these patterns and tells you which genes are predictable from their genomic context.
14.01.2026 22:25 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
New paper out in Bioinformatics! PanForest uses random forests to predict gene presence/absence in bacterial genomes based on other genes present. Joint work with Alan Beavan & Maria Rosa Domingo-Sananes.
๐ doi.org/10.1093/bioinformatics/btag005