James McInerney's Avatar

James McInerney

@jomcinerney.bsky.social

Writing a book about horizontal gene transfer and non treelike evolution. Bioinformatics, Evolutionary Biology. Pangenomes. Chair in Evolutionary Biology. ๐Ÿ‡ฎ๐Ÿ‡ช http://github.com/mol-evol/panGPT

3,982 Followers  |  6,471 Following  |  280 Posts  |  Joined: 31.08.2023
Posts Following

Posts by James McInerney (@jomcinerney.bsky.social)

Preview
Genomic Perplexity and the Evolution of Context-Dependent Function Abstract. The fundamental principle that selection acts on a geneโ€™s function often assumes implicitly that this function is fixed and intrinsic. However, e

Thanks to The Leverhulme Trust (RF-2023-408) for supporting this work, and to the reviewers and associate editor whose feedback greatly improved the manuscript.
Paper: doi.org/10.1093/molbev/msag041

27.02.2026 10:45 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

This connects to real tools. Transformer-based genome models (DNABERT, Evo) can calculate perplexity directly. AlphaFold confidence scores estimate structural perplexity. Flux Balance Analysis handles metabolic perplexity. The framework is testable now.

27.02.2026 10:45 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

The practical shift: instead of asking "what does this gene do?" we should ask "what can this gene become?" Synthetic biology, antimicrobial development, and evolutionary prediction all become questions of context engineering rather than gene optimisation alone.

27.02.2026 10:45 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

It also explains open vs closed pangenomes. Open pangenomes (like E. coli) arise when large population sizes can detect small fitness advantages and high environmental variability creates many contexts where accessory genes pay off - despite integration costs.

27.02.2026 10:45 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

The framework predicts pangenome structure. Core genes = low perplexity across contexts. Rare accessory genes = high perplexity generally, but strong benefits in specific contexts. The U-shaped frequency distribution falls out naturally.

27.02.2026 10:45 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

This explains why HGT has a fitness cost โ€” not because transferred genes are broken, but because they arrive into a genome optimised for different statistical patterns. Over time, codon adaptation, regulatory rewiring, and compensatory mutations reduce perplexity. The gene becomes "expected."

27.02.2026 10:45 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Perplexity operates across multiple dimensions: codon usage, protein structure, regulatory compatibility, metabolic integration, protein-protein interactions, chromosomal organisation, and gene co-occurrence patterns. Each contributes to the fitness cost of genomic novelty.

27.02.2026 10:45 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

This leads to the concept of "genomic perplexity" โ€” borrowed from information theory. Perplexity measures how "surprised" a model is by a sequence. A horizontally transferred gene landing in a new genome is a high-perplexity token โ€” statistically unexpected in that context.

27.02.2026 10:45 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I propose that evolution shapes genomes not to encode fixed functions, but to optimise probability distributions of functional outcomes across the contexts organisms actually encounter. Selection acts on these distributions, not on singular gene activities.

27.02.2026 10:45 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Modern language models (transformers) succeed because they learn probability distributions over outcomes given context. They use "attention"-each word's contribution depends on other words in the sequence. Epistasis is the biological equivalent. A gene's effect depends on what else is in the genome.

27.02.2026 10:45 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

This is exactly the problem NLP faced for decades. Trying to understand language through fixed word definitions failed. The breakthrough came when researchers stopped assigning fixed meanings and started treating words as things whose meaning emerges from context.

27.02.2026 10:45 โ€” ๐Ÿ‘ 1    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

We've long asked "what does this gene do?" But the same gene can be essential in one strain and dispensable in another. SP_0185 in Streptococcus pneumoniae is a magnesium transporter โ€” lethal to lose in some strains, irrelevant in others. Same gene. Different context. Different function.

27.02.2026 10:45 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Genomic Perplexity and the Evolution of Context-Dependent Function Abstract. The fundamental principle that selection acts on a geneโ€™s function often assumes implicitly that this function is fixed and intrinsic. However, e

New paper out in MBE! ๐Ÿงต
"Genomic Perplexity and the Evolution of Context-Dependent Function"
The big idea: genes don't have fixed functions. Function emerges from context - genomic, cellular, environmental. And we can quantify this. academic.oup.com/mbe/article/...

27.02.2026 10:45 โ€” ๐Ÿ‘ 58    ๐Ÿ” 16    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Post image

MENI is back! Join us in Dublin this August 2026 for our 3rd Meeting for Microbial Evolution in Ireland. We are delighted to have @rachelmwheatley.bsky.social @drrebeccajhall.bsky.social @jpjhall.bsky.social and @tweethinking.bsky.social join us as keynote speakers this year. miniurl.com/MENI

18.02.2026 12:16 โ€” ๐Ÿ‘ 39    ๐Ÿ” 28    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1

You can change the denomination to euro. Itโ€™s general for people with a finite amount of time and more than one option for what grant to write.

29.01.2026 13:35 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Bacteriophages mobilize bacterial defense systems via lateral transduction Bacteriophages and PICIs spread bacterial defenses via lateral transduction, shaping microbial immunity and pathogen evolution.

Very interesting paper. www.science.org/doi/10.1126/...

24.01.2026 13:24 โ€” ๐Ÿ‘ 12    ๐Ÿ” 5    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Grant Portfolio Evaluator Rank and prioritize research funding opportunities. Calculate EV/hour and decide which grants are worth your time.

Ever wondered if writing a grant proposal was actually worth your time? Presenting....The Grant Portfolio Evaluator - (for entertainment only) mol-evol.github.io/grant-evalua...

15.01.2026 14:22 โ€” ๐Ÿ‘ 22    ๐Ÿ” 6    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 2
Preview
PanForest: predicting genes in genomes using random forests AbstractMotivation. The presence or absence of some genes in a genome can influence whether other genes are likely to be present or absent. Understanding t

PanForest: predicting genes in genomes using random forests academic.oup.com/bioinformati... #jcampubs

12.01.2026 14:30 โ€” ๐Ÿ‘ 20    ๐Ÿ” 7    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Dissecting Phylogenetic Support: Unified Decay Indices, AU Tests, and Branch-Site Specific Visualizations. https://www.biorxiv.org/content/10.64898/2025.12.05.692543v1

05.12.2025 23:33 โ€” ๐Ÿ‘ 3    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1

Oh nooooo. ๐Ÿ˜๐Ÿ˜๐Ÿ˜

15.01.2026 17:34 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

its for "entertainment" purposes only :)

15.01.2026 14:43 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Grant Portfolio Evaluator Rank and prioritize research funding opportunities. Calculate EV/hour and decide which grants are worth your time.

Ever wondered if writing a grant proposal was actually worth your time? Presenting....The Grant Portfolio Evaluator - (for entertainment only) mol-evol.github.io/grant-evalua...

15.01.2026 14:22 โ€” ๐Ÿ‘ 22    ๐Ÿ” 6    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 2
Preview
PanForest: predicting genes in genomes using random forests AbstractMotivation. The presence or absence of some genes in a genome can influence whether other genes are likely to be present or absent. Understanding t

Interested in pangenomics ? This paper just out from Alan Beavan (not on socials), @blackpassiflora.bsky.social and @jomcinerney.bsky.social is a must: doi.org/10.1093/bioi...

15.01.2026 12:27 โ€” ๐Ÿ‘ 12    ๐Ÿ” 5    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Latest preprint with @hairyllama.bsky.social and @evol-molly.bsky.social This is a program that helps calculate several phylogenetic support measures.

05.12.2025 23:56 โ€” ๐Ÿ‘ 11    ๐Ÿ” 8    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
Preview
GitHub - alanbeavan/PanForest: Pangenome analysis using random forests Pangenome analysis using random forests. Contribute to alanbeavan/PanForest development by creating an account on GitHub.

Open source & freely available:
github.com/alanbeavan/PanForest
Congrats to Alan on leading this work! ๐ŸŽ‰

14.01.2026 22:25 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Outputs both prediction accuracy (how predictable is each gene?) and feature importance (which genes matter most for each prediction?). Useful for understanding genome organisation, synthetic biology, and molecular ecology.

14.01.2026 22:25 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Case study: antimicrobial resistance genes. Certain AMR genes reliably predict other AMR genes for the same drug. But we also found unexpected associations with genes NOT previously linked to resistance. New targets for investigation?

14.01.2026 22:25 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

We tested it on 1,000 E. coli genomes with ~12,700 accessory genes. Runs in ~5 hours on 8 processors. Scales to Network of Life pangenomes.

14.01.2026 22:25 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

The core insight: genes don't distribute randomly across genomes. Some genes "like" to co-occur, others avoid each other. PanForest learns these patterns and tells you which genes are predictable from their genomic context.

14.01.2026 22:25 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
PanForest: predicting genes in genomes using random forests AbstractMotivation. The presence or absence of some genes in a genome can influence whether other genes are likely to be present or absent. Understanding t

New paper out in Bioinformatics! PanForest uses random forests to predict gene presence/absence in bacterial genomes based on other genes present. Joint work with Alan Beavan & Maria Rosa Domingo-Sananes.
๐Ÿ”— doi.org/10.1093/bioinformatics/btag005

14.01.2026 22:25 โ€” ๐Ÿ‘ 9    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0