John Lovell's Avatar

John Lovell

@jotlovell.bsky.social

Helping to make genomics useful for crop improvement, ecology, evolutionary biology, and conservation HudsonAlpha Genome Sequencing Center and DOE Joint Genome Institute

687 Followers  |  371 Following  |  89 Posts  |  Joined: 12.09.2023  |  2.4841

Latest posts by jotlovell.bsky.social on Bluesky

Post image

Do you know ~60% of human SVs fall in ~1% of GRCh38? See our new preprint: arxiv.org/abs/2509.23057 and the companion blog post on how we started this project and longdust: lh3.github.io/2025/09/29/o.... Work with Alvin Qin

30.09.2025 02:19 β€” πŸ‘ 75    πŸ” 27    πŸ’¬ 0    πŸ“Œ 0

Just an outrageous amount of structural variation in pennycress. While not yet reproductively isolated, its likely these shredded pericentromeres contribute to some reproductive incompatibilities.

29.09.2025 17:33 β€” πŸ‘ 11    πŸ” 5    πŸ’¬ 0    πŸ“Œ 0
Preview
Structure and sequence evolution in the pennycress (Thlaspi arvense) pangenome Summary Β· Eukaryotic genomes harbor many forms of variation, including nucleotide diversity and structural polymorphisms, which experience natural selection and contribute to genome evolution and biod...

New β€” with @joannarifkin.bsky.social, @jotlovell.bsky.social, @spicybotrytis.bsky.social, and many more β€” we created seven new high-quality genomes and explored pangenomic variation in the emerging oilseed crop pennycress (Thlaspi arvense). 1/

28.09.2025 17:25 β€” πŸ‘ 55    πŸ” 23    πŸ’¬ 1    πŸ“Œ 1

C. elegans is a real animal and we set out to understand how it comes to have its distinctive biogeography. Its ancestral center of diversity is in the higher elevation forests of Hawaii. Its closest relatives are spread across east Asia. Did they travel from Asia? [Preprint 🧡]

24.09.2025 20:33 β€” πŸ‘ 164    πŸ” 79    πŸ’¬ 5    πŸ“Œ 8

Don't forget to apply to our position in Evolutionary Genetics at U of South Carolina!

In my experience it is a fantastic place to start a new lab, with friendly and supportive colleagues and many junior faculty in EEB!

Review starts in 1 week (Oct 1)!

24.09.2025 13:32 β€” πŸ‘ 9    πŸ” 14    πŸ’¬ 0    πŸ“Œ 0

I haven’t slept for seven days either … that would be too long

16.09.2025 15:22 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
It’s Time to Rethink the Academic Tenure Process Opinion | To fight the war on science, higher education needs to reimagine the most important career milestone for faculty.

And yes, now is the right time.

New for @undark.org

"The fundamental problem with the tenure process is that it has struggled to recognize that knowledge is curated, created, and consumed differently today than even a decade ago."

undark.org/2025/09/11/o...

11.09.2025 14:09 β€” πŸ‘ 35    πŸ” 12    πŸ’¬ 1    πŸ“Œ 6
Post image

Very nice! Thanks for the link.
This is something you find when you dig deep enough. We've been looking for ways to harmonize annotations since we saw a similar pattern among pecan genomes in 2021 (buried in the SI tho). www.nature.com/articles/s41...

19.08.2025 00:26 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We can try it, but my guess is way worse. So many false positives.

18.08.2025 17:51 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

So, tl;dr: gene PAV and CDS variation is highly dependent on annotation method. Carefully choose, re-annotate, and integrate your pangenome if you want to trust the results

Preprint led by @tomasbruna.bsky.social, Avinash Sreedasyam, and @avril-m-harder.bsky.social. Support from @jgi.doe.gov.

18.08.2025 16:51 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image

Furthermore, even within fully present ('core') gene families we noticed a disturbing trend β€” identical sequence was not annotated with identical gene structures 20-50% of the time w/in annotation methods and 40-70% of the time btw methods
IGC-reannotation is not perfect, but reduces this to 5-15%

18.08.2025 16:51 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 2    πŸ“Œ 0
Post image

But what about within methods? Is using the same method enough to trust PAV? The answer here is less obvious, but method clearly matters.

Within two groups that annotated 7 and 23 soybean genomes there were 3x & 2x more PAVs than IGC β€” these pangenomes are not as 'open' as reported.

18.08.2025 16:51 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

These results clearly show that 'naive' integration of existing annotations is not a good idea, especially among genomes that were annotated with similar but not identical methods.

18.08.2025 16:51 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

In other words, while gene PAV similarity of IGC re-annotated genomes recapitulates known relatedness, clustering by original annotation PAV simply distinguished which consortium did the annotation (and did not evolutionary relationships): PAV across the original annotations is largely artifactual.

18.08.2025 16:51 β€” πŸ‘ 4    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
correlation between assembly and annotation similarity

correlation between assembly and annotation similarity

To develop a baseline, we re-annotated the genomes with exactly the same 'Integrated Gene Caller' (IGC) pipeline. IGC annotations had ⬆️ BUSCO and ⬇️ false positives, yet more than halved PAV%. Critically, assembly-based relatedness predicted PAV similarity from IGC but not original annotations.

18.08.2025 16:51 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
PAV tabulation in cotton and soybean

PAV tabulation in cotton and soybean

We downloaded 'original' genome annotations directly from Soybase and Cottengen repos and calculated gene families from OrthoFinder. In both species there were WAY more PAVs than we expected: ~140k (86%) and ~90k (62%) of gene families were absent in β‰₯1 soybean and cotton genome respectively.

18.08.2025 16:51 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

To study causes of gene PAV, we looked for species with (1) a history of polyploidy, (2) relatively low amounts of genetic variation, and (3) the availability of many high-quality reference genomes with independent RNA-seq evidenced gene annotation. Soybean and cotton popped to the top of the list.

18.08.2025 16:51 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
table of n 1:1 orthologs and %PAV in plants and animals

table of n 1:1 orthologs and %PAV in plants and animals

We first looked at how divergence time correlates with gene PAV in pairs of plant and animal genomes that were annotated with the same method (mostly NCBI refseq).

While PAV generally scales with divergence time, it is 2-4X more common in plants, especially those with a history of polyploids.

18.08.2025 16:51 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
pangenome 'expansion' curves for cotton and soybean

pangenome 'expansion' curves for cotton and soybean

Determining presence-absence variation (PAV) across reference genomes is a major goal of pangenome analysis. It turns out that A LOT of gene PAV is due to methodological artifacts.

We explore the causes of this in soybean and cotton datasets in our recent preprint: www.biorxiv.org/content/10.1...

18.08.2025 16:51 β€” πŸ‘ 36    πŸ” 17    πŸ’¬ 3    πŸ“Œ 2

Motherfucker wrote one sloppy paper in April 2020 and instead of being like oops, shit, my bad, he has kept doubling down until now he's killing most promising medical technology of the past quarter century rather than going to therapy.

14.08.2025 04:30 β€” πŸ‘ 2628    πŸ” 366    πŸ’¬ 25    πŸ“Œ 19

Funding and support from: @energygov.bsky.social (especially The Office of Biological and Environmental Research), Bill and Melinda Gates Foundation, and many others. πŸ™

06.08.2025 20:32 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

This work is part of a global collaboration across many groups. In particular, Todd Mockler, who tragically passed in 2023, and the contributions of many scientists at @danforthcenter.bsky.social made much of this work possible. The pangenome was built by scientists at @jgi.doe.gov & HudsonAlpha.

06.08.2025 20:32 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Combined, these results illustrate the power of pangenomics for trait discovery ... but they also highlight how far we have to go. Integrated methods to probe and iteratively update variant calls in pangenome frameworks really are needed to bridge the gap between resources and stakeholders

06.08.2025 20:32 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
The distribution of identity % for the four BGC pangenome groups across all northwestern sub-saharan Africa members of the diversity panel; colors follow

The distribution of identity % for the four BGC pangenome groups across all northwestern sub-saharan Africa members of the diversity panel; colors follow

 the phenotypic and climate distribution of the four clusters; annual precipitation is shown just for the region highlighted in panel E while dhurrin content is from all phenotyped members of the diversity pane

the phenotypic and climate distribution of the four clusters; annual precipitation is shown just for the region highlighted in panel E while dhurrin content is from all phenotyped members of the diversity pane

There are three major haplotypes that each harbor several large structural variants but few coding variants. While the evidence for single-marker associations was limited, these three typable haplotypes segregate major variation in dhurrin concentration and drought severity of source habitat

06.08.2025 20:32 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
The 33 pangenome references were clustered into four BGC groups, and a β€˜recombinant’ grey unclustered group for β€˜IRAT204’, based on kmer similarity. The tubemap shows SVs β‰₯5kb with sequences shared across specific haplotypes (i.e, nodes in the pangenome graph) indicated by transparent rectangles

The 33 pangenome references were clustered into four BGC groups, and a β€˜recombinant’ grey unclustered group for β€˜IRAT204’, based on kmer similarity. The tubemap shows SVs β‰₯5kb with sequences shared across specific haplotypes (i.e, nodes in the pangenome graph) indicated by transparent rectangles

Finally, we combined pangenome-informed haplotype classification and tests of drought adaptation by probing the biosynthetic gene cluster that produces dhurrin, a secondary metabolite known to enhance drought stress tolerance and resistance against chewing insect herbivory ...

06.08.2025 20:32 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

However, at a 110km scale, adjacent populations in the Northern (driest) part of the Sahel were far more different from each other than those from less drought-prone regions. This is in part caused by the admixture and potential local adaptation of different deeply diverged botanical types.

06.08.2025 20:32 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

We then explored variation in our resequencing panel to understand how deep divergence at SHATTERING1 and genome-wide diversity is shaped by climate, geography and migration.

At a continent-wide scale, gene flow was higher between populations in drought-prone habitats ...

06.08.2025 20:32 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
A Simplified sequence tube map showing only large INDEL variants (β‰₯1kb) between genomes with the three Sh1 haplotypes. Representatives of each of the haplotypes were aligned to the longest sh1-3 haplotype and identical (black) or variable (grey) 100bp alignments are shown in the dotplots, revealing an insertion relative to sh1-1 and an exact segmental duplicate in sh1-3. The tubemap coordinates (BTx623) are loosely translated to RTx430 coordinate system following the grey blocks below the tubemap and the major structural mutation bounds are shown with dotted vertical lines. Positions of uninformative (dark grey) non-unique (light grey) and diagnostic (colors follow tubemap) kmers are shown below each dotplot. B The most closely matching haplotype is shown for resequencing libraries in genome-wide principal component space (see Supplementary Fig. 2 for details). Ancestry proportions for georeferenced African members of the diversity panel (top: pies on the map) and all libraries (bottom: bar chart) illustrate previously observed strong spatial and genome-wide population genetic structure relative to Sh1.

A Simplified sequence tube map showing only large INDEL variants (β‰₯1kb) between genomes with the three Sh1 haplotypes. Representatives of each of the haplotypes were aligned to the longest sh1-3 haplotype and identical (black) or variable (grey) 100bp alignments are shown in the dotplots, revealing an insertion relative to sh1-1 and an exact segmental duplicate in sh1-3. The tubemap coordinates (BTx623) are loosely translated to RTx430 coordinate system following the grey blocks below the tubemap and the major structural mutation bounds are shown with dotted vertical lines. Positions of uninformative (dark grey) non-unique (light grey) and diagnostic (colors follow tubemap) kmers are shown below each dotplot. B The most closely matching haplotype is shown for resequencing libraries in genome-wide principal component space (see Supplementary Fig. 2 for details). Ancestry proportions for georeferenced African members of the diversity panel (top: pies on the map) and all libraries (bottom: bar chart) illustrate previously observed strong spatial and genome-wide population genetic structure relative to Sh1.

Here, we explored several genomic regions of interest and re-classified haplotype associations across our diversity panel using diagnostic short sequences. The approach discovered and genotyped a previously unknown exact >2kb duplication in the domestication locus SHATTERING1

06.08.2025 20:32 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

In theory, all of the genetic variation that a breeder could use can be found in the pangenome; however, conducting hypothesis tests in a pangenome framework and extracting causal variants that are not visible in single reference methods remains quite challenging.

06.08.2025 20:32 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Panacus pangenome expansion curves. Black bars are the total sequence/genes found across the entire pangenome with that many members, and grey bars are the remainder that segregates across the pangenome.

Panacus pangenome expansion curves. Black bars are the total sequence/genes found across the entire pangenome with that many members, and grey bars are the remainder that segregates across the pangenome.

OrthoFinder (phylogenetically hierarchical orthogroup) pangenome expansion curves. Black bars are the total sequence/genes found across the entire pangenome with that many members, and grey bars are the remainder that segregates across the pangenome.

OrthoFinder (phylogenetically hierarchical orthogroup) pangenome expansion curves. Black bars are the total sequence/genes found across the entire pangenome with that many members, and grey bars are the remainder that segregates across the pangenome.

We integrated these resources into a 'pangenome', a queryable resource w/ all available DNA sequences. We complimented the pangenome w/ RNA-seq evidenced protein-coding gene annotations of all 33 members. These resources capture the vast majority of gene presence-absence and DNA sequence variation.

06.08.2025 20:32 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

@jotlovell is following 20 prominent accounts