Jim Shaw's Avatar

Jim Shaw

@jimshaw.bsky.social

Postdoc at Dana-Farber and Harvard Med with Heng Li (@lh3lh3.bsky.social). Prev: UBC / UofT. I like thinking about biological sequence analysis and its applications to metagenomics / microbial genomics. https://jim-shaw-bluenote.github.io

1,148 Followers  |  504 Following  |  107 Posts  |  Joined: 20.09.2023  |  2.1593

Latest posts by jimshaw.bsky.social on Bluesky


Preview
Release SeqKit v2.13.0 (10-year-old birthday version) Β· shenwei356/seqkit Changelog SeqKit is 10 years old! SeqKit v2.13.0 - 2026-02-28 seqkit: add support for reading and writing LZ4 compression format. new command: seqkit sample2: improved seqkit sample by @stahiga....

Can't wait to release a 10-year-old birthday version for SeqKit!

- 10 years
- 2 papers, 3500 citations
- 20 contributors
- 40 subcommands
- 880 commits
- 500 issues
- 685.5K Bioconda total downloads

Thank you all, dear contributors and users!
I'll keep maintaining it.

github.com/shenwei356/s...

27.02.2026 13:25 β€” πŸ‘ 80    πŸ” 26    πŸ’¬ 5    πŸ“Œ 1
Overview β€” AllTheBacteria documentation

Courtesy of @martibartfast.bsky.social , we have a new release of AllTheBacteria which adds another 322,920 assemblies, covering all ENA (illumina, isolate) prokaryotes to May 2025.
allthebacteria.readthedocs.io/en/latest/ov...

26.02.2026 15:48 β€” πŸ‘ 59    πŸ” 28    πŸ’¬ 0    πŸ“Œ 3
Preview
Detecting foldback artifacts in long-reads - BMC Genomics Long-read sequencing data is useful for detecting large and complex structural variations; however, technical artifacts can lead to false structural variant calls. In our analyses, we became aware of ...

Our paper on foldback artifacts in long-read sequencing is now published in BMC Genomics!

We introduce Breakinator to flag foldback and chimeric artifacts across library types, sequencers, and chemistries.

Paper: link.springer.com/article/10.1...

With Matthew Meyerson and @lh3lh3.bsky.social

24.02.2026 16:02 β€” πŸ‘ 17    πŸ” 6    πŸ’¬ 1    πŸ“Œ 0

Excited to share our latest work on early-life Bifidobacteria. We built a global genomic atlas of 4,000+ genomes from 48 countries for B. infantis and B. longum, expanding previous resources by 15-fold.

Read the press release @sangerinstitute.bsky.social: www.sanger.ac.uk/news_item/mi...

20.02.2026 15:23 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

Were you aware that these are available for the community:
- an efficient fasta/q parser in Rust
- a quick rolling hash library
- a minimizer library
All in Rust πŸ¦€

19.02.2026 14:21 β€” πŸ‘ 28    πŸ” 8    πŸ’¬ 1    πŸ“Œ 0

Super happy that the AllTheBacteria hypothetical proteins are now in AFDB - hopefully we can start to understand the function of some of them at least 😁

18.02.2026 04:46 β€” πŸ‘ 39    πŸ” 7    πŸ’¬ 0    πŸ“Œ 0

Great to see that Emu's still being worked on and updated!

Indeed, I think Emu and savont will serve different niches in the future and its very cool to see your development. Thanks for the informative post.

13.02.2026 21:07 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Recent updates Contribute to treangenlab/emu development by creating an account on GitHub.

I enjoyed reading your post @jimshaw.bsky.social and learning about Savont, thank you for the comparison to Emu and kudos on your new tool!

relevant to points raised in the post, the Emu development team has compiled our follow-up thoughts here: github.com/treangenlab/...

13.02.2026 20:02 β€” πŸ‘ 7    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
Preview
Addressing pandemic-wide systematic errors in the SARS-CoV-2 phylogeny - Nature Methods This Resource paper presents a global SARS-CoV-2 phylogenetic tree of 4,471,579 high-quality genomes consistently constructed by Viridian, an efficient amplicon-aware assembler.

A long time ago in a galaxy far away, there was a SARS-CoV-2 pandemic. Our paper, led by @martibartfast.bsky.social
a) correcting errors in 4.5 million genomes & their phylogeny
b) improving representation of the Global South in public data
www.nature.com/articles/s41...
(thread 1/n)

09.02.2026 15:16 β€” πŸ‘ 135    πŸ” 65    πŸ’¬ 3    πŸ“Œ 6

If you have an interest in mixing computation and experiments to understand microbial evolution (for example antibiotic resistance) and you think you might be a good fit for a postdoc in my lab, reach out. If it seems it might be a fit I’m happy to help you frame it as AI for Biology per this call

06.02.2026 01:04 β€” πŸ‘ 25    πŸ” 20    πŸ’¬ 2    πŸ“Œ 0
Preview
QuadRank: Engineering a High Throughput Rank Given a text, a query $\mathsf{rank}(q, c)$ counts the number of occurrences of character $c$ among the first $q$ characters of the text. Space-efficient methods to answer these rank queries form an i...

Now also on arxiv:
arxiv.org/abs/2602.04103

05.02.2026 10:53 β€” πŸ‘ 5    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
GTDB - skani calculator An interface to compute pairwise ANI of NCBI genomes using the GTDB taxonomy.

The online ANI calculator (skani) that allows you to compare your genome data with reference sequences in GTDB saves so much time and faffing around. :)

02.02.2026 20:27 β€” πŸ‘ 12    πŸ” 2    πŸ’¬ 0    πŸ“Œ 2
GuFi phages represent the most prevalent viral family-level clusters in the human gut microbiome Despite being important ecological modulators of the gut microbiome, bacteriophage diversity and function remain under-characterized. We show that short-read metagenomic surveys can miss even globally highly prevalent viral family-level clusters (VFCs), that can be readily assembled and characterized with long-read metagenomic data from a relatively small cohort (n=109). While gut Bacteroidota phages have been the prevailing focus in the literature, we show that highly prevalent gut phage families frequently have Firmicutes hosts (termed GuFi phages), with broad host ranges verified using proximity-ligation (Hi-C) sequencing data. High-throughput sequencing of virus-like particles from fecal samples detected frequent enrichment of GuFi phages across samples, revealing their under-appreciated impact on the gut microbiome. We report the first in vitro induction and imaging of members of prevalent GuFi clades including the candidate orders Heliusvirales , Astravirales (VFC 2) and Suryavirales (VFC 4). Our findings underscore the importance of GuFi phages with broad host ranges in the gut microbiome, and the utility of long-read sequencing for viral discovery, paving the way for deeper insights into the role of bacteriophages in human health and disease. ### Competing Interest Statement IL is an employee of Phase Genomics. National Medical Research Council, 23-0614 National Research Foundation, NRFI09-0015 A*STAR, C210812044

Thrilled to share our labor of love over the last 5 years 🀩

Leveraging long-read metagenomics (@nanoporetech.com) we identified some of the most prevalent gut phage families that have previously been overlooked in short-read based studies. [1/5]

Read more here: www.biorxiv.org/content/10.6...

30.01.2026 10:15 β€” πŸ‘ 12    πŸ” 7    πŸ’¬ 1    πŸ“Œ 0
Preview
Multiple protein structure alignment at scale with FoldMason Protein structure is conserved beyond sequence, making multiple structural alignment (MSTA) essential for analyzing distantly related proteins. Computational prediction methods have vastly extended our repository of available protein structures, ...

Multiple protein structure alignment at scale with FoldMason | Science https://www.science.org/doi/full/10.1126/science.ads6733?af=R

29.01.2026 23:31 β€” πŸ‘ 11    πŸ” 4    πŸ’¬ 0    πŸ“Œ 0

I've just released #rust-htslib 1.0. After a long time with a pretty stable API usage of rust-htslib in production, it feels like the right time to finally move to 1.0. Most important change is probably a switch to thread-safe pointers in BAM record handling. github.com/rust-bio/rus...

29.01.2026 10:37 β€” πŸ‘ 17    πŸ” 7    πŸ’¬ 1    πŸ“Œ 0

Awesome to hear, let me know how it goes.

28.01.2026 23:39 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Work supervised by @lh3lh3.bsky.social and in collab with @mkddueholm.bsky.social and his wonderful team.

Savont is still quite new, so feedback is more than welcome.

4/4

28.01.2026 18:47 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0
Post image

Using ASVs seems to avoid some issues with low-abundance, false-positive species calls from read mapping. Of course, ASVs will be less sensitive, though.

For more preliminary results and tests on actual data, see github.com/bluenote-157...

3/4

28.01.2026 18:45 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Preliminary: savont gets ASVs with ~10-30x less depth than previous denoising methods for *nanopore R10.4 amplicons*.

Essentially, previous short-read methods have difficulties with longer reads and higher error rates. Savont uses new techniques to handle this.

2/4

28.01.2026 18:45 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
GitHub - bluenote-1577/savont: Amplicon sequencing variants from 16s ONT R10.4 / HiFi long reads Amplicon sequencing variants from 16s ONT R10.4 / HiFi long reads - bluenote-1577/savont

Announcing a new tool for "denoising" long-read amplicon sequences: savont.

Savont enables amplicon sequence variants (ASVs) directly from nanopore (or HiFi) long reads. Tested on 16S nanopore amplicons -- seems to work okay.

1/4

github.com/bluenote-157...

28.01.2026 18:45 β€” πŸ‘ 51    πŸ” 28    πŸ’¬ 1    πŸ“Œ 2
P2 Solo announcement and the trade-offs of a more stable ONT a blog for miscellaneous bioinformatics stuff

New blog post with some thoughts on @nanoporetech.com and their recent announcement that the P2 Solo will be discontinued:
rrwick.github.io/2026/01/21/p...

21.01.2026 03:38 β€” πŸ‘ 21    πŸ” 14    πŸ’¬ 0    πŸ“Œ 0
Post image

We just released #anvio v9, "eunice" πŸŽ‰

This version represents over 2,000 changes in the codebase since v8, increasing the total number of programs in the anvi'o ecosystem to 176.

Read the release notes:

github.com/merenlab/anv...

Visit our up-to-date web page:

anvio.org

20.01.2026 11:48 β€” πŸ‘ 71    πŸ” 34    πŸ’¬ 2    πŸ“Œ 3
Preview
Mirdita Lab - Laboratory for Computational Biology & Molecular Machine Learning Mirdita Lab builds scalable bioinformatics methods.

My time in @martinsteinegger.bsky.social's group is ending, but I’m staying in Korea to build a lab at Sungkyunkwan University School of Medicine. If you or someone you know is interested in molecular machine learning and open-source bioinformatics, please reach out. I am hiring!
mirdita.org

20.01.2026 11:07 β€” πŸ‘ 105    πŸ” 54    πŸ’¬ 7    πŸ“Œ 1
HLi Lab - Vacancies Openings

I am looking for a postdoc to develop high-performance algorithms in computational genomics. Email or DM me if interested. For more information, see hlilab.github.io/vacancies. RTs appreciated!

14.01.2026 15:44 β€” πŸ‘ 43    πŸ” 64    πŸ’¬ 1    πŸ“Œ 0

Phold's manuscript is now available @narjournal.bsky.social thanks to @susiegriggo.bsky.social @npbhavya.bsky.social @vijinim.bsky.social @linsalrob.bsky.social @martinsteinegger.bsky.social @milot.bsky.social @eunbelivable.bsky.social & others not on bsky #phagesky academic.oup.com/nar/article/...

14.01.2026 05:10 β€” πŸ‘ 83    πŸ” 44    πŸ’¬ 1    πŸ“Œ 1

Now published in Algorithms for Molecular Biology: link.springer.com/article/10.1.... Key message: a tiny CNN model with 7k parameters can capture main splice signals across vertebrates+insect and halves the minimap2 & miniprot junction error rate. I always use this new feature now.

06.01.2026 23:02 β€” πŸ‘ 58    πŸ” 20    πŸ’¬ 1    πŸ“Œ 0

πŸŽ‰ New year, NEW PREPRINT!

Bacteria exhibit astonishing genetic diversity, but where do new genes come from?

My best friend Arya Kaul (/labmate in the @baym lab) investigates how advantageous deletions can spawn new genes - "deletion-born fusions." 🧡:

06.01.2026 16:09 β€” πŸ‘ 49    πŸ” 30    πŸ’¬ 1    πŸ“Œ 2

Proud to announce SimPhyNI, a new tool for bacterial GWAS with higher precision and scalability than existing tools. Try it out and let us know what you think!!

05.01.2026 14:55 β€” πŸ‘ 66    πŸ” 32    πŸ’¬ 5    πŸ“Œ 1
Preview
Release Version 0.17.0 Β· ksahlin/strobealign Changes #504: Introduce collinear chaining as the new default mapping and alignment method, replacing NAMs. This reproduces the Minimap2 chaining algorithm providing better mapping accuracy. NAMs ...

Strobealign v0.17.0 is out! github.com/ksahlin/stro...

29.12.2025 05:49 β€” πŸ‘ 8    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0
Preview
Gene-specific selective sweeps are pervasive across human gut microbiomes - Nature Development and application of the integrated linkage disequilibrium score (iLDS) reveals both selective pressures impacting the human gut microbiome and the mechanisms by which gut bacteria adapt to ...

Grateful to share our paper on gene-specific selective sweeps in human gut microbiomes, now out in Nature! It has been a joy to work with @rwolff.bsky.social, whose insights and hard work made this possible.
www.nature.com/articles/s41...

17.12.2025 18:53 β€” πŸ‘ 147    πŸ” 69    πŸ’¬ 10    πŸ“Œ 3

@jimshaw is following 20 prominent accounts