Ben J Woodcroft's Avatar

Ben J Woodcroft

@benjwoodcroft.bsky.social

Yet another microbial bioinformatician, group leader, dad github.com/wwood https://research.qut.edu.au/cmr/team/ben-woodcroft/

384 Followers  |  136 Following  |  61 Posts  |  Joined: 20.11.2024  |  2.6366

Latest posts by benjwoodcroft.bsky.social on Bluesky

Preview
Public MAG datasets not available at NCBI or ENA Some metagenome assembled genome (MAG) datasets are not available in the standard locations (NCBI / ENA / etc) for a variety of reasons. Here you can contribute new ones you come across. To be recorde...

In the meantime, we've been collecting a list of these at tinyurl.com/mag-collecti.... Feel free to add more you find.

See also GlobDB from @daanspeth.bsky.social which incorporates some of these into a new MAG collection arxiv.org/abs/2506.11896

24.10.2025 08:41 β€” πŸ‘ 11    πŸ” 5    πŸ’¬ 0    πŸ“Œ 0
FAQ Documentation for SingleM

Yes normal behaviour. See wwood.github.io/singlem/FAQ for the formula - most windows are 60bp, and so if your reads are uniform length you get that.

But you are looking at the OTU table there, perhaps you want the taxonomic profile output (which is a more final output)?

23.07.2025 22:27 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Trimmed reads are bad news when they become short, but if they remain 100bp+ then you should be fine I reckon.
2/2

23.07.2025 05:35 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
wwood/singlem Novelty-inclusive microbial (and now dsDNA phage) community profiling of shotgun metagenomes - wwood/singlem

Thanks - strange that your Lyrebird experience wasn't good. Please report errors (what did you you?) at github.com/wwood/single... or just via email. We test installation inclusive of DB download at github.com/wwood/single...

But a new version of the lyrebird DB incoming btw.
1/2

23.07.2025 05:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0
the treasure trove of all sequencing datasets

the treasure trove of all sequencing datasets

Excited to share a new preprint from the lab with @ryandhindsa.bsky.social ! www.biorxiv.org/content/10.1...

Led by @sherrynyeo.bsky.social, @erinmayc.bsky.social, and friends, we continue our journey to find viral DNA in our favorite place-- the overlooked and discarded reads in existing data! 1/

22.07.2025 21:58 β€” πŸ‘ 69    πŸ” 37    πŸ’¬ 1    πŸ“Œ 4
FAQ Documentation for SingleM

Thanks for kind words. By UCEs you mean e.g. 16S? It actually does this already, and tests pass. But it isn't the most efficient and code is a bit crusty and db is out of date, since it doesn't get used much. See wwood.github.io/singlem/FAQ

19.07.2025 10:51 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
singlem-benchmarking/bin at main Β· wwood/singlem-benchmarking Contribute to wwood/singlem-benchmarking development by creating an account on GitHub.

This is great @titus.idyll.org (though to be picky it's SingleM or singlem, not singleM). We wrote a few parsers for other formats at github.com/wwood/single... - it'd be nice if not everyone needed to reinvent (and use standard names for things like coverage inclusive vs exclusive of children).

18.07.2025 23:39 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I wonder if AI could do a good job of that integration. I'd love to learn some Haskell actually, just need to find the time..

18.07.2025 23:33 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Nanopore by thepatientwait Β· Pull Request #208 Β· wwood/singlem Working Nanopore build. Important changes: DIAMOND prefilter Uses --range-culling + related args for DIAMOND. These results are now streamed to help memory. Sequences are indexed using gene names ...

There is also a branch that takes nanopore reads as input, which works reasonably well. We are putting some final code touches on it, but maybe helpful - github.com/wwood/single...

18.07.2025 01:07 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Good good, or could be better?

17.07.2025 23:24 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Good q. Imagine that new Chem nanopore should be fine. You can check by running the supplemented package on your genomes and making sure there is the expected number of markers detected. Should be in line with mag completeness.

17.07.2025 23:22 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

cheers Daan - here's a thread explaining some of the deets bsky.app/profile/benj...

16.07.2025 22:07 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Thanks for spreading the word @jcamthrash.bsky.social - there's a explanatory thread at bsky.app/profile/benj...

16.07.2025 22:05 β€” πŸ‘ 6    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Thanks for considering it for publication. There's a explanation thread at bsky.app/profile/benj...

16.07.2025 22:03 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Much appreciated @bcoltman.bsky.social

16.07.2025 22:02 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Comprehensive taxonomic identification of microbial species in metagenomic data using SingleM and Sandpiper Nature Biotechnology - Novel microbial species in metagenomes are identified using conserved regions within universal marker genes.

Thanks for reading this thread - share link at rdcu.be/ewqLW

16.07.2025 21:59 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Thanks also to the reviewers including Alice McHardy - very fair and helpful we thought.

16.07.2025 21:59 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Many many to thank, particularly
@aroneys.bsky.social @rossenzhao.bsky.social Mitchell Cunningham, Linda Blackall, Gene Tyson, @cmrqut.bsky.social and dozens of people who have helped with the software, ms, and everyone who tolerated my enthusiasm.

16.07.2025 21:59 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
SingleM supplement Documentation for SingleM

SingleM is BYO genome, you can add your MAGs to the refDB to get profiles which include both known species and your novel MAGs. wwood.github.io/singlem/tool...

16.07.2025 21:59 β€” πŸ‘ 5    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0

Novel lineage detection + 700k profiles makes it possible to recover novel MAGs from taxons you care about. We recovered new genera from the underrepresented Muirbacteria, Wallbacteria, Riflebacteria and Fusobacteria phyla by assembling the right metagenomes.

16.07.2025 21:59 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Screenshot of the sandpiper website front page

Screenshot of the sandpiper website front page

@ace-gtdb.bsky.social R226-based profiles from 700k public metagenomes are at sandpiper.qut.edu.au. Search for your fave microbe by GTDB taxonomy there and see to get prevalence and community profiles. Got something novel? Get in touch.

16.07.2025 21:59 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Preview
GitHub - wwood/smafa: Biological sequence aligner for pre-aligned sequences Biological sequence aligner for pre-aligned sequences - wwood/smafa

A new
@rust-lang.org approach also helps - conserved regions are already aligned to each other so distance calcs become a vector similarity search problem. github.com/wwood/smafa Big distance => novel species. Props to @viralinstruction.bsky.social for awesome PR.

16.07.2025 21:59 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Walltime and RAM usage of SingleM.

Walltime and RAM usage of SingleM.

Fast and RAM-efficient since most raw reads are swiftly ignored. We optimise an up-front DIAMOND BLASTX-based method. Thanks @bbuchfink.bsky.social / Serratus for makeidx

16.07.2025 21:59 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Benchmarking of SingleM when challenged by novel phyla, genera and species. Performance of SingleM substantially greater than other tools, particularly for higher levels of novelty.

Benchmarking of SingleM when challenged by novel phyla, genera and species. Performance of SingleM substantially greater than other tools, particularly for higher levels of novelty.

Perhaps most strikingly, it detects microbes that aren't in the ref db, correctly weighting their relative abundance.

16.07.2025 21:59 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Benchmarking results of several community profilers, showing good performance of SingleM.

Benchmarking results of several community profilers, showing good performance of SingleM.

It's accurate on communities of known species / non-rep strains (though can struggle with low abundance species where coverage <1X)

16.07.2025 21:59 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
A cartoon of a sequence alignment, showing how only reads which fully cover the 20aa window of the HMM are retained for further downstream analysis.

A cartoon of a sequence alignment, showing how only reads which fully cover the 20aa window of the HMM are retained for further downstream analysis.

SingleM is a new approach to metagenome profiling. It uses conserved regions within marker genes (20aa) spanned by individual short reads. Concentrating analysis on these regions makes things easier.

16.07.2025 21:59 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

This is after the huge mining efforts of SPIRE @tsbschm.bsky.social, @borklab.bsky.social, SMAG, OceanMAGs etc.

16.07.2025 21:59 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Density plot of known species fractions found in various environments. Soil and sediment samples rarely contain >25% known species.

Density plot of known species fractions found in various environments. Soil and sediment samples rarely contain >25% known species.

In environmental samples 75% lack a genome/MAG (abundance-weighted). We aren't near to a genome for all species, not even close.

16.07.2025 21:59 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Logo for the Sandpiper website

Logo for the Sandpiper website

Out in @natbiotech.nature.com: Metagenome taxonomy profilers usually ignore unknown species. SingleM is an accurate profiler which doesn't, even detecting phyla with no MAGs. Profiles of 700,000 metagenomes at sandpiper.qut.edu.au. A 🧡

16.07.2025 21:59 β€” πŸ‘ 130    πŸ” 71    πŸ’¬ 7    πŸ“Œ 9
Preview
GenomeFISH: genome-based fluorescence in situ hybridisation for strain-level visualisation of microbial communities Abstract. Fluorescence in situ hybridisation (FISH) is a powerful tool for visualising the spatial organisation of microbial communities. However, traditio

Very excited to share the first paper out of my Postdoc @CMR:

GenomeFISH: genome-based fluorescence in situ hybridisation for strain-level visualisation of microbial communities.
@sjmcilroy.bsky.social @jamesvolmer.bsky.social @benjwoodcroft.bsky.social

doi.org/10.1093/isme...

🧡1/7

08.07.2025 02:30 β€” πŸ‘ 63    πŸ” 27    πŸ’¬ 4    πŸ“Œ 2

@benjwoodcroft is following 20 prominent accounts