Li Song's Avatar

Li Song

@mourisl.bsky.social

毛利元光. Assistant Prof at the Department of Biomedical Data Science at Dartmouth College. Research on bioinformatics, algorithms. Lab page: mourisl.github.io

242 Followers  |  363 Following  |  10 Posts  |  Joined: 18.11.2024  |  1.7762

Latest posts by mourisl.bsky.social on Bluesky

Congratulations!!!!!!!!!!

07.06.2025 03:19 — 👍 1    🔁 0    💬 1    📌 0

Introns have to come from somewhere, right? @celineh2ooo.bsky.social and I looked at multiple genome alignments with 1000s of genomes and found 342 cases where humans (and our relatives) had gained a new intron. Still not sure where these come from, but it's a fascinating question

04.06.2025 20:13 — 👍 42    🔁 12    💬 2    📌 1

Neng Huang developed longcallR for joint SNP calling and phasing from long RNA-seq reads, AND for identifying allele-specific splicing/junctions (ASJ). Although ASJs of statistical significance are rare, a large fraction involve unannotated junctions. In Rust!

30.05.2025 14:54 — 👍 16    🔁 7    💬 0    📌 0

Industry friends, now is the time for MUCH more speaking out on behalf of academic colleagues under duress. Here are core open source methods that many of your products doubtlessly depend on either directly or indirectly (see en.wikipedia.org/wiki/HMMER) being abruptly defunded. Make noise.

29.05.2025 14:39 — 👍 76    🔁 50    💬 1    📌 0
myloasm - metagenomic assembly with (noisy) long reads

Announcing myloasm, a new long-read (ONT R10/PacBio) metagenome assembler that I've been working on during my postdoc in the Heng Li lab (@lh3lh3.bsky.social).

myloasm-docs.github.io

28.05.2025 17:53 — 👍 131    🔁 77    💬 5    📌 3
Preview
Partitioned Multi-MUM finding for scalable pangenomics Pangenome collections are growing to hundreds of high-quality genomes. This necessitates scalable methods for constructing pangenome alignments that can incorporate newly-sequenced assemblies. We prev...

Excited to share a new update to Mumemto, scaling MUM and conserved element finding to any size pangenome! Preprint out now w/ @benlangmead.bsky.social.
Mumemto scales to the new HPRC v2 release and beyond, and can merge in future assemblies without any recomputation! 1/n

27.05.2025 19:35 — 👍 27    🔁 15    💬 2    📌 2

Centrifuger has updated the pre-built index list to include this exciting GTDB new release r226 for taxonomic classification of sequencing data: github.com/mourisl/cent.... There is also a gtdb+refseq human/virus/fungi/contaminants index, hopefully will be useful for human microbiome studies.

27.05.2025 15:58 — 👍 3    🔁 0    💬 0    📌 0

Great 🧵 by Pierre on the Kaminari paper! In short, Kaminari is a simple and elegant, but highly effective index for approximate colored k-mer queries. The simplicity leads to very fast query, but with accuracy consistent with (or exceeding) best-in-class solutions; a very fun collaboration indeed!

27.05.2025 15:41 — 👍 10    🔁 2    💬 0    📌 0
Preview
Efficient evidence-based genome annotation with EviAnn For many years, machine learning-based ab initio gene finding approaches have been the central components of eukaryotic genome annotation pipelines, and they remain so today. The reliance on these app...

Bioinformatics folks: check out our @biorxivpreprint on a new, very efficient and accurate system for automated genome annotation, EviAnn, led by my colleague Aleksey Zimin: www.biorxiv.org/content/10.1...

13.05.2025 17:52 — 👍 54    🔁 22    💬 1    📌 0

Congratulations!!!!

09.05.2025 21:14 — 👍 1    🔁 0    💬 1    📌 0
Preview
Inside UniProt Rich Epitope Information Comes to UniProt Mammalian immune responses are mediated by interactions between antigens and immune system compo...

Check out our latest collaboration with UniProt, who has integrated over 700,000 experimentally validated epitopes to enhance its protein entries with detailed immune response information. This data is accessible via the UniProt Feature Viewer and API! 💻🔬🧪 #collaboration #immunology #proteins

09.05.2025 00:35 — 👍 2    🔁 1    💬 0    📌 0
WABI 2025 WABI Conference on Algorithms in Bioinformatics

The deadline for WABI 2025 has been extended (but is still rapidly approaching) wabiconf.github.io/2025/

* abstract deadline: May 12 (AoE)
* paper deadline: May 15 (AoE)

Consider submitting your exciting algorithmic bioinformatics work to the WABI conference!

07.05.2025 19:14 — 👍 10    🔁 11    💬 0    📌 2

Thank you!

04.05.2025 14:46 — 👍 0    🔁 0    💬 0    📌 0

Forgot to dustmasker the genomes before creating a Centrifuger index and indeed saw some misclassifications. Took a while to figure out and lessons learned... Need to implement a built-in masking step like Kraken2 in case forget doing it in the future..

04.05.2025 06:25 — 👍 0    🔁 0    💬 1    📌 0
Preview
Parsing GTF and FASTA files using the eccLib Library Summary: Leveraging the Python/C API, eccLib was developed as a high-performance library designed for parsing genomic files and analysing genomic contexts. To the best of the authors' knowledge, it…

Parsing GTF and FASTA files using the eccLib Library www.biorxiv.org/content/10.1... 🧬🖥️🧪 gitlab.platinum.edu.pl/eccdna/eccLib

29.04.2025 18:30 — 👍 6    🔁 3    💬 0    📌 1
Preview
GitHub - ArcInstitute/xsra: An efficient CLI to extract sequences from the SRA An efficient CLI to extract sequences from the SRA - ArcInstitute/xsra

Extracting @NCBI SRA files with fasterq-dump can require 17x the size of the accession while decompressing. Our new tool xsra extracts sequences at 5x throughput with significantly less disk usage, built-in compression, and optional BINSEQ outputs

github.com/arcInstitute...

29.04.2025 21:03 — 👍 39    🔁 15    💬 2    📌 1
AllTheBacteria

Small update from AllTheBacteria (allthebacteria.org). Assemblies can be bulk downloaded from OSF as before, or you can now get individual assemblies from AWS. We now also have a LexicMap index on AWS, so you can align your favourite gene against 2.4million bacteria (next post for price estimates)

29.04.2025 15:36 — 👍 47    🔁 23    💬 1    📌 2
Post image

The Department of Human Genetics at the University of Utah is sponsoring the Rising Stars in Genetics and Genomics symposium!

- We are seeking nominations bu June 1.
- September 18-19, 2025
- Please share with the star postdocs that you know.

docs.google.com/forms/d/e/1F...

28.04.2025 17:20 — 👍 53    🔁 43    💬 1    📌 1
Post image

The sequence analysis session of #RECOMB2025 is off to a great start with @jimshaw.bsky.social presenting devider, a new algorithm for haplotyping small sequences from long-read sequencing.

www.biorxiv.org/content/10.1...

27.04.2025 01:27 — 👍 26    🔁 6    💬 1    📌 0
Post image

If you want to check if a human gene has copy-number changes or lands in a complex region, try pangene.bioinweb.org. Recently updated with more and better assemblies.

26.04.2025 01:06 — 👍 44    🔁 13    💬 1    📌 0

Time to build a new index!!

24.04.2025 01:54 — 👍 7    🔁 1    💬 0    📌 0
Short RNA-seq read alignment with minimap2

Minimap2-2.29 released with the support of short RNA-seq read alignment. More explanation and results here: lh3.github.io/2025/04/18/s...

18.04.2025 21:53 — 👍 29    🔁 7    💬 0    📌 0
Preview
Efficient near telomere-to-telomere assembly of Nanopore Simplex reads Telomere-to-telomere (T2T) assembly is the ultimate goal for de novo genome assembly. Existing algorithms capable of near T2T assembly all require Oxford Nanopore Technologies (ONT) ultra-long reads w...

Preprint on hifiasm Nanopore-only assembly. Led by Haoyu Cheng: www.biorxiv.org/content/10.1...

18.04.2025 21:54 — 👍 139    🔁 77    💬 5    📌 6
Short RNA-seq read alignment with minimap2

minimap2 adds support for short read spliced RNA-seq alignment! lh3.github.io/2025/04/18/s...

18.04.2025 21:58 — 👍 34    🔁 8    💬 1    📌 1

Happy Birthdays, Ben and Rob! Very 2-power day!

02.04.2025 18:51 — 👍 2    🔁 0    💬 0    📌 0
Schematic figures showing global pairwise alignment algorithms

Schematic figures showing global pairwise alignment algorithms

A worked example for each algorithm

A worked example for each algorithm

Schemetic figures showing various modes, such as semi-global, local, and extension alignment

Schemetic figures showing various modes, such as semi-global, local, and extension alignment

New set of thesis figures on pairwise alignment just dropped!
- schematic and worked example for many algorithms
- alignment modes

27.03.2025 16:30 — 👍 33    🔁 9    💬 2    📌 0
GitHub - fulcrumgenomics/fqgrep: Grep for FASTQ files Grep for FASTQ files. Contribute to fulcrumgenomics/fqgrep development by creating an account on GitHub.

fqgrep release 1.1.0 now speeds up searching FASTQ files!

Thank-you to both Markus Schlegel from @activegroupgmbh.bsky.social for updating seq_io and Nicholas D. Crosbie of grepq for some competition and inspiration.

See more: github.com/fulcrumgenom...

14.03.2025 17:45 — 👍 12    🔁 5    💬 1    📌 0
Schematic of the mod-bucket algorithm: all k-mer hashes are partitioned into s buckets via their remainder mod s. Then, in each bucket the smallest hash is selected.

Schematic of the mod-bucket algorithm: all k-mer hashes are partitioned into s buckets via their remainder mod s. Then, in each bucket the smallest hash is selected.

Just published simd-sketch, a crate for fast bucket sketches.
It's 7x to 30x faster than BinDash, by using the simd-minimizers crate for fast hashing, and a nearly branch-free implementation.

Here's a blogpost with a survey of minhash history & methods, and evals:

curiouscoding.nl/posts/simd-s...

14.03.2025 00:35 — 👍 12    🔁 9    💬 1    📌 0

@mourisl is following 20 prominent accounts