Congratulations!!!!!!!!!!
07.06.2025 03:19 — 👍 1 🔁 0 💬 1 📌 0@mourisl.bsky.social
毛利元光. Assistant Prof at the Department of Biomedical Data Science at Dartmouth College. Research on bioinformatics, algorithms. Lab page: mourisl.github.io
Congratulations!!!!!!!!!!
07.06.2025 03:19 — 👍 1 🔁 0 💬 1 📌 0Introns have to come from somewhere, right? @celineh2ooo.bsky.social and I looked at multiple genome alignments with 1000s of genomes and found 342 cases where humans (and our relatives) had gained a new intron. Still not sure where these come from, but it's a fascinating question
04.06.2025 20:13 — 👍 42 🔁 12 💬 2 📌 1Neng Huang developed longcallR for joint SNP calling and phasing from long RNA-seq reads, AND for identifying allele-specific splicing/junctions (ASJ). Although ASJs of statistical significance are rare, a large fraction involve unannotated junctions. In Rust!
30.05.2025 14:54 — 👍 16 🔁 7 💬 0 📌 0Industry friends, now is the time for MUCH more speaking out on behalf of academic colleagues under duress. Here are core open source methods that many of your products doubtlessly depend on either directly or indirectly (see en.wikipedia.org/wiki/HMMER) being abruptly defunded. Make noise.
29.05.2025 14:39 — 👍 76 🔁 50 💬 1 📌 0Announcing myloasm, a new long-read (ONT R10/PacBio) metagenome assembler that I've been working on during my postdoc in the Heng Li lab (@lh3lh3.bsky.social).
myloasm-docs.github.io
Excited to share a new update to Mumemto, scaling MUM and conserved element finding to any size pangenome! Preprint out now w/ @benlangmead.bsky.social.
Mumemto scales to the new HPRC v2 release and beyond, and can merge in future assemblies without any recomputation! 1/n
Centrifuger has updated the pre-built index list to include this exciting GTDB new release r226 for taxonomic classification of sequencing data: github.com/mourisl/cent.... There is also a gtdb+refseq human/virus/fungi/contaminants index, hopefully will be useful for human microbiome studies.
27.05.2025 15:58 — 👍 3 🔁 0 💬 0 📌 0Great 🧵 by Pierre on the Kaminari paper! In short, Kaminari is a simple and elegant, but highly effective index for approximate colored k-mer queries. The simplicity leads to very fast query, but with accuracy consistent with (or exceeding) best-in-class solutions; a very fun collaboration indeed!
27.05.2025 15:41 — 👍 10 🔁 2 💬 0 📌 0Bioinformatics folks: check out our @biorxivpreprint on a new, very efficient and accurate system for automated genome annotation, EviAnn, led by my colleague Aleksey Zimin: www.biorxiv.org/content/10.1...
13.05.2025 17:52 — 👍 54 🔁 22 💬 1 📌 0Congratulations!!!!
09.05.2025 21:14 — 👍 1 🔁 0 💬 1 📌 0Check out our latest collaboration with UniProt, who has integrated over 700,000 experimentally validated epitopes to enhance its protein entries with detailed immune response information. This data is accessible via the UniProt Feature Viewer and API! 💻🔬🧪 #collaboration #immunology #proteins
09.05.2025 00:35 — 👍 2 🔁 1 💬 0 📌 0The deadline for WABI 2025 has been extended (but is still rapidly approaching) wabiconf.github.io/2025/
* abstract deadline: May 12 (AoE)
* paper deadline: May 15 (AoE)
Consider submitting your exciting algorithmic bioinformatics work to the WABI conference!
Thank you!
04.05.2025 14:46 — 👍 0 🔁 0 💬 0 📌 0Forgot to dustmasker the genomes before creating a Centrifuger index and indeed saw some misclassifications. Took a while to figure out and lessons learned... Need to implement a built-in masking step like Kraken2 in case forget doing it in the future..
04.05.2025 06:25 — 👍 0 🔁 0 💬 1 📌 0Parsing GTF and FASTA files using the eccLib Library www.biorxiv.org/content/10.1... 🧬🖥️🧪 gitlab.platinum.edu.pl/eccdna/eccLib
29.04.2025 18:30 — 👍 6 🔁 3 💬 0 📌 1Extracting @NCBI SRA files with fasterq-dump can require 17x the size of the accession while decompressing. Our new tool xsra extracts sequences at 5x throughput with significantly less disk usage, built-in compression, and optional BINSEQ outputs
github.com/arcInstitute...
Small update from AllTheBacteria (allthebacteria.org). Assemblies can be bulk downloaded from OSF as before, or you can now get individual assemblies from AWS. We now also have a LexicMap index on AWS, so you can align your favourite gene against 2.4million bacteria (next post for price estimates)
29.04.2025 15:36 — 👍 47 🔁 23 💬 1 📌 2The Department of Human Genetics at the University of Utah is sponsoring the Rising Stars in Genetics and Genomics symposium!
- We are seeking nominations bu June 1.
- September 18-19, 2025
- Please share with the star postdocs that you know.
docs.google.com/forms/d/e/1F...
The sequence analysis session of #RECOMB2025 is off to a great start with @jimshaw.bsky.social presenting devider, a new algorithm for haplotyping small sequences from long-read sequencing.
www.biorxiv.org/content/10.1...
If you want to check if a human gene has copy-number changes or lands in a complex region, try pangene.bioinweb.org. Recently updated with more and better assemblies.
26.04.2025 01:06 — 👍 44 🔁 13 💬 1 📌 0Time to build a new index!!
24.04.2025 01:54 — 👍 7 🔁 1 💬 0 📌 0Minimap2-2.29 released with the support of short RNA-seq read alignment. More explanation and results here: lh3.github.io/2025/04/18/s...
18.04.2025 21:53 — 👍 29 🔁 7 💬 0 📌 0Preprint on hifiasm Nanopore-only assembly. Led by Haoyu Cheng: www.biorxiv.org/content/10.1...
18.04.2025 21:54 — 👍 139 🔁 77 💬 5 📌 6minimap2 adds support for short read spliced RNA-seq alignment! lh3.github.io/2025/04/18/s...
18.04.2025 21:58 — 👍 34 🔁 8 💬 1 📌 1Happy Birthdays, Ben and Rob! Very 2-power day!
02.04.2025 18:51 — 👍 2 🔁 0 💬 0 📌 0Schematic figures showing global pairwise alignment algorithms
A worked example for each algorithm
Schemetic figures showing various modes, such as semi-global, local, and extension alignment
New set of thesis figures on pairwise alignment just dropped!
- schematic and worked example for many algorithms
- alignment modes
fqgrep release 1.1.0 now speeds up searching FASTQ files!
Thank-you to both Markus Schlegel from @activegroupgmbh.bsky.social for updating seq_io and Nicholas D. Crosbie of grepq for some competition and inspiration.
See more: github.com/fulcrumgenom...
Schematic of the mod-bucket algorithm: all k-mer hashes are partitioned into s buckets via their remainder mod s. Then, in each bucket the smallest hash is selected.
Just published simd-sketch, a crate for fast bucket sketches.
It's 7x to 30x faster than BinDash, by using the simd-minimizers crate for fast hashing, and a nearly branch-free implementation.
Here's a blogpost with a survey of minhash history & methods, and evals:
curiouscoding.nl/posts/simd-s...