Complex genetic variation in nearly complete human genomes - Nature
Using sequencing and haplotype-resolved assembly of 65 diverse human genomes, complex regions including the major histocompatibility complex and centromeres are analysed.
Two papers in today's issue of @nature.com β¬: 1) we assemble 65 genomes to near completion, including centromeres and the MHC. tinyurl.com/3huhax6w. 2) we sequence 1,019 genomes from the 1kGP with long reads, revealing SVs down to low allele frequencies tinyurl.com/wbx3we9x.
23.07.2025 15:12 β π 54 π 24 π¬ 1 π 2
Sassy: Searching Short DNA Strings in the 2020s https://www.biorxiv.org/content/10.1101/2025.07.22.666207v1
26.07.2025 18:46 β π 7 π 3 π¬ 0 π 0
Congratulations to Rayan Chiki, (Institut Pasteur) head of the βSequence Bioinformaticsβ unit, for securing the ERC Proof of Concept 2025 for his project ENZYMINER! π
βͺ@rayan.chiki.bsky.social
#Bioinformatics
24.07.2025 15:10 β π 55 π 13 π¬ 4 π 2
After Tim Hunt won the Nobel, he said, "We do science because we like discovering things about the world...and then boasting about what we found".
Any one individual can argue about their own motivation, but it would naive to dispute that's an accurate description of many people.
11.07.2025 13:44 β π 15 π 4 π¬ 2 π 0
OReO: optimizing read order for practical compression
AbstractMotivation. Recent advances in high-throughput and third-generation sequencing technologies have created significant challenges in storing and mana
Paper alert!
We present Oreo a tools that reorder long reads datasets in a way to compress them efficiently with ANY universal compressor like gz, zstd, xz ...
TLDR: You can get state of the art compression WITHOUT a dedicated compressor/decompressor!
academic.oup.com/bioinformati...
A thread!
03.07.2025 10:52 β π 23 π 18 π¬ 1 π 1
I worked with Thomas during a three months research visit during his PhD, and it resulted in a paper in NAR. I highly recommend him. doi.org/10.1093/nar/...
02.07.2025 11:48 β π 9 π 8 π¬ 0 π 0
π₯οΈπ§¬ WABI '25 will not only have excellent keynotes, but an exciting program of papers. The titles and abstracts of all accepted WABI '25 papers are now available on the conference website (wabiconf.github.io/2025/talks/). I'm looking forward to seeing these talks!
25.06.2025 18:40 β π 9 π 3 π¬ 1 π 0
5/n
The code and paper are available:
π paper: tinyurl.com/4svc3xhu
π code: github.com/medvedevgrou...
25.06.2025 13:19 β π 0 π 0 π¬ 0 π 0
4/n
The traditional "repeat-oblivious" estimator can *overestimate mutation rates by an order of magnitude* on repetitive data. In contrast, the new estimator remains accurate across a broad range of rates and repetitive sequences (e.g. RBMY gene, Ξ±-satellite centromeres).
25.06.2025 13:19 β π 0 π 0 π¬ 1 π 0
3/n
Capturing the full repeat structure in the estimator is pretty hard and possibly not even needed. Instead, we account for the most pertinent part of the repeat structure in the estimator and the rest of the structure is accounted for in the bias formula.
25.06.2025 13:19 β π 0 π 0 π¬ 1 π 0
2/n
Tools such as Mash estimate the mutation rate via k-mer Jaccard similarity, assuming *non-repetitive* sequences. But in highly repetitive regions (e.g., Ξ±-satellite DNA), these estimates break down. We derive a novel estimator by relaxing the non-repetitive assumption.
25.06.2025 13:19 β π 0 π 0 π¬ 1 π 0
π§΅1/n
Estimating mutation rates using k-mers is fastβbut what happens when repeats dominate the genome?
In a new preprint, Haonan Wu, Antonio Blanca, and myself propose a *repeat-aware* estimator that's accurate even in centromeres.
25.06.2025 13:19 β π 29 π 14 π¬ 1 π 0
GitHub - COMBINE-lab/QCatch: Quality Control downstream of alevin-fry / simpleaf
Quality Control downstream of alevin-fry / simpleaf - COMBINE-lab/QCatch
π We are thrilled to introduce QCatch β a fast, command-line QC reporting tool built for alevin-fry & simpleaf single-cell data! Led by @ygao61.bsky.social & in collaboration with Dongze He π§¬π₯οΈ . The Preprint π is available at bit.ly/4neSznl. Read more below: 1/3
23.06.2025 12:27 β π 22 π 6 π¬ 1 π 1
Preprint alert! π¦
Our new abundance index, REINDEER2, is out!
It's cheap to build and update, offers tunable abundance precision at kmer level, and delivers very high query throughput.
Short thread!
www.biorxiv.org/content/10.1...
github.com/Yohan-Hernan...
19.06.2025 09:12 β π 22 π 13 π¬ 1 π 2
Also: what are the bottlenecks in your data processing?
Specifically, I'm looking for reasonably well defined & understood and widely used methods that could use a fresh high-throughput implementation.
Stuff like sketching, maybe assembly, ...
Surely, many pipelines could be sped up 10x ;)
20.06.2025 15:14 β π 4 π 3 π¬ 2 π 0
My thought is that you can't drop it because of that UNLESS you truly have no access to a compute node without more memory
16.06.2025 13:33 β π 1 π 0 π¬ 1 π 0
Build software better, together
GitHub is where people build software. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects.
5/4 This is a draft manuscript and we hope to receive feedback from the community. You can submit a GitHub issue using github.com/medvedevgrou... or email the authors privately
12.06.2025 11:28 β π 2 π 0 π¬ 0 π 0
4/4 We further clarify common misconceptions, e.g. the confusion between uniformity and regularity, the discrepancy between the original SimHash for vectors and the folklore version commonly used for estimating similarities among sets.
12.06.2025 11:26 β π 1 π 0 π¬ 1 π 0
3/4 We propose a categorization of hashing methods based on their properties, design goals, and application context.
12.06.2025 11:26 β π 0 π 0 π¬ 1 π 0
2/4 We provide a comprehensive overview of hash functions used in genomics. Hashing is central to many genomic tasks, but we found no good treatment that describes the wide variety of hash functions employed in these applications.
12.06.2025 11:26 β π 0 π 0 π¬ 1 π 0
Dropbox
1/4 Hash functions in genomic sequence analysis (tinyurl.com/4kk9ccmt) : a new survey written together with Ke Chen, Xiang Li, Qian Shi, and Mingfu Shao. Before submitting it, we are posting it online to get feedback from the community.
12.06.2025 11:26 β π 23 π 15 π¬ 1 π 0
Hi Rob, thanks for clarifying. If I now understand you correctly, the severity of the criticism in your original post was due more to the lack of source code then the choice of license, right?
08.06.2025 11:00 β π 0 π 0 π¬ 1 π 0
Congratulations!
08.06.2025 10:57 β π 1 π 0 π¬ 1 π 0
I was trying to understand the license but couldn't spot the issue. Could you point to the specific problematic language?
05.06.2025 08:44 β π 0 π 0 π¬ 1 π 0
Actually, if you open that "Source code" tarball, it doesn't contain any source code!
04.06.2025 17:39 β π 1 π 0 π¬ 1 π 0
Slides from my talk (with @kamilsjaron.bsky.social) on an history of k-mers in bioinformatics: rayan.chikhi.name/pdf/2025-kme...
03.06.2025 09:25 β π 42 π 24 π¬ 1 π 2
Professor of Algorithmic and Microbial Genomics at the University of Bath (UK). Pangenomes, drug resistance (esp TB), data structures for DNA search, plasmid evolution, global microbial surveillance. Open Data, reproducibility
Genomics, Bioinformatics.
github.com/tobiasrausch
CSE PhD student at Penn State
Theoretical and algorithmic bioinformatics
https://wu-haonan.github.io/
Epigenetic Inheritance, Neuroscience & anything biology-related
https://www.odedrechavilab.com/
TED: https://shorturl.at/myFTY
Huberman Lab Podcast: https://youtu.be/CDUetQMKM6g
Computational Biology at Goethe University Frankfurt
Professor in Bioinformatics - University of Montpellier
Institute of Evolutionary Sciences of Montpellier
Member of the Institut Universitaire de France (IUF)
President of the French Society of Bioinformatics
Computational Biology, Bioinformatics, Transcriptomics, Postdoc at CHOP, Penn State PhD
https://x-zang.github.io/
Inria Senior researcher.
Head of the https://team.inria.fr/genscale/ at Inria and Irisa.
Algorithmics for sequencing data analyses, genomics and metagenomics.
Assoc. Prof. of Computer Science, Biology, and the Huck Institutes of the Life Sciences. Algorithms for sequence analysis, probability, knowledge graphs, and dabbling in ML/AI
Research group in Lille, Fr.
Ph.D. in Mathematics
Currently postdoc at @bonsaiseqbioinfo.bsky.social, in Lille.
Investigating patterns (substructures) in structured data (sequences, trees, graphs) of predominantly biological origin.
More at https://fingels.github.io/
Assoc. Professor, Graph Algorithms and Bioinformatics @ U. Helsinki
https://www.cs.helsinki.fi/u/tomescu/
Curses back at dimensionality. High throughput, some output.
Assistant Professor, Gilbert S. Omenn Department of Computational Medicine & Bioinformatics, University of Michigan
Computer scientist and tennis player
Computer scientist at Charles University, Prague π¨πΏ I like all kinds of efficient algorithms and data structures for large datasets || also β°οΈπΊπ¦https://iuuk.mff.cuni.cz/~vesely/
Researcher at UCSC Genomics Institute. Space-efficient data structures and pangenome graphs.
The 29th International Conference on Research in Computational Molecular Biology. recomb.org
April 26 - 29, 2025