The simplex algorithm is super efficient. 80 years of experience says it runs in linear time. Nobody can explain _why_ it is so fast.
We invented a new algorithm analysis framework to find out.
@imartayan.bsky.social
PhD student in algorithmic bioinformatics at @bonsaiseqbioinfo.bsky.social. Interested in randomized algorithms and space-efficient data structures https://igor.martayan.org
The simplex algorithm is super efficient. 80 years of experience says it runs in linear time. Nobody can explain _why_ it is so fast.
We invented a new algorithm analysis framework to find out.
Really exciting that the preprint on Barbell, a new demultiplexer, is finally out!
It's the first tool that builds on Sassy, the approximate-DNA-searching tool that @rickbitloo.bsky.social and myself developed earlier this year, specifically with this application in mind.
1/6 Movi 2 is here: faster and more space-efficient for pangenome queries. Its fastest mode uses half the memory of Movi 1 while running ~30% faster. github.com/mohsenzakeri...
21.10.2025 20:00 — 👍 44 🔁 24 💬 1 📌 2Movi 2: Fast and Space-Efficient Queries on Pangenomes. #Pangenomes #SequenceQueries #Genomics #Bioinformatics @biorxiv-genomic.bsky.social 🧬 🖥️
www.biorxiv.org/content/10.1...
So what's the equivalent of `perf record && perf report` on a MacBook?
I want to see the generated assembly and which lines are hot.
Ca n'est pas si souvent, un article publié dans Nature met ma communauté à l'honneur (la bioinformatique des séquences). Je vous raconte ?
www.nature.com/articles/d41...
"OpenZL is our answer to the tension between the performance of format-specific compressors and the maintenance simplicity of a single executable binary."
engineering.fb.com/2025/10/06/d...
We are alarmed by reports that Germany is on the verge of a catastrophic about-face, reversing its longstanding and principled opposition to the EU’s Chat Control proposal which, if passed, could spell the end of the right to privacy in Europe. signal.org/blog/pdfs/ge...
03.10.2025 16:14 — 👍 4000 🔁 2430 💬 41 📌 145#RECOMB2026 is now accepting submissions and we'd love to see your best work!
📌 Abstract registration: Nov 7, 2025
📌 Full paper submission: Nov 14, 2025
📜 More info: recomb.org/recomb2026/call_for_papers.html
🦒Long read giraffe is out!🦒
Mapping long reads to pangenome graphs is ~10x faster than with GraphAligner, with veeery slightly better mapping accuracy, short variant calling, and SV genotyping than GraphAligner or Minimap2
Alice: fast and haplotype-aware assembly of high-fidelity reads based on MSR sketching https://www.biorxiv.org/content/10.1101/2025.09.29.679204v1
01.10.2025 01:47 — 👍 7 🔁 6 💬 0 📌 0Looking for people to test the latest version of simd-sketch.
It's now 2x as fast at sketching, and supports skipping over kmers containing N and other ambiguous bases (which is only ~35% slower).
'cargo install simd-sketch' is right there under your fingertips ;)
github.com/RagnarGrootK...
There are millions of openly available microbial genomes, but searching them can be slow.
Until now 🥁
Introducing LexicMap, a new alignment tool that lets scientists search these data in minutes, helping track antibiotic resistance, trace outbreaks, and more.
www.ebi.ac.uk/about/news/r...
🦠
Vous pouvez soutenir ma proposition à la Cour des Comptes d'examiner les marchés publics de voyagistes, notamment dans l'ESR :
participationcitoyenne.ccomptes.fr/processes/co...
#RECOMB2026 will be in Thessaloniki, Greece on May 26-29, 2026. Satellites on May 24-25. Save the date!
Το συνέδριο #RECOMB2026 θα πραγματοποιηθεί στη Θεσσαλονίκη, στις 26-29 Μαΐου 2026. Οι δορυφορικές εκδηλώσεις θα διεξαχθούν στις 24-25 Μαΐου 2026. Σημειώστε την ημερομηνία!
Delighted to finally announce a preprint describing the Q100 project! “A complete diploid human genome benchmark for personalized genomics” For which we finished HG002 to near-perfect accuracy: www.biorxiv.org/content/10.1... 🧵[1/14]
22.09.2025 17:01 — 👍 96 🔁 57 💬 4 📌 4Depends on how full it is I guess, negative queries are fast when you don't need probing
22.09.2025 11:07 — 👍 1 🔁 0 💬 1 📌 0Critical part of the President's new $100,000 charge for H1-B visas: The Administration can also offer a $100,000 discount to any person, company, or industry that it wants. Replacing rules with arbitrary discretion.
Want visas? You know who to call and who to flatter.
Minimap2 is very much the hammer in
"When all you have is a hammer, everything looks like a nail."
Blogged about how zstd --long fills the gap between fast and slow-but-high-ratio genome compression methods log.bede.im/2025/09/12/z...
12.09.2025 15:07 — 👍 18 🔁 9 💬 0 📌 3Sometimes you meet absolutely incredible bioinfo-magicians.
It was a huge privilege when @shenwei356.bsky.social
joined our group for a year on an @embl.org sabbatical.
While here, he developed a new way of aligning to
millions of bacteria, called LexicMap 1/n
www.nature.com/articles/s41...
Preprint out for myloasm, our new nanopore / HiFi metagenome assembler!
Nanopore's getting accurate, but
1. Can this lead to better metagenome assemblies?
2. How, algorithmically, to leverage them?
with co-author Max Marin @mgmarin.bsky.social, supervised by Heng Li @lh3lh3.bsky.social
1 / N
A wonderful day for @bonsaiseqbioinfo.bsky.social, with @camillemrcht.bsky.social's and @npmalfoy.bsky.social's HDR defenses.
Congrats for the amazing work!
It's a great chance for the team to have you both!
We are glad to announce that the next workshop “Data Structures in Bioinformatics” (DSB 2026) will take place in Venice, Italy, on *February 18-19*, 2026. dsb-meeting.github.io/DSB2026/ Book the dates! #DSB26
01.09.2025 18:10 — 👍 14 🔁 8 💬 1 📌 0📣 Deacon 0.8.0 available on Bioconda
- Much faster search and depletion through improved work distribution on multicore systems. My fastq.gz benchmark now runs at 400Mbp/s on Apple M1.
- Dual default match thresholds for greater accuracy
Details: github.com/bede/deacon/...
1k downloads! 🐥
A grid of plots compaing the performance of a binary heap, 8-ary heap, 4-ary heap, quickheap, and radix heap, all taken from various rust crates. The top 4 plots are for u32 values, the bottom 4 for u64 values. Each column is a different type of input. The left is heapsort: insert n times then pop n times. The second column does groups of push followed by 4x (pop, push), and then the reverse. The last two columns have linearly/randomly increasing data. On the first two data sets, the quickheap is by far the best, showing my less degradation of cache misses than the d-ary heaps.
So, the heap I invented over the weekend was introduced as the quickheap by Navarro and Paredes around 2006!
Basically: on each pop, do just enough quicksort to find the smallest element.
My implementation (the 1st??) is 2x to 4x faster than d-ary and binary heaps.
curiouscoding.nl/posts/quickh...
There was a workshop on 25 years of the FM-index and the CSA after SEA. I would have liked to attend, but I had other commitments. The invited speakers were Giovanni Manzini and Roberto Grossi, as the other purpose of the workshop was to present them Festschrifts for their 60th birthdays. 1/6
08.08.2025 09:49 — 👍 9 🔁 5 💬 1 📌 0With 3 threads, the middle thread processes the reads starting in the middle third of the fasta file.
Little writeup on the speed of fasta parsers, at last.
Basically: both needletail and paraseq are process input linearly, and thus have a limit around 4 GB/s.
By giving each thread its own slice of the input file, we're limited by RAM bandwidth instead :)
curiouscoding.nl/posts/fasta-...
🧬🖥️ I am strongly of the opinion that bioinformatics needs to move away entirely from text-based and "loosely" structured file formats for essentially any type of data. File formats should be binary-first, and designed for *correct* and *efficient* machine parsing 1/3
05.08.2025 16:39 — 👍 72 🔁 16 💬 15 📌 0Congrats! Is it publicly available?
04.08.2025 13:31 — 👍 1 🔁 0 💬 1 📌 0