Igor Martayan's Avatar

Igor Martayan

@imartayan.bsky.social

PhD student in algorithmic bioinformatics at @bonsaiseqbioinfo.bsky.social. Interested in randomized algorithms and space-efficient data structures https://igor.martayan.org

741 Followers  |  305 Following  |  59 Posts  |  Joined: 20.09.2023  |  2.1357

Latest posts by imartayan.bsky.social on Bluesky

Preview
Beyond Smoothed Analysis: Analyzing the Simplex Method by the Book Narrowing the gap between theory and practice is a longstanding goal of the algorithm analysis community. To further progress our understanding of how algorithms work in practice, we propose a new alg...

The simplex algorithm is super efficient. 80 years of experience says it runs in linear time. Nobody can explain _why_ it is so fast.

We invented a new algorithm analysis framework to find out.

27.10.2025 01:43 — 👍 162    🔁 41    💬 5    📌 9

Really exciting that the preprint on Barbell, a new demultiplexer, is finally out!
It's the first tool that builds on Sassy, the approximate-DNA-searching tool that @rickbitloo.bsky.social and myself developed earlier this year, specifically with this application in mind.

23.10.2025 21:28 — 👍 20    🔁 15    💬 2    📌 0
Preview
GitHub - mohsenzakeri/Movi: Fast, Cache-Efficient, and Scalable Queries on Pangenomes Fast, Cache-Efficient, and Scalable Queries on Pangenomes - mohsenzakeri/Movi

1/6 Movi 2 is here: faster and more space-efficient for pangenome queries. Its fastest mode uses half the memory of Movi 1 while running ~30% faster. github.com/mohsenzakeri...

21.10.2025 20:00 — 👍 44    🔁 24    💬 1    📌 2
Post image

Movi 2: Fast and Space-Efficient Queries on Pangenomes. #Pangenomes #SequenceQueries #Genomics #Bioinformatics @biorxiv-genomic.bsky.social 🧬 🖥️
www.biorxiv.org/content/10.1...

21.10.2025 13:49 — 👍 6    🔁 5    💬 0    📌 0

So what's the equivalent of `perf record && perf report` on a MacBook?

I want to see the generated assembly and which lines are hot.

11.10.2025 13:48 — 👍 3    🔁 2    💬 1    📌 1
Preview
‘Google for DNA’ brings order to biology’s big data MetaGraph compresses vast data archives into a search engine for scientists, opening up new frontiers of biological discovery.

Ca n'est pas si souvent, un article publié dans Nature met ma communauté à l'honneur (la bioinformatique des séquences). Je vous raconte ?
www.nature.com/articles/d41...

09.10.2025 15:00 — 👍 25    🔁 14    💬 1    📌 1

"OpenZL is our answer to the tension between the performance of format-specific compressors and the maintenance simplicity of a single executable binary."
engineering.fb.com/2025/10/06/d...

06.10.2025 20:58 — 👍 13    🔁 5    💬 2    📌 0

We are alarmed by reports that Germany is on the verge of a catastrophic about-face, reversing its longstanding and principled opposition to the EU’s Chat Control proposal which, if passed, could spell the end of the right to privacy in Europe. signal.org/blog/pdfs/ge...

03.10.2025 16:14 — 👍 4000    🔁 2430    💬 41    📌 145
RECOMB 2026 | CALL FOR PAPERS Call For Papers

#RECOMB2026 is now accepting submissions and we'd love to see your best work!

📌 Abstract registration: Nov 7, 2025
📌 Full paper submission: Nov 14, 2025

📜 More info: recomb.org/recomb2026/call_for_papers.html

02.10.2025 12:00 — 👍 7    🔁 4    💬 0    📌 0

🦒Long read giraffe is out!🦒
Mapping long reads to pangenome graphs is ~10x faster than with GraphAligner, with veeery slightly better mapping accuracy, short variant calling, and SV genotyping than GraphAligner or Minimap2

02.10.2025 06:28 — 👍 42    🔁 22    💬 1    📌 0

Alice: fast and haplotype-aware assembly of high-fidelity reads based on MSR sketching https://www.biorxiv.org/content/10.1101/2025.09.29.679204v1

01.10.2025 01:47 — 👍 7    🔁 6    💬 0    📌 0
Preview
GitHub - RagnarGrootKoerkamp/simd-sketch: Compute bottom-s sketches and s-buckets sketches, using simd-minimizers crate. Compute bottom-s sketches and s-buckets sketches, using simd-minimizers crate. - RagnarGrootKoerkamp/simd-sketch

Looking for people to test the latest version of simd-sketch.

It's now 2x as fast at sketching, and supports skipping over kmers containing N and other ambiguous bases (which is only ~35% slower).

'cargo install simd-sketch' is right there under your fingertips ;)

github.com/RagnarGrootK...

01.10.2025 14:38 — 👍 12    🔁 4    💬 2    📌 0
Preview
How to rapidly search the world’s microbial DNA By making the world’s microbial DNA easier to explore, LexicMap helps researchers track outbreaks, study antibiotic resistance, and understand microbial diversity.

There are millions of openly available microbial genomes, but searching them can be slow.

Until now 🥁

Introducing LexicMap, a new alignment tool that lets scientists search these data in minutes, helping track antibiotic resistance, trace outbreaks, and more.

www.ebi.ac.uk/about/news/r...
🦠

30.09.2025 09:47 — 👍 41    🔁 16    💬 1    📌 1
Preview
Les marchés publics de voyagistes - Contributions - 2025 - Aidez-nous à enrichir notre programme de travail - Plateforme de participation de la Cour des Comptes Corps de la contributionLes administrations et opérateurs publics, notamment ceux de l'enseignement supérieur et de la recherche, fond appel à des agences de voyage (FCM, TravelPlanet…) pour l'hôtelle...

Vous pouvez soutenir ma proposition à la Cour des Comptes d'examiner les marchés publics de voyagistes, notamment dans l'ESR :
participationcitoyenne.ccomptes.fr/processes/co...

25.09.2025 10:28 — 👍 85    🔁 78    💬 8    📌 4
Post image

#RECOMB2026 will be in Thessaloniki, Greece on May 26-29, 2026. Satellites on May 24-25. Save the date!

Το συνέδριο #RECOMB2026 θα πραγματοποιηθεί στη Θεσσαλονίκη, στις 26-29 Μαΐου 2026. Οι δορυφορικές εκδηλώσεις θα διεξαχθούν στις 24-25 Μαΐου 2026. Σημειώστε την ημερομηνία!

26.09.2025 15:03 — 👍 22    🔁 13    💬 0    📌 1
Preview
A complete diploid human genome benchmark for personalized genomics Human genome resequencing typically involves mapping reads to a reference genome to call variants; however, this approach suffers from both technical and reference biases, leaving many duplicated and ...

Delighted to finally announce a preprint describing the Q100 project! “A complete diploid human genome benchmark for personalized genomics” For which we finished HG002 to near-perfect accuracy: www.biorxiv.org/content/10.1... 🧵[1/14]

22.09.2025 17:01 — 👍 96    🔁 57    💬 4    📌 4

Depends on how full it is I guess, negative queries are fast when you don't need probing

22.09.2025 11:07 — 👍 1    🔁 0    💬 1    📌 0
Post image

Critical part of the President's new $100,000 charge for H1-B visas: The Administration can also offer a $100,000 discount to any person, company, or industry that it wants. Replacing rules with arbitrary discretion.

Want visas? You know who to call and who to flatter.

20.09.2025 13:40 — 👍 12599    🔁 4765    💬 735    📌 660

Minimap2 is very much the hammer in
"When all you have is a hammer, everything looks like a nail."

16.09.2025 20:18 — 👍 12    🔁 1    💬 1    📌 0

Blogged about how zstd --long fills the gap between fast and slow-but-high-ratio genome compression methods log.bede.im/2025/09/12/z...

12.09.2025 15:07 — 👍 18    🔁 9    💬 0    📌 3
Preview
Efficient sequence alignment against millions of prokaryotic genomes with LexicMap - Nature Biotechnology LexicMap uses a fixed set of probes to efficiently query gene sequences for fast and low-memory alignment.

Sometimes you meet absolutely incredible bioinfo-magicians.
It was a huge privilege when @shenwei356.bsky.social
joined our group for a year on an @embl.org sabbatical.
While here, he developed a new way of aligning to
millions of bacteria, called LexicMap 1/n
www.nature.com/articles/s41...

10.09.2025 09:12 — 👍 188    🔁 99    💬 5    📌 4

Preprint out for myloasm, our new nanopore / HiFi metagenome assembler!

Nanopore's getting accurate, but

1. Can this lead to better metagenome assemblies?
2. How, algorithmically, to leverage them?

with co-author Max Marin @mgmarin.bsky.social, supervised by Heng Li @lh3lh3.bsky.social

1 / N

07.09.2025 23:34 — 👍 114    🔁 79    💬 5    📌 5

A wonderful day for @bonsaiseqbioinfo.bsky.social, with @camillemrcht.bsky.social's and @npmalfoy.bsky.social's HDR defenses.
Congrats for the amazing work!

It's a great chance for the team to have you both!

04.09.2025 16:40 — 👍 11    🔁 4    💬 2    📌 2
DSB 2026 Venice - February 18-19 Workshop Data Structures in Bioinformatics

We are glad to announce that the next workshop “Data Structures in Bioinformatics” (DSB 2026) will take place in Venice, Italy, on *February 18-19*, 2026. dsb-meeting.github.io/DSB2026/ Book the dates! #DSB26

01.09.2025 18:10 — 👍 14    🔁 8    💬 1    📌 0
Preview
Release 0.8.0 · bede/deacon Faster filtering on multicore systems through improved work allocation using the Paraseq library (@noamteyssier). Filtering at >1Gbp/s is possible with uncompressed long sequences, and >500Mbp/s is...

📣 Deacon 0.8.0 available on Bioconda
- Much faster search and depletion through improved work distribution on multicore systems. My fastq.gz benchmark now runs at 400Mbp/s on Apple M1.
- Dual default match thresholds for greater accuracy

Details: github.com/bede/deacon/...

1k downloads! 🐥

11.08.2025 18:55 — 👍 15    🔁 6    💬 2    📌 0
A grid of plots compaing the performance of a binary heap, 8-ary heap, 4-ary heap, quickheap, and radix heap, all taken from various rust crates.
The top 4 plots are for u32 values, the bottom 4 for u64 values.
Each column is a different type of input. The left is heapsort: insert n times then pop n times. The second column does groups of push followed by 4x (pop, push), and then the reverse. The last two columns have linearly/randomly increasing data.

On the first two data sets, the quickheap is by far the best, showing my less degradation of cache misses than the d-ary heaps.

A grid of plots compaing the performance of a binary heap, 8-ary heap, 4-ary heap, quickheap, and radix heap, all taken from various rust crates. The top 4 plots are for u32 values, the bottom 4 for u64 values. Each column is a different type of input. The left is heapsort: insert n times then pop n times. The second column does groups of push followed by 4x (pop, push), and then the reverse. The last two columns have linearly/randomly increasing data. On the first two data sets, the quickheap is by far the best, showing my less degradation of cache misses than the d-ary heaps.

So, the heap I invented over the weekend was introduced as the quickheap by Navarro and Paredes around 2006!
Basically: on each pop, do just enough quicksort to find the smallest element.

My implementation (the 1st??) is 2x to 4x faster than d-ary and binary heaps.

curiouscoding.nl/posts/quickh...

11.08.2025 23:32 — 👍 10    🔁 2    💬 2    📌 0
SEA 2025

There was a workshop on 25 years of the FM-index and the CSA after SEA. I would have liked to attend, but I had other commitments. The invited speakers were Giovanni Manzini and Roberto Grossi, as the other purpose of the workshop was to present them Festschrifts for their 60th birthdays. 1/6

08.08.2025 09:49 — 👍 9    🔁 5    💬 1    📌 0
With 3 threads, the middle thread processes the reads starting in the middle third of the fasta file.

With 3 threads, the middle thread processes the reads starting in the middle third of the fasta file.

Little writeup on the speed of fasta parsers, at last.

Basically: both needletail and paraseq are process input linearly, and thus have a limit around 4 GB/s.

By giving each thread its own slice of the input file, we're limited by RAM bandwidth instead :)

curiouscoding.nl/posts/fasta-...

06.08.2025 17:42 — 👍 17    🔁 5    💬 1    📌 0

🧬🖥️ I am strongly of the opinion that bioinformatics needs to move away entirely from text-based and "loosely" structured file formats for essentially any type of data. File formats should be binary-first, and designed for *correct* and *efficient* machine parsing 1/3

05.08.2025 16:39 — 👍 72    🔁 16    💬 15    📌 0

Congrats! Is it publicly available?

04.08.2025 13:31 — 👍 1    🔁 0    💬 1    📌 0

@imartayan is following 20 prominent accounts