Pierre Peterlongo @pierrepeterlongo

Good Friday Evening news: we updated back_to_sequences
(find the origin of kmers)

- Faster
- Can consider multiline fasta files
- Much easier installation: see github.com/pierrepeterl...

21.11.2025 17:03 — 👍 4 🔁 1 💬 0 📌 0

Efficient and accurate search in petabase-scale sequence repositories - Nature MetaGraph enables scalable indexing of large sets of DNA, RNA or protein sequences using annotated de Bruijn graphs.

The Metagraph paper is out in Nature; it showed up in my feeds today! Congratulations to Mikhail Karasikov, @gxxxr.bsky.social, @akkah21.bsky.social and all of the other authors (whom I'd love to follow on Bluesky if I can find you ;P) www.nature.com/articles/s41...

09.10.2025 14:40 — 👍 36 🔁 15 💬 1 📌 0

Preprint out for myloasm, our new nanopore / HiFi metagenome assembler!

Nanopore's getting accurate, but

1. Can this lead to better metagenome assemblies?
2. How, algorithmically, to leverage them?

with co-author Max Marin @mgmarin.bsky.social, supervised by Heng Li @lh3lh3.bsky.social

1 / N

07.09.2025 23:34 — 👍 114 🔁 80 💬 5 📌 5

❗ I clearly consider this result as THE most important result achieved over this last decade for exploiting and democratizing genomic data.
I think there will be a "before" and an "after" logan and logan-search
github.com/IndexThePlan...
logan-search.org
Have a look at this thread

04.09.2025 12:35 — 👍 9 🔁 3 💬 0 📌 0

🤝 Amazing collaboration with @jermp.bsky.social, @yhhshb.bsky.social, @robp.bsky.social, Victor Levallois, and Bertrand Le Gal, and the help of ‪@yoann.bsky.social‬. 8/8

27.05.2025 12:06 — 👍 3 🔁 0 💬 0 📌 0

🌊 On metagenomic data, other tools such as kmindex are good alternatives. At the same time, Kaminari consistently ranks as one of the fastest tools across all data types, generating the smallest indexes (or the lower FPR). 7/8

27.05.2025 12:06 — 👍 1 🔁 0 💬 1 📌 0

💾 For fixed False Positive rates, it uses up to 37x less space than COBS while being an order of magnitude faster to build and query. 6/8

27.05.2025 12:06 — 👍 2 🔁 0 💬 1 📌 0

📊 Experimental results show Kaminari's superiority in index size and query performance across various genomic datasets. 5/8

27.05.2025 12:06 — 👍 1 🔁 0 💬 1 📌 0

🧬 Kaminari's design leverages properties of k-mer minimizers for compact space and fast query time, as inspired by the techniques proposed in Fulgor. 4/8

27.05.2025 12:06 — 👍 1 🔁 0 💬 1 📌 0

GitHub - yhhshb/kaminari: 雷 - kaminari (thunder/lightning) 雷 - kaminari (thunder/lightning). Contribute to yhhshb/kaminari development by creating an account on GitHub.

💻 We implemented Kaminari in C++17, available under the MIT license at github.com/yhhshb/kaminari. Additional results and reproducibility info at github.com/vicLeva/benchmarks_kaminari. 3/8

27.05.2025 12:06 — 👍 1 🔁 0 💬 1 📌 0

🔍 Key findings include:
- Use of minimizers and integer compression for indexing.
- Lower memory footprint and faster query times.
- Minimal impact of false positives on result ranking, using the Rank-Biased Overlap (RBO) metric.
2/8

27.05.2025 12:06 — 👍 2 🔁 0 💬 1 📌 0

📜 Excited to share insights from our recent paper: "Kaminari: a resource-frugal index for approximate colored k-mer queries". The study aims to efficiently identify documents containing a query string, focusing on DNA strings. www.biorxiv.org/content/10.1... 🧬 🖥️ 1/8

27.05.2025 12:06 — 👍 25 🔁 16 💬 1 📌 1

Thanks guys for your precious feedback. I modified the code accordingly.

25.03.2025 14:15 — 👍 1 🔁 0 💬 0 📌 0

GitHub - pierrepeterlongo/hyperloglog_kmer_counter Contribute to pierrepeterlongo/hyperloglog_kmer_counter development by creating an account on GitHub.

Hi @imartayan.bsky.social I wanted to run distinct-kmers, but I faced limitations as my input data contains non-ACGTacgt characters. Thus I created this github.com/pierrepeterl...
(again extremely simple)

25.03.2025 11:09 — 👍 2 🔁 0 💬 1 📌 0

GitHub - pierrepeterlongo/hyperloglog_kmer_counter Contribute to pierrepeterlongo/hyperloglog_kmer_counter development by creating an account on GitHub.

That's correct.
I just created this github.com/pierrepeterl... This is yet a new hll kmer counter, but hyper simple. And I did not find a way to accumulate the kmer counts for several input datasets.

25.03.2025 11:08 — 👍 0 🔁 0 💬 1 📌 0

GitHub - pierrepeterlongo/distinct-kmers: How many distinct k-mers are there in a sequence? How many distinct k-mers are there in a sequence? Contribute to pierrepeterlongo/distinct-kmers development by creating an account on GitHub.

@imartayan.bsky.social I needed a version of distinct_kmers for multiple fasta/fastq.
I created this fork github.com/pierrepeterl...
I'm almost ashamed that this code modification is public, but maybe it can be useful.

24.03.2025 17:50 — 👍 1 🔁 0 💬 1 📌 0

I added the notion of insertion order (mentioning your name). However, I don't get the point of the mergeability issue.

20.03.2025 10:50 — 👍 0 🔁 0 💬 1 📌 0

Note that the "conservative update" is also something we implemented (without describing it) in fimpera github.com/lrobidou/fim...

20.03.2025 07:59 — 👍 1 🔁 0 💬 1 📌 0

Thanks again for this pointer @benlangmead.bsky.social. What I described is the same idea, adapted when items are added on the fly, without their final abundance.
The technique in the "conservative update" is adapted when items are added simultaneously with their abundance.

20.03.2025 07:59 — 👍 1 🔁 0 💬 1 📌 0

HO! amazing results. The difference between you and a rust beginner.
You'll try to understand your code.

18.03.2025 18:28 — 👍 1 🔁 0 💬 0 📌 0

Thanks Ben - I'll at this.

18.03.2025 18:17 — 👍 2 🔁 0 💬 1 📌 0

Results: slightly longer insertion time, but 2 to 3 times lower abundance overestimations.

18.03.2025 16:31 — 👍 2 🔁 0 💬 1 📌 0

In two words: increase only minimal stored values of a cBF when adding elements to this filter.

18.03.2025 16:31 — 👍 1 🔁 0 💬 1 📌 0

Maybe the simplest idea to decrease overestimations of a counting bloom filter. A trivial observation + 10 lines of code.
I'm surprised it has not been described before. Please comment if this is not the case.
Blog post here:
pierrepeterlongo.github.io/2025/03/17/m... 🧪🧬🖥️

18.03.2025 16:31 — 👍 7 🔁 2 💬 2 📌 1

Yes ntCard helps a lot and its precision is impressive on reads. Indeed I wanted exact number on genome.

30.01.2025 18:40 — 👍 1 🔁 0 💬 0 📌 0

I wanted something that used as little memory as possible. I don't want to count kmers, but only know the number of unique kmers. So jellyfish, KMC, ... are too advanced for this simple task.

30.01.2025 17:08 — 👍 3 🔁 0 💬 0 📌 0

GitHub - pierrepeterlongo/unique_kmer_counter: Count number of unique kmers from fasta or fasta.gz files Count number of unique kmers from fasta or fasta.gz files - pierrepeterlongo/unique_kmer_counter

Today I wanted to know the number of unique 27-mers in the hg38 human genome (spoiler there are 2.49 billion). I found no tool for doing this. So I wrote that github.com/pierrepeterl...

It may help.
Please use it / improve it.

🧬💻 #bioinformatics

30.01.2025 16:37 — 👍 16 🔁 3 💬 3 📌 1

We are back in the Town Theatre for a great lecture on Alignment, by @rayanchikhi.bsky.social! 🧬💻 #evomics2025 #genomics #bioinformatics

08.01.2025 09:45 — 👍 22 🔁 7 💬 1 📌 0

bsky.app/profile/pier...
Applications for this position are still open. If you're passionate about large-scale science, we'd love to hear from you.
🧬 & 🖥️

08.01.2025 11:27 — 👍 2 🔁 2 💬 0 📌 0

🚨🚨🚨
We are hiring
🚨🚨🚨

After the creation of logan-search (see: bsky.app/profile/pier...) we propose a 2-years engineer position for continuing the development and optimizations.

With @rayanchikhi.bsky.social and @tlemane.bsky.social

Details + applications: recrutement.inria.fr/public/class...

12.12.2024 14:30 — 👍 12 🔁 14 💬 1 📌 2

Pierre Peterlongo

Latest posts by pierrepeterlongo.bsky.social on Bluesky

@pierrepeterlongo is following 20 prominent accounts