Good Friday Evening news: we updated back_to_sequences
(find the origin of kmers)
- Faster
- Can consider multiline fasta files
- Much easier installation: see github.com/pierrepeterl...
@pierrepeterlongo.bsky.social
Inria Senior researcher. Head of the https://team.inria.fr/genscale/ at Inria and Irisa. Algorithmics for sequencing data analyses, genomics and metagenomics.
Good Friday Evening news: we updated back_to_sequences
(find the origin of kmers)
- Faster
- Can consider multiline fasta files
- Much easier installation: see github.com/pierrepeterl...
The Metagraph paper is out in Nature; it showed up in my feeds today! Congratulations to Mikhail Karasikov, @gxxxr.bsky.social, @akkah21.bsky.social and all of the other authors (whom I'd love to follow on Bluesky if I can find you ;P) www.nature.com/articles/s41...
09.10.2025 14:40 β π 36 π 15 π¬ 1 π 0Preprint out for myloasm, our new nanopore / HiFi metagenome assembler!
Nanopore's getting accurate, but
1. Can this lead to better metagenome assemblies?
2. How, algorithmically, to leverage them?
with co-author Max Marin @mgmarin.bsky.social, supervised by Heng Li @lh3lh3.bsky.social
1 / N
β I clearly consider this result as THE most important result achieved over this last decade for exploiting and democratizing genomic data.
I think there will be a "before" and an "after" logan and logan-search
github.com/IndexThePlan...
logan-search.org
Have a look at this thread
π€ Amazing collaboration with @jermp.bsky.social, @yhhshb.bsky.social, @robp.bsky.social, Victor Levallois, and Bertrand Le Gal, and the help of βͺ@yoann.bsky.socialβ¬. 8/8
27.05.2025 12:06 β π 3 π 0 π¬ 0 π 0π On metagenomic data, other tools such as kmindex are good alternatives. At the same time, Kaminari consistently ranks as one of the fastest tools across all data types, generating the smallest indexes (or the lower FPR). 7/8
27.05.2025 12:06 β π 1 π 0 π¬ 1 π 0πΎ For fixed False Positive rates, it uses up to 37x less space than COBS while being an order of magnitude faster to build and query. 6/8
27.05.2025 12:06 β π 2 π 0 π¬ 1 π 0π Experimental results show Kaminari's superiority in index size and query performance across various genomic datasets. 5/8
27.05.2025 12:06 β π 1 π 0 π¬ 1 π 0𧬠Kaminari's design leverages properties of k-mer minimizers for compact space and fast query time, as inspired by the techniques proposed in Fulgor. 4/8
27.05.2025 12:06 β π 1 π 0 π¬ 1 π 0π» We implemented Kaminari in C++17, available under the MIT license at github.com/yhhshb/kaminari. Additional results and reproducibility info at github.com/vicLeva/benchmarks_kaminari. 3/8
27.05.2025 12:06 β π 1 π 0 π¬ 1 π 0π Key findings include:
- Use of minimizers and integer compression for indexing.
- Lower memory footprint and faster query times.
- Minimal impact of false positives on result ranking, using the Rank-Biased Overlap (RBO) metric.
2/8
π Excited to share insights from our recent paper: "Kaminari: a resource-frugal index for approximate colored k-mer queries". The study aims to efficiently identify documents containing a query string, focusing on DNA strings. www.biorxiv.org/content/10.1... 𧬠π₯οΈ 1/8
27.05.2025 12:06 β π 25 π 16 π¬ 1 π 1Thanks guys for your precious feedback. I modified the code accordingly.
25.03.2025 14:15 β π 1 π 0 π¬ 0 π 0Hi @imartayan.bsky.social I wanted to run distinct-kmers, but I faced limitations as my input data contains non-ACGTacgt characters. Thus I created this github.com/pierrepeterl...
(again extremely simple)
That's correct.
I just created this github.com/pierrepeterl... This is yet a new hll kmer counter, but hyper simple. And I did not find a way to accumulate the kmer counts for several input datasets.
@imartayan.bsky.social I needed a version of distinct_kmers for multiple fasta/fastq.
I created this fork github.com/pierrepeterl...
I'm almost ashamed that this code modification is public, but maybe it can be useful.
I added the notion of insertion order (mentioning your name). However, I don't get the point of the mergeability issue.
20.03.2025 10:50 β π 0 π 0 π¬ 1 π 0Note that the "conservative update" is also something we implemented (without describing it) in fimpera github.com/lrobidou/fim...
20.03.2025 07:59 β π 1 π 0 π¬ 1 π 0Thanks again for this pointer @benlangmead.bsky.social. What I described is the same idea, adapted when items are added on the fly, without their final abundance.
The technique in the "conservative update" is adapted when items are added simultaneously with their abundance.
HO! amazing results. The difference between you and a rust beginner.
You'll try to understand your code.
Thanks Ben - I'll at this.
18.03.2025 18:17 β π 2 π 0 π¬ 1 π 0Results: slightly longer insertion time, but 2 to 3 times lower abundance overestimations.
18.03.2025 16:31 β π 2 π 0 π¬ 1 π 0In two words: increase only minimal stored values of a cBF when adding elements to this filter.
18.03.2025 16:31 β π 1 π 0 π¬ 1 π 0Maybe the simplest idea to decrease overestimations of a counting bloom filter. A trivial observation + 10 lines of code.
I'm surprised it has not been described before. Please comment if this is not the case.
Blog post here:
pierrepeterlongo.github.io/2025/03/17/m... π§ͺπ§¬π₯οΈ
Yes ntCard helps a lot and its precision is impressive on reads. Indeed I wanted exact number on genome.
30.01.2025 18:40 β π 1 π 0 π¬ 0 π 0I wanted something that used as little memory as possible. I don't want to count kmers, but only know the number of unique kmers. So jellyfish, KMC, ... are too advanced for this simple task.
30.01.2025 17:08 β π 3 π 0 π¬ 0 π 0Today I wanted to know the number of unique 27-mers in the hg38 human genome (spoiler there are 2.49 billion). I found no tool for doing this. So I wrote that github.com/pierrepeterl...
It may help.
Please use it / improve it.
π§¬π» #bioinformatics
We are back in the Town Theatre for a great lecture on Alignment, by @rayanchikhi.bsky.social! π§¬π» #evomics2025 #genomics #bioinformatics
08.01.2025 09:45 β π 22 π 7 π¬ 1 π 0bsky.app/profile/pier...
Applications for this position are still open. If you're passionate about large-scale science, we'd love to hear from you.
𧬠& π₯οΈ
π¨π¨π¨
We are hiring
π¨π¨π¨
After the creation of logan-search (see: bsky.app/profile/pier...) we propose a 2-years engineer position for continuing the development and optimizations.
With @rayanchikhi.bsky.social and @tlemane.bsky.social
Details + applications: recrutement.inria.fr/public/class...