Paul Medvedev 's Avatar

Paul Medvedev

@pashadag.bsky.social

Algorithmic Bioinformatics Researcher and Teacher. Posts about research results and educational/mentorship topics (for details, see http://bit.ly/380vX22).

1,831 Followers  |  158 Following  |  80 Posts  |  Joined: 07.09.2023  |  2.2466

Latest posts by pashadag.bsky.social on Bluesky

Preview
Efficient and accurate search in petabase-scale sequence repositories - Nature MetaGraph enables scalable indexing of large sets of DNA, RNA or protein sequences using annotated de Bruijn graphs.

After years of research and continuous refinement, we’re thrilled to share that our paper on the MetaGraph framework β€” enabling Petabase-scale search across sequencing data β€” has been published today in Nature (www.nature.com/articles/s41...)

08.10.2025 20:56 β€” πŸ‘ 23    πŸ” 15    πŸ’¬ 2    πŸ“Œ 2

And it's posted! If you're interested and eligible, please consider applying through the UMD portal: umd.wd1.myworkdayjobs.com/en-US/UMCP/j....

If you're a PI working in algorithmic genomics (& you can recommend my lab to your top graduating students ;P), please let them know!

08.10.2025 16:53 β€” πŸ‘ 17    πŸ” 16    πŸ’¬ 0    πŸ“Œ 3
Preview
Burrows-Wheeler Indexing - YouTube Videos on : (a) the Burrows-Wheeler Transform (BWT), (b) the FM Index, which uses the BWT to construct a full-text index, (c) Wheeler graphs, (d) r-index, an...

I've added 7 videos to my Burrows-Wheeler indexing playlist (www.youtube.com/playlist?lis...), rounding out the r-index series and adding a 5-part series on the move structure. Now 27 videos in that playlist. I aim to add videos on prefix-free parsing, PBWT, Wheeler languages/automata in the future.

07.10.2025 14:17 β€” πŸ‘ 55    πŸ” 16    πŸ’¬ 2    πŸ“Œ 1

Sounds like someone is trying to solve a bidirected flow problem..

07.10.2025 03:15 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

i've let the person in charge know

03.10.2025 19:11 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

There seems to be a self-contradiction within the CFP, since it also says: "Submissions to peer-reviewed journals other than the partnering ones are also allowed.."

03.10.2025 17:53 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0
Preview
Alice: fast and haplotype-aware assembly of high-fidelity reads based on MSR sketching We introduce Mapping-friendly Sequence Reduction (MSR) sketches, a sketching method for high-fidelity (HiFi) long reads, and Alice, an assembler that operates directly on these sketches. MSR produces ...

Our preprint on our new metagenomic HiFi assembler Alice is out πŸ₯³ Based on a *new sketching method* (🧡1/6)
πŸ‘‰ Preprint www.biorxiv.org/content/10.1...
πŸ‘‰ Github github.com/rolandfaure/...

03.10.2025 14:51 β€” πŸ‘ 20    πŸ” 13    πŸ’¬ 2    πŸ“Œ 0

Alice: fast and haplotype-aware assembly of high-fidelity reads based on MSR sketching https://www.biorxiv.org/content/10.1101/2025.09.29.679204v1

01.10.2025 01:47 β€” πŸ‘ 7    πŸ” 6    πŸ’¬ 0    πŸ“Œ 0

It could be. Or it could be that the decision process is not consistent? Hard to tell...

28.09.2025 22:13 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I see. Do you know if the list of papers that are posted there get disseminated somehow through mail lists or social media?

26.09.2025 17:48 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

They do, but they did not accept our paper. From what we understood, it was because it was a review paper and not novel research

26.09.2025 17:46 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

#RECOMB2026 will be in Thessaloniki, Greece on May 26-29, 2026. Satellites on May 24-25. Save the date!

΀ο συνέδριο #RECOMB2026 ΞΈΞ± πραγματοποιηθΡί στη Ξ˜Ξ΅ΟƒΟƒΞ±Ξ»ΞΏΞ½Ξ―ΞΊΞ·, στις 26-29 ΞœΞ±ΞΞΏΟ… 2026. Οι δορυφορικές Ξ΅ΞΊΞ΄Ξ·Ξ»ΟŽΟƒΞ΅ΞΉΟ‚ ΞΈΞ± διΡξαχθούν στις 24-25 ΞœΞ±ΞΞΏΟ… 2026. Ξ£Ξ·ΞΌΞ΅ΞΉΟŽΟƒΟ„Ξ΅ την ημΡρομηνία!

26.09.2025 15:03 β€” πŸ‘ 20    πŸ” 13    πŸ’¬ 0    πŸ“Œ 1

Hi Gaurav, I'm not sure what you mean. (But it sounds like you are asking for a library with all these implemented in one place? That would be quite an undertaking! As these things are always evolving, I'd guess it would also not age well.

25.09.2025 21:25 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I guess that if all one wants is to just have a doi for the pdf, there are various options (zenodo, HAL). But if one is looking to have the title "advertised" broadly (as happens with a biorxiv or arxiv preprint), then that's the hard part

25.09.2025 21:19 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Conway–Bromage–Lyndon (CBL): an exact, dynamic representation of k -mer sets Abstract Summary In this article, we introduce the Conway–Bromage–Lyndon (CBL) structure, a compressed, dynamic and exact method for representing k-mer sets. Originating from Conway and Bromage’s concept, CBL innovatively employs the smallest cyclic rotations of k-mers, akin to Lyndon words, to leverage lexicographic redundancies. In order to support dynamic operations and set operations, we propose a dynamic bit vector structure that draws a parallel with Elias-Fano’s scheme. This structure is encapsulated in a Rust library, demonstrating a balanced blend of construction efficiency, cache locality, and compression. Our findings suggest that CBL outperforms existing dynamic k-mer set methods. Unique to this work, CBL stands out as the only known exact k-mer structure offering in-place set operations. Its different combined abilities position it as a flexible Swiss knife structure for k-mer set management. Availability and implementation https://github.com/imartayan/CBL.

This might be an option, though I'm very confused by how it works. For example, I see a recent paper there:

hal.science/hal-04764426v1

but when I open it, its headers indicate it is a biorxiv preprint. Why is it duplicating it?

In another exmpl,

hal.science/pasteur-0327...

I can't find a pdf

25.09.2025 21:13 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I've seen it used for storing datasets but I haven't seen it used for pre-prints. If you have any examples, let me know!

25.09.2025 21:10 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I hadn't heard of it before, but looking at their webpage: "Effective 8/25/2025, we will be suspending submissions to this generalist server hosted by OSF Preprints."

25.09.2025 21:07 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I can appreciate the perspective of bioRxiv about not taking reviews (arXiv is not transparent about their policy). But in the end of the day, the community needs some way to disseminate pre-print reviews that are not just putting them in a shared dropbox folder :(

25.09.2025 13:25 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

If you're wondering why we're hosting the pre-print via dropbox, its because arXiv (and bioRxiv) did not accept it (because it is a review). Its a bit disconcerting, because a review is precisely the type of paper that would benefit a lot from pre-publication dissemination and feedback.

25.09.2025 13:25 β€” πŸ‘ 12    πŸ” 3    πŸ’¬ 9    πŸ“Œ 0

Thank you folks for your feedback on our survey about Hash functions in genomic sequence analysis. We've updated the paper and you can see the new version here: tinyurl.com/4kk9ccmt.

25.09.2025 13:21 β€” πŸ‘ 8    πŸ” 5    πŸ’¬ 0    πŸ“Œ 1

Are you referring to the randomness of the *location*? If yes, you could plot the distribution of distances between adjacent errors and overlay it with what would be expected under a Poisson model

23.09.2025 13:15 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Figure 1(A) ANI quantifies the similarity between two genomes. ANI can be defined as the number of aligned positions where the two aligned bases are identical, divided by the total number of aligned bases. Historically, ANI was calculated using a single gene family for multiple sequence alignment. Another approach finds orthologous genes between two genomes and reports the average similarity between their CDSs. This method was later extended to whole-genome alignment by identifying local alignments and excluding supplementary alignments with lower similarity. (B) Different ANI tools employ various approaches in calculating ANI values. ANIm, OrthoANI, and FastANI use aligners to identify homologous regions, whereas Mash uses k-mer hashing to estimate similarities. Only alignments with higher similarity represented by green arrows are included in ANI calculations, while red arrows, corresponding to paralogs, are excluded. (C) The proposed benchmarking method evaluates the performance of different tools using both real and simulated data. It assumes that more distantly related species on the phylogenetic tree should have lower ANI similarities. This is measured by calculating the statistics of Spearman rank correlation. We expect a negative correlation between ANI and the tree distance (scatter plot on the right).
https://academic.oup.com/bib/article/doi/10.1093/bib/bbaf267/8160681

Figure 1(A) ANI quantifies the similarity between two genomes. ANI can be defined as the number of aligned positions where the two aligned bases are identical, divided by the total number of aligned bases. Historically, ANI was calculated using a single gene family for multiple sequence alignment. Another approach finds orthologous genes between two genomes and reports the average similarity between their CDSs. This method was later extended to whole-genome alignment by identifying local alignments and excluding supplementary alignments with lower similarity. (B) Different ANI tools employ various approaches in calculating ANI values. ANIm, OrthoANI, and FastANI use aligners to identify homologous regions, whereas Mash uses k-mer hashing to estimate similarities. Only alignments with higher similarity represented by green arrows are included in ANI calculations, while red arrows, corresponding to paralogs, are excluded. (C) The proposed benchmarking method evaluates the performance of different tools using both real and simulated data. It assumes that more distantly related species on the phylogenetic tree should have lower ANI similarities. This is measured by calculating the statistics of Spearman rank correlation. We expect a negative correlation between ANI and the tree distance (scatter plot on the right). https://academic.oup.com/bib/article/doi/10.1093/bib/bbaf267/8160681

Excited to share our EvANI benchmarking workflow, published in Briefings in Bioinformatics doi.org/10.1093/bib/...
Computing average nucleotide identity (ANI) is neither conceptually nor computationally trivial. Its definition has evolved over years, with different meanings and assumptions (1/5)

21.09.2025 15:26 β€” πŸ‘ 28    πŸ” 12    πŸ’¬ 1    πŸ“Œ 0

Preprint out for myloasm, our new nanopore / HiFi metagenome assembler!

Nanopore's getting accurate, but

1. Can this lead to better metagenome assemblies?
2. How, algorithmically, to leverage them?

with co-author Max Marin @mgmarin.bsky.social, supervised by Heng Li @lh3lh3.bsky.social

1 / N

07.09.2025 23:34 β€” πŸ‘ 110    πŸ” 76    πŸ’¬ 5    πŸ“Œ 5

congratulations to you both

07.09.2025 12:18 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸŒŽπŸ‘©β€πŸ”¬ For 15+ years biology has accumulated petabytes (million gigabytes) of🧬DNA sequencing data🧬 from the far reaches of our planet.πŸ¦ πŸ„πŸŒ΅

Logan now democratizes efficient access to the world’s most comprehensive genetics dataset. Free and open.

doi.org/10.1101/2024...

03.09.2025 08:39 β€” πŸ‘ 215    πŸ” 118    πŸ’¬ 3    πŸ“Œ 16
Preview
Complex genetic variation in nearly complete human genomes - Nature Using sequencing and haplotype-resolved assembly of 65 diverse human genomes, complex regions including the major histocompatibility complex and centromeres are analysed.

Two papers in today's issue of @nature.com ‬: 1) we assemble 65 genomes to near completion, including centromeres and the MHC. tinyurl.com/3huhax6w. 2) we sequence 1,019 genomes from the 1kGP with long reads, revealing SVs down to low allele frequencies tinyurl.com/wbx3we9x.

23.07.2025 15:12 β€” πŸ‘ 55    πŸ” 24    πŸ’¬ 1    πŸ“Œ 2
Preview
FAMSA2 enables accurate multiple sequence alignment at protein-universe scale We introduce FAMSA2, an algorithm that produces high-accuracy multiple protein sequence alignments with unprecedented speed. Across structural, phylogenetic, and functional benchmarks, FAMSA2 matches ...

Interested in a tool that aligns millions of proteins in minutes with quality similar to or better than the state-of-the-art utilities? Please take a look at our FAMSA2 paper: www.biorxiv.org/content/10.1...
and GH repo: github.com/refresh-bio/...

19.07.2025 21:28 β€” πŸ‘ 49    πŸ” 28    πŸ’¬ 3    πŸ“Œ 0

Sassy: Searching Short DNA Strings in the 2020s https://www.biorxiv.org/content/10.1101/2025.07.22.666207v1

26.07.2025 18:46 β€” πŸ‘ 7    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Post image Post image

Congratulations to Rayan Chiki, (Institut Pasteur) head of the β€œSequence Bioinformatics” unit, for securing the ERC Proof of Concept 2025 for his project ENZYMINER! πŸ‘

β€ͺ@rayan.chiki.bsky.social

#Bioinformatics

24.07.2025 15:10 β€” πŸ‘ 60    πŸ” 13    πŸ’¬ 4    πŸ“Œ 2

After Tim Hunt won the Nobel, he said, "We do science because we like discovering things about the world...and then boasting about what we found".

Any one individual can argue about their own motivation, but it would naive to dispute that's an accurate description of many people.

11.07.2025 13:44 β€” πŸ‘ 15    πŸ” 4    πŸ’¬ 2    πŸ“Œ 0

@pashadag is following 20 prominent accounts