Andre Kahles's Avatar

Andre Kahles

@akkah21.bsky.social

30 Followers  |  23 Following  |  11 Posts  |  Joined: 18.12.2023  |  1.7298

Latest posts by akkah21.bsky.social on Bluesky

Hi bioinformatics, genomics and CS friends! Please help me spread the word. I'm hiring a postdoc! Come work on cutting edge method development in algorithmic genomics with me and my group at @umdscience.bsky.social! ๐Ÿ–ฅ๏ธ๐Ÿงฌ

10.10.2025 13:02 โ€” ๐Ÿ‘ 29    ๐Ÿ” 37    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 3

Thanks Rob! Much appreciated.

09.10.2025 15:06 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
MetaGraph - Biological Sequence Search Petabase-Scale Search for DNA, RNA & Amino acids

We invite you to try out Metagraph at metagraph.ethz.ch, learn more about our framework in the paper (nature.com/articles/s41...) or start building your own indexes from your own data (github.com/ratschlab/me...).

08.10.2025 20:56 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

We would like to thank the bioinformatics community for years of support and openness. A special thanks to the Logan effort, whose contig set we use as input for one of our largest indexes.

08.10.2025 20:56 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

While MetaGraph provides a lossless representation of the input k-mer set, it is not a lossless compression of the raw reads. To reach petabase scale, we remove noisy k-mers prior to indexing โ€” a step that we show has only minimal impact on search sensitivity.

08.10.2025 20:56 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

We show that MetaGraph indexes are both scalable and cost-efficient for querying. We Searching 1 Mbp of sequence against the entire SRA costs less than $1 on standard cloud infrastructure โ€” making Petabase-scale biological data truly searchable and accessible.

08.10.2025 20:56 โ€” ๐Ÿ‘ 5    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Our indexes support fast exact matching as well as alignment with edits. Labels can represent sample metadata, coordinates or quantification values. We can store 10โ€™000 human transcriptome samples in < 160 GB and return position-wise expression for any queried sequence.

08.10.2025 20:56 โ€” ๐Ÿ‘ 5    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

We have already processed more than 10 Petabases of raw sequence data from the SRA and make the compressed indexes publicly available for search (metagraph.ethz.ch), download and cloud-based access.

08.10.2025 20:56 โ€” ๐Ÿ‘ 4    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

At its core, MetaGraph represents all input sequences as labeled, succinct de Bruijn graphs โ€” a highly compressed yet fully searchable structure. Each k-mer carries metadata labels that remain interactively queryable through a flexible API.

08.10.2025 20:56 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Modern biology produces vast amounts of raw sequencing data โ€” genomes, transcriptomes, and protein sequences. MetaGraph provides a unified computational framework to index, query, and reason across this landscape of biological information.

08.10.2025 20:56 โ€” ๐Ÿ‘ 5    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

The following thread describes the main ideas and results of this joint work with @gxxxr.bsky.social @karasikov.bsky.social @adamant-pwn.bsky.social @HarunMustafa416

08.10.2025 20:56 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Efficient and accurate search in petabase-scale sequence repositories - Nature MetaGraph enables scalable indexing of large sets of DNA, RNA or protein sequences using annotated de Bruijn graphs.

After years of research and continuous refinement, weโ€™re thrilled to share that our paper on the MetaGraph framework โ€” enabling Petabase-scale search across sequencing data โ€” has been published today in Nature (www.nature.com/articles/s41...)

08.10.2025 20:56 โ€” ๐Ÿ‘ 28    ๐Ÿ” 17    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 2

@akkah21 is following 20 prominent accounts