Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ's Avatar

Martin Steinegger ๐Ÿ‡บ๐Ÿ‡ฆ

@martinsteinegger.bsky.social

Developing data intensive computational methods โ€ข PI @ Seoul National University ๐Ÿ‡ฐ๐Ÿ‡ท โ€ข #FirstGen โ€ข he/him โ€ข Hauptschรผler

4,147 Followers  |  520 Following  |  189 Posts  |  Joined: 30.06.2023  |  2.2183

Latest posts by martinsteinegger.bsky.social on Bluesky

Looking very much forward to it!

30.09.2025 02:20 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

#RECOMB2026 will be in Thessaloniki, Greece on May 26-29, 2026. Satellites on May 24-25. Save the date!

ฮคฮฟ ฯƒฯ…ฮฝฮญฮดฯฮนฮฟ #RECOMB2026 ฮธฮฑ ฯ€ฯฮฑฮณฮผฮฑฯ„ฮฟฯ€ฮฟฮนฮทฮธฮตฮฏ ฯƒฯ„ฮท ฮ˜ฮตฯƒฯƒฮฑฮปฮฟฮฝฮฏฮบฮท, ฯƒฯ„ฮนฯ‚ 26-29 ฮœฮฑฮฮฟฯ… 2026. ฮŸฮน ฮดฮฟฯฯ…ฯ†ฮฟฯฮนฮบฮญฯ‚ ฮตฮบฮดฮทฮปฯŽฯƒฮตฮนฯ‚ ฮธฮฑ ฮดฮนฮตฮพฮฑฯ‡ฮธฮฟฯฮฝ ฯƒฯ„ฮนฯ‚ 24-25 ฮœฮฑฮฮฟฯ… 2026. ฮฃฮทฮผฮตฮนฯŽฯƒฯ„ฮต ฯ„ฮทฮฝ ฮทฮผฮตฯฮฟฮผฮทฮฝฮฏฮฑ!

26.09.2025 15:03 โ€” ๐Ÿ‘ 20    ๐Ÿ” 13    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1

Thank you Caroline! :)

24.09.2025 09:11 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

We are excited to share GPN-Star, a cost-effective, biologically grounded genomic language modeling framework that achieves state-of-the-art performance across a wide range of variant effect prediction tasks relevant to human genetics.
www.biorxiv.org/content/10.1...
(1/n)

22.09.2025 05:29 โ€” ๐Ÿ‘ 165    ๐Ÿ” 87    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 5

Exon finding seems very well suited for GPU acceleration. Worth revisiting exonerate. :)
DP remains very powerful and aligns well with AI approaches, whether via scoring schemes or tokenized data (e.g. Foldseek).

21.09.2025 14:34 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Thank you! Yes, it uses DP to compute the maximal ungapped score, followed by a GPU-based Gotohโ€“Smithโ€“Waterman, so no k-mer index is required. The drawback is that you canโ€™t trade sensitivity for speed, but full DP searches against UniProt in milliseconds open up many exciting applications.

21.09.2025 14:05 โ€” ๐Ÿ‘ 5    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Technically yes, but UniProt is highly redundant, so searches against an unclustered database could produce extremely long lists, potentially overwhelming the interface. What's your use-case?

21.09.2025 09:18 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

One of the shared first authors just joined Bsky. Welcome Alex @achancond.bsky.social

21.09.2025 08:24 โ€” ๐Ÿ‘ 6    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

This work was only possible through the great work of Felix Kallenborn, Alejandro Chacon, Christian Hundt, Hassan Sirelkhatim, @kdidi.bsky.social, @sooyoung-cha.bsky.social, @machine.learning.bio, @milot.bsky.social, Bertil Schmidt n/n

21.09.2025 08:06 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

We are currently integrating Grace and Blackwell optimizations and further speeding up the algorithms in MMseqs2-GPU and structure prediction. Below is a sneak peak of our current progress. 5/n
๐Ÿ“„ research.nvidia.com/labs/dbr/ass...

21.09.2025 08:06 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

My first email to Johannes Sรถding, my later PhD advisor, proposed a GPU-accelerated HHblits. But GPUs in 2012 had many limitations. Now they are widely deployed and massive number crunchers. I am happy that together with @unimainz.bsky.social and NVIDIA we were finally able to build MMseqs2-GPU. 4/n

21.09.2025 08:06 โ€” ๐Ÿ‘ 5    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Homology retrieval grounds ML systems to produce reliable predictions. MMseqs2 is already used in Boltz1/2, BioEmu, MSA-Pairformer, Chai-1, BioNeMo, Proteinx, etc. MMseqs2-GPU can enable these and next-gen models to integrate fast homology retrieval for end-to-end GPU inference. 3/n

21.09.2025 08:06 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Video thumbnail

Below we show GPU-accelerated Foldseek, searching 128 structures against AFDB50 (54 million structures). On 128 CPU cores this takes ~120 seconds, whereas a single GPU completes it in ~25 seconds. 2/n

21.09.2025 08:06 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
GPU-accelerated homology search with MMseqs2 - Nature Methods Graphics processing unit-accelerated MMseqs2 offers tremendous speedups for homology retrieval from metagenomic databases, query-centered multiple sequence alignment generation for structure predictio...

MMseqs2-GPU sets new standards in single query search speed, allows near instant search of big databases, scales to multiple GPUs and is fast beyond VRAM. It enables ColabFold MSA generation in seconds and sub-second Foldseek search against AFDB50. 1/n
๐Ÿ“„ www.nature.com/articles/s41...
๐Ÿ’ฟ mmseqs.com

21.09.2025 08:06 โ€” ๐Ÿ‘ 173    ๐Ÿ” 64    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 2
Preview
GPU-accelerated homology search with MMseqs2 - Nature Methods Graphics processing unit-accelerated MMseqs2 offers tremendous speedups for homology retrieval from metagenomic databases, query-centered multiple sequence alignment generation for structure predictio...

GPU-accelerated MMseqs2 offers tremendous speedup for homology retrieval, protein structure prediction with ColabFold, and protein structure search with Foldseek. @martinsteinegger.bsky.social @milot.bsky.social @machine.learning.bio

www.nature.com/articles/s41...

18.09.2025 20:09 โ€” ๐Ÿ‘ 81    ๐Ÿ” 21    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

EcoFoldDB: Protein Structure-Guided Functional Profiling of Ecologically Relevant Microbial Traits at the Metagenome Scale enviromicro-journals.onlinelibrary.wiley.com/doi/10.1111/...

17.09.2025 13:22 โ€” ๐Ÿ‘ 9    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Screen binders using ipSAE
YouTube video by ProteinDesignStudio Screen binders using ipSAE

pip install ipsae
from www.linkedin.com/in/ullah-sam...

www.youtube.com/watch?v=A5ph...
PyPI pypi.org/project/ipsae/
His github fork github.com/ullahsamee/I...
My github github.com/DunbrackLab/...
Paper www.biorxiv.org/content/10.1...
For designed protein binders www.biorxiv.org/content/10.1...

16.09.2025 19:21 โ€” ๐Ÿ‘ 19    ๐Ÿ” 6    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

Preprint:
Highly efficient protein structure prediction on NVIDIA RTX Blackwell and Grace-Hopper
nvda.ws/4n4xzz9

Visit the NVIDIA Digital Biology Labs website to find more information like this:
t.co/R9ufEZrGEA

16.09.2025 13:59 โ€” ๐Ÿ‘ 14    ๐Ÿ” 4    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
De novo discovery of conserved gene clusters in microbial genomes with Spacedust - Nature Methods This work presents Spacedust, a tool for de novo identification of conserved gene clusters from metagenomic data.

Spacedust: a tool for de novo identification of conserved gene clusters from metagenomic data.
@ruoshiz.bsky.social @milot.bsky.social

www.nature.com/articles/s41...

16.09.2025 13:25 โ€” ๐Ÿ‘ 44    ๐Ÿ” 23    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1

hey bluesky ๐Ÿ‘‹ visa hurdles mean Iโ€™m looking for opportunities outside the US. Iโ€™m a computational biologist (bacterial + phage genomics, postdoc in Kooninโ€™s group @ NIH). I am interested in teaming up on funding apps. reach out if this resonates!

15.09.2025 17:26 โ€” ๐Ÿ‘ 70    ๐Ÿ” 91    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 3

Was about to write the same.

06.09.2025 16:11 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Funny, I had the same question on my mind today.

03.09.2025 17:17 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

@pedrobeltrao.bsky.social here you can see the clustering at 90% identity.

03.09.2025 09:53 โ€” ๐Ÿ‘ 5    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Also, this dataset only contains complete ORFs (containing both a start and stop codon). Metagenomic samples, especially Soil are often very harder to assemble, which frequently results in incomplete ORFs.

03.09.2025 09:41 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐ŸŒŽ๐Ÿ‘ฉโ€๐Ÿ”ฌ For 15+ years biology has accumulated petabytes (million gigabytes) of๐ŸงฌDNA sequencing data๐Ÿงฌ from the far reaches of our planet.๐Ÿฆ ๐Ÿ„๐ŸŒต

Logan now democratizes efficient access to the worldโ€™s most comprehensive genetics dataset. Free and open.

doi.org/10.1101/2024...

03.09.2025 08:39 โ€” ๐Ÿ‘ 215    ๐Ÿ” 118    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 16
Video thumbnail

Exciting to see our protein binder design pipeline BindCraft published in its final form in @Nature ! This has been an amazing collaborative effort with Lennart, Christian, @sokrypton.org, Bruno and many other amazing lab members and collaborators.

www.nature.com/articles/s41...

27.08.2025 16:14 โ€” ๐Ÿ‘ 304    ๐Ÿ” 109    ๐Ÿ’ฌ 14    ๐Ÿ“Œ 11
Post image

Does anyone know of a recent comparison of the main structural classification schemes of proteins and guidance on when to choose one? Something like this but including ECOD and perhaps seq-based schemes like Pfam, SUPERFAMILY and CDD.

Img source (2020)
pubmed.ncbi.nlm.nih.gov/32302382/

27.08.2025 02:16 โ€” ๐Ÿ‘ 5    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

#structuralphylogenetics #strphy #3di

22.08.2025 12:20 โ€” ๐Ÿ‘ 21    ๐Ÿ” 10    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Congratulations!

22.08.2025 12:58 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

The ColabFold server has 80 cores (Intel Xeon E7-8891 v2) with 4 TB RAM (using ~1 TB).

19.08.2025 17:20 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@martinsteinegger is following 20 prominent accounts