π Excited to share that the last paper of my PhD is now published in PRX Life!
We introduce RAG-ESM, a retrieval-augmented framework that makes pretrained protein language models (like ESM2) homology-aware with minimal training cost.
π Paper: journals.aps.org/prxlife/abst...
21.08.2025 16:13 β π 7 π 2 π¬ 0 π 0
Protein-protein interactions studied by @cyrilmalbranke.bsky.social #PragueBioML @elixircz.bsky.social
22.08.2025 10:57 β π 10 π 1 π¬ 0 π 0
[8/8] π» Resources:
β’ Training dataset
β’ 4 pre-trained models (XS β L)
β’ Code & interactive notebooks
π huggingface.co/collections/...
π github.com/Bitbol-Lab/P...
21.08.2025 13:55 β π 3 π 0 π¬ 1 π 0
[7/8] π In conclusion, results show strong performances across species and benchmarks for both PPI prediction and gene essentiality. ProteomeLM makes proteome-wide analysis more practical, easing large-scale studies, including in complex eukaryotic proteomes.
21.08.2025 13:55 β π 2 π 0 π¬ 1 π 0
Gene essentiality, showing performance outperrforms ESM-C, and that the prediction are good on E.coli, S. cerevisae and minimal cells
[6/8] π― Beyond PPIs: ProteomeLM predicts gene essentiality across diverse taxa (e.g. E. coli, yeast, minimal cells), highlighting its potential for broad downstream applications.
21.08.2025 13:55 β π 2 π 0 π¬ 1 π 0
Barplot showing speed improvement over classical DCA methods
Number of predictions in function of recall to show performance leap from classical DCA methods to ProteomeLM on human interactome (0.73 -> 0.826 AUROC)
Performance on the D-SCRIPT dataset on four organisms for supervised PPI
[5/8] β‘ This allows unsupervised and supervised PPI prediction at proteome scale in minutes, several orders of magnitude faster than coevolution-based methods such as DCA.
Try it here: github.com/Bitbol-Lab/P...
21.08.2025 13:55 β π 2 π 0 π¬ 1 π 0
Heatmap plot showing that ProteomeLM attention heads can distinguish interacting vs non interacting pairs in E.coli, S. cerevisiae, H. sapiens
[4/8] π― Key finding: Attention heads spontaneously encode proteinβprotein interaction networks. Some heads can reach an AUC of 0.92 in discriminating interacting vs non-interacting pairs.
21.08.2025 13:55 β π 2 π 0 π¬ 1 π 0
[3/8] 𧬠Encoding strategy: Instead of positional encoding, ProteomeLM introduces a functional encoding based on orthologous groups. Thus the model can leverage functional encoding and other proteins. This is especially important in eukaryotes, where gene order is less conserved.
21.08.2025 13:55 β π 2 π 0 π¬ 1 π 0
Figure 1. ProteomeLM Architecture
[2/8] 𧬠Training objective: ProteomeLM uses a custom masked language modeling task, predicting masked ESM-C representations of proteins within the proteome.
21.08.2025 13:55 β π 2 π 0 π¬ 1 π 0
ProteomeLM: A proteome-scale language model allowing fast prediction of protein-protein interactions and gene essentiality across taxa
Language models starting from biological sequence data are advancing many inference problems, both at the scale of single proteins, and at the scale of genomic neighborhoods. In this paper, we introduce ProteomeLM, a transformer-based language model that reasons on entire proteomes from species spanning the tree of life. Leveraging protein language model embeddings, ProteomeLM is trained to reconstruct masked protein embeddings using the whole proteomic context. It thus learns contextualized protein representations reflecting proteome-scale functional constraints. We show that ProteomeLM spontaneously captures protein-protein interactions (PPI) in its attention coefficients. We demonstrate that it screens whole interactomes orders of magnitude faster than amino-acid coevolution-based methods, and substantially outperforms them. We further develop ProteomeLM-PPI, a supervised PPI prediction network that combines ProteomeLM embeddings and attention coefficients, and achieves state-of-the-art performance across species and benchmarks. Finally, we introduce ProteomeLM-Ess, a supervised predictor of gene essentiality that generalizes across diverse taxa. Our results highlight the power of proteome-scale language models for addressing function and interactions at the organism level. ### Competing Interest Statement The authors have declared no competing interest. European Research Council, https://ror.org/0472cxd90, 851173
[1/8] π New preprint! With Gionata Paolo Zalaffi & Anne-Florence Bitbol, we introduce ProteomeLM, a transformer that processes entire proteomes (prokaryotes and eukaryotes), enabling ultra-fast proteinβprotein interaction (PPI) prediction across the tree of life.
π www.biorxiv.org/content/10.1...
21.08.2025 13:55 β π 17 π 3 π¬ 1 π 1
Machine Learning researcher at @Xaira_Thera (former @CambridgeEllis and @OxCSML) opinions expressed are my own.
Applied ML Research @mila-quebec.bsky.social, currently focused on VLMs and climate. Alumnus of @polymtl.bsky.social. Travel, Photography & Amateur Music Composer. https://alzaia.github.io/ He/him. π«π·π¨π¦
AI and Physics to address complex challenges within the biological sciences. π§¬π»
manuroag.github.io
M.Sc. in Statistical Physics (open to PhD positions)
Postdoctoral Researcher at the Max Planck Institute for Security and Privacy. Interested in RL, Graph Neural Networks, AI for Science, and Multi-Agent Systems. π§π·
Biologist designing proteins
Postdoc at Fraunhofer IIP in Munich/Penzberg
https://moritzertelt.github.io/
PhD student at CTU Prague working on machine learning for molecule discovery https://anton-bushuiev.github.io
π https://roman-bushuiev.github.io/
Predoctoral Research Assistant at the Llorca lab @cniostopcancer.bsky.social working on mTORC1 signaling and cryo-EM βοΈπ¬. Intrigued by the structural mechanisms of protein assemblies involved in endomembrane signaling and trafficking π§ͺ.
Postdoctoral Research in Integrative Transcriptomics @University of TΓΌbingen | passionate about bacteriophages, RNA biochemistry, bioinformatics & machine learning
Computational microbiologist. Senior scientist at @cemess.bsky.social, @univie.ac.at.
Microbial ecology, mostly of nitrogen cycle microbes, and data driven physiology.
Maintainer of the GlobDB genome database https://globdb.org
Tenured Scientist in Synthetic Biology
(INRAE, Paris Area, France)
#CellFree β¦ #SynBio β¦ #Bioengineering β¦ #Microbiology
https://sites.google.com/view/olivierborkowski
Structural Microbiologist studies molecular details of C. difficile at Newcastle University, UK. (she/her) Interested in science, politics, theatre, and so much more. Views my own. https://salgadolab.org/
RE at Instadeep, PhD in computational neuroscience, MSc in CS, interested in ML for life sciences.
Biologist, Explorer. Group Leader @ibecbarcelona.eu⬠Protein Phase Transitions.
Phylogenetics 𧬠, infectious disease dynamics π¦ & modelling π» #IDSky #IDModelling
Postdoctoral fellow @ETH ZΓΌrich (in Basel)
Previously @Cambridge_Uni and @institutpasteur
Director, Environmental Bioinformatics Group at SIB Swiss Institute of Bioinformatics @sib.swiss. Chair, European Reference Genome Atlas (ERGA, @ergabiodiv.bsky.social). #biodiversity #genomics www.rmwaterhouse.org
Regulatory genomics, machine learning, networks, systems biology, evolution
Chief AI Officer @ UHN; Assistant Prof. @ U of Toronto; CIFAR AI Chair @ Vector Institute; AI & Biology
Book: https://thecon.ai
Web: https://faculty.washington.edu/ebender
My lab works on genome evolution, using yeast genetics and genomics. Chair, Department of Genome Sciences at University of Washington.
Lab website: https://depts.washington.edu/dunhamlab/