Does anyone know whether there's a functioning API to ESMfold?
(api.esmatlas.com/foldSequence... gives me Service Temporarily Unavailable)
30.09.2025 14:11 โ ๐ 3 ๐ 1 ๐ฌ 2 ๐ 0
Fig. 6: Low-N GFP design.
We can use METL for low-N protein design. We trained METL on Rosetta simulations of GFP biophysical attributes and only 64 experimental examples of GFP brightness. It designed fluorescent 5 and 10 mutants, including some with mutants entirely outside training set mutations. 7/
11.09.2025 17:00 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
Fig. 5: Function-specific simulations improve METL pretraining for GB1.
A powerful aspect of pretraining on biophysical simulations is that the simulations can be customized to match the protein function and experimental assay. Our expanded simulations of the GB1-IgG complex with Rosetta InterfaceAnalyzer improve METL predictions of GB1 binding. 6/
11.09.2025 17:00 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
Fig. 3: Comparative performance across extrapolation tasks.
We also benchmark METL on four types of difficult extrapolation. For instance, positional extrapolation provides training data from some sequence positions and tests predictions at different sequence positions. Linear regression completely fails in this setting. 5/
11.09.2025 17:00 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
Fig. 2: Comparative performance of Linear, Rosetta total score, EVE, RaSP, Linear-EVE, ESM-2, ProteinNPT, METL-Global and METL-Local across different training set sizes.
We compare these approaches on deep mutational scanning datasets with increasing training set sizes. Biophysical pretraining helps METL generalize well with small training sets. However, augmented linear regression with EVE scores is great on some of these assays. 4/
11.09.2025 17:00 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
METL models pretrained on Rosetta biophysical attributes learn different protein representations than general protein language models like ESM-2 or protein family-specific models like EVE. These new representations are valuable for machine learning-guided protein engineering. 3/
11.09.2025 17:00 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
Most protein language models train on natural protein sequence data and use the underlying evolutionary signals to score sequence variants. Instead, METL trains on @rosettacommons.bsky.social data, learning from simulated biophyiscal attributes of the sequence variants we select. 2/
11.09.2025 17:00 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
Biophysics-based protein language models for protein engineering - Nature Methods
Mutational effect transfer learning (METL) is a protein language model framework that unites machine learning and biophysical modeling. Transformer-based neural networks are pretrained on biophysical ...
The journal version of "Biophysics-based protein language models for protein engineering" with @philromero.bsky.social is live! Mutational Effect Transfer Learning (METL) is a protein language model trained on biophysical simulations that we use for protein engineering. 1/
doi.org/10.1038/s415...
11.09.2025 17:00 โ ๐ 13 ๐ 2 ๐ฌ 1 ๐ 0
๐จNew paper ๐จ
Can protein language models help us fight viral outbreaks? Not yet. Hereโs why ๐งต๐
1/12
17.08.2025 03:42 โ ๐ 42 ๐ 19 ๐ฌ 3 ๐ 0
Distribution of the top 10 docking scores from molecules with high- and low-relevance BioAssays as context for different proteins.
There are many more results and controls in the paper. Here's how the best (most negative) docking scores change when we use relevant assays, irrelevant assays, or no assays as context for generation with GPT-4o. In the majority of cases, but not all, relevant context helps. 6/
18.07.2025 15:13 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
This generally has the desired effects across multiple LLMs and queried protein targets, with the caveat that our core results are based on AutoDock Vina scores. Assessing generated molecules with docking is admittedly frustrating. 5/
18.07.2025 15:13 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
We embed the BioAssay data into a vectorbase, retrieve initial candidate assays, and do further LLM-based filtering and summarization. We select some active and inactive molecules from the BioAssay data table. This is all used for in-context learning and molecule generation. 4/
18.07.2025 15:13 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
SSB-PriA antibiotic resistant target AlphaScreen
PubChem BioAssays can contain a lot of information about why and how an assay was run. Here's an example from our collaborators. pubchem.ncbi.nlm.nih.gov/bioassay/127...
There are now 1.7M PubChem BioAssays ranging in scale from a few tested molecules to high-throughput screens. 2/
18.07.2025 15:13 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
The Assay2Mol workflow. A chemist provides a target description, which is used to retrieve BioAssays from the pre-embedded vector database. After filtering for relevance, the BioAssays are summarized by an LLM. The BioAssay ID is then used to retrieve experimental tables. The final molecule generation prompt is formed by combining the description, summarization, and selected test molecules with associated test outcomes, enabling the LLM to generate relevant active molecules.
Our preprint Assay2Mol introduces uses PubChem chemical screening data as context when generating molecules with large language models. It uses assay descriptions and protocols to find relevant assays and that text plus active/inactive molecules as context for generation. 1/
18.07.2025 15:13 โ ๐ 1 ๐ 1 ๐ฌ 1 ๐ 0
Nobody is commenting on this little nugget from Fig 1?
18.07.2025 14:55 โ ๐ 34 ๐ 5 ๐ฌ 3 ๐ 1
Some complexes can be huge, not that that is what you'd use this model for. The mammalian nuclear pore complex has ~800 nucleoporins and a molecular weight of ~100 MDa. doi.org/10.1016/j.tc...
30.05.2025 14:22 โ ๐ 3 ๐ 0 ๐ฌ 1 ๐ 0
Isn't PKZILLA-1 the new champ? 45k amino acids www.uniprot.org/uniprotkb/A0...
30.05.2025 14:06 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
New methods are revolutionizing biology: an interview with Martin Steinegger
Martin Steinegger, who is the only non-DeepMind-affiliated author of the AlphaFold2 Nature paper, offers unique insights and personal reflections.
Happy to share this interview with Weijie Zhao from NSR at #OxfordUniversityPress. It covers questions Iโm often askedโwhy I chose Korea, AlphaFold2, my unconventional journey into academia, and research insights. Thanks again for the fun conversation.
๐ academic.oup.com/nsr/article/...
19.05.2025 12:08 โ ๐ 75 ๐ 17 ๐ฌ 0 ๐ 1
PDB101: Learn: Other Resources: Commemorating 75 Years of Discovery and Innovation at the NSF
Download images celebrating NSF and PDB milestones
To honor the 75th anniversary of @NSF, RCSB PDB Intern Xinyi Christine Zhang created posters to celebrate the science made possible by the NSF and RCSB PDB.
Explore these images and learn how protein research is changing our world. #NSFfunded #NSF75
pdb101.rcsb.org/lear...
08.05.2025 16:18 โ ๐ 16 ๐ 10 ๐ฌ 1 ๐ 1
I am assuming that responsibility extends far beyond Bluesky and that he also agrees to co-sign a personal loan I am applying for and walk the new puppy I adopted.
04.04.2025 14:38 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
My first post is a niche and personal shout out to @michaelhoffman.bsky.social, the person who asked me most often if I am on Bluesky yet.
03.04.2025 23:23 โ ๐ 22 ๐ 2 ๐ฌ 4 ๐ 1
Professor of EECS and Statistics at UC Berkeley. Mathematical and computational biologist.
assistant professor at Princeton University interested in biological and chemical data
Associate professor at ETH Zurich, studying the cellular consequences of genetic variation. Affiliated with the Swiss Institute of Bioinformatics and a part of the LOOP Zurich.
Discover the Languages of Biology
Build computational models to (help) solve biology? Join us! https://www.deboramarkslab.com
DM or mail me!
Biologist that navigate in the oceans of diversity through space-time
Protein evolution, metagenomics, AI/ML/DL
Website https://miangoaren.github.io/
Director of Institute for Computational Genomic Medicine at Goethe University Frankfurt https://cgm.uni-frankfurt.de/
Studying gene regulation and transcription factor binding with machine learning. Assoc Prof at Penn State. ๐ฎ๐ช
Computational biologist. Faculty @DukeU. Co-founder http://martini.ai. Prev @MIT_CSAIL. Did quant investing for a while, before returning to research.
https://singhlab.net
Chair of the Department of Biomedical Informatics at the University of Colorado School of Medicine. Research: transcriptomics, machine learning, public data - pick two of three. He/him. Views mine, not employer's.
Computational biology, machine learning, AI, RNA, cancer genomics. My views are my own. https://www.morrislab.ai
He/him/his
Developing data intensive computational methods โข PI @ Seoul National University ๐ฐ๐ท โข #FirstGen โข he/him โข Hauptschรผler
Lab studying molecular evolution of proteins and viruses. Affiliated with Fred Hutch & HHMI.
https://jbloomlab.org/
Principal Researcher in BioML at Microsoft Research. He/him/ไป. ๐น๐ผ yangkky.github.io
Ray and Stephanie Lane Professor of Computational Biology at CMU School of Computer Science. https://www.cs.cmu.edu/~jianma/
Machine learning for biology | Stanford and Arc Institute
Computational protein engineering & synthetic biochemistry
Opinions my own
https://linktr.ee/ddelalamo
Protein and coffee lover, father of two, professor of biophysics and sudo scientist at the Linderstrรธm-Lang Centre for Protein Science, University of Copenhagen ๐ฉ๐ฐ
Associate Prof @HarvardMed. Microbial evolution, antibiotic resistance, mobile genetic elements, algorithms, phages, molecular biotech, etc. Basic research is the engine of progress.
baymlab.hms.harvard.edu
AI4Science researcher. Associate Professor @CSHL. My lab advances AI for genomics and healthcare!
http://koo-lab.github.io