Will be interesting to see this compared to experiments
New OpenFold3 preview out! (OF3p2)
It closes the gap to AlphaFold3 for most modalities.
Most critically, we're releasing everything, including training sets & configs, making OF3p2 the only current AF3-based model that is functionally trainable & reproducible from scratch🧵1/9
OK, I'm going to go with the theory that Sam is seeing how insane the things he says can get while still being reported solemnly in the media and nodded through in interviews. Because that way at least I won't go mad.
Yes! Though maybe if a protein has enough (close/easy to find) homologs, there’s a good chance that it was?
I feel attacked
doi.org/10.1140/epje...
Holy smoke. What ultimately happened???
A public database of binding-site predictions in the human proteome and a google colab notebook to use the model yourself can be found here: github.com/sokrypton/af...
New paper showing that much of the apparent success of protein language models in predicting mutational effects is a mirage: These models mostly memorize sites. 1/
www.biorxiv.org/content/10.6...
This is fully in line with our experience. One advantage of the pLM's is that—once they have been trained—it's an easy approach to assess positional conservation w.o. having to build alignments.
Community perspective:
Toward a unified framework for determining conformational ensembles of disordered proteins 🍝
with framework for experimental data acquisition, computational ensemble generation & validation
Led by @hamidrgh.bsky.social, Silvio Tosatto & Alex Monzon
doi.org/10.1038/s415...
New paper from former PhD student @tkschulze.bsky.social on supervised learning of protein variant effects across large-scale mutagenesis datasets
MAVE/DMS experiments provide large amounts of data for benchmarking variant effect predictors, but may be difficult to use in supervised learning. 1/5
This turned into a rant by accident.
We are always thinking about this and are facing it right now: new molecules and some new routes to them with some new chemistry, some new Assays for the biology slashing costs by three orders of magnitude (not this research...but per sample moving forward)
We also see lab-to-lab variation despite using the same approaches (the top three panels are all from Doug's lab)
bsky.app/profile/lind...
As we discuss briefly in the paper, it should in principle be possible to learn the calibration curves from the MAVE data alone, but it's not easy
Thanks. One point of our paper was also to highlight that having calibration curves can be useful both for training and benchmarking, in particular when the latter occurs on a shared and physically meaningful scale.
As for tests on unseen data, I only know of CAGI genomeinterpretation.org
Review from Fia B. Larsen in @rhp-lab.bsky.social with everything you always wanted to know about proteasomal control of transcription factors (but were afraid to ask about)
Proteasomal control of transcription factors: mechanisms, regulation and dysregulation.
doi.org/10.1007/s000...
Supervised learning of protein variant effects across large-scale mutagenesis datasets
onlinelibrary.wiley.com/doi/10.1002/...
@tkschulze.bsky.social, Lasse Blaabjerg, @mcagiada.bsky.social
See also: Effects of residue substitutions on the cellular abundance of proteins
doi.org/10.7554/eLif...
5/5
Thea therefore built an approach to train models that takes this dataset-to-dataset variability into account via specific "standard curves", thus enabling training a model on the high-throughput data while learning to predict on the abundance scale only available in low-throughput experiments. 4/5
This variation makes it hard to perform supervised learning because a VAMP-seq score of, say 0.5, can mean quite different things in different datasets (see paper for a discussion of why that's the case). 3/5
Thea collected VAMP-seq data from the literature on how variants impact protein abundance, and showed that while there is a high correlation between abundance (as measured in low-throughput) and the sequence-based VAMP-seq scores, the relationship may be non-linear and vary across datasets. 2/5
New paper from former PhD student @tkschulze.bsky.social on supervised learning of protein variant effects across large-scale mutagenesis datasets
MAVE/DMS experiments provide large amounts of data for benchmarking variant effect predictors, but may be difficult to use in supervised learning. 1/5
That may be the situation in some cases but not in the one I refer to. I don’t think it would have made sense to publish method separately from this application. But strategically it might have been better. So in that case sort of the opposite of the (very real) issue to describe
I don't know. We tried to describe the new technique with a few equations in the main text as well as a flow-chart and benchmarking using synthetic data, targetting what I thought was the right the audience. And we did cut some corners compared to the full Bayesian approach.
But the key idea—a Bayesian framework that “back-propagates” deviations between simulations and experiments to update force-field parameters efficiently—was placed in the Supporting Information. In hindsight, that was a mistake: the main conceptual advance was hidden, so few noticed it.
Don’t hide the good stuff in the SI
In 2008, we published a paper on parameterizing force fields for unfolded proteins using NMR data, developing an early HPS model for intrinsically disordered proteins. The paper showed the idea, validation with synthetic data, and applications to real proteins.
FMLWY are wrong, so that’s indeed more than half that are correct if we accept the weird locations of the ring heteroatoms, how the OH’s are connected and the wrong names for DE
Fancy a fresh preprint for Friday? When we were first getting involved with single molecule FRET, there weren't any standard protein molecules that suited our applications to help us develop our pipeline. So we built some! A universal protein ladder for FRET. 🧵 1/
We have started a project trying to predic the interactions/structures of all yeast protein pairs using an AlphaFold pooling approach. We are making the current dataset open and we welcome collaborations.
www.evocellnet.com/2026/03/mapp...
The Division of Biological Physics @mpipks.bsky.social seeks a Research Group Leader in #biophysics, #softmatter physics, or related areas. (Further particulars in the ad.)
Apply by the 3rd of April 2026 at pks.mpg.de/bprgl to join us in Dresden!
Have a look at our latest publication:
Cupriavidus necator as an alternative source for 15N/13C isotopic enrichment of proteins expressed in insect cells for NMR - enabling structural studies of disease-relevant targets.
Full text link:
link.springer.com/article/10.1...