β οΈ Caution when using ProteinGYM binary classification: some DMS_binarization_cutoff values appear inverted, some are totally unreasonable given the DMS_score shows a clear bimodal distribution.
11.06.2025 15:57 β π 0 π 0 π¬ 0 π 0@chaohou.bsky.social
Protein dynamic, Multi conformation, Language model, Computational biology | Postdoc @Columbia | PhD 2023 & Bachelor 2020 @PKU1898 http://chaohou.netlify.app
β οΈ Caution when using ProteinGYM binary classification: some DMS_binarization_cutoff values appear inverted, some are totally unreasonable given the DMS_score shows a clear bimodal distribution.
11.06.2025 15:57 β π 0 π 0 π¬ 0 π 0How can we better understand pathogenic variants in intrinsically disordered regions (IDRs)? How do models such as AlphaMissense and ESM1b predict pathogenicity, when these regions typically exhibit lower genomic conservation than ordered regions? Read more:
doi.org/10.1101/2025...
Relationship between perplexity and zero shot performance.
Protein language model likelihood are better zero shot mutation effect predictions when they have perplexity 3-6 on the wildtype sequence.
www.biorxiv.org/content/10.1...
read our preprint here: www.biorxiv.org/content/10.1...
29.04.2025 17:55 β π 0 π 0 π¬ 0 π 0Why do large protein language models like ESM2-15B underperform compared to medium-sized ones like ESM2-650M in predicting mutation effects? π€
We dive into this issue in our new preprintβbringing insights into model scaling on mutation effect prediction. π§¬π
4/n We also compared SeqDance's attention with ESM2-35M.
17.04.2025 14:43 β π 0 π 0 π¬ 0 π 03/n here is the comparison on viral proteins in ProteinGYM, (our models have 35M parameters)
17.04.2025 14:43 β π 0 π 0 π¬ 1 π 02/n on the mega-scale protein stability dataset, it's clear that ESM2's performance is correlated with the number of homologs in its training set. but our models show robust performance for proteins without homologs in training set.
17.04.2025 14:42 β π 0 π 0 π¬ 1 π 01/n to perform zero-shot fitness prediction, we use our models SeqDance/ESMDance to predict dynamic properties of both wild-type and mutated sequences. the relative changes bettween them are used to infer mutation effects.
17.04.2025 14:41 β π 0 π 0 π¬ 1 π 0We have updated our protein lanuage model trained on structure dynamics. Our new models show significant better zero-shot performance on mutation effects of designed and viral proteins compared to ESM2. check the new preprint here: www.biorxiv.org/content/10.1...
17.04.2025 14:40 β π 2 π 2 π¬ 1 π 0SeqDance: A Protein Language Model for Representing Protein Dynamic Properties https://www.biorxiv.org/content/10.1101/2024.10.11.617911v1
15.10.2024 16:49 β π 1 π 1 π¬ 0 π 0