Amy Lu's Avatar

Amy Lu

@amyxlu.bsky.social

AI for drug discovery at Isomorphic Labs. Prev: PhD @ UC Berkeley | πŸ‡¨πŸ‡¦

884 Followers  |  198 Following  |  10 Posts  |  Joined: 20.02.2024  |  1.5589

Latest posts by amyxlu.bsky.social on Bluesky

Preview
Mirdita Lab - Laboratory for Computational Biology & Molecular Machine Learning Mirdita Lab builds scalable bioinformatics methods.

My time in @martinsteinegger.bsky.social's group is ending, but I’m staying in Korea to build a lab at Sungkyunkwan University School of Medicine. If you or someone you know is interested in molecular machine learning and open-source bioinformatics, please reach out. I am hiring!
mirdita.org

20.01.2026 11:07 β€” πŸ‘ 105    πŸ” 54    πŸ’¬ 7    πŸ“Œ 1
Post image Post image

Just coincidentally found GenBank Release 84.0 from 1994 in the neighboring lab. Anyone out there with an even older version?

26.01.2025 02:28 β€” πŸ‘ 80    πŸ” 13    πŸ’¬ 6    πŸ“Œ 2

In case you missed our ML for proteins seminar on CHEAP compression for protein embeddings back in October, here it is -- thanks @megthescientist.bsky.social for doing so much for the MLxProteins community 🫢

28.12.2024 03:00 β€” πŸ‘ 16    πŸ” 1    πŸ’¬ 2    πŸ“Œ 0
Post image

β€’introduced β€œzero shot prediction” as a question of guessing a bioassay’s outcome by likelihoods of pLMs
β€’commented on biases in evolutionary signals from Tree of life used to train pLMs (a favorite paper I read in 2024: shorturl.at/fbC7g)

16.12.2024 06:29 β€” πŸ‘ 8    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Thanks @workshopmlsb.bsky.social for letting us share our work!

πŸ”—πŸ“„ bit.ly/plaid-proteins

15.12.2024 22:27 β€” πŸ‘ 22    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0

Another straightforward application is generation, either by next-token sampling or MaskGIT style denoising. We made the tokenized version of CHEAP to do generation, and decided to go with diffusion on continuous embeddings instead β€” but I think either would’ve worked

10.12.2024 01:04 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

We trained a model to co-generate protein sequence and structure by working in the ESMFold latent space, which encodes both. PLAID only requires sequences for training but generates all-atom structures!

Really proud of @amyxlu.bsky.social 's effort leading this project end-to-end!

09.12.2024 14:58 β€” πŸ‘ 57    πŸ” 11    πŸ’¬ 2    πŸ“Œ 0

immensely grateful for awesome collaborators on this work: Wilson Yan, Sarah Robinson, @kevinkaichuang.bsky.social, Vladimir Gligorijevic, @kyunghyuncho.bsky.social, Rich Bonneau, Pieter Abbeel, @ncfrey.bsky.social 🫢

06.12.2024 17:44 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

6/ We'll get to share PLAID as an oral presentation at MLSB next week πŸ₯³ In the meantime, checkout:

πŸ“„Preprint: biorxiv.org/content/10.1...
πŸ‘©β€πŸ’»Code: github.com/amyxlu/plaid
πŸ‹οΈWeights: huggingface.co/amyxlu/plaid...
🌐Website: amyxlu.github.io/plaid/
🍦Server: coming soon!

06.12.2024 17:44 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
conditioning on organism and function shows that PLAID has learned active site residues and sidechain positions!

conditioning on organism and function shows that PLAID has learned active site residues and sidechain positions!

5/πŸš€ ...and when prompted by function, PLAID learns sequence motifs at active sites & directly outputs sidechain positions, which backbone-only methods such as RFDiffusion can't do out-of-the-box.

The residues aren't directly adjacent, suggesting that the model isn't simply memorizing training data:

06.12.2024 17:44 β€” πŸ‘ 5    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
unconditional generations from PLAID

unconditional generations from PLAID

4/ On unconditional generation, PLAID generates high quality and diverse structures, especially at longer sequence lengths where previous methods underperform...

06.12.2024 17:44 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
noising by a diffusion schedule in the latent space doesn't always correspond to the same corruption in the sequence and structure space...

noising by a diffusion schedule in the latent space doesn't always correspond to the same corruption in the sequence and structure space...

3/ I was pretty stuck until building out the CHEAP (bit.ly/cheap-proteins) autoencoders that compressed & smoothed out the latent space: interestingly, gradual noise added to the ESMFold latent space doesn't actually corrupt the sequence and structure until the final forward diffusion timesteps πŸ€”

06.12.2024 17:44 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
how does the PLAID approach work?

how does the PLAID approach work?

2/πŸ’‘Co-generating sequence and structure is hard. A key insight is that to get embeddings of the ESMFold latent space during training, we only need sequence inputs.

For inference, we can sample latent embeddings & use frozen sequence/structure decoders to get all-atom structure:

06.12.2024 17:44 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
overview of results for PLAID!

overview of results for PLAID!

1/🧬 Excited to share PLAID, our new approach for co-generating sequence and all-atom protein structures by sampling from the latent space of ESMFold. This requires only sequences during training, which unlocks more data and annotations:

bit.ly/plaid-proteins
🧡

06.12.2024 17:44 β€” πŸ‘ 121    πŸ” 37    πŸ’¬ 1    πŸ“Œ 3

@amyxlu is following 18 prominent accounts