Mirdita Lab - Laboratory for Computational Biology & Molecular Machine Learning
Mirdita Lab builds scalable bioinformatics methods.
My time in @martinsteinegger.bsky.social's group is ending, but Iβm staying in Korea to build a lab at Sungkyunkwan University School of Medicine. If you or someone you know is interested in molecular machine learning and open-source bioinformatics, please reach out. I am hiring!
mirdita.org
20.01.2026 11:07 β π 105 π 54 π¬ 7 π 1
Just coincidentally found GenBank Release 84.0 from 1994 in the neighboring lab. Anyone out there with an even older version?
26.01.2025 02:28 β π 80 π 13 π¬ 6 π 2
In case you missed our ML for proteins seminar on CHEAP compression for protein embeddings back in October, here it is -- thanks @megthescientist.bsky.social for doing so much for the MLxProteins community π«Ά
28.12.2024 03:00 β π 16 π 1 π¬ 2 π 0
β’introduced βzero shot predictionβ as a question of guessing a bioassayβs outcome by likelihoods of pLMs
β’commented on biases in evolutionary signals from Tree of life used to train pLMs (a favorite paper I read in 2024: shorturl.at/fbC7g)
16.12.2024 06:29 β π 8 π 1 π¬ 1 π 0
Thanks @workshopmlsb.bsky.social for letting us share our work!
ππ bit.ly/plaid-proteins
15.12.2024 22:27 β π 22 π 2 π¬ 1 π 0
Another straightforward application is generation, either by next-token sampling or MaskGIT style denoising. We made the tokenized version of CHEAP to do generation, and decided to go with diffusion on continuous embeddings instead β but I think either wouldβve worked
10.12.2024 01:04 β π 6 π 0 π¬ 0 π 0
We trained a model to co-generate protein sequence and structure by working in the ESMFold latent space, which encodes both. PLAID only requires sequences for training but generates all-atom structures!
Really proud of @amyxlu.bsky.social 's effort leading this project end-to-end!
09.12.2024 14:58 β π 57 π 11 π¬ 2 π 0
immensely grateful for awesome collaborators on this work: Wilson Yan, Sarah Robinson, @kevinkaichuang.bsky.social, Vladimir Gligorijevic, @kyunghyuncho.bsky.social, Rich Bonneau, Pieter Abbeel, @ncfrey.bsky.social π«Ά
06.12.2024 17:44 β π 3 π 0 π¬ 0 π 0
6/ We'll get to share PLAID as an oral presentation at MLSB next week π₯³ In the meantime, checkout:
πPreprint: biorxiv.org/content/10.1...
π©βπ»Code: github.com/amyxlu/plaid
ποΈWeights: huggingface.co/amyxlu/plaid...
πWebsite: amyxlu.github.io/plaid/
π¦Server: coming soon!
06.12.2024 17:44 β π 6 π 1 π¬ 1 π 0
conditioning on organism and function shows that PLAID has learned active site residues and sidechain positions!
5/π ...and when prompted by function, PLAID learns sequence motifs at active sites & directly outputs sidechain positions, which backbone-only methods such as RFDiffusion can't do out-of-the-box.
The residues aren't directly adjacent, suggesting that the model isn't simply memorizing training data:
06.12.2024 17:44 β π 5 π 2 π¬ 1 π 0
unconditional generations from PLAID
4/ On unconditional generation, PLAID generates high quality and diverse structures, especially at longer sequence lengths where previous methods underperform...
06.12.2024 17:44 β π 5 π 0 π¬ 1 π 0
noising by a diffusion schedule in the latent space doesn't always correspond to the same corruption in the sequence and structure space...
3/ I was pretty stuck until building out the CHEAP (bit.ly/cheap-proteins) autoencoders that compressed & smoothed out the latent space: interestingly, gradual noise added to the ESMFold latent space doesn't actually corrupt the sequence and structure until the final forward diffusion timesteps π€
06.12.2024 17:44 β π 4 π 0 π¬ 1 π 0
how does the PLAID approach work?
2/π‘Co-generating sequence and structure is hard. A key insight is that to get embeddings of the ESMFold latent space during training, we only need sequence inputs.
For inference, we can sample latent embeddings & use frozen sequence/structure decoders to get all-atom structure:
06.12.2024 17:44 β π 4 π 0 π¬ 1 π 0
overview of results for PLAID!
1/𧬠Excited to share PLAID, our new approach for co-generating sequence and all-atom protein structures by sampling from the latent space of ESMFold. This requires only sequences during training, which unlocks more data and annotations:
bit.ly/plaid-proteins
π§΅
06.12.2024 17:44 β π 121 π 37 π¬ 1 π 3
biology π§¬, microscopes π¬, and computers π₯οΈ. Preferably all at once.
this bio is left as an exercise to the reader
π± emilyliu.me
Stanford Bioengineering PhD candidate / Biological AI in Brian Hieβs lab at Arc Institute
https://samuelking.cargo.site
enjoying and bemoaning biology. phd student
@columbia prev. @harvardmed @ginkgo @yale
Research Engineer at Google DeepMind. AlphaFold, LLMs, Physics and Civic Tech. tfgg.me
Learning from algorithms that learn from data about gene regulation.
Founder/CEO @ AINovo Biotech. Prev: AI + Genomics @ Stanford Computer Science, computational drug discovery @ ETH, Harvard. AI Optimist for Human Health.
phd student @ berkeley ai, vision + language
π https://people.eecs.berkeley.edu/~graceluo
π¦ https://x.com/graceluo_
Scientist, #MachineLearning and #AI for Moleculear Sciences. Scuba Diver. Loves @cecclementi.bsky.social
Research Scientist @prescientdesign @Genentech Former PhD @umdcs
ayaismail.com
Studying genomics, machine learning, and fruit. My code is like our genomes -- most of it is junk.
Assistant Professor UMass Chan
Previously IMP Vienna, Stanford Genetics, UW CSE.
Doudna Lab at UC Berkeley, Innovative Genomics Institute founder, CRISPR co-inventor and Nobel laureate innovativegenomics.org/
@_angie_chen at the other place
PhD student @NYU, formerly at
@Princeton π
Interested in LLMs/NLP, pastries, and running. She/her.
AI researcher in music, audio, LLMs.
Bioinformatics PhD Student in the Ren lab at UCSD. Modeling gene regulation across species.