Neil Thomas's Avatar

Neil Thomas

@countablyfinite.bsky.social

Research Scientist; AI + Biology thomas-a-neil.github.io

855 Followers  |  432 Following  |  13 Posts  |  Joined: 31.10.2023  |  1.5593

Latest posts by countablyfinite.bsky.social on Bluesky

With Tom Lehrer's passing, I suppose this is a moment to share the story of the prank he played on the National Security Agency, and how it went undiscovered for nearly 60 years.

27.07.2025 21:01 โ€” ๐Ÿ‘ 8539    ๐Ÿ” 3601    ๐Ÿ’ฌ 143    ๐Ÿ“Œ 715

Stats friends... what would your estimator be if you were interested in a similar question as this study that is lighting Bluesky on fire tonight? 1/x

11.07.2025 00:08 โ€” ๐Ÿ‘ 23    ๐Ÿ” 3    ๐Ÿ’ฌ 5    ๐Ÿ“Œ 2
Post image

1/4
๐Ÿš€ Announcing the 2025 Protein Engineering Tournament.

This yearโ€™s challenge: design PETase enzymes, which degrade the type of plastic in bottles. Can AI-guided protein design help solve the climate crisis? Letโ€™s find out! โฌ‡๏ธ

#AIforBiology #ClimateTech #ProteinEngineering #OpenScience

08.07.2025 16:26 โ€” ๐Ÿ‘ 22    ๐Ÿ” 20    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 4

We're sponsoring the use of ESM3 and EMSC to help researchers engineer improved PETase enzymes in the @AlignBio 2025 Protein Engineering Tournament.

Get started using ESMC to predict protein function and ESM3 to generate new enzymes here: github.com/evolutionary...

08.07.2025 18:01 โ€” ๐Ÿ‘ 7    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
Video thumbnail

Today I remembered my first QM parameterization of a small molecule failed miserably (turn volume ON for a full experience)

26.03.2025 21:18 โ€” ๐Ÿ‘ 29    ๐Ÿ” 4    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 1

NIH funding supporting the HMMER and Infernal software projects has been terminated. NIH states that our work, as well as all other federally funded research at Harvard, is of no benefit to the US.

22.05.2025 12:42 โ€” ๐Ÿ‘ 286    ๐Ÿ” 232    ๐Ÿ’ฌ 37    ๐Ÿ“Œ 46
Preview
Learning millisecond protein dynamics from what is missing in NMR spectra Many proteinsโ€™ biological functions rely on interconversions between multiple conformations occurring at micro-to millisecond (ยตs-ms) timescales. A lack of standardized, large-scale experimental data ...

Next Tues (4/29) at **4:30PM** ET, we will have @ginaelnesr.bsky.social @hkws.bsky.social present "Learning millisecond protein dynamics from what is missing in NMR spectra"

Paper: biorxiv.org/content/10.1...

Sign up on our website for zoom links!

22.04.2025 21:08 โ€” ๐Ÿ‘ 19    ๐Ÿ” 11    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 2
Post image

Thrilled to see my digital art on the cover of Trends Genet. The two binary strings represent reverse-complementary DNA sequences (00=A, 01=C, 10=G, 11=T) and the connecting rectangles represent โ€œembeddingsโ€ learned by DNA language models. Pls check out our article as well: doi.org/10.1016/j.ti...

07.04.2025 15:01 โ€” ๐Ÿ‘ 69    ๐Ÿ” 13    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
Post image

Small proteins can be more complex than they look!

We know proteins fluctuate between different conformations- but by how much? How does it vary from protein to protein? Can highly stable domains have low stability segments? @ajrferrari.bsky.social experimentally tested >5,000 domains to find out!

26.03.2025 16:21 โ€” ๐Ÿ‘ 84    ๐Ÿ” 36    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 0

Gene synthesis is often the most expensive part of protein engineering with generative models.

Happy to have played a small part in this work, where Chase developed a method for precision library construction at scale, with per-gene costs as low as $1.50.

@philromero.bsky.social

24.03.2025 17:24 โ€” ๐Ÿ‘ 63    ๐Ÿ” 23    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Scalable and cost-efficient custom gene library assembly from oligopools Advances in metagenomics, deep learning, and generative protein design have enabled broad in silico exploration of sequence space, but experimental characterization is still constrained by the cost an...

๐ŸŽ‰Congrats to Chase on her new preprint! She developed OMEGA--a simple method for assembling custom gene panels for as little as $1.50 per gene. Big step forward protein engineering and design!๐Ÿงฌ
www.biorxiv.org/content/10.1...

24.03.2025 16:50 โ€” ๐Ÿ‘ 57    ๐Ÿ” 14    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 3

So exciting to think what we will be able to do as we pair scaled library assembly techniques like these with ML-designed libraries and high throughput screening!

24.03.2025 17:38 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Protein dynamics was the first research to enchant me >10yrs ago, but I left in PhD bc I couldn't find big experimental data to evaluate models.

Today w @ginaelnesr.bsky.social, I'm thrilled to share the big dynamics data I've been dreaming of, and the mdl we trained w them: Dyna-1.
๐Ÿ“: rb.gy/de5axp

20.03.2025 15:02 โ€” ๐Ÿ‘ 85    ๐Ÿ” 25    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1
Post image

Protein function often depends on protein dynamics. To design proteins that function like natural ones, how do we predict their dynamics?

@hkws.bsky.social and I are thrilled to share the first big, experimental datasets on protein dynamics and our new model: Dyna-1!

๐Ÿงต

20.03.2025 15:02 โ€” ๐Ÿ‘ 102    ๐Ÿ” 38    ๐Ÿ’ฌ 5    ๐Ÿ“Œ 5
Preview
GitHub - google-deepmind/nuclease_design: ML-guided enzyme engineering ML-guided enzyme engineering. Contribute to google-deepmind/nuclease_design development by creating an account on GitHub.

All of our data is available! We released a deeply sampled, 55k variant library of NucBโ€™s enzymatic function.

Get started here: github.com/google-deepm...

12.03.2025 17:18 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Engineering highly active and diverse nuclease enzymes by ML and high-throughput screening
YouTube video by ML for protein engineering seminar series Engineering highly active and diverse nuclease enzymes by ML and high-throughput screening

If youโ€™re interested in learning more, check out our @ml4proteins.bsky.social seminar talk on this work

www.youtube.com/watch?v=eGNE...

12.03.2025 17:18 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

This was a collaborative effort between myself, David Belanger, Lucy Colwell and our whole team: Chenling Xu, Hanson Lee, Kathleen Hirano, Kosuke Iwai, Vanja Polic, Kendra Nyberg, Kevin Hoff, Lucas Frenz, Charlie Emrich, Jun Kim, Mariya Chavarha, Abi Ramanan, Jeremy Agresti

12.03.2025 17:18 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

This campaign was completed in 2021! Since then, the field has evolved tremendously. Weโ€™re excited about work that pushes forward:
1) Multi-objective optimization
2) Generative models (e.g. ESM3, ProGen, RFDiffusion)
3) Synergy with randomized library design
โ€ฆ to name a few

12.03.2025 17:18 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Multiple Sequence Alignments (MSAs) were also powerful for zero-shot design! Without any assay data, and even without structure or large-scale pretraining, we were able to design improved NucB variants with as many as 9 mutations from the wildtype.

12.03.2025 17:18 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

We found that in a head-to-head comparison of ML-guided design versus high-throughput directed evolution, our ML system could design higher activity variants, with lots more diversity!

12.03.2025 17:18 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

How did we engineer NucB? We used a variety of methods, encapsulated in our โ€œTeleProtโ€ framework (which โ€œteleportsโ€ past holes in the protein fitness landscape!) We balance evolutionary (VAEs) and assay-labeled data (CNNs) when designing libraries, and adjust as data accumulates

12.03.2025 17:18 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Our protein engineering target was NucB, an endonuclease that can break down biofilms and has potential as a treatment for chronic wounds. It is active at pH 9, but needs to be highly active at pH 7 to unlock its potential as a therapeutic.

12.03.2025 17:18 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Engineering highly active nuclease enzymes with machine learning and high-throughput screening Thomas etย al. introduce TeleProt, a framework for guiding protein library design with machine learning, and validate it in an enzyme engineering campaign to optimize the endonuclease NucB. Across 4 ro...

So proud to see our work on machine learning + enzyme design just published! www.cell.com/cell-systems...

Fun collaboration between Google X, DeepMind, and Triplebar that we hope can be a template for integrating ML and high throughput screening in protein engineering

12.03.2025 17:18 โ€” ๐Ÿ‘ 27    ๐Ÿ” 6    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 2
Post image

How did we engineer NucB? We used a variety of methods, encapsulated in our โ€œTeleProtโ€ framework (which โ€œteleportsโ€ past holes in the protein fitness landscape!) We balance evolutionary (VAEs) and assay-labeled data (CNNs) when designing libraries, and adjust as data accumulates

12.03.2025 17:02 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Our protein engineering target was NucB, an endonuclease that can break down biofilms and has potential as a treatment for chronic wounds. It is active at pH 9, but needs to be highly active at pH 7 to unlock its potential as a therapeutic.

12.03.2025 17:02 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I'll be in Vancouver for NeurIPS / MLSB from Dec 13-16! If you're interested in meeting up, especially to discuss protein language models, reach out! :)

06.12.2024 22:57 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Model Scale vs. Performance curves for ESM C models, with comparisons to ESM2 and other protein LMs. ESMC performs better than existing state of the art for the same model parameter scale.

Model Scale vs. Performance curves for ESM C models, with comparisons to ESM2 and other protein LMs. ESMC performs better than existing state of the art for the same model parameter scale.

Introducing ESM Cambrian, a new family of protein language models, focused on creating representations of the underlying biology of proteins.

04.12.2024 17:45 โ€” ๐Ÿ‘ 52    ๐Ÿ” 16    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 2
Preview
Ultrafast classical phylogenetic method beats large protein... Amino acid substitution rate matrices are fundamental to statistical phylogenetics and evolutionary biology. Estimating them typically requires reconstructed trees for massive amounts of aligned...

Large protein language models can learn complex epistatic interactions, but how much does that help with predicting variant effects? In this NeurIPS article, we show that classical independent-sites phylogenetic models can outperform pLMs on this task.
1/7
openreview.net/forum?id=H7m...

16.11.2024 20:41 โ€” ๐Ÿ‘ 92    ๐Ÿ” 44    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 2

@countablyfinite is following 20 prominent accounts