Anthony Gitter's Avatar

Anthony Gitter

@anthonygitter.bsky.social

Computational biologist; Associate Prof. at University of Wisconsin-Madison; Jeanne M. Rowe Chair at Morgridge Institute

79 Followers  |  36 Following  |  34 Posts  |  Joined: 03.04.2025  |  2.1647

Latest posts by anthonygitter.bsky.social on Bluesky

Preview
Ancient amino acid sets enable stable protein folds Early proteins likely arose from a chemically limited set of amino acids available through prebiotic chemistry, raising a central question in molecular evolution: could such primitive compositions yie...

Can proteins fold and function with half of the amino acid alphabet?
Using only 10 residues, we designed stable, mutation-resilient structuresโ€”no aromatics or basics involved.
A minimalist foundation for ancient biology and synthetic design. tinyurl.com/37t8br4v
#ProteinDesign #OriginsOfLife

03.11.2025 16:48 โ€” ๐Ÿ‘ 25    ๐Ÿ” 11    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Mingchen replied to me on Twitter that it's also on bioRxiv now www.biorxiv.org/content/10.6...

23.01.2026 15:41 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Mirdita Lab - Laboratory for Computational Biology & Molecular Machine Learning Mirdita Lab builds scalable bioinformatics methods.

My time in @martinsteinegger.bsky.social's group is ending, but Iโ€™m staying in Korea to build a lab at Sungkyunkwan University School of Medicine. If you or someone you know is interested in molecular machine learning and open-source bioinformatics, please reach out. I am hiring!
mirdita.org

20.01.2026 11:07 โ€” ๐Ÿ‘ 105    ๐Ÿ” 54    ๐Ÿ’ฌ 7    ๐Ÿ“Œ 1
Know when to co-fold'em This is the official web page for the James Fraser Lab at UCSF.

I'm really excited to break up the holiday relaxation time with a new preprint that benchmarks AlphaFold3 (AF3)/โ€œco-foldingโ€ methods with 2 new stringent performance tests.

Thread below - but first some links:
A longer take:
fraserlab.com/2025/12/29/k...

Preprint:
www.biorxiv.org/content/10.6...

29.12.2025 22:25 โ€” ๐Ÿ‘ 72    ๐Ÿ” 30    ๐Ÿ’ฌ 5    ๐Ÿ“Œ 2
Video thumbnail

New preprint๐Ÿšจ
Imagine (re)designing a protein via inverse folding. AF2 predicts the designed sequence to a structure with pLDDT 94 & you get 1.8 ร… RMSD to the input. Perfect design?
What if I told u that the structure has 4 solvent-exposed Trp and 3 Pro where a Gly should be?

Why to be wary๐Ÿงต๐Ÿ‘‡

16.12.2025 15:15 โ€” ๐Ÿ‘ 57    ๐Ÿ” 21    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 1
Preview
GitHub - AnantharamanLab/protein_set_transformer: Protein Set Transformer (PST) framework for training protein-language-model-based genome language models. Inference is possible for viral genomes usin... Protein Set Transformer (PST) framework for training protein-language-model-based genome language models. Inference is possible for viral genomes using our pretrained viral foundation model. - Anan...

Cody also put in a ton of extra work to make the code organized and usable in the GitHub repo: github.com/Anantharaman...

It links to a Colab notebook for model inference, training data, and pretrained models.

15.12.2025 21:55 โ€” ๐Ÿ‘ 1    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Protein Set Transformer: a protein-based genome language model to power high-diversity viromics - Nature Communications A genome language model, Protein Set Transformer, trained on viral datasets, uncovers evolutionary rules of protein content and organization driving precise virus identification, host prediction, and ...

Excited for our new paper on a genome language model for viruses in @natcomms.nature.com: "Protein Set Transformer: a protein-based genome language model to power high-diversity viromics"! Led by PhD student Cody Martin in collaboration with @anthonygitter.bsky.social

doi.org/10.1038/s414...

15.12.2025 18:40 โ€” ๐Ÿ‘ 10    ๐Ÿ” 4    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Thanks, I didn't realize Rogue Scholar minted DOIs

12.12.2025 20:22 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Use @prereview.bsky.social for preprints and something else for other manuscripts?

12.12.2025 16:49 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

What are good places to post an unsolicited manuscript peer review these days? I don't have a blog. I read manuscripts across arXiv, bioRxiv, ChemRxiv, OpenReview, random white papers, journals, etc. Do I dump it on Zenodo, post it here, and send it to the authors?

12.12.2025 16:49 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

Our Assay2Mol manuscript was published at EMNLP 2025 doi.org/10.18653/v1/...

See the preprint thread below for a summary of the methodology, results, and code. We added more control experiments in this version related to protein sequence identity and generated molecule size.

21.11.2025 15:19 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@hkws.bsky.social and I are creating the Madison AI for Proteins (MAIP) group to discuss early-stage research at monthly meetups, share computational resources, and grow this local community. Visit mad-ai-proteins.github.io to sign up for announcements and watch for our 2026 events.

20.11.2025 16:29 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

This looks like a fantastic resource to study human kinase signalling. So much MS instrument time.

19.11.2025 06:18 โ€” ๐Ÿ‘ 13    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Video thumbnail

Something fun and sciencey is coming soon to Madison

14.11.2025 17:20 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
Preview
Protein Language Model Fitness Is a Matter of Preference Leveraging billions of years of evolution, scientists have trained protein language models (pLMs) to understand the sequence and structure space of proteins aiding in the design of more functional pro...

Looks very interesting. Can I think of this like a more extreme form of the evotuning from UniRep or doi.org/10.1101/2024... except it uses one sequence instead of the sequence plus homologs?

23.10.2025 22:23 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
MPAC Multi-omic Pathway Analysis of Cells (MPAC), integrates multi-omic data for understanding cellular mechanisms. It predicts novel patient groups with distinct pathway profiles as well as identifying ke...

Bioconductor R package: bioconductor.org/packages/MPAC

Shiny app to explore results in manuscript: connect.doit.wisc.edu/content/122/

10.10.2025 14:56 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

MPAC uses PARADIGM as the probabilistic model but makes many improvements:
- data-driven omic data discretization
- permutation testing to eliminate spurious predictions
- full workflow and downstream analyses in an R package
- Shiny app for interactive visualization

10.10.2025 14:56 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Overview of the MPAC workflow. MPAC calculates inferred pathway levels (IPLs) from real and permuted CNA and RNA data. It filters real IPLs using the permuted IPLs to remove spurious IPLs. Then, MPAC focuses on the largest pathway subset network with filtered IPLs to compute GO term enrichment, predict patient groups, and identify key group-specific proteins.

Overview of the MPAC workflow. MPAC calculates inferred pathway levels (IPLs) from real and permuted CNA and RNA data. It filters real IPLs using the permuted IPLs to remove spurious IPLs. Then, MPAC focuses on the largest pathway subset network with filtered IPLs to compute GO term enrichment, predict patient groups, and identify key group-specific proteins.

The journal version of our Multi-omic Pathway Analysis of Cells (MPAC) software is now out: doi.org/10.1093/bioi...

MPAC uses biological pathway graphs to model DNA copy number and gene expression changes and infer activity states of all pathway members.

10.10.2025 14:56 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
๐Ÿงฌ Use ESMFold Online | Neurosnap Bulk protein structure prediction model that only requires a single amino acid sequence as input. Much faster than AlphaFold2 since no MSAs are required (but slightly less accurate too).

I found out that Neurosnap offers ESMFold via API neurosnap.ai/service/ESMF...

I may test how many calls are possible with the free academic plan to see if it is worthwhile to update my repo.

09.10.2025 02:25 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Biophysics-based protein language models for protein engineering - Nature Methods Mutational effect transfer learning (METL) is a protein language model framework that unites machine learning and biophysical modeling. Transformer-based neural networks are pretrained on biophysical simulation data to capture fundamental relationships between protein sequence, structure and energetics.

AI + physics for protein engineering ๐Ÿš€
Our collaboration with @anthonygitter.bsky.social is out in Nature Methods! We use synthetic data from molecular modeling to pretrain protein language models. Congrats to Sam Gelman and the team!
๐Ÿ”— www.nature.com/articles/s41...

01.10.2025 19:07 โ€” ๐Ÿ‘ 5    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Does anyone know whether there's a functioning API to ESMfold?

(api.esmatlas.com/foldSequence... gives me Service Temporarily Unavailable)

30.09.2025 14:11 โ€” ๐Ÿ‘ 3    ๐Ÿ” 1    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Preview
GitHub - gitter-lab/metl: Mutational Effect Transfer Learning (METL) framework for pretraining and finetuning biophysics-informed protein language models Mutational Effect Transfer Learning (METL) framework for pretraining and finetuning biophysics-informed protein language models - gitter-lab/metl

The main GitHub repo github.com/gitter-lab/m... links to the extensive resources for running Rosetta simulations at scale to generate new training data, training METL models, running our models, and accessing our datasets. 8/

11.09.2025 17:00 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Fig. 6: Low-N GFP design.

Fig. 6: Low-N GFP design.

We can use METL for low-N protein design. We trained METL on Rosetta simulations of GFP biophysical attributes and only 64 experimental examples of GFP brightness. It designed fluorescent 5 and 10 mutants, including some with mutants entirely outside training set mutations. 7/

11.09.2025 17:00 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Fig. 5: Function-specific simulations improve METL pretraining for GB1.

Fig. 5: Function-specific simulations improve METL pretraining for GB1.

A powerful aspect of pretraining on biophysical simulations is that the simulations can be customized to match the protein function and experimental assay. Our expanded simulations of the GB1-IgG complex with Rosetta InterfaceAnalyzer improve METL predictions of GB1 binding. 6/

11.09.2025 17:00 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Fig. 3: Comparative performance across extrapolation tasks.

Fig. 3: Comparative performance across extrapolation tasks.

We also benchmark METL on four types of difficult extrapolation. For instance, positional extrapolation provides training data from some sequence positions and tests predictions at different sequence positions. Linear regression completely fails in this setting. 5/

11.09.2025 17:00 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Fig. 2: Comparative performance of Linear, Rosetta total score, EVE, RaSP, Linear-EVE, ESM-2, ProteinNPT, METL-Global and METL-Local across different training set sizes.

Fig. 2: Comparative performance of Linear, Rosetta total score, EVE, RaSP, Linear-EVE, ESM-2, ProteinNPT, METL-Global and METL-Local across different training set sizes.

We compare these approaches on deep mutational scanning datasets with increasing training set sizes. Biophysical pretraining helps METL generalize well with small training sets. However, augmented linear regression with EVE scores is great on some of these assays. 4/

11.09.2025 17:00 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

METL models pretrained on Rosetta biophysical attributes learn different protein representations than general protein language models like ESM-2 or protein family-specific models like EVE. These new representations are valuable for machine learning-guided protein engineering. 3/

11.09.2025 17:00 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Most protein language models train on natural protein sequence data and use the underlying evolutionary signals to score sequence variants. Instead, METL trains on @rosettacommons.bsky.social data, learning from simulated biophyiscal attributes of the sequence variants we select. 2/

11.09.2025 17:00 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Biophysics-based protein language models for protein engineering - Nature Methods Mutational effect transfer learning (METL) is a protein language model framework that unites machine learning and biophysical modeling. Transformer-based neural networks are pretrained on biophysical ...

The journal version of "Biophysics-based protein language models for protein engineering" with @philromero.bsky.social is live! Mutational Effect Transfer Learning (METL) is a protein language model trained on biophysical simulations that we use for protein engineering. 1/

doi.org/10.1038/s415...

11.09.2025 17:00 โ€” ๐Ÿ‘ 13    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Chemical Language Model Linker: Blending Text and Molecules with Modular Adapters The development of large language models and multimodal models has enabled the appealing idea of generating novel molecules from text descriptions. Generative modeling would shift the paradigm from relying on large-scale chemical screening to find molecules with desired properties to directly generating those molecules. However, multimodal models combining text and molecules are often trained from scratch, without leveraging existing high-quality pretrained models. Training from scratch consumes more computational resources and prohibits model scaling. In contrast, we propose a lightweight adapter-based strategy named Chemical Language Model Linker (ChemLML). ChemLML blends the two single domain models and obtains conditional molecular generation from text descriptions while still operating in the specialized embedding spaces of the molecular domain. ChemLML can tailor diverse pretrained text models for molecule generation by training relatively few adapter parameters. We find that the choice of molecular representation used within ChemLML, SMILES versus SELFIES, has a strong influence on conditional molecular generation performance. SMILES is often preferable despite not guaranteeing valid molecules. We raise issues in using the entire PubChem data set of molecules and their associated descriptions for evaluating molecule generation and provide a filtered version of the data set as a generation test set. To demonstrate how ChemLML could be used in practice, we generate candidate protein inhibitors and use docking to assess their quality and also generate candidate membrane permeable molecules.

The journal version of our paper 'Chemical Language Model Linker: Blending Text and Molecules with Modular Adapters' is out doi.org/10.1021/acs....

ChemLML is a method for text-based conditional molecule generation that uses pretrained text models like SciBERT, Galactica, or T5.

22.08.2025 13:36 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@anthonygitter is following 20 prominent accounts