Anthony Gitter's Avatar

Anthony Gitter

@anthonygitter.bsky.social

Computational biologist; Associate Prof. at University of Wisconsin-Madison; Jeanne M. Rowe Chair at Morgridge Institute

56 Followers  |  27 Following  |  12 Posts  |  Joined: 03.04.2025  |  1.6688

Latest posts by anthonygitter.bsky.social on Bluesky

Preview
Assay2Mol: large language model-based drug design using BioAssay context Scientific databases aggregate vast amounts of quantitative data alongside descriptive text. In biochemistry, molecule screening assays evaluate the functional responses of candidate molecules against...

Paper: arxiv.org/abs/2507.12574
GitHub: github.com/gitter-lab/A...
Datasets: doi.org/10.5281/zeno...

7/

18.07.2025 15:13 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Distribution of the top 10 docking scores from molecules with high- and low-relevance BioAssays as context for different proteins.

Distribution of the top 10 docking scores from molecules with high- and low-relevance BioAssays as context for different proteins.

There are many more results and controls in the paper. Here's how the best (most negative) docking scores change when we use relevant assays, irrelevant assays, or no assays as context for generation with GPT-4o. In the majority of cases, but not all, relevant context helps. 6/

18.07.2025 15:13 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

This generally has the desired effects across multiple LLMs and queried protein targets, with the caveat that our core results are based on AutoDock Vina scores. Assessing generated molecules with docking is admittedly frustrating. 5/

18.07.2025 15:13 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

We embed the BioAssay data into a vectorbase, retrieve initial candidate assays, and do further LLM-based filtering and summarization. We select some active and inactive molecules from the BioAssay data table. This is all used for in-context learning and molecule generation. 4/

18.07.2025 15:13 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Data mining of PubChem bioassay records reveals diverse OXPHOS inhibitory chemotypes as potential therapeutic agents against ovarian cancer - Journal of Cheminformatics Focused screening on target-prioritized compound sets can be an efficient alternative to high throughput screening (HTS). For most biomolecular targets, compound prioritization models depend on prior ...

A proof of concept study from our collaborators showed that mining this PubChem data successfully identified new candidates for a target phenotype, oxidative phosphorylation doi.org/10.1186/s133....

We wanted to generalize that for any new query and assess the effectiveness. 3/

18.07.2025 15:13 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
SSB-PriA antibiotic resistant target AlphaScreen

SSB-PriA antibiotic resistant target AlphaScreen

PubChem BioAssays can contain a lot of information about why and how an assay was run. Here's an example from our collaborators. pubchem.ncbi.nlm.nih.gov/bioassay/127...

There are now 1.7M PubChem BioAssays ranging in scale from a few tested molecules to high-throughput screens. 2/

18.07.2025 15:13 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
The Assay2Mol workflow. A chemist provides a target description, which is used to retrieve BioAssays from the pre-embedded vector database. After filtering for relevance, the BioAssays are summarized by an LLM. The BioAssay ID is then used to retrieve experimental tables. The final molecule generation prompt is formed by combining the description, summarization, and selected test molecules with associated test outcomes, enabling the LLM to generate relevant active molecules.

The Assay2Mol workflow. A chemist provides a target description, which is used to retrieve BioAssays from the pre-embedded vector database. After filtering for relevance, the BioAssays are summarized by an LLM. The BioAssay ID is then used to retrieve experimental tables. The final molecule generation prompt is formed by combining the description, summarization, and selected test molecules with associated test outcomes, enabling the LLM to generate relevant active molecules.

Our preprint Assay2Mol introduces uses PubChem chemical screening data as context when generating molecules with large language models. It uses assay descriptions and protocols to find relevant assays and that text plus active/inactive molecules as context for generation. 1/

18.07.2025 15:13 โ€” ๐Ÿ‘ 1    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Nobody is commenting on this little nugget from Fig 1?

18.07.2025 14:55 โ€” ๐Ÿ‘ 34    ๐Ÿ” 5    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 1
Preview
MLCB The 20th Machine Learning in Computational Biology (MLCB) meeting will be a two-day hybrid conference, September 10-11, 9am-5pm ET, with the in-person component at the New York Genome Center, NYC. Reg...

That's what MLCB became after it was rejected as a NeurIPS workshop www.mlcb.org

Maybe a possible partner?

07.07.2025 18:29 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Some complexes can be huge, not that that is what you'd use this model for. The mammalian nuclear pore complex has ~800 nucleoporins and a molecular weight of ~100 MDa. doi.org/10.1016/j.tc...

30.05.2025 14:22 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Isn't PKZILLA-1 the new champ? 45k amino acids www.uniprot.org/uniprotkb/A0...

30.05.2025 14:06 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
New methods are revolutionizing biology: an interview with Martin Steinegger Martin Steinegger, who is the only non-DeepMind-affiliated author of the AlphaFold2 Nature paper, offers unique insights and personal reflections.

Happy to share this interview with Weijie Zhao from NSR at #OxfordUniversityPress. It covers questions Iโ€™m often askedโ€”why I chose Korea, AlphaFold2, my unconventional journey into academia, and research insights. Thanks again for the fun conversation.
๐Ÿ“„ academic.oup.com/nsr/article/...

19.05.2025 12:08 โ€” ๐Ÿ‘ 76    ๐Ÿ” 19    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
Preview
PDB101: Learn: Other Resources: Commemorating 75 Years of Discovery and Innovation at the NSF Download images celebrating NSF and PDB milestones

To honor the 75th anniversary of @NSF, RCSB PDB Intern Xinyi Christine Zhang created posters to celebrate the science made possible by the NSF and RCSB PDB.
Explore these images and learn how protein research is changing our world. #NSFfunded #NSF75
pdb101.rcsb.org/lear...

08.05.2025 16:18 โ€” ๐Ÿ‘ 16    ๐Ÿ” 10    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

I am assuming that responsibility extends far beyond Bluesky and that he also agrees to co-sign a personal loan I am applying for and walk the new puppy I adopted.

04.04.2025 14:38 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

My first post is a niche and personal shout out to @michaelhoffman.bsky.social, the person who asked me most often if I am on Bluesky yet.

03.04.2025 23:23 โ€” ๐Ÿ‘ 22    ๐Ÿ” 2    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 1

@anthonygitter is following 20 prominent accounts