Paper: arxiv.org/abs/2507.12574
GitHub: github.com/gitter-lab/A...
Datasets: doi.org/10.5281/zeno...
7/
@anthonygitter.bsky.social
Computational biologist; Associate Prof. at University of Wisconsin-Madison; Jeanne M. Rowe Chair at Morgridge Institute
Paper: arxiv.org/abs/2507.12574
GitHub: github.com/gitter-lab/A...
Datasets: doi.org/10.5281/zeno...
7/
Distribution of the top 10 docking scores from molecules with high- and low-relevance BioAssays as context for different proteins.
There are many more results and controls in the paper. Here's how the best (most negative) docking scores change when we use relevant assays, irrelevant assays, or no assays as context for generation with GPT-4o. In the majority of cases, but not all, relevant context helps. 6/
18.07.2025 15:13 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0This generally has the desired effects across multiple LLMs and queried protein targets, with the caveat that our core results are based on AutoDock Vina scores. Assessing generated molecules with docking is admittedly frustrating. 5/
18.07.2025 15:13 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0We embed the BioAssay data into a vectorbase, retrieve initial candidate assays, and do further LLM-based filtering and summarization. We select some active and inactive molecules from the BioAssay data table. This is all used for in-context learning and molecule generation. 4/
18.07.2025 15:13 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0A proof of concept study from our collaborators showed that mining this PubChem data successfully identified new candidates for a target phenotype, oxidative phosphorylation doi.org/10.1186/s133....
We wanted to generalize that for any new query and assess the effectiveness. 3/
SSB-PriA antibiotic resistant target AlphaScreen
PubChem BioAssays can contain a lot of information about why and how an assay was run. Here's an example from our collaborators. pubchem.ncbi.nlm.nih.gov/bioassay/127...
There are now 1.7M PubChem BioAssays ranging in scale from a few tested molecules to high-throughput screens. 2/
The Assay2Mol workflow. A chemist provides a target description, which is used to retrieve BioAssays from the pre-embedded vector database. After filtering for relevance, the BioAssays are summarized by an LLM. The BioAssay ID is then used to retrieve experimental tables. The final molecule generation prompt is formed by combining the description, summarization, and selected test molecules with associated test outcomes, enabling the LLM to generate relevant active molecules.
Our preprint Assay2Mol introduces uses PubChem chemical screening data as context when generating molecules with large language models. It uses assay descriptions and protocols to find relevant assays and that text plus active/inactive molecules as context for generation. 1/
18.07.2025 15:13 โ ๐ 1 ๐ 1 ๐ฌ 1 ๐ 0Nobody is commenting on this little nugget from Fig 1?
18.07.2025 14:55 โ ๐ 34 ๐ 5 ๐ฌ 3 ๐ 1That's what MLCB became after it was rejected as a NeurIPS workshop www.mlcb.org
Maybe a possible partner?
Some complexes can be huge, not that that is what you'd use this model for. The mammalian nuclear pore complex has ~800 nucleoporins and a molecular weight of ~100 MDa. doi.org/10.1016/j.tc...
30.05.2025 14:22 โ ๐ 3 ๐ 0 ๐ฌ 1 ๐ 0Isn't PKZILLA-1 the new champ? 45k amino acids www.uniprot.org/uniprotkb/A0...
30.05.2025 14:06 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0Happy to share this interview with Weijie Zhao from NSR at #OxfordUniversityPress. It covers questions Iโm often askedโwhy I chose Korea, AlphaFold2, my unconventional journey into academia, and research insights. Thanks again for the fun conversation.
๐ academic.oup.com/nsr/article/...
To honor the 75th anniversary of @NSF, RCSB PDB Intern Xinyi Christine Zhang created posters to celebrate the science made possible by the NSF and RCSB PDB.
Explore these images and learn how protein research is changing our world. #NSFfunded #NSF75
pdb101.rcsb.org/lear...
I am assuming that responsibility extends far beyond Bluesky and that he also agrees to co-sign a personal loan I am applying for and walk the new puppy I adopted.
04.04.2025 14:38 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0My first post is a niche and personal shout out to @michaelhoffman.bsky.social, the person who asked me most often if I am on Bluesky yet.
03.04.2025 23:23 โ ๐ 22 ๐ 2 ๐ฌ 4 ๐ 1