Chris Mungall's Avatar

Chris Mungall

@cmungall.bsky.social

Berkeley Lab, Environmental Genomics and Systems Biology division. #GeneOntology #MonarchInitiative #AllianceGenome #NationalMicrobimeDataCollaborative #OBOFoundry.

1,044 Followers  |  515 Following  |  570 Posts  |  Joined: 13.09.2023  |  2.3789

Latest posts by cmungall.bsky.social on Bluesky

Phage Foundry

πŸ“£ New preprint from us at phagefoundry.org πŸ“£
A solid machine learning framework & to predict strain-level phage-host interactions across diverse bacterial genera from genome sequences alone. Avery Noonan from the Arkin Lab led this massive effort
www.biorxiv.org/content/10.1...

16.11.2025 17:58 β€” πŸ‘ 26    πŸ” 15    πŸ’¬ 1    πŸ“Œ 0
Preview
Chris Mungall (@Cmungall@genomic.social) Attached: 1 image How can we scale up manual classification of chemical structures in databases like ChEBI? Can we help curators place new structures into classes like "terpenoid", based on their che...

See the thread (from the original arXiv preprint) over on Mastodon: genomic.social/@Cmungall/11...

03.10.2025 14:58 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Chemical classification program synthesis using generative artificial intelligence - Journal of Cheminformatics Accurately classifying chemical structures is essential for cheminformatics and bioinformatics, including tasks such as identifying bioactive compounds of interest, screening molecules for toxicity to humans, finding non-organic compounds with desirable material properties, or organizing large chemical libraries for drug discovery or environmental monitoring. However, manual classification is labor-intensive and difficult to scale to large chemical databases. Existing automated approaches either rely on manually constructed classification rules, or are deep learning methods that lack explainability. This work presents an approach that uses generative artificial intelligence to automatically write chemical classifier programs for classes in the Chemical Entities of Biological Interest (ChEBI) database. These programs can be used for efficient deterministic run-time classification of SMILES structures, with natural language explanations. The programs themselves constitute an explainable computable ontological model of chemical class nomenclature, which we call the ChEBI Chemical Class Program Ontology (C3PO). We validated our approach against the ChEBI database, and compared our results against deep learning models and a naive SMARTS pattern based classifier. C3PO outperforms the naive classifier, but does not reach the performance of state of the art deep learning methods. However, C3PO has a number of strengths that complement deep learning methods, including explainability and reduced data dependence. C3PO can be used alongside deep learning classifiers to provide an explanation of the classification, where both methods agree. The programs can be used as part of the ontology development process, and iteratively refined by expert human curators.

We developed and evaluated a method to learn python chemical structure classifiers using LLMs. These can give classifications+explanations at runtime. With @jannahastings.bsky.social @justaddcoffee.bsky.social Noel O'Boyle, Daniel Korn, Adnan Malik jcheminf.biomedcentral.com/articles/10....

03.10.2025 14:57 β€” πŸ‘ 8    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
A busy tool wall in a shed. At the bottom there are instructions saying "Find the 10 hidden enhancers!" Across the wall between the tools are 10 enhancers, represented as DNA helices, but they are difficult to find in the style of a "hidden object" puzzle. Original photo by Lachlan Donald, https://www.flickr.com/photos/lox/9408028555

A busy tool wall in a shed. At the bottom there are instructions saying "Find the 10 hidden enhancers!" Across the wall between the tools are 10 enhancers, represented as DNA helices, but they are difficult to find in the style of a "hidden object" puzzle. Original photo by Lachlan Donald, https://www.flickr.com/photos/lox/9408028555

Hiding in plain sight - how close are we to mapping ALL 🧬enhancers🧬 in the genome?

Our new paper by Mannion et al. takes a systematic look at "hidden enhancers" and why they remain so hard to find. With @mosterwalder.bsky.social, @jlopezrios.bsky.social & many more

www.nature.com/articles/s41...

08.08.2025 18:09 β€” πŸ‘ 45    πŸ” 20    πŸ’¬ 2    πŸ“Œ 2
Preview
rbio1-training scientific reasoning LLMs with biological world models as soft verifiers Reasoning Models are typically trained against verification mechanisms in formally specified systems such as code or symbolic math. However, in open domains like biology, we do not generally have acce...

Check out the pre-print here www.biorxiv.org/content/10.1.... Not sure if the other authors beyond @tkaraletsos.bsky.social are on bsky #CellBiology #AI #GeneSky #genomics #VirtualCell

25.08.2025 00:41 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

One super pedantic minor ontological pet peeve is the use of the term "simulation", since that leads me to expect a agent-based or physics-style simulation of cell perturbations. But in fact this pattern could be used for those too! And I guess the terminological horse has long bolted here..

25.08.2025 00:41 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
two dalek robots are standing next to each other in a room with the words 76totterslane above them Alt: Dalek meme: Daleks saying "EX-PLAIN" (alluding to use of technique to make foundation models, the "daleks", more explainable)

But of course rBio is very cool independent of my nerdy obsession with FMs using ontologies/KGs! This general distillation pattern is likely to be very useful for integrating knowledge with the weights in massive omics FMs..

25.08.2025 00:41 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

For another use of ontologies in genomic foundation models, see the recent AlphaGenome paper bsky.app/profile/cmun...

25.08.2025 00:41 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
yoda from star wars is smoking a cigarette and says `` teach you i will '' Alt: Yoda meme: "Teach you I will". Alluding to using the ontology as a "teacher" in the RL loop

Aside: I find that too many "defenses" of ontologies/KGs in the face of genAI fall back on a kind of GraphRAG use case, where the ontology/KG is used as some kind of bullwark against hallucination. Valid... but they can do so much more! Using as teacher in RL-loop on reasoner traces is v cool!

25.08.2025 00:41 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Table 1. Verifiers used during RL training and their descriptions, as well as example prompts.
Verifiers: "Exp" is experimental; "MLP" is multi-layer perceptron; "TF" is Transcriptformer; "GO" is
Gene Ontology.

Table 1. Verifiers used during RL training and their descriptions, as well as example prompts. Verifiers: "Exp" is experimental; "MLP" is multi-layer perceptron; "TF" is Transcriptformer; "GO" is Gene Ontology.

In order to fine tune the reasoner model, the authors used three kinds of soft verifiers in the RL loop - experimental (e.g. CRISPRi knockdown), "simulation" (e.g Transcriptformer), and knowledge-based. For knowledge-based, they used GO @geneontology.bsky.social!

25.08.2025 00:41 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Fig 7 from paper - The figure shows three example responses to the query β€œIs a knockdown of ISCA2 in RPE1 cells likely to result in differential expression of CEP295?”, each demonstrating different reasoning strategies.

Basic answer: ISCA2 is linked to cell cycle progression and DNA repair, so its knockdown could affect CEP295 expression, though experimental data would be needed to confirm directionality.

Chain-of-Thought: Provides backgroundβ€”ISCA2 is involved in cell cycle regulation; CEP295 in cilia formation. Knockdown of ISCA2 may influence cell cycle–related genes but there’s no direct evidence connecting it to CEP295 regulation.

Self-aware Chain-of-Thought: Notes ISCA2’s role in autophagy and related processes, but emphasizes that its relationship to CEP295 is indirect. Suggests that literature review would be required for confirmation, and stresses the absence of direct experimental evidence, while acknowledging possible indirect effects.

Overall, all answers converge on the idea that ISCA2 knockdown could plausibly influence CEP295 but highlight the uncertainty and need for direct experimental validation.

Fig 7 from paper - The figure shows three example responses to the query β€œIs a knockdown of ISCA2 in RPE1 cells likely to result in differential expression of CEP295?”, each demonstrating different reasoning strategies. Basic answer: ISCA2 is linked to cell cycle progression and DNA repair, so its knockdown could affect CEP295 expression, though experimental data would be needed to confirm directionality. Chain-of-Thought: Provides backgroundβ€”ISCA2 is involved in cell cycle regulation; CEP295 in cilia formation. Knockdown of ISCA2 may influence cell cycle–related genes but there’s no direct evidence connecting it to CEP295 regulation. Self-aware Chain-of-Thought: Notes ISCA2’s role in autophagy and related processes, but emphasizes that its relationship to CEP295 is indirect. Suggests that literature review would be required for confirmation, and stresses the absence of direct experimental evidence, while acknowledging possible indirect effects. Overall, all answers converge on the idea that ISCA2 knockdown could plausibly influence CEP295 but highlight the uncertainty and need for direct experimental validation.

The applications of this are very interesting, allowing for interrogation in natural language, as well as background reasoning over the wealth of biology in the literature. So you can ask what happens to other genes if you knock down a gene in a cell type, and get a biological explanation

25.08.2025 00:41 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
rBio: Reasoning Model Trained on Virtual Cell Simulations Scientists can ask complex biological questions in plain language and get predictions about gene interactions.

Very exciting to see the research from @cziscience.bsky.social on the rBio distilling a black box foundation model (in this case a "virtual cell" perturbation model) into a smaller reasoner LLM. And it uses ontologies as part of RL! chanzuckerberg.com/blog/rbio-re...

25.08.2025 00:41 β€” πŸ‘ 17    πŸ” 7    πŸ’¬ 1    πŸ“Œ 1

The Alliance webinar for August is this Thursday (Aug 21, noon EDT), on Ontologies and the Alliance, presented by Chris Mungall. You can preregister for the zoom link here forms.gle/GzMnmwK23SzP...; please preregister by midnight EDT Wednesday Aug 20.

18.08.2025 16:51 β€” πŸ‘ 1    πŸ” 3    πŸ’¬ 1    πŸ“Œ 1

I don’t have those details to hand, but this affects the 3 subcontract sites too…

15.08.2025 21:47 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

This is terrible news, not just for fly research, Drosophila is a key model organism that helps us understand shared biological pathways and the systems that underpin many human diseases πŸ’”πŸ’”πŸ’”

15.08.2025 15:37 β€” πŸ‘ 5    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0
FlyBase:Contribute to FlyBase - FlyBase Wiki

FlyBase needs your help! We ask that European labs continue to contribute to Cambridge, UK FlyBase, whereas US and other non-European labs can contribute to US FlyBase. For more information and how to donate: wiki.flybase.org/wiki/FlyBase...

15.08.2025 12:45 β€” πŸ‘ 129    πŸ” 159    πŸ’¬ 3    πŸ“Œ 26

mΓΌltifaceted crΓΌe

14.08.2025 04:27 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Mild frustration and eye rolling against the machine

14.08.2025 04:23 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Chris Mungall: collaborative knowledge graphs in the life sciences Chris Mungall is an expert on building knowledge graphs for the life sciences with a wide variety of scientific collaborators.

@cmungall.bsky.social‬ tackles complex #knowledgeManagement challenges in the life sciences with well-honed collaborative methods and AI-augmented computational tooling, streamlining #ontology creation and #knowledgeGraph building.

knowledgegraphinsights.com/chris-mungall/

05.08.2025 12:57 β€” πŸ‘ 5    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Congratulations to Ruth Lovering from the University College London, UK for winning the Exceptional Contributions to Biocuration, Lifetime Achievement Award 2025. A photo of Ruth is next to the text.

Congratulations to Ruth Lovering from the University College London, UK for winning the Exceptional Contributions to Biocuration, Lifetime Achievement Award 2025. A photo of Ruth is next to the text.

Exceptional Contributions to Biocuration - Lifetime Achievement Award winner: Ruth Lovering
Ruth has contributed extensively to the curation of key resources such as HGNC, Gene Ontology (GO), and IMEx, and has been instrumental in developing curation standards. Ruth is a past chair of the ISB EC.

30.07.2025 16:04 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Congratulations to Kimberly Van Auken from the California Institute of Technology, USA for winning the Exceptional Contributions to Biocuration, Advanced Career Award 2025. A photo of Kimberly is next to the text.

Congratulations to Kimberly Van Auken from the California Institute of Technology, USA for winning the Exceptional Contributions to Biocuration, Advanced Career Award 2025. A photo of Kimberly is next to the text.

Exceptional Contributions to Biocuration - Advanced Career Award winner: Kimberly Van Auken
Kimberly's career reflects expertise, sustained innovation, & dedicated service to community. She's contributed to many projects, including WormBase, the Gene Ontology, & the Alliance of Genome Resources.

30.07.2025 16:04 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Congratulations to Tiago Lubiana from the University of Sao Paulo, Brazil for winning the Exceptional Contributions to Biocuration, Early Career Award 2025. A photo of Tiago is next to the text.

Congratulations to Tiago Lubiana from the University of Sao Paulo, Brazil for winning the Exceptional Contributions to Biocuration, Early Career Award 2025. A photo of Tiago is next to the text.

Exceptional Contributions to Biocuration - Early Career Award winner: Tiago Lubiano.
Tiago's a passionate and motivated scientist interested in linked open data, ontologies, the semantic web, and their application in modeling cells and cell types. He is active in many curation projects & with ISB.

30.07.2025 16:04 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 1    πŸ“Œ 3
Post image

@severaltimes.bsky.social talking about the BioPortal MCP at #BOSC2025 / #BOKR2025 #ISMBECCB2025

22.07.2025 13:30 β€” πŸ‘ 5    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Post image

Nice Chris is raising the topic of AI-assisted coding. There are huge advantages of Agentic AI applications here, but there are also risks (watch this space for more on this topic) #ISMBECCB2025 #BOSC2025

22.07.2025 10:49 β€” πŸ‘ 4    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0

Just as Chris launched into the topic of knowledge censorship, the AI transcription and the slides stopped working. Coincidence?? πŸ€” #BOSC2025

22.07.2025 11:26 β€” πŸ‘ 7    πŸ” 6    πŸ’¬ 1    πŸ“Œ 0
Room at BOSC/BOKR keynote

Room at BOSC/BOKR keynote

A full room during @cmungall.bsky.social's #BOSC2025 / #BOKR2025 keynote talk at #ISMBECCB2025

22.07.2025 11:34 β€” πŸ‘ 5    πŸ” 4    πŸ’¬ 0    πŸ“Œ 0

Looking forward to seeing many of you this week at ISMB, and talking to you about our work on agentic AI and knowledge bases! Doubly honored to be selected for this joint session, the open-bio community has been a huge influence on how I approach ontologies and knowledge base development.

20.07.2025 14:19 β€” πŸ‘ 5    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Post image

And thank you to the ENCODE team who laid the groundwork for this AI work ten years ago, and took such care with annotating using standard ontologies pmc.ncbi.nlm.nih.gov/articles/PMC...

27.06.2025 18:04 β€” πŸ‘ 15    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

πŸ™πŸ™πŸ™Thank you to the AlphaGenome team for clearly pointing out and attributing your usage in the manuscript, and for highlighting this clearly in the -- this helps those of us trying to structure and standardize the data...

27.06.2025 18:04 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
code segment from collab notebook:

output = dna_model.predict_sequence(
    sequence='GATTACA'.center(2048, 'N'),  # Pad to valid sequence length.
    requested_outputs=[dna_client.OutputType.DNASE],
    ontology_terms=['UBERON:0002048'],  # Lung.
)

code segment from collab notebook: output = dna_model.predict_sequence( sequence='GATTACA'.center(2048, 'N'), # Pad to valid sequence length. requested_outputs=[dna_client.OutputType.DNASE], ontology_terms=['UBERON:0002048'], # Lung. )

And the example Colab notebook shows how you can use UBERON terms in API calls to explore tissue specificity. Nice! colab.research.google.com/github/googl...

27.06.2025 18:04 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@cmungall is following 20 prominent accounts