Yunha Hwang's Avatar

Yunha Hwang

@microyunha.bsky.social

Building genomic intelligence @ Tatta Bio

1,237 Followers  |  1,111 Following  |  28 Posts  |  Joined: 04.12.2023  |  1.7639

Latest posts by microyunha.bsky.social on Bluesky

Preview
Gaia β€” Tatta Bio

We are building this infrastructure for the scientific community, and we invite feedback and collaboration from researchers at every stage. We are grateful to
the Moore Foundation for their generous support in making this project possible. Stay tuned for more updates!

www.tatta.bio/gaia

02.06.2025 16:23 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
Today's sequence data infrastructure is set up for failure in the age of AI. Building an open and collaborative sequence platform for both Human and AI scientists.

At Tatta Bio, we have been thinking deeply about the sequence-to-function problem. We believe that before AI can power functional prediction, we first need to rethink how we curate, manage, and share sequence data. Here, we share our initial ideas on what we are building next:

02.06.2025 16:23 β€” πŸ‘ 7    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0
Preview
Assemblies of long-read metagenomes suffer from diverse errors Genomes from metagenomes have revolutionised our understanding of microbial diversity, ecology, and evolution, propelling advances in basic science, biomedicine, and biotechnology. Assembly algorithms...

I am very happy (and anxious) to share with you our most recent work in which we evaluated four of the most popular long-read assemblers,

www.biorxiv.org/content/10.1...

and tell you just a little bit about it in the following 🧡

28.04.2025 08:07 β€” πŸ‘ 127    πŸ” 69    πŸ’¬ 5    πŸ“Œ 7

I am so grateful for all the support I received from my mentors, colleagues and collaborators over the years: @pgirguis.bsky.social, @sokrypton.org, @simrouxvirus.bsky.social, @alexjprobst.bsky.social, @annedekas.bsky.social

28.04.2025 14:57 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

It’s been an incredible journey building Tatta Bio with @ancornman1.bsky.social to advance AI infrastructure for biology, and I will continue to further our mission as chief scientist.

28.04.2025 13:47 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

My lab will couple ML and high throughput experimentation to harness the remarkable functional diversity of microbial genomes. If you are excited about the intersection of AI and microbiology, please get in touch!

28.04.2025 13:47 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

It’s official! πŸŽ‰ I’m thrilled to announce that I will be joining MIT as an assistant professor in a shared appointment between Biology, EECS and Schwarzman College of Computing this fall.

28.04.2025 13:47 β€” πŸ‘ 65    πŸ” 3    πŸ’¬ 9    πŸ“Œ 0
Preview
Job Board | Notion Overview

Tatta Bio is growing! We are hiring *two positions* in Business Development and Software Engineering to lead the development of AI-enabled scientific software for open science and biological sequence interpretation. Please check out the job postings at www.tatta.bio/careers and share widely!

24.03.2025 16:29 β€” πŸ‘ 5    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Our thoughts too! (stay tunedπŸ‘€) πŸ˜‰

18.12.2024 22:37 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

As we improve Gaia Agent, we want to hear your feedback on the agent predictions. If you have suggestions on how we can increase its capabilities, please reach out! This was a major collaborative effort with @cong-ml.bsky.social , @joshuakravitz.com @nishantjha.org @ancornman1.bsky.social @Tatta Bio

17.12.2024 13:38 β€” πŸ‘ 7    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
Gaia Agent: Context-Aware Functional Insights at Scale β€” Tatta Bio An AI biologist discovers previously uncharacterized systems in the Mtb genome.

We tested Gaia Agent's capabilities with hypothetical genes in Mycobeterium tuberculosis. In our blog, We detail our in silico validation of Gaia Agent-predicted membrane transporter and lanthipeptide biosynthesis loci that were uncharacterized despite decades of Mtb research. Read more:

17.12.2024 13:38 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Like a human biologist, Gaia Agent considers sequence, structure and genomic context to *think* about functions of novel genes, drastically accelerating our ability to predict functions of billions of unannotated proteins across the tree of life.

17.12.2024 13:38 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Can LLM agents discover novel protein functions? Introducing Gaia Agent 🌎 πŸ€–: an AI biologist capable of reasoning across genomic contexts to predict functions of proteins! Gaia Agent is now integrated with Gaia Search at gaia.tatta.bio

17.12.2024 13:38 β€” πŸ‘ 38    πŸ” 13    πŸ’¬ 2    πŸ“Œ 1
Post image

If you are at #NeurIPS2024 don't miss @ancornman1.bsky.social's talk on OMG/gLM2 at 9AM! @workshopmlsb.bsky.social East meeting room 11,12

15.12.2024 16:21 β€” πŸ‘ 12    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Preview
The OMG dataset: An Open MetaGenomic corpus for mixed-modality genomic language modeling Biological language model performance depends heavily on pretraining data quality, diversity, and size. While metagenomic datasets feature enormous biological diversity, their utilization as pretraini...

Excited to be at #NeurIPS this week. @ancornman1.bsky.social will give a spotlight talk at the @workshopmlsb.bsky.social on gLM2/OMG! Please reach out if you want to chat about gLM2/OMG/Gaia and our latest projectsπŸ˜‡

www.biorxiv.org/content/10.1...

10.12.2024 16:01 β€” πŸ‘ 9    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Preview
MIBiG 4.0: advancing biosynthetic gene cluster curation through global collaboration Abstract. Specialized or secondary metabolites are small molecules of biological origin, often showing potent biological activities with applications in ag

Are you working on natural products? We’ve just released version 4.0 of the MIBiG data standard and repository! It now includes 3059 biosynthetic gene clusters, thanks to the combined efforts of 288 expert contributors. A thread: (1/8) academic.oup.com/nar/advance-...

10.12.2024 08:05 β€” πŸ‘ 92    πŸ” 53    πŸ’¬ 4    πŸ“Œ 12
overview of results for PLAID!

overview of results for PLAID!

1/🧬 Excited to share PLAID, our new approach for co-generating sequence and all-atom protein structures by sampling from the latent space of ESMFold. This requires only sequences during training, which unlocks more data and annotations:

bit.ly/plaid-proteins
🧡

06.12.2024 17:44 β€” πŸ‘ 122    πŸ” 37    πŸ’¬ 1    πŸ“Œ 4

you can search for eukaryotic sequences too, and you might find interesting homology to microbial proteins! (the current database you search against is microbial)

23.11.2024 23:39 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Our Big Fantastic Virus Database (BFVD) is now published NAR! It contains protein structure predictions of major viral clades, enhanced by petabase-scale homology search and it's explorable on the web.
🌐 bfvd.foldseek.com
πŸ’Ύ bfvd.steineggerlab.workers.dev
πŸ“„ academic.oup.com/nar/advance-...

23.11.2024 21:12 β€” πŸ‘ 339    πŸ” 127    πŸ’¬ 6    πŸ“Œ 5

Great question, translation tables 11 and 4 should be covered, and we have seen translation table 15 being accounted for in some cases. @apcamargo.bsky.social

22.11.2024 16:16 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

and we are live on biorxiv! bsky.app/profile/bior...

21.11.2024 22:50 β€” πŸ‘ 8    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Thank you! We are building additional features (e.g. bookmarks, tags, comments), stay tuned for updates!

19.11.2024 19:59 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Great suggestion -- noted!

19.11.2024 18:51 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

We cluster all protein embeddings across all 100 retrieved contexts, and then the top 5 most frequently occurring clusters are colored!

19.11.2024 18:13 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

This is a fantastic resource! Yes this is possible and we plan on expanding our database in the coming months, it would make sense to include AllTheBacteria

19.11.2024 15:39 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

A huge shoutout to nishantjha.bsky.social, Joshua Kravitz, Jacob West-Roberts, apcamargo.bsky.social, simrouxvirus.bsky.social & Andre Cornman for awesome teamwork, & a big TY to those who participated in user interviews. Gaia is in active development so please reach out with ideas or suggestions.

19.11.2024 15:07 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Introducing Gaia: Context-Aware Protein Search Across Genomic Datasets β€” Tatta Bio Gaia is an embedding-based search engine for sequences.

Blog: www.tatta.bio/blog/gaia
gLM2_embed: huggingface.co/tattabio/gLM...
OG_prot_90: huggingface.co/datasets/tat...

19.11.2024 15:07 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Check out our preprint www.tatta.bio/gaia-paper for benchmarking results. In our manuscript, we showcase how Gaia can be used for annotating uncharacterized #phage proteins and discovering putative biosynthetic gene clusters!

19.11.2024 15:07 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Gaia

In order to make this search maximally interpretable, we built a web application that integrates existing tools (HMMer, sequence alignments, ESMFold) with genomic context visualizations. Gaia is freely available on gaia.tatta.bio, please share your feedback!

19.11.2024 15:07 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0
Post image

Gaia searches the protein universe comprising 85M clusters across hundreds of thousands of microbial genomes. Embedding-based search takes ~0.2s per sequence, which is at least two orders of magnitude faster than BLASTp, allowing for real-time search.

19.11.2024 15:07 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@microyunha is following 20 prominent accounts