Junhao (Bear) Xiong's Avatar

Junhao (Bear) Xiong

@junhaobearxiong.bsky.social

Machine learning for computational biology. PhD student at Berkeley EECS.

11 Followers  |  15 Following  |  11 Posts  |  Joined: 17.11.2024  |  1.4103

Latest posts by junhaobearxiong.bsky.social on Bluesky

On a personal note, it is at once surreal, gratifying and humbling to be part of a wet-dry colab, I’m so grateful for my collaborators (also great friends) for making it real + keeping it fun! Also thankful our buildings (BAIR and @innovativegenomics.bsky.social) are right next to each other :)

31.05.2025 15:48 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Guide your favorite protein sequence generative model Generative machine learning models on sequences are transforming protein engineering. However, no principled framework exists for conditioning these models on auxiliary information, such as experiment...

Preprint link: arxiv.org/abs/2505.04823
Paper link to β€œUnlocking Guidance for Discrete State-Space Diffusion and Flow Models”: openreview.net/forum?id=Xsg...

31.05.2025 15:45 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

This work is only made possible through an incredible interdisciplinary collaboration between the Listgarten lab and @savagecatsonly.bsky.social . All kudos go to the amazing team that I’m super grateful to be part of: @hnisonoff.bsky.social @marialukarska.bsky.social (and Ishan and Luke)

31.05.2025 15:45 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

The guided library in round 2 showed significantly higher activity than the initial unguided library in the experimental base editing assay.

31.05.2025 15:45 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We didn't just validate in silico - we also synthesized & tested proteins in the lab. We used ProteinGuide to engineer an adenine base editor for high activity: generated 2,000 variants β†’ tested in bacteria β†’ used results to guide 2,000 new designs.

31.05.2025 15:45 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

In our third task, we demonstrate the generality of ProteinGuide beyond amino acid sequences, to structure tokens. In particular, we guide ESM3 to generate backbone structures (as tokens) with specified CATH fold class labels.

31.05.2025 15:45 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

In our second task, we guided ESM3 to re-design enzymes sequences predicted to belong to specific enzyme classes, based on a published classifier, CLEAN, for enzyme commission number.

31.05.2025 15:45 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

In our first task, we guided ProteinMPNN with experimental stability measurements from the @grocklin.bsky.social lab to generate amino acid sequences encoding proteins that are more stable than what ProteinMPNN would do on its own.

31.05.2025 15:45 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

To illustrate the potential of ProteinGuide, we applied it, in silico, to three tasks, using two representative, well-known protein generative models, ProteinMPNN and ESM3. Across these three tasks, we observed that guidance, as expected, led to the desired outcome.

31.05.2025 15:45 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We leverage the fact that MLMs (e.g., ESM3), OA-AR models (e.g., ProteinMPNN), and masking-based diffusion models are actually equivalent. This allows us to leverage our previously-developed guidance methodology for discrete diffusion and flow models for MLMs and OA-AR models.

31.05.2025 15:45 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Guide your favorite protein generative model with experimental data? Meet ProteinGuide - a method to condition pre-trained models on properties without retraining. We validated it both in silico by guiding ProteinMPNN and ESM3 on 3 tasks and in vitro by engineering base editors.

31.05.2025 15:45 β€” πŸ‘ 11    πŸ” 5    πŸ’¬ 1    πŸ“Œ 2

@junhaobearxiong is following 15 prominent accounts