Jeremy Parker Yang @jeremyparkeryang

Predicting the effect of CRISPR-Cas9-based epigenome editing

How well can deep learning models predict the effect of modifying chromatin on gene expression???

Our work -- led by Sanjit Batra and Alan Cabrera when they were in @yun-s-song.bsky.social ’s and Isaac Hilton’s labs -- tries to answer this.

🧵🧬🧪

elifesciences.org/reviewed-pre...

30.05.2025 02:45 — 👍 14 🔁 3 💬 1 📌 0

Congrats to Nathaniel and Sri for their exciting work teaching protein language models to generate beyond what evolution has explored. They introduce Reinforcement Learning from eXperimental Feedback (RLXF) to steer generation toward enhanced and non-natural functions
www.biorxiv.org/content/10.1...

08.05.2025 18:25 — 👍 8 🔁 2 💬 0 📌 0

Relationship between perplexity and zero shot performance.

Protein language model likelihood are better zero shot mutation effect predictions when they have perplexity 3-6 on the wildtype sequence.

www.biorxiv.org/content/10.1...

30.04.2025 18:18 — 👍 11 🔁 3 💬 0 📌 0

MPRAbase a Massively Parallel Reporter Assay database An international, peer-reviewed genome sciences journal featuring outstanding original research that offers novel insights into the biology of all organisms

One of the toughest parts of the field of massively parallel reporter assays to measure >~thousands of elements is that there are hundreds of pubs using them, but no central repo to easily locate the results....until now! Great collab w/ Jingjing Zhao, Ilias G-S, and @nadavahituv.bsky.social!!

22.04.2025 18:52 — 👍 24 🔁 9 💬 0 📌 0

Tahoe-100M: A Giga-Scale Single-Cell Perturbation Atlas for Context-Dependent Gene Function and Cellular Modeling Building predictive models of the cell requires systematically mapping how perturbations reshape each cell's state, function, and behavior. Here, we present Tahoe-100M, a giga-scale single-cell atlas ...

@thejohnnyyu.bsky.social, @therealnima.bsky.social, and I, are excited to tell you about Tahoe-100M! The largest publicly available single-cell dataset that measures the effect of 1200 genes on 50 cell line models. The Vevo team has outdone itself. #Tahoe100M www.biorxiv.org/content/10.1...

25.02.2025 13:25 — 👍 81 🔁 34 💬 1 📌 6

Semantic mining of functional de novo genes from a genomic language model Generative genomics models can design increasingly complex biological systems. However, effectively controlling these models to generate novel sequences with desired functions remains a major challeng...

Really enjoyed this creative paper using "DNA prompts" to guide Evo in designing genomic sequences. Curious to see how this research evolves.

www.biorxiv.org/content/10.1...

19.12.2024 03:40 — 👍 4 🔁 0 💬 1 📌 0

EXTRA-seq: a genome-integrated extended massively parallel reporter assay to quantify enhancer-promoter communication Precise control of gene expression is essential for cellular function, but the mechanisms by which enhancers communicate with promoters to coordinate this process are not fully understood. While seque...

Finally out! We present EXTRA-seq, a new EXTended Reporter Assay to quantify endogenous enhancer-promoter communication at kb scale!
www.biorxiv.org/content/10.1...
A 🧵about what it can do:
#SynBio #DeepLearning #GeneRegulation

16.12.2024 14:39 — 👍 83 🔁 34 💬 5 📌 6

Can we bypass the resource bottleneck of pretraining genomic Foundation Models? Our work L2G repurposes language LLMs for genomics via cross-modal transfer, matching fine-tuned genomic FMs. Kudos to Wenduo & fantastic collab w/ @atalwalkar.bsky.social. L2G, language to genome; L2G, life’s too good!

11.12.2024 13:41 — 👍 9 🔁 3 💬 0 📌 1

Screenshot of the paper.

Even as an interpretable ML researcher, I wasn't sure what to make of Mechanistic Interpretability, which seemed to come out of nowhere not too long ago.

But then I found the paper "Mechanistic?" by
@nsaphra.bsky.social and @sarah-nlp.bsky.social, which clarified things.

20.11.2024 08:00 — 👍 232 🔁 28 💬 7 📌 2

Leveraging genomic deep learning models for non-coding variant effect prediction The majority of genetic variants identified in genome-wide association studies of complex traits are non-coding, and characterizing their function remains an important challenge in human genetics. Gen...

Super excited to share our review on genomic deep learning models for non-coding variant effect prediction, with Ayesha Bajwa and Nilah Ioannidis. We’d like this review to be a useful resource, and welcome any feedback, comments, or questions! 1/4

arxiv.org/abs/2411.11158

20.11.2024 01:31 — 👍 34 🔁 13 💬 1 📌 1

Jeremy Parker Yang

Latest posts by jeremyparkeryang.bsky.social on Bluesky

@jeremyparkeryang is following 20 prominent accounts