Yun S. Song's Avatar

Yun S. Song

@yun-s-song.bsky.social

Professor of EECS and Statistics at UC Berkeley. Mathematical and computational biologist.

602 Followers  |  114 Following  |  35 Posts  |  Joined: 14.11.2024  |  1.6832

Latest posts by yun-s-song.bsky.social on Bluesky

Post image Post image

(1/4) 🧬 Why Sequence the Genomes of Earth’s Biodiversity?
The Earth BioGenome Project 🌍 is a global network of initiatives working together to create a complete genome library for all Eukaryotic lifeβ€”from mushrooms πŸ„ to mammals 🐘.
#biodiversity #genomes #sequence #earthbiogenome #education #stem

29.07.2025 20:59 β€” πŸ‘ 16    πŸ” 10    πŸ’¬ 1    πŸ“Œ 0
Post image

Germinal center clonal diversity trees as a musical score, a great image to start @victora.bsky.social's CCII seminar, "Replaying germinal center evolution on a quantified affinity landscape"
#GerminalCenter #Immunology
www.ccii.med.kyoto-u.ac.jp/en/event/the...

02.07.2025 02:42 β€” πŸ‘ 17    πŸ” 7    πŸ’¬ 1    πŸ“Œ 1
Preview
In vivo mapping of mutagenesis sensitivity of human enhancers - Nature Human enhancers contain a high density of sequence features that are required for their normal in vivo function.

In vivo mapping of mutagenesis sensitivity of human enhancers

www.nature.com/articles/s41...

18.06.2025 21:20 β€” πŸ‘ 49    πŸ” 19    πŸ’¬ 0    πŸ“Œ 1
Home - ProbGen 2026 Your Site Description

The 2026 Probabilistic Modeling in Genomics (ProbGen) meeting will be held at UC Berkeley, March 25-28, 2026. We have an amazing list of keynote speakers and session chairs:
probgen2026.github.io

Please help spread the news.

06.06.2025 17:52 β€” πŸ‘ 63    πŸ” 35    πŸ’¬ 2    πŸ“Œ 0
Preview
Replaying germinal center evolution on a quantified affinity landscape Darwinian evolution of immunoglobulin genes within germinal centers (GC) underlies the progressive increase in antibody affinity following antigen exposure. Whereas the mechanics of how competition be...

Wanted to highlight our latest preprint--a huge effort by multiple people and labs, but led primarily by @wsdewitt.github.io, Tatsuya Araki, and Ashni Vora, in a very close wet-dry collaboration with @matsen.bsky.social’s lab at the Hutch

www.biorxiv.org/content/10.1...

05.06.2025 14:28 β€” πŸ‘ 66    πŸ” 28    πŸ’¬ 1    πŸ“Œ 5
CRISPRpedia: Chapter 5
CRISPR & Ethics

CRISPRpedia: Chapter 5 CRISPR & Ethics

Check out CRISPRpedia, our resource on all things #CRISPR! The latest chapter is on CRISPR & ethics: innovativegenomics.org/crisprpedia/...

CRISPRpedia features 85+ original illustrations that are free to download & use for non-commercial purposes!

#STEMeducation #STEMed #bioethics #SciArt

04.06.2025 17:13 β€” πŸ‘ 8    πŸ” 2    πŸ’¬ 0    πŸ“Œ 1
Predicting the effect of CRISPR-Cas9-based epigenome editing

How well can deep learning models predict the effect of modifying chromatin on gene expression???

Our work -- led by Sanjit Batra and Alan Cabrera when they were in @yun-s-song.bsky.social ’s and Isaac Hilton’s labs -- tries to answer this.

🧡🧬πŸ§ͺ

elifesciences.org/reviewed-pre...

30.05.2025 02:45 β€” πŸ‘ 14    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0

Thank you, β€ͺAnjali! It was great to see you.

31.05.2025 05:16 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
From Likelihood to Fitness: Improving Variant Effect Prediction in Protein and Genome Language Models Generative models trained on natural sequences are increasingly used to predict the effects of genetic variation, enabling progress in therapeutic design, disease risk prediction, and synthetic biolog...

New preprint in collaboration with @paulinanunezv.bsky.social supervised by @jonnyfrazer.bsky.social and Mafalda Dias – we propose a simple approach to improving zero-shot variant effect prediction in pre-existing protein and genome language models: 🧢 1/n

www.biorxiv.org/content/10.1...

26.05.2025 17:30 β€” πŸ‘ 73    πŸ” 24    πŸ’¬ 1    πŸ“Œ 4

In our Forward-Equivalent model, there are no deaths (i.e., the FE death rate is zero) and the sampling fraction is 1 for the present, so lineages that will not be in the reconstructed tree for the sample do not need to be simulated at all.

23.05.2025 22:36 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

We believe that our work opens new avenues for testing existing inference methods as well as developing new ones (e.g., based on machine learning or Approximate Bayesian Computation, both of which require large amounts of training data).
n/n

23.05.2025 21:02 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
GitHub - songlab-cal/forward-equivalent-trees: Efficiently simulating multi-type birth-death processes via forward-equivalent parameter mapping Efficiently simulating multi-type birth-death processes via forward-equivalent parameter mapping - songlab-cal/forward-equivalent-trees

The Forward-Equivalent model we construct enables much more realistic simulations and benchmark studies. We have implemented our algorithm in a publicly available Python software package. Code available at github.com/songlab-cal/...
8/n

23.05.2025 21:02 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

This enables us to develop an efficient simulation algorithm that scales linearly with the ascertained tree size, independent of the population size, thereby speeding up simulation by orders of magnitude when the full population size is extremely large.
7/n

23.05.2025 21:02 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Specifically, we prove that, for any multi-type BDMS model, there exists a mathematically equivalent model with complete sampling and no death; we call this the Forward-Equivalent (FE) model. Our proof is constructive and hence we can find an explicit equivalent model.
6/n

23.05.2025 21:02 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

When the full population size is in the billions or more, as is the case in many biological settings, the full simulation is intractable. In our work, we derive rigorous theoretical results and develop efficient algorithms to remove the above computational bottleneck.
5/n

23.05.2025 21:02 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Therefore, to simulate an evolutionary tree for a sample, existing methods first simulate the full population tree and then prune unobserved lineages. This approach can be extremely inefficient if the death rate is high or the sampling probability is low.
4/n

23.05.2025 21:02 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Developing and benchmarking inference methods relies on extensive tree simulation, but due to death and incomplete sampling in BDMS models, the tree describing the ancestral relationship of an observed sample represents only a partial history of the full population.
3/n

23.05.2025 21:02 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Multi-type birth-death-mutation-sampling (BDMS) models – a general class of stochastic processes with birth, death, mutation, and incomplete sampling – have a wide variety of applications in evolutionary biology. Phylogenetic trees are central objects in these studies.
2/n

23.05.2025 21:02 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

How can one efficiently simulate phylodynamics for populations with billions of individuals, as is typical in many applications, e.g., viral evolution and cancer genomics? In this work with M. Celentano, @wsdewitt.github.io , & S. Prillo, we provide a solution. doi.org/10.1073/pnas...
1/n

23.05.2025 21:02 β€” πŸ‘ 37    πŸ” 15    πŸ’¬ 1    πŸ“Œ 1
The patient, KJ, reaching out after infusion of the CRISPR therapy, with a big smile! Photo credit Children's Hospital of Philadelphia

The patient, KJ, reaching out after infusion of the CRISPR therapy, with a big smile! Photo credit Children's Hospital of Philadelphia

In a medical breakthrough, a team including IGI’s
@urnov.bsky.social & @giannikopoulosp.bsky.social created an on-demand #CRISPR therapy for an infant with a deadly gene mutation β€” developed, approved, and delivered to the patient in just 6 months.

Read more: ow.ly/G0Bg50VTonC

#RareDisease 🧬

15.05.2025 17:04 β€” πŸ‘ 49    πŸ” 19    πŸ’¬ 0    πŸ“Œ 6
You Can Fix Your DNA... Starting Now (feat. Nobel Prize Winner)
YouTube video by Cleo Abram You Can Fix Your DNA... Starting Now (feat. Nobel Prize Winner)

Jennifer Doudna @jenniferdoudna.bsky.social @doudna-lab.bsky.social speaks with Cleo Abrams on the history and future of #CRISPR 🧬. Watch here: youtu.be/0OXaanDHENI?...

19.05.2025 16:16 β€” πŸ‘ 12    πŸ” 5    πŸ’¬ 0    πŸ“Œ 1

Overfitting is among the conceptually most interesting problems in machine learning.
I am happy of several new phenomena we began to understand with Pierfrancesco Urbani.
Alert: mostly non-rigorous! (Celebrating Jorge Kurchan)
web.stanford.edu/~montanar/OT...

30.04.2025 20:23 β€” πŸ‘ 27    πŸ” 6    πŸ’¬ 1    πŸ“Œ 0
Post image

If you want to check if a human gene has copy-number changes or lands in a complex region, try pangene.bioinweb.org. Recently updated with more and better assemblies.

26.04.2025 01:06 β€” πŸ‘ 44    πŸ” 13    πŸ’¬ 1    πŸ“Œ 0

Congratulations, Graham! Very well deserved, indeed.

25.04.2025 01:10 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Thank you for using my illustration for the cover. I am thrilled and honored!

08.04.2025 16:05 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Thrilled to see my digital art on the cover of Trends Genet. The two binary strings represent reverse-complementary DNA sequences (00=A, 01=C, 10=G, 11=T) and the connecting rectangles represent β€œembeddings” learned by DNA language models. Pls check out our article as well: doi.org/10.1016/j.ti...

07.04.2025 15:01 β€” πŸ‘ 69    πŸ” 13    πŸ’¬ 0    πŸ“Œ 1

An earlier thread on TraitGym by Gonzalo can be found here:
bsky.app/profile/gonz...
n/n

04.03.2025 19:54 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

We think that data curation of enhancers and other functional non-coding elements is an important avenue to explore for improving alignment-free gLMs and our lab has a promising approach to tackle this problem. Please reach out if you are interested. Happy to collaborate!
6/n

04.03.2025 19:54 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 2    πŸ“Œ 0

Overall, we see Evo 2 as a big advancement over previous alignment-free gLMs and should be very useful, especially in the majority of species on earth with no good alignments. Scaling appears very promising when coupled with the right data composition.
5/n

04.03.2025 19:54 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We also find that GPN-Promoter (152M params) performs comparably to Evo 2 40B when evaluating only on promoters, suggesting that if you only care about one region, you need vastly fewer parameters.
4/n

04.03.2025 19:54 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@yun-s-song is following 20 prominent accounts