Sarah Gurev's Avatar

Sarah Gurev

@sarahgurev.bsky.social

Postdoc @ Debbie Marks Lab, Harvard | Prev. PhD @ MIT EECS || ML for Proteins + Viruses ๐Ÿฆ 

88 Followers  |  187 Following  |  13 Posts  |  Joined: 29.09.2023  |  2.1503

Latest posts by sarahgurev.bsky.social on Bluesky

Post image

Large AI models are reported to achieve high accuracy (AUROC) predicting pathogenic variants across the genome.

A preprint reports that the predictions are based on splice variants. Using only this info (no sequences, no AI) achieves AUROC=0.944 across noncoding variants.

1/2

09.09.2025 23:01 โ€” ๐Ÿ‘ 13    ๐Ÿ” 7    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Near real-time data on the human neutralizing antibody landscape to influenza virus to inform vaccine-strain selection in September 2025 The hemagglutinin of human influenza virus evolves rapidly to erode neutralizing antibody immunity. Twice per year, new vaccine strains are selected with the goal of providing maximum protection again...

In new study led by @ckikawa.bsky.social, we provide near real-time data on human neutralizing antibody landscape to influenza by measuring ~26,000 titers to >100 recent viral strains

Data can inform vaccine selection & evolutionary/epidemiological modeling
www.biorxiv.org/content/10.1...

08.09.2025 21:48 โ€” ๐Ÿ‘ 56    ๐Ÿ” 30    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Preview
Recent advances in the inference of deep viral evolutionary history | Journal of Virology Phylogenetic studies examining the origins, emergence, and spread of viruses have arguably been one of the most active and successful areas of evolutionary biology and form the bedrock of the flourishing field of genomic epidemiology. This, in part, reflects the ability of viruses, particularly those with RNA genomes, to evolve at rates much greater than their cellular counterparts (1). The rapid rate at which viruses evolve and accumulate mutations enables evolutionary signals to be identified through comparative genomics at short timescales relevant for outbreak investigation and response. The integration of phylogenetics and epidemiology, known as phylodynamics, has become a vital tool in response to numerous viral outbreaks, epidemics, and pandemics, including Ebola (2), Zika (3), and, more recently, COVID-19 (4) and mpox (5).

Thereโ€™s been a bunch of new approaches looking at deep viral evolutionary history. Weโ€™ve put together a mini review highlighting some recent advancements in structural phylogenetics and time-dependent rate models and what they could do for the field ๐Ÿฆ 
๐Ÿ”— journals.asm.org/doi/full/10....

25.08.2025 20:32 โ€” ๐Ÿ‘ 25    ๐Ÿ” 13    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 2
Preview
Divergent viral phosphodiesterases for immune signaling evasion Cyclic dinucleotides (CDNs) and other short oligonucleotides play fundamental roles in immune system activation in organisms ranging from bacteria to humans. In response, viruses use phosphodiesterase...

Excited to share our new preprint co-led by @jnoms.bsky.social!

Here we reveal an exceptional diversity of viral 2H phosphodiesterases (PDEs) that enable immune evasion by selectively degrading oligonucleotide-based messengers. This 2H PDE fold has evolved striking substrate breath & specificity.

22.08.2025 19:02 โ€” ๐Ÿ‘ 42    ๐Ÿ” 28    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 2

Thanks!

20.08.2025 20:50 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Variant effect prediction with reliability estimation across priority viruses Viruses pose a significant threat to global health due to their rapid evolution, adaptability, and increasing potential for cross-species transmission. While advances in machine learning and the growi...

๐Ÿฆ The future of pathogen forecasting needs rigorous benchmarks and domain-specific modeling, not only bigger PLMs. EVEREST is a step in that direction.

๐Ÿ”—Paper: biorxiv.org/content/10.1...
๐Ÿ’ปCode + data: github.com/debbiemarksl...
12/12

17.08.2025 03:42 โ€” ๐Ÿ‘ 5    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

๐Ÿ™Amazing collaboration co-led with Noor Youssef
and Navami Jain, @deboramarks.bsky.social, and our funders @cepi.net!
11/12

17.08.2025 03:42 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

This matters for:
โš ๏ธ Future-proof vaccine and therapeutics design
โš ๏ธ Monitoring of high-pandemic risk viruses
โš ๏ธ Dual-use biosecurity risk assessment

Without reliable models, we risk underestimating viral evolutionโ€”and overestimating our ability to counter it.
10/12

17.08.2025 03:42 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

EVEREST highlights:
โœ… Where models failโ€”and why
โœ… Which viruses are least/most predictable
โœ… How to estimate per-protein, model-specific reliability
โœ… Concrete steps to improve ML for viral mutation prediction
9/12

17.08.2025 03:42 โ€” ๐Ÿ‘ 6    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Post image

๐ŸŒCurrent models fail to reliably predict mutations in more than half of the high-priority viruses identified by the WHO.
8/12

17.08.2025 03:42 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿ’ชIs bigger always better? Maybe not for other taxa but for viruses - yes! For viruses, models continue to improve with increased numbers of parameters.
7/12

17.08.2025 03:42 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐ŸคWhy? Viruses are severely underrepresented in training datasets (<1%) and are further downsampled after common clustering approaches.
6/12

17.08.2025 03:42 โ€” ๐Ÿ‘ 8    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿ“‰Despite the hype, protein language models trained across the โ€œprotein universeโ€ are outperformed by even the simplest, site-independent alignment-based model.
5/12

17.08.2025 03:42 โ€” ๐Ÿ‘ 13    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

๐Ÿ’ญImagine: Itโ€™s Day 0 of an outbreak and thereโ€™s little experiment data. Computational mutational effect predictions could provide valuable informationโ€ฆif we could trust them. Can we?

EVEREST doesnโ€™t just assess performance. It also quantifies reliability for new viruses.
4/12

17.08.2025 03:42 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

๐Ÿš€To find out, we built EVEREST: Evolutionary Variant Effect prediction with Reliability ESTimation.

We benchmark models across 45 viral deep mutational scanning datasets spanning >340,000 mutations.
3/12

17.08.2025 03:42 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

๐Ÿฆ  Protein language models (PLMs) have shown impressive performance in predicting mutation effects. But... viruses are a different beast.

They evolve fast, cross species, and are under pressure from host immunity. Do PLMs still work here?
2/12

17.08.2025 03:42 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐ŸšจNew paper ๐Ÿšจ

Can protein language models help us fight viral outbreaks? Not yet. Hereโ€™s why ๐Ÿงต๐Ÿ‘‡
1/12

17.08.2025 03:42 โ€” ๐Ÿ‘ 42    ๐Ÿ” 19    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 0
Preview
Protein Structure Informed Bacteriophage Genome Annotation with Phold Bacteriophage (phage) genome annotation is essential for understanding their functional potential and suitability for use as therapeutic agents. Here we introduce Phold, an annotation framework utilis...

Stoked to finally have a preprint out for Phold, our tool that uses protein structural information to enhance phage genome annotation #phagesky 1/n

www.biorxiv.org/content/10.1...

08.08.2025 07:10 โ€” ๐Ÿ‘ 131    ๐Ÿ” 65    ๐Ÿ’ฌ 5    ๐Ÿ“Œ 3
Preview
Scaling down protein language modeling with MSA Pairformer Recent efforts in protein language modeling have focused on scaling single-sequence models and their training data, requiring vast compute resources that limit accessibility. Although models that use ...

Excited to share work with
Zhidian Zhang, @milot.bsky.social, @martinsteinegger.bsky.social, and @sokrypton.org
biorxiv.org/content/10.1...
TLDR: We introduce MSA Pairformer, a 111M parameter protein language model that challenges the scaling paradigm in self-supervised protein language modeling๐Ÿงต

05.08.2025 06:29 โ€” ๐Ÿ‘ 94    ๐Ÿ” 43    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Preview
Pathoplexus | Pathoplexus July Update Pathoplexus is a new, open-source database dedicated to the efficient sharing of human viral pathogen genomic data, fostering global collaboration and public health response.

Some great new features and updates from the awesome Pathoplexus project. This is a new open pathogen genome database that can provide access to your sequences under a use-restricted license but also feed directly in to INSDC (EBI, Genbank etc) when you are ready. pathoplexus.org/news/2025-07...

15.07.2025 14:05 โ€” ๐Ÿ‘ 45    ๐Ÿ” 24    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Post image

๐Ÿšจ New paper ๐Ÿšจ RNA modeling just got its own Gym! ๐Ÿ‹๏ธ Introducing RNAGym, large-scale benchmarks for RNA fitness and structure prediction.
๐Ÿงต 1/9

18.06.2025 19:35 โ€” ๐Ÿ‘ 40    ๐Ÿ” 16    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Post image Post image

End-to-end differentiable homology search for protein fitness prediction.

@yaringal.bsky.social @deboramarks.bsky.social @pascalnotin.bsky.social

arxiv.org/abs/2506.089...

11.06.2025 19:00 โ€” ๐Ÿ‘ 32    ๐Ÿ” 9    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Hello everyone! I am pleased to share information on the first ever Computational Structural Virology Symposium, conducted August 4th on zoom and highlighting work in this emerging field. You can register for this event here: forms.gle/CNiqskMwQEuV.... Please re-post!

12.06.2025 20:31 โ€” ๐Ÿ‘ 68    ๐Ÿ” 52    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 6

@sarahgurev is following 20 prominent accounts