Gherman Novakovsky's Avatar

Gherman Novakovsky

@gnovakovsky.bsky.social

PhD, Illumina AI lab

157 Followers  |  137 Following  |  20 Posts  |  Joined: 08.01.2024  |  2.4369

Latest posts by gnovakovsky.bsky.social on Bluesky

Preview
Research Associate Academic Job Category Faculty Non Bargaining Job Title Research Associate Department Wasserman Laboratory | Department of Medical Genetics | Faculty of Medicine (Wyeth Wasserman) Posting End Date Augu...

I'm hiring a Bioinformatics Research Associate for the Silent Genomes Project. PhD required, restricted to Canadians, work must be performed in British Columbia.

Great for those who love pipelines, whole genome data and work with a social purpose.
ubc.wd10.myworkdayjobs.com/ubcfacultyjo...

22.07.2025 20:42 β€” πŸ‘ 6    πŸ” 6    πŸ’¬ 0    πŸ“Œ 1
Post image

@saramostafavi.bsky.social (@Genentech) & I (@Stanford) r excited to announce co-advised postdoc positions for candidates with deep expertise in ML for bio (especially sequence to function models, causal perturbational models & single cell models). See details below. Pls RT 1/

19.06.2025 20:55 β€” πŸ‘ 55    πŸ” 40    πŸ’¬ 1    πŸ“Œ 3

Yes, that's exactly what it is. Predicting the difference here is important.

10.06.2025 16:40 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

This ensures the model focuses on the actual variant and doesn't overfit to correlated but irrelevant features, which leads to a better generalization.

10.06.2025 06:47 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Certainly! Here the entire model is shared, two copies see inputs that differ only at a single base pair (a variant of interest), and the model weights are tuned to learn the difference in effect size correctly.

10.06.2025 06:47 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Great question! That's our best guess as well and we highlight this in the paper by saying that MPRA experimental data from individual cell lines could have limitations for variant interpretation.

10.06.2025 06:46 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Huge thanks to the amazing Illumina teamβ€”this was an incredible learning experience! I'm excited to keep pushing forward as we develop models to tackle gene expression and non-coding variant interpretation. (16/)

29.05.2025 23:57 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

A complementary thread from my colleague Kishore Jaganathan β€ͺ@kjaganatha.bsky.social‬ bsky.app/profile/kjag... (15/)

29.05.2025 23:57 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Predicting expression-altering promoter mutations with deep learning Only a minority of patients with rare genetic diseases are currently diagnosed by exome sequencing, suggesting that additional unrecognized pathogenic variants may reside in non-coding sequence. Here,...

Want to learn more about PromoterAI?
πŸ“„ Read the paper: science.org/doi/10.1126/...
πŸ’» Explore the code & precomputed scores: github.com/Illumina/Pro.... (14/)

29.05.2025 23:57 β€” πŸ‘ 21    πŸ” 3    πŸ’¬ 1    πŸ“Œ 1
Post image

We followed up by testing promoter variants in Mendelian genes using MPRA. Surprisingly, PromoterAI was more effective than MPRA at prioritizing variants linked to patient phenotypes, highlighting limitations of MPRA for rare disease interpretation. (13/)

29.05.2025 23:57 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

While we noticed that the use of additional species such as mouse does not lead to substantial improvement of variant effect prediction, it does help with ensembling. Thus, the final model is an ensemble of two: trained on human only and trained on mouse+human together. (12/)

29.05.2025 23:57 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

In the Genomics England rare disease cohort, functional promoter variants predicted by PromoterAI were enriched in phenotype-matched Mendelian genes. These variants accounted for an estimated 6% of the rare disease genetic burden. (11/)

29.05.2025 23:57 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

In the UK biobank cohort, PromoterAI's predicted promoter variant effects correlated strongly with measured protein levels and quantitative traits, suggesting that promoter variants contribute meaningfully to phenotypic variation in the general population. (10/)

29.05.2025 23:57 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

PromoterAI's embeddings split promoters into three distinct classes: P1 (~9K genes, ubiquitously active), P2 (~3K genes, bivalent chromatin), E (~6K genes, enhancer-like). The E class, enriched for TATA boxes, may reflect enhancers co-opted as promoters. (9/)

29.05.2025 23:57 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Fine-tuning improved PromoterAI’s ability to predict the direction of motif effects β€” a known issue of multitask models. The model often recognized motifs before fine-tuning, but got the direction wrong. After fine-tuning, its predictions aligned better with the data. (8/)

29.05.2025 23:57 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

We used our list of gene expression outliers to explore their effect on transcription factor binding sites. Our results show that it is easier for new variants to cause outlier gene expression by disrupting existing regulatory components rather than creating new ones. (7/)

29.05.2025 23:57 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We also attempted to fine-tune Enformer and Borzoi on our promoter variant set. While performance improved, both models lagged behind PromoterAI. Notably, PromoterAI outperformed Enformer and was similar to Borzoi before fine-tuning. (6/)

29.05.2025 23:57 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

When it comes to predicting expression effects of promoter variants, PromoterAI achieved best performance across benchmarks spanning RNA, proteins, QTLs, and MPRA. (5/)

29.05.2025 23:57 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

The second step was to fine-tune the model using a carefully curated list of rare promoter variants linked to aberrant gene expression. The fine-tuning was done using a twin-network setup to ensure the generalization across unseen genes and datasets. (4/)

29.05.2025 23:57 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0
Post image

First, we pre-trained PromoterAI to predict histone marks, TF binding, DNA accessibility, and CAGE signal from a genomic sequence. The key difference with models like Enformer and Borzoi is that we predict at a single base-pair resolution and use only TSS-centered regions. (3/)

29.05.2025 23:57 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

PromoterAI is built from transformer-inspired blocks called metaformers β€” but instead of attention, we use depthwise convolutions, making it a fully convolutional model. We believe that CNN-based methods are not surpassed yet and remain a great choice for genomics tasks. (2/)

29.05.2025 23:57 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Excited to share my first contribution here at Illumina! We developed PromoterAI, a deep neural network that accurately identifies non-coding promoter variants that disrupt gene expression.🧡 (1/)

29.05.2025 23:57 β€” πŸ‘ 60    πŸ” 21    πŸ’¬ 1    πŸ“Œ 1
Preview
Decoding Plasticity Regulators and Transition Trajectories in Glioblastoma with Single-cell Multiomics Glioblastoma (GB) is one of the most lethal human cancers, marked by profound intratumoral heterogeneity and near-universal treatment resistance. Cellular plasticity, the capacity of cancer cells to t...

Two massive glioblastoma papers, datasets, trajectories, insights, and.. a very cool new method for GRN inference - scDORI -from @steglelab.bsky.social @oliverstegle.bsky.social @bayraktarlab.bsky.social & Moritz Mall www.biorxiv.org/content/10.1... www.biorxiv.org/content/10.1...

16.05.2025 07:31 β€” πŸ‘ 38    πŸ” 15    πŸ’¬ 1    πŸ“Œ 0
Post image

🧠 Excited to share my main PhD project! We mapped the regulatory rules governing Glioblastoma plasticity using single-cell multi-omics and deep learning. This work is part of a two-paper series with @bayraktarlab.bsky.social @oliverstegle.bsky.social and @moritzmall.bsky.social, Preprint at endπŸ§΅πŸ‘‡

16.05.2025 10:04 β€” πŸ‘ 76    πŸ” 29    πŸ’¬ 1    πŸ“Œ 6
Preview
scPrediXcan integrates deep learning methods and single-cell data into a cell-type-specific transcriptome-wide association study framework Zhou et al. introduce scPrediXcan, a novel transcriptome-wide association study framework that integrates the deep learning-based model ctPred for cell-type-specific expression prediction. Applied to ...

Check out our scPrediXcan paper
www.cell.com/cell-genomic...
Led by the talented @Charles_Zhou12 and supervised by @MengjieChen6
and me, with thanks to many contributors.

scPrediXcan integrates deep learning and single cell expression data into a powerful cell type specific TWAS framework.

14.05.2025 22:41 β€” πŸ‘ 14    πŸ” 6    πŸ’¬ 0    πŸ“Œ 1

So excited to see Yawei's manuscript on long-range MPRAs out! Some really great insights into distal enhancer regulation πŸ™‚

23.04.2025 21:10 β€” πŸ‘ 15    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0
Post image

Day 1 of #RECOMB2025​-SEQ starts with @rayanchikhi.bsky.social and an introduction to Logan, a planetary-scale effort to assemble everything.

Currently cataloguing about 700k virus species!

πŸ“„ www.biorxiv.org/content/10.1...

24.04.2025 01:12 β€” πŸ‘ 20    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0

Our latest work now online in Cell:

Rewriting regulatory DNA to dissect and reprogram gene expression

Our new method (Variant-EFFECTS) uses high-throughput prime editing + flow sorting + sequencing to precisely measure effects of noncoding variants on gene expression

Thread πŸ‘‡

17.04.2025 18:26 β€” πŸ‘ 116    πŸ” 28    πŸ’¬ 1    πŸ“Œ 2
Preview
Shift augmentation improves DNA convolutional neural network indel effect predictions Determining genetic variant effects on molecular phenotypes like gene expression is a task of paramount importance to medical genetics. DNA convolutional neural networks (CNNs) attain state-of-the-art...

⚑️ Our latest preprint is on bioRxiv!

Shift augmentation improves DNA convolutional neural network indel effect predictions

www.biorxiv.org/content/10.1...

16.04.2025 21:13 β€” πŸ‘ 12    πŸ” 5    πŸ’¬ 2    πŸ“Œ 0
Post image

We released our preprint on the CREsted package. CREsted allows for complete modeling of cell type-specific enhancer codes from scATAC-seq data. We demonstrate CREsted’s robust functionality in various species and tissues, and in vivo validate our findings: www.biorxiv.org/content/10.1...

03.04.2025 14:30 β€” πŸ‘ 74    πŸ” 38    πŸ’¬ 1    πŸ“Œ 5

@gnovakovsky is following 20 prominent accounts