Jeffrey Pullin's Avatar

Jeffrey Pullin

@jeffreypullin.bsky.social

PhD Student, MRC Biostatistics Unit University of Cambridge Gates Cambridge Scholar Bioinformatics, genetics, single-cell, statistics Australian πŸ‡¦πŸ‡Ί

113 Followers  |  220 Following  |  34 Posts  |  Joined: 08.02.2024  |  2.2273

Latest posts by jeffreypullin.bsky.social on Bluesky

πŸ””Paper alert! Extremely excited to share a preprint from our lab! Spearheaded by @axel-schmidt.bsky.social, a super talented medical & computational geneticist, we studied latent Epstein-Barr virus (EBV) infection at population-scale.

Interested in how this works & what we found? Read along! πŸ‘‡

22.07.2025 16:10 β€” πŸ‘ 20    πŸ” 12    πŸ’¬ 1    πŸ“Œ 2

Super excited to see this out. What started as some math in a grant in 2020, to a student deciding to take this on in 2022, to published in 2025.

These things can take time and patience is key!

21.07.2025 18:54 β€” πŸ‘ 57    πŸ” 17    πŸ’¬ 3    πŸ“Œ 2

Thanks for those kind words Davis! I caught the eQTL bug in your lab and its great to finally contribute to the field

23.07.2025 08:39 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Unfortunately not yet! This version of quasar does not support cell-level data nor interaction testing, but those are the two biggest features I want to add. The next part of my PhD will likely focus on finer resolution single-cell eQTLs, so watch this space :)

22.07.2025 21:19 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Flexible and efficient count-distribution and mixed-model methods for eQTL mapping with quasar Identifying genetic variants that affect gene expression, expression quantitative trait loci (eQTLs), is a major focus of modern genomics. Today, various methods exist for eQTL mapping, each using dif...

Very excited to share new work from my PhD on a new software package for eQTL mapping: quasar. The quasar software package is a C++ program designed to provide a flexible and efficient eQTL mapping. www.medrxiv.org/content/10.1...

22.07.2025 10:15 β€” πŸ‘ 40    πŸ” 17    πŸ’¬ 2    πŸ“Œ 1

Finally a big thanks to @chr1sw.bsky.social for her support throughout this project and we welcome any and all feedback on the software and paper!

22.07.2025 10:15 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

In addition, we provide mathematical intuition for why negative binomial mixed models give very similar results to Poisson mixed models and study the interaction between methods for computing gene-level p-values and FDR methods.

22.07.2025 10:15 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Statistical power of negative binomial and linear model methods across the
OneK1K dataset a) Number of eQTLs detected by the quasar linear model and negative binomial GLM with adjusted profile likelihood dispersion estimation methods across all cell types
in the OneK1K dataset. b) Number of eGenes detected by the quasar linear model and negative
binomial GLM with adjusted profile likelihood dispersion estimation methods across all cell types
in the OneK1K dataset.

Statistical power of negative binomial and linear model methods across the OneK1K dataset a) Number of eQTLs detected by the quasar linear model and negative binomial GLM with adjusted profile likelihood dispersion estimation methods across all cell types in the OneK1K dataset. b) Number of eGenes detected by the quasar linear model and negative binomial GLM with adjusted profile likelihood dispersion estimation methods across all cell types in the OneK1K dataset.

When comparing methods we found that mixed model methods did not have better performance, but that, as previously reported, count distribution methods increased power. Overall we recommend the negative binomial GLM model, using the APL, as the method with the best overall performance.

22.07.2025 10:15 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
 Histograms of Pearson correlation of βˆ’ log10 transformed variant-level p-values for each gene, correlating the output of output
of quasar against that uses the same statistical model (LM: tensorQTL, NB-GLM : jaxQTL, LMM:
apex. All results are computed for the B IN cluster. b) Speed of methods across the three representative cell types. All methods were run on CPUs. Methods are labelled by the options used
to run them: for tensorQTL and jaxQTL β€˜cis’ computes significance at the level of genes while β€˜cis
nominal’ computes significance at the level of variants.

Histograms of Pearson correlation of βˆ’ log10 transformed variant-level p-values for each gene, correlating the output of output of quasar against that uses the same statistical model (LM: tensorQTL, NB-GLM : jaxQTL, LMM: apex. All results are computed for the B IN cluster. b) Speed of methods across the three representative cell types. All methods were run on CPUs. Methods are labelled by the options used to run them: for tensorQTL and jaxQTL β€˜cis’ computes significance at the level of genes while β€˜cis nominal’ computes significance at the level of variants.

When run on CPUs quasar is quite a bit faster (up to ~40x) than exisiting methods, while producing concordant output when the statistical model aligns.

22.07.2025 10:15 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We compared quasar to three existing eQTL mapping methods (tensorQTL, jaxQTL and apex) in a pesudobulk analysis of the OneK1K dataset and used the flexibility of quasar to compare different models without confounding by implementation.

22.07.2025 10:15 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Bar charts of number of discoveries across different tools and thresholds in a paper about eQTL mapping

Bar charts of number of discoveries across different tools and thresholds in a paper about eQTL mapping

2. We also show that negative binomial models can fail to appropriately control the Type 1 error, which we fix in quasar by implementing the Cox-Reid adjusted profile likelihood (APL), a core part of edgeR and DESeq2.

22.07.2025 10:15 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

1. In mixed models a recurring challenge has been how to approximate the (very slow!) calculation of the score test variance. We introduce and implement a trace-based approx, which can be computed in O(n) time in LMMs. Our derivation also clarifies the effectiveness of the approx used in regenie.

22.07.2025 10:15 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Compared to other eQTL mapping methods, quasar implements a much wider variety of statistical models: the linear model, Poisson and negative binomial GLMs, the linear mixed model and Poisson and negative binomial GLMMs. Beyond this versatility, quasar has two pieces of novel methodology:

22.07.2025 10:15 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Flexible and efficient count-distribution and mixed-model methods for eQTL mapping with quasar Identifying genetic variants that affect gene expression, expression quantitative trait loci (eQTLs), is a major focus of modern genomics. Today, various methods exist for eQTL mapping, each using dif...

Very excited to share new work from my PhD on a new software package for eQTL mapping: quasar. The quasar software package is a C++ program designed to provide a flexible and efficient eQTL mapping. www.medrxiv.org/content/10.1...

22.07.2025 10:15 β€” πŸ‘ 40    πŸ” 17    πŸ’¬ 2    πŸ“Œ 1
Preview
Borzoi-informed fine mapping improves causal variant prioritization in complex trait GWAS Genome-wide association studies (GWAS) have identified thousands of trait-associated loci. Prioritizing causal variants within these loci is critical for characterizing trait biology. Statistical fine...

I'm excited to share work on a research direction my team has been advancing: connecting machine learning derived genetic variant embeddings to downstream tasks in human genetics. This work was led by the amazing Divyanshi Srivastava! www.biorxiv.org/content/10.1...

21.07.2025 14:50 β€” πŸ‘ 32    πŸ” 15    πŸ’¬ 2    πŸ“Œ 0
Post image

🚨New preprint just dropped 🚨
medrxiv.org/content/10.1101/2025.06.24.25330216
The main output from my PhD is finally public and we’re SUPER excited about the findings! If you’re interested in what we learnt about IBD with a massive 700+ sample sc-eQTL dataset of the gut, read on!

08.07.2025 08:51 β€” πŸ‘ 37    πŸ” 14    πŸ’¬ 1    πŸ“Œ 2

I'd be interested in why that is, especially as I've spent the last ~6 months implementing a different part of the edgeR machinery in a mixed model context

20.06.2025 15:32 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
GitHub - jeffreypullin/qtl-zoo: A glossary/zoo of QTL types! A glossary/zoo of QTL types! Contribute to jeffreypullin/qtl-zoo development by creating an account on GitHub.

Finally also adding pcQTLs to my "QTL Zoo" github.com/jeffreypulli...!

20.06.2025 11:46 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

That's true! I think the approach still has a lot of value though to uncover the right weights/variables for each cluster, it would just be interesting if you can recover the pc-eQTLs as sum/difference QTLs.

11.06.2025 06:53 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Variant-specific priors clarify colocalisation analysis Author summary Evaluating whether two traits, such as disease risk and gene expression, are affected by the same genetic variants is crucial for understanding the molecular mechanisms through which ge...

Very happy that my first PhD paper is now out in PLOS Genetics! journals.plos.org/plosgenetics.... We describe our implementation of variant-specific priors in coloc. We show that using distance to the TSS as information about which variants are causal can improve colocalisation performance, 1/n

09.06.2025 10:45 β€” πŸ‘ 23    πŸ” 8    πŸ’¬ 1    πŸ“Œ 1

I agree completely and you're right that what I said only holds exactly when the variables are positively correlated. I guess I'm really wondering whether the 2 cluster cases can also be interpreted as sum-expression or difference-expression QTLs in many cases.

10.06.2025 10:35 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

PC2 QTLs would also be found as QTLs for the difference of the gene expression levels. Then they could be simply interpreted as tuning the difference in gene expression levels. Thanks!

09.06.2025 20:59 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Such a cool paper! In the case of a cluster of two genes, simulations suggest (shorturl.at/Pbh8Z) that the two PCs are going to be highly correlated with the sum and difference of the expression levels. I wonder in your second two gene-example (NLRC3, CLUAP1) whether the 1/n

09.06.2025 20:59 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Thanks for those examples, super interesting!

09.06.2025 20:24 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Interesting! Do you have a strong prior about whether the 'causal path' normally goes through one of the regulated genes, or through all/most of them?

09.06.2025 19:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 3    πŸ“Œ 0

Can you use variant level information in colocalisation? Yes! Will it improve accuracy on average? Yes! Will it make a substantial difference? Not using any information we could think of.

Very nice work by @jeffreypullin.bsky.social to adapt coloc to enable these questions to be addressed.

09.06.2025 15:58 β€” πŸ‘ 9    πŸ” 5    πŸ’¬ 1    πŸ“Œ 0
Standard methods are equivalent to a flashlight, looking at each gene independently. We combine signals from multiple genes, turning a floodlight onto the genome.

Standard methods are equivalent to a flashlight, looking at each gene independently. We combine signals from multiple genes, turning a floodlight onto the genome.

Excited to share my first PhD paper in the @sbmontgom.bsky.social lab with @tamigj.bsky.social (www.biorxiv.org/content/10.1...)! Standard QTL methods treat each gene independently. But what if a single variant regulates multiple nearby genes at once - what we call β€œallelic proxitropy”? 🧡 ⬇️

08.06.2025 17:38 β€” πŸ‘ 91    πŸ” 33    πŸ’¬ 6    πŸ“Œ 4

although the improvement wasn't as large as we initially expected. Big thanks to @chr1sw.bsky.social for her support throughout this project!

09.06.2025 10:45 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Variant-specific priors clarify colocalisation analysis Author summary Evaluating whether two traits, such as disease risk and gene expression, are affected by the same genetic variants is crucial for understanding the molecular mechanisms through which ge...

Very happy that my first PhD paper is now out in PLOS Genetics! journals.plos.org/plosgenetics.... We describe our implementation of variant-specific priors in coloc. We show that using distance to the TSS as information about which variants are causal can improve colocalisation performance, 1/n

09.06.2025 10:45 β€” πŸ‘ 23    πŸ” 8    πŸ’¬ 1    πŸ“Œ 1
Preview
Focus on single gene effects limits discovery and interpretation of complex trait-associated variants Standard QTL mapping approaches consider variant effects on a single gene at a time, despite abundant evidence for allelic pleiotropy, where a single variant can affect multiple genes simultaneously. ...

β€œFocus on single gene effects limits discovery and interpretation of
complex trait-associated variants” Very interesting preprint by Kathryn Lawrence @tamigj.bsky.social @sbmontgom.bsky.social
good arguments to move beyond single-gene-at-a-time approaches πŸ§ͺ🧬
www.biorxiv.org/content/10.1...

07.06.2025 10:46 β€” πŸ‘ 43    πŸ” 16    πŸ’¬ 1    πŸ“Œ 0

@jeffreypullin is following 20 prominent accounts