Jeffrey Pullin's Avatar

Jeffrey Pullin

@jeffreypullin.bsky.social

PhD Student, MRC Biostatistics Unit University of Cambridge Gates Cambridge Scholar Bioinformatics, genetics, single-cell, statistics Australian ๐Ÿ‡ฆ๐Ÿ‡บ

130 Followers  |  260 Following  |  43 Posts  |  Joined: 08.02.2024  |  2.4409

Latest posts by jeffreypullin.bsky.social on Bluesky

Good point! Indeed I just checked and we don't see colocalisation between IFI6 and the GWAS

03.10.2025 15:25 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Makes perfect sense! I think it's super interesting that IFI6 is the regulated gene as it has an antiviral function but is not thought to affect HHV-7

03.10.2025 14:17 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Host control of latent Epstein-Barr virus infection Epstein-Barr virus (EBV) is a herpes virus that infects around 90-95% of the global population, and is associated with numerous autoimmune and neoplastic diseases. EBV persists in B cells as a life-lo...

It's awesome isn't it! There have also been two other recent EBV viral load GWAS: www.medrxiv.org/content/10.1... and www.biorxiv.org/content/10.1...

03.10.2025 14:14 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Great to see this out! Seeing the thumbnail reminded me that SP110 was recently identified as a GWAS hit for HHV7 viral load

02.10.2025 13:49 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Preview
Trans-eQTL mapping prioritises USP18 as a negative regulator of interferon response at a lupus risk locus - Nature Communications | Open Targets Out now in Nature Comms: the largest trans-eQTL meta-analysis in a single cell type! An Open Targets team led by Krista Freimann and Kaur Alasoo analysed 3,734 lymphoblastoid cell line samples across...

Out now in Nature Comms: the largest trans-eQTL meta-analysis in a single cell type!

An Open Targets team led by Krista Freimann and @kauralasoo.bsky.social analysed 3,734 lymphoblastoid cell line samples across nine cohorts, identifying four robust loci

www.nature.com/articles/s41...

02.10.2025 11:58 โ€” ๐Ÿ‘ 7    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Preview
Clinical and genetic spectrum of Fanconi anemia in Australia and New Zealand Fanconi anemia (FA) is a rare genetic condition that predisposes to progressive bone marrow failure, a specific spectrum of malignancies, including heโ€ฆ

I feel incredibly privileged to share this study on Fanconi anaemia, based on a small but important cohort. This work describes the genetics and clinical outcomes of patients in Australia and New Zealand with a diagnosis of FA.

www.sciencedirect.com/science/arti...

11.09.2025 03:50 โ€” ๐Ÿ‘ 7    ๐Ÿ” 5    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Evaluating multi-ancestry genome-wide association methods: Statistical power, population structure, and practical implications Multi-ancestry GWASs enhance discovery in diverse populations, but optimal methods remain debated. Using theory, simulations, and analyses from the UK Biobank and All of Us, we show that pooled analys...

Multi-ancestry GWAS can increase power and precision, but how should we analyze them? Pooled or stratified? We answer that question in a paper out today in AJHG, led by Julie Dias and Haoyu Zhang. 1/7 www.cell.com/ajhg/fulltex...

02.09.2025 15:26 โ€” ๐Ÿ‘ 27    ๐Ÿ” 10    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Preview
Genetic regulation of cell type-specific chromatin accessibility shapes immune function and disease risk Understanding how genetic variation influences gene regulation at the single-cell level is crucial for elucidating the mechanisms underlying complex diseases. However, limited large-scale single-cell multi-omics data have constrained our understanding of the regulatory pathways that link variants to cell type-specific gene expression. Here we present chromatin accessibility profiles from 3.5 million peripheral blood mononuclear cells (PBMCs) across 1,042 donors, generated using single-cell ATAC-seq and multiome (RNA+ATAC) sequencing, with matched whole-genome sequencing, generated as part of the TenK10K program. We characterized 440,996 chromatin peaks across 28 immune cell types and mapped 243,273 chromatin accessibility quantitative trait loci (caQTLs), 60% of which are cell type-specific. Integration with TenK10K scRNA-seq data (5.4 million PBMCs) identified 31,688 candidate cis-regulatory elements colocalized with eQTLs; over half (52.5%) show evidence of causal effects mediated via chromatin accessibility. Integrating caQTLs with GWAS summary statistics for 16 diseases and 44 blood traits uncovered 9.8% - 30.0% more colocalized signals compared with using eQTLs alone, many of which have not been reported in prior studies. We demonstrate cell type-specific mechanisms, such as a regulatory effect on IRGM acting through altered promoter chromatin accessibility in CD8 effector memory T cells but not in naive cells. Using a graph neural network, we inferred peak-to-gene relationships from unpaired multiome data by incorporating caQTL and eQTL signals, achieving up to 80% higher accuracy compared to using paired multiome data without QTL information. This improvement further enhanced gene regulatory network inference, leading to the identification of 128 additional transcription factor (TF)-target gene pairs (a 22% increase). These findings provide an unprecedented single-cell map of chromatin accessibility and genetic variation in human circulating immune cells, establishing a powerful resource for dissecting cell type-specific regulation and advancing our understanding of genetic risk for complex diseases. ### Competing Interest Statement L.C., E.B.D., and K.K.H.F. are employed at Illumina Inc. D.G.M. is a paid advisor to Insitro and GSK, and receives research funding from Google and Microsoft, unrelated to the work described in this manuscript. G.A.F reports grants from National Health and Medical Research Council (Australia), grants from Abbott Diagnostic, Sanofi, Janssen Pharmaceuticals, and NSW Health. G.A.F reports honorarium from CSL, CPC Clinical Research, Sanofi, Boehringer-Ingelheim, Heart Foundation, and Abbott. G.A.F serves as Board Director for the Australian Cardiovascular Alliance (past President), Executive Committee Member for CPC Clinical Research, Founding Director and CMO for Prokardia and Kardiomics, and Executive Committee member for the CAD Frontiers A2D2 Consortium. In addition, G.A.F serves as CMO for the non-profit, CAD Frontiers, with industry partners including, Novartis, Amgen, Siemens Healthineers, ELUCID, Foresite Labs LLC, HeartFlow, Canon, Cleerly, Caristo, Genentech, Artyra, and Bitterroot Bio, Novo Nordisk and Allelica. In addition, G.A.F has the following patents: "Patent Biomarkers and Oxidative Stress" awarded USA May 2017 (US9638699B2) issued to Northern Sydney Local Health District, "Use of P2X7R antagonists in cardiovascular disease" PCT/AU2018/050905 licensed to Prokardia, "Methods for treatment and prevention of vascular disease" PCT/AU2015/000548 issued to The University of Sydney/Northern Sydney Local Health District, "Methods for predicting coronary artery disease" AU202290266 issued to The University of Sydney, and the patent "Novel P2X7 Receptor Antagonists" PCT/AU2022/051400 (23.11.2022), International App No: WO/2023/092175 (01.06.2023), issued to The University of Sydney. ### Funding Statement A.X. is supported by NHMRC Investigator grant 2033018. J.E.P. is supported by NHMRC Investigator grant 2034556, and a Fok Family Fellowship; D.G.M. is supported by an NHMRC investigator grant (2009982). G.A.F. and the BioHEART Study have been supported by NHMRC Investigator Grant, NSW Health Office of Health and Medical Research, and the NSW Health Statewide Biobank scheme. ### Author Declarations I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained. Yes The details of the IRB/oversight body that provided approval or exemption for the research described are given below: The Human Research Ethics Committee of St Vincent's Hospital gave ethical approval for this work. The National Statement on Ethical Conduct in Human Research of the National Health and Medical Research Council gave ethical approval for this work. I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals. Yes I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance). Yes I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable. Yes Raw caQTL summary statistics will be available at Zenodo website prior to acceptance. [https://github.com/powellgenomicslab/tenk10k\_phase1\_multiome][1] [1]: https://github.com/powellgenomicslab/tenk10k_phase1_multiome

New preprint alert: tinyurl.com/tenk10k-multiome. Excited to share our analysis on the impact of genetic variants on single-cell chromatin accessibility in blood, using scATAC-seq and WGS from over 1,000 donors and 3.5M nuclei as part of TenK10K phase 1 ๐Ÿงฌ
๐Ÿงต๐Ÿ‘‡ (1/n)

01.09.2025 11:59 โ€” ๐Ÿ‘ 17    ๐Ÿ” 12    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 2
Preview
Ultra-fast genetic colocalisation across millions of traits Colocalisation is a powerful approach to assess if two genetic association signals are likely to share a causal variant. However, association analyses in large biobanks and molecular quantitative trai...

After 1.5 years of work in @kauralasoo.bsky.socialโ€™s lab, we finally published my preprint! We introduce gpu-coloc, a GPU-accelerated implementation of coloc, show comparability to CLPP and aim to provide practical guidelines. Now accessible on BioRxiv: www.biorxiv.org/content/10.1...

27.08.2025 12:19 โ€” ๐Ÿ‘ 15    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
Preview
Exploiting pleiotropy to enhance variant discovery with functional false discovery rates - Nature Computational Science This study introduces a cost-effective strategy called surrogate functional false discovery rates to increase power in genome-wide association studies by leveraging genetic correlations (or pleiotropy...

Excited to see our (w/ @chr1sw.bsky.social) work published in @natcomputsci.nature.com! We developed a new framework, surrogate functional false discovery rate (sffdr), that integrates summary statistics of related traits to improve power in GWASs.

Paper: www.nature.com/articles/s43...

25.08.2025 17:04 โ€” ๐Ÿ‘ 10    ๐Ÿ” 4    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Thanks for that clarification! So perhaps the challenge is really constructing accurate PGS for binary phenotypes in non-ascertained diseases?

25.08.2025 10:53 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Such cool work! Do you think this work can inform optimal prior structures for trans-eQTL discovery models? That is, suggest the right amount of pooling/shrinkage over genes

22.08.2025 20:16 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Thrilled to share the second half of my PhD work here!

We show how data on expression quantitative trait loci (eQTL) relates to the structure of gene regulatory networks (GRN). Much of the GRN / eQTL picture is unmapped, but what we do have says a lotโ€ฆ (1/)

doi.org/10.1101/2025...

22.08.2025 19:50 โ€” ๐Ÿ‘ 69    ๐Ÿ” 25    ๐Ÿ’ฌ 6    ๐Ÿ“Œ 1

Thanks for those thoughts! Plenty to consider.

18.08.2025 15:10 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Biallelic variants in the non-coding RNA gene RNU4-2 cause a recessive neurodevelopmental syndrome with distinct white matter changes Genetic variants in RNU4-2, which encodes U4, a key non-coding small nuclear RNA (snRNA) component of the major spliceosome, were recently shown to cause a prevalent neurodevelopmental disorder (NDD) ...

I am absolutely delighted to share our work describing a new *recessive* condition caused by variants in #RNU4-2. Yes, that #RNU4-2!

tinyurl.com/3j9r56s8
@rociorius.bsky.social @yuyangchen.bsky.social @gregfindlay.bsky.social @dgmacarthur.bsky.social @cassimons.bsky.social @nickywhiffin.bsky.social

18.08.2025 11:22 โ€” ๐Ÿ‘ 32    ๐Ÿ” 9    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 6

Does anyone have any intuition for why that is? And if they are so hard to construct why are we now countenancing them is high stakes contexts!? 2/2

18.08.2025 13:44 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
LDAK-KVIK performs fast and powerful mixed-model association analysis of quantitative and binary phenotypes - Nature Genetics LDAK-KVIK is a mixed-model association method for genome-wide studies that optimizes computational performance and power. LDAK-KVIK can also perform gene-based tests and produces state-of-the-art poly...

Really enjoying reading this paper describing the new GWAS method LDAK-KVIK - I'm particularly struck by the results describing the "difficulty of constructing accurate PGS for binary phenotypes." 1/2 www.nature.com/articles/s41...

18.08.2025 13:44 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Preview
Higher eQTL power reveals signals that boost GWAS colocalization Expression quantitative trait locus (eQTL) studies in human cohorts typically detect at least one regulatory signal per gene, and have been proposed as a way to explain mechanisms of genetic liability...

Excited to share this preprint from first author Jon Rosen, a postdoctoral fellow in the @klmohlke.bsky.social lab and my lab. We examine eQTL study sample size and how this affects signal discovery and rates of colocalization with GWAS.

www.biorxiv.org/content/10.1...

18.08.2025 12:18 โ€” ๐Ÿ‘ 47    ๐Ÿ” 16    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 0
"Manhattan plot" for DecodeME's principal genome-wide association study (GWAS) showing 6 genome-wide significant associations, and 2 additional signals that are significant in DecodeME's other GWAS.

"Manhattan plot" for DecodeME's principal genome-wide association study (GWAS) showing 6 genome-wide significant associations, and 2 additional signals that are significant in DecodeME's other GWAS.

Initial findings from the DecodeME genome-wide association study of myalgic encephalomyelitis/chronic fatigue syndrome now on medRXiv
www.medrxiv.org/content/10.1...

09.08.2025 08:32 โ€” ๐Ÿ‘ 57    ๐Ÿ” 23    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1

๐Ÿ””Paper alert! Extremely excited to share a preprint from our lab! Spearheaded by @axel-schmidt.bsky.social, a super talented medical & computational geneticist, we studied latent Epstein-Barr virus (EBV) infection at population-scale.

Interested in how this works & what we found? Read along! ๐Ÿ‘‡

22.07.2025 16:10 โ€” ๐Ÿ‘ 22    ๐Ÿ” 12    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 2

Super excited to see this out. What started as some math in a grant in 2020, to a student deciding to take this on in 2022, to published in 2025.

These things can take time and patience is key!

21.07.2025 18:54 โ€” ๐Ÿ‘ 58    ๐Ÿ” 17    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 2

Thanks for those kind words Davis! I caught the eQTL bug in your lab and its great to finally contribute to the field

23.07.2025 08:39 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Unfortunately not yet! This version of quasar does not support cell-level data nor interaction testing, but those are the two biggest features I want to add. The next part of my PhD will likely focus on finer resolution single-cell eQTLs, so watch this space :)

22.07.2025 21:19 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Flexible and efficient count-distribution and mixed-model methods for eQTL mapping with quasar Identifying genetic variants that affect gene expression, expression quantitative trait loci (eQTLs), is a major focus of modern genomics. Today, various methods exist for eQTL mapping, each using dif...

Very excited to share new work from my PhD on a new software package for eQTL mapping: quasar. The quasar software package is a C++ program designed to provide a flexible and efficient eQTL mapping. www.medrxiv.org/content/10.1...

22.07.2025 10:15 โ€” ๐Ÿ‘ 42    ๐Ÿ” 17    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1

Finally a big thanks to @chr1sw.bsky.social for her support throughout this project and we welcome any and all feedback on the software and paper!

22.07.2025 10:15 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

In addition, we provide mathematical intuition for why negative binomial mixed models give very similar results to Poisson mixed models and study the interaction between methods for computing gene-level p-values and FDR methods.

22.07.2025 10:15 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Statistical power of negative binomial and linear model methods across the
OneK1K dataset a) Number of eQTLs detected by the quasar linear model and negative binomial GLM with adjusted profile likelihood dispersion estimation methods across all cell types
in the OneK1K dataset. b) Number of eGenes detected by the quasar linear model and negative
binomial GLM with adjusted profile likelihood dispersion estimation methods across all cell types
in the OneK1K dataset.

Statistical power of negative binomial and linear model methods across the OneK1K dataset a) Number of eQTLs detected by the quasar linear model and negative binomial GLM with adjusted profile likelihood dispersion estimation methods across all cell types in the OneK1K dataset. b) Number of eGenes detected by the quasar linear model and negative binomial GLM with adjusted profile likelihood dispersion estimation methods across all cell types in the OneK1K dataset.

When comparing methods we found that mixed model methods did not have better performance, but that, as previously reported, count distribution methods increased power. Overall we recommend the negative binomial GLM model, using the APL, as the method with the best overall performance.

22.07.2025 10:15 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
 Histograms of Pearson correlation of โˆ’ log10 transformed variant-level p-values for each gene, correlating the output of output
of quasar against that uses the same statistical model (LM: tensorQTL, NB-GLM : jaxQTL, LMM:
apex. All results are computed for the B IN cluster. b) Speed of methods across the three representative cell types. All methods were run on CPUs. Methods are labelled by the options used
to run them: for tensorQTL and jaxQTL โ€˜cisโ€™ computes significance at the level of genes while โ€˜cis
nominalโ€™ computes significance at the level of variants.

Histograms of Pearson correlation of โˆ’ log10 transformed variant-level p-values for each gene, correlating the output of output of quasar against that uses the same statistical model (LM: tensorQTL, NB-GLM : jaxQTL, LMM: apex. All results are computed for the B IN cluster. b) Speed of methods across the three representative cell types. All methods were run on CPUs. Methods are labelled by the options used to run them: for tensorQTL and jaxQTL โ€˜cisโ€™ computes significance at the level of genes while โ€˜cis nominalโ€™ computes significance at the level of variants.

When run on CPUs quasar is quite a bit faster (up to ~40x) than exisiting methods, while producing concordant output when the statistical model aligns.

22.07.2025 10:15 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

We compared quasar to three existing eQTL mapping methods (tensorQTL, jaxQTL and apex) in a pesudobulk analysis of the OneK1K dataset and used the flexibility of quasar to compare different models without confounding by implementation.

22.07.2025 10:15 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Bar charts of number of discoveries across different tools and thresholds in a paper about eQTL mapping

Bar charts of number of discoveries across different tools and thresholds in a paper about eQTL mapping

2. We also show that negative binomial models can fail to appropriately control the Type 1 error, which we fix in quasar by implementing the Cox-Reid adjusted profile likelihood (APL), a core part of edgeR and DESeq2.

22.07.2025 10:15 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

@jeffreypullin is following 20 prominent accounts