Stephen Burgess's Avatar

Stephen Burgess

@stevesphd.bsky.social

Medical statistician, work with genetic data to disentangle causation from correlation. Author of book on Mendelian randomization.

695 Followers  |  169 Following  |  105 Posts  |  Joined: 16.11.2024  |  2.3399

Latest posts by stevesphd.bsky.social on Bluesky

Thanks to Janne for leading this, and the team at @FinnGen_FI led by @johanneskettune for allowing us to perform bespoke analyses in their cohort!

07.10.2025 08:59 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Negative studies are difficult to interpret (and publish) - there are legitimate reasons why the result may not replicate in a different study population. However, we did not see encouraging evidence from our attempted replication analysis.

07.10.2025 08:59 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

While we cannot rule out low power, we did not find any associations between PCSK9 variants and breast cancer survival in datasets other than the original Cell paper. In contrast, variants in the HMGCR gene were associated with breast cancer survival.

07.10.2025 08:59 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

For the BCAC data, we weren't able to replicate the original analysis exactly - we couldn't restrict to older women, or those with Stage 2/3 cancer, or consider a recessive model. For FinnGen, we were able to replicate the original analysis exactly - but the sample size was much lower.

07.10.2025 08:59 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We did not find replication of their finding in any analysis using published consortium (BCAC) data from Morra et al on 91,686 breast cancer cases with 7531 breast cancer-specific deaths, or in FinnGen (4648 breast cancer patients).

07.10.2025 08:59 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

There may be good reasons for this, but from a purely statistical point of view, reporting associations for a single SNP under a non-additive model and restricted participant eligibility raises concerns of possible selective reporting.

07.10.2025 08:59 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We were curious why they did not examine genetic associations in large publicly-available datasets on breast cancer survival. Additionally, they considered associations using a recessive allele model, and limited to women over 50 with Stage 2 or 3 breast cancer.

07.10.2025 08:59 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
A commonly inherited human PCSK9 germline variant drives breast cancer metastasis via LRP1 receptor - PubMed Identifying patients at risk for metastatic relapse is a critical medical need. We identified a common missense germline variant in proprotein convertase subtilisin/kexin type 9 (PCSK9) (rs562556, V47...

A recent Cell paper (pubmed.ncbi.nlm.nih.gov/39657676/) reported links between PCSK9 and breast cancer metastasis using a variety of approaches, including genetic associations - however, associations were estimated in small samples (n=1456).

07.10.2025 08:59 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
PCSK9 and breast cancer survival: a Mendelian Randomization study Background: Proprotein convertase subtilisin/kexin type 9 (PCSK9) is well known for its causal effects on the lipid metabolism. A recent study identified an association between rs562556 within PCSK9 a...

New pre-print: "PCSK9 and breast cancer survival: a Mendelian Randomization study" www.medrxiv.org/content/10.1... led by Janne Pott. Brief thread:

07.10.2025 08:59 β€” πŸ‘ 13    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Preview
PSI The community dedicated to leading and promoting the use of statistics within the healthcare industry for the benefit of patients.

Full recording of the PSI webinar by myself and @jack_bowdenjack on instrumental variable methods is available online: psiweb.org/vod/item/efs...

06.10.2025 09:34 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
Variable information across SNPs in GWAS data can cause false rejections of colocalisation which can be resolved by proportional colocalisation tests Fine-mapping is now a standard post-GWAS analysis, but it has been shown to be potentially inaccurate for large meta-analysis GWAS. We show how this can be caused by variable amounts of statistical in...

A pre-print for Chris Wallace's colocPropTest method for proportional colocalization is available online: www.biorxiv.org/content/10.1... @chr1sw.bsky.social

23.09.2025 12:41 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1

In conclusion, we need to be cautious when using population-based biobanks for investigating rare diseases. Case definitions should be developed that do not only rely on hospital episode statistics and ICD code records.

23.09.2025 12:37 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Venn diagram

Venn diagram

4) PAH prelevance in the All of US dataset based on electronic health records is far higher than expected, and many identified "cases" do not have corresponding medication prescription consistent with PAH.

23.09.2025 12:37 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

3) MR investigations for the effect of similar (but aetiologically distinct) conditions on PAH risk demonstrate effects in the population-based biobanks, but not in the clinically-validated dataset. This suggests that the population-based biobanks suffer from case contamination.

23.09.2025 12:37 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

2) GWAS hits from population-based biobanks do not validate in the clinically-validated dataset, even accounting for lower power. These hits also do not have biological support.

23.09.2025 12:37 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

In this work, we show: 1) GWAS hits from a clinically-validated dataset for PAH with biological support do not validate in population-based biobanks, despite the larger sample size and more "cases".

23.09.2025 12:37 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

For common diseases, the answer may be yes. But for rare diseases such as pulmonary arterial hypertension (PAH, ~50 cases per million), even a small fraction of misclassified cases (0.01%) can contaminate results.

23.09.2025 12:37 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Population-based biobanks allow epidemiological analyses to be performed in large sample sizes, including genome-wide association studies (GWAS). These have taken over from smaller disease-specific cohorts, often constructed using clinically-validated outcome data. Is bigger better?

23.09.2025 12:37 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
A cautionary note on the naive use of general-population biobanks to study pulmonary arterial hypertension, with a focus on Mendelian randomization - PubMed A cautionary note on the naive use of general-population biobanks to study pulmonary arterial hypertension, with a focus on Mendelian randomization

New manuscript led by @BarWoolf "A cautionary note on the naive use of general-population biobanks to study pulmonary arterial hypertension" is now published at Euro Respiratory Journal @ERSpublications: pubmed.ncbi.nlm.nih.gov/40967765/. Summary follows:

23.09.2025 12:37 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Thanks to @amymariemason and @BarWoolf
for working on this together, and to @ChatGPTapp
for helping to get the ball rolling with the writing, even if we overruled you in many places!

22.07.2025 11:18 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

...but the initial text needed a lot of work - it struggled to synthesize the ideas, and the structure was not great. Maybe a better prompt? Some of the ideas we seeded in the prompt ended up less important in the eventual submission.

22.07.2025 11:18 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

But it did cut down the overall writing time - I would estimate by around 50%. This is a topic that has been in my head for several years, and I don't think I would have got round to writing it otherwise. It was much better at writing the abstract and cover letter...

22.07.2025 11:18 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

To be honest, I was a bit disappointed with the draft - in particular, the simulation study was incorrect and quite limited in scope (we hoped it would do well with this). We ended up re-writing large chunks of text, although some vestiges remain in the final submission.

22.07.2025 11:18 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

A subtext to this work is that it is the first manuscript I've written where the first draft was generated by ChatGPT - we used the Deep Research function. The AI prompt is in the appendix, and we will share the full machine-written draft (pre-edits) with the community.

22.07.2025 11:18 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

...as for context stratification, the subgroups differ based on other factors by definition - as they come from different centres. In conclusion, the idea may work in some cases, but even when it does, it is somewhat limited in scope and interpretation.

22.07.2025 11:18 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Additionally, differences in centre-stratified estimates may occur for a variety of reasons, including non-linearity, but also other differences between centres. The same is true for other stratification methods, but potentially worse for context stratification...

22.07.2025 11:18 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Figure 2

Figure 2

However, the separation between mean exposure levels in centres is far less than between subgroups defined by the residual-based or doubly-ranked method, allowing us to consider non-linearity over a much narrower range.

22.07.2025 11:18 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Figure 1 right panel

Figure 1 right panel

We can perform MR analyses in each centre, obtaining context-stratified MR estimates that can be analysed using a heterogeneity test or trend test (i.e. meta-regression).

22.07.2025 11:18 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Table 2 from manuscript

Table 2 from manuscript

An alternative is to stratify on existing structure in the data, such as recruitment centres. For instance, in UK Biobank, average vitamin D levels differ across centres - higher in the south-west, lower in Scotland.

22.07.2025 11:18 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

A naive approach is to stratify on the exposure directly. But this induces bias, as the exposure is a collider of the IV and exposure-outcome confounders. Alternative approaches can work (residual-based and doubly-ranked methods), but rely on untestable assumptions.

22.07.2025 11:18 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@stevesphd is following 20 prominent accounts