Stephen Burgess's Avatar

Stephen Burgess

@stevesphd.bsky.social

Medical statistician, work with genetic data to disentangle causation from correlation. Author of book on Mendelian randomization.

712 Followers  |  169 Following  |  137 Posts  |  Joined: 16.11.2024  |  1.9499

Latest posts by stevesphd.bsky.social on Bluesky

Post image Post image Post image

πŸ“£πŸ“£πŸ“£ Excited for our lab's latest preprint, led by Chief Ben-Eghan! www.medrxiv.org/content/10.1...

tl;dr We identify protein vQTLs in multiple ancestries then use MVMR to show independent effects of mean & variance on disease, suggesting targeting protein variance could have therapeutic potential.

22.11.2025 11:42 β€” πŸ‘ 29    πŸ” 9    πŸ’¬ 1    πŸ“Œ 0
Preview
From School To The Skilled Workforce - Policy Exchange Download Publication Independent Analysis Online Reader This new report by Policy Exchange makes the case that University Technical Colleges (UTCs) can play a vital role in addressing the UK’s profoun...

Thanks to Will Bickford Smith for co-leading this, and the people at Policy Exchange for commissioning this work. Great to see this work published! Policy document: policyexchange.org.uk/publication/..., Statistical report: policyexchange.org.uk/wp-content/u....

14.11.2025 10:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Based on student attainment data, UTC students perform less well at English at age 16, but equally well at maths (and potentially better at maths for disadvantaged students).

14.11.2025 10:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Based on leaver data, UTC students were more likely to go into apprenticeships, potentially more likely to go into employment, and no more likely (and possibly less likely) to have no sustained outcome.

14.11.2025 10:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Differences in overall education participation were maintained across surveys, but differences in further education participation appeared to attenuate to zero over time.

14.11.2025 10:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Based on student destination data, UTC students were consistently less likely to be in sustained education, but more likely to be in sustained employment, and less likely to not have a sustained outcome compared with students at comparable schools.

14.11.2025 10:30 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We performed a doubly-robust analysis, matching on these variables but also adjusting for them in a regression model. We also adjusted for proportion of pupils who are boys (%BOYS), but didn't match on this variable (as schools with unbalanced sex ratio are often atypical).

14.11.2025 10:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We matched on three variables, proportion of pupils with English as an additional language (%EAL), proportion with special educational needs (%SEN), and proprtion with free school meals (%FSM).

14.11.2025 10:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

For most outcomes, we performed a matched analysis, matching each UTC with 5 similar schools in the same Local Educational Authority, and comparing outcomes within each matched set. This analysis uses less data than an analysis of the full dataset, but more relevant data.

14.11.2025 10:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We wanted to benchmark UTC performance for exam results and leaver outcomes. The analysis was challenging in a number of ways. How to conduct a like-with-like comparison of UTCs with similar secondary schools that are not UTCs using publicly-available data?

14.11.2025 10:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

University Technical Colleges (UTCs) are non-selective state-funded β€œfree” secondary schools in the UK (free = outside the control of the local education authority, as well as non-paying) that focus on science and technology.

14.11.2025 10:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Happy to have co-authored a technical report published by @Policy_Exchange on University Technical Colleges. My role was limited to the statistical report, not the policy document - although I proof read the latter to ensure our analyses was correctly represented.

14.11.2025 10:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Thanks to @hwang_seongwon for leading the project, to @jeffreypullin.bsky.social for performing code review, and to @chr1sw.bsky.social allace and John Whittaker for co-supervising - has been a fun project so far, and look forward to getting feedback from the community!

08.11.2025 15:40 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

However, like all statistical methods, it has limitations, and results should not be thought of as unquestionable truth. It is likely that the differences between datasets in other applications are similar or stronger than those we considered here.

08.11.2025 15:40 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

In conclusion, while all methods were well-calibrated in the baseline scenario, they struggled to declare colocalization to different degrees when the datasets varied in terms of platform and population. Colocalization can be a valuable tool for triaging and prioritizing.

08.11.2025 15:40 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

This was not intended to be a fair comparison - fairness is impossible to achieve. For example, coloc-SuSiE was judged to support colocalization if there was high PP.H4 for any pair of credible sets. Rather, we wanted to compare methods as they would typically be used.

08.11.2025 15:40 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We acknowledge that there are many legitimate reasons why we may observe non-colocalization for the same protein when using estimates from different platforms / populations. Also, we acknowledge that different methods use different standards of evidence.

08.11.2025 15:40 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Enumeration methods tended to outperform proportional methods in most scenarios. However, no single approach dominated in all scenarios, with coloc-SuSiE reporting the highest rate of colocalization in Case 1, Case 2B, and Case 4; colocPropTest in Case 2F; and coloc in Case 3.

08.11.2025 15:40 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

In these cases, results were more mixed. We observed frequent disagreement between methods as to whether there was colocalization, non-colocalization, or insufficient evidence. In the worst-case scenario, colocalization was only agreed by all four methods for 20% of proteins.

08.11.2025 15:40 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We then consider associations with the same protein, but measured on different platforms (Olink vs SomaLogic in British [Case 2B] and Finnish [Case 2F] populations), and measured in different populations (British vs Finnish for Olink [Case 3] and SomaLogic [Case 4]).

08.11.2025 15:40 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

In the baseline context, we split the UK Biobank Pharma Proteomics Project in two at random, and tested associations for the same protein in one half of the data versus the other half of the data (Case 1). Unsurprisingly, all methods performed well in this context.

08.11.2025 15:40 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We perform colocalization for protein-coding gene regions with β‰₯1 pQTL across four datasets using four colocalization methods: coloc, coloc-SuSiE, prop.coloc, and colocPropTest in a range of contexts.

08.11.2025 15:40 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Systematic comparison of colocalization methods using protein quantitative trait loci Colocalization is frequently performed as a step to triage findings from genetic investigations linking molecular and disease data. However, the reliability and consistency of the various colocalizati...

New pre-print: "Systematic comparison of colocalization methods using protein quantitative trait loci" led by @hwang_seongwon at www.biorxiv.org/content/10.1.... Which method does best? Find out!

08.11.2025 15:40 β€” πŸ‘ 19    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0

Big thanks to all co-authors for contributing to this: @amymariemason.bsky.social, @VerenaZuber, @explodecomputer, Elena, @IamYuXu, Amanda, @BarWoolf, @eliasallara, @dpsg108, and @OpeSoremekun. Feedback would be very welcome!

27.10.2025 08:43 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Critical is what we can assume is shared between populations, and what is different - are we clear what we are assuming can be borrowed? And is it reasonable to borrow that information?

27.10.2025 08:43 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

When analysing non-European data, there is often a compromise between only including the most relevant data to the target population, and including all available data from any population - we describe some approaches to this taken in the literature.

27.10.2025 08:43 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
DAG with dashed arrows sources of potential heterogeneity

DAG with dashed arrows sources of potential heterogeneity

The green dashed arrows indicate potential mechanisms that would lead to heterogeneity and hence differences in MR estimates between populations - examples of each are given in Table 1.

27.10.2025 08:43 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

There are many reasons why an MR estimate (or any epidemiological estimate) may differ between populations. We would opine that a true biological difference between population groups is rarely the most likely explanation for a difference.

27.10.2025 08:43 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

The target population will likely depend on the question. For environmental exposures, geographic definitions may be best. For social patterned exposures, cultural (ethnic) definitions. For genetic exposures, ancestral definitions.

27.10.2025 08:43 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

For instance, if we say "South Asians are at elevated risk of COVID-19", do we mean individuals living in South Asia? Do we mean individuals with South Asian ancestral heritage? Or do we mean individuals following South Asian cultural practices? These populations overlap, but they are distinct.

27.10.2025 08:43 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@stevesphd is following 20 prominent accounts