Some Open Problems in Probability that are Relevant to Applied Statistics (my talk this Wed noon at the Columbia statistics department student seminar)
statmodeling.stat.columbia.edu/2026/02/10/m...
New paper on the problem of "missing regulation" (limited overlap between GWAS signals and eQTLs) from Shamil Sunyaev's lab. Led by Noah Connally.
Our work on the generalizability of polygenic scores (PGS) from the @arbelharpak.bsky.social Lab is now officially out!
We examine the accuracy of PGS predictions at the individual level. We make 3 observations that expose gaps in our understanding of PGS “portability.”
rdcu.be/e0LAr
(1/27)
Insightful paper on the importance of phenotypic scale when testing for interactions involving genetic variants (specifically, GxE effects). From Iain Mathieson's and Andy Dahl's labs, and led by Manuela Costantino.
Registration for the 2026 NY Area Population Genetics meeting is now open, at events.simonsfoundation.org/e0mEoL?rt=8k.... Registration is free but required; if you are submitting an abstract, note that the deadline is *January 30th*.
Happy to highlight an essay I wrote together with @marcdemanuel.bsky.social,
@natanaels.bsky.social and Anastasia Stolyarova, trying to think through what sets the mutation rate of a cell type in an animal species: www.biorxiv.org/content/10.6... 1/n
GWAS has been an incredible discovery tool for human genetics: it regularly identifies *causal* links from 1000s of SNPs to any given trait. But mechanistic interpretation is usually difficult.
Our latest work on causal models for this is out yesterday:
www.nature.com/articles/s41...
A short🧵:
Delighted that our paper about the distribution of genomic spans of clades/edges in genealogies (ARGs), and using this for detecting inversions and other SVs (and other phenomena that cause local disruption of recombination) is out in MBE academic.oup.com/mbe/article/... (1/n)
SuSiE 2.0: improved methods and implementations for genetic fine-mapping and phenotype prediction https://www.biorxiv.org/content/10.1101/2025.11.25.690514v1
🚨 New preprint from the lab!
We’re excited to share “Improving population-scale disease prediction through multi-omics integration” by Ng et al. www.medrxiv.org/content/10.1...
...for interactions involving the HLA region in collaboration with @y-luo.bsky.social.
Thanks for reading!
...might allow for detecting many more of these effects.
I'd like to thank Sile Hu for his help and Simon Myers for his supervision. 🙏
I'm also very grateful to @mollyprz.bsky.social for generous financial support in the final stages of the project.
In ongoing work, we are testing...
...are partly mediated through modulating the effects of other SNPs.
Another takeaway is that we find more interactions for molecular phenotypes than for more complex and polygenic phenotypes (probably due to greater statistical power to detect them), and so novel proteomics datasets...
...functional relationships between genes.
Moreover, many phenotypes (more than half of those we analysed) show interactions, and in fact some well-known hits from standard GWASs (at FTO for obesity or TCF7L2 for diabetes, for example) have effects on disease-relevant phenotypes that...
...the Wnt signalling pathway (itself important in diabetes aetiology) which points to the potential relevance of this interaction in the architecture of this disease.
Our results show that, even though interactions explain very little phenotypic variance, they can be useful by pointing to...
...to partition PGSs and test the same 144 hits for interactions with partitioned scores. We identify 12 interactions, including one between the strongest T2D-associated SNP found to date (at TCF7L2) and the KDM2A TF for HbA1c levels. KDM2A has been found to interact physically with TCF7L2 within...
...and IL33 for eosinophil levels, which could reflect a functional interaction between these genes recently implicated in eosinophilic asthma.
We then look for interactions that are more precise than SNP-by-PGS but broader than SNP-by-SNP: we use data on transcription factor binding motifs...
...for SNP-by-SNP interactions but within a much smaller search space, and allows us to find 38 pairs (of which 32 are novel to our knowledge).
Our results recover and extend a known network involving ABO, FUT2 and TREH for alkaline phosphatase. Another highlight is an interaction between ALOX15...
...the effect of the PGS on the trait; or the effect of a SNP varies depending on polygenic background. Our signals include well-know disease risk variants at APOE, FTO and TCF7L2.
We then take these 144 associations and look for pairwise interactions genome-wide. This is a classic search...
We develop a method to test for interactions between SNPs and polygenic scores (PGSs) and apply it to 97 quantitative phenotypes in the @ukbiobank.bsky.social, identifying 144 associations for 52 different traits.
These can be interpreted in two equivalent ways: the genotype at a locus alters...
...a linear model of genotype > phenotype.
Interactions can help with understanding biological mechanisms by identifying different parts of the genome whose statistical effects on a phenotype are interdependent – and which are therefore likely to also interact functionally within a pathway.
GWASs have been hugely successful in finding genetic associations but understanding the function of associated loci remains a great challenge.
We address this question from the angle of genetic interactions (epistasis): statistical interaction terms between genetic variants in...
Excited to share a preprint of my PhD project looking at interactions between SNPs and polygenic scores in the UK Biobank!
A thread... 🧵
www.medrxiv.org/content/10.1...
New study in #GENETICS from @anaignatieva.bsky.social and @linoafferreira.bsky.social shows how ancestral recombination graphs can help detect "phantom" genetic interaction signals that arise due to genealogy and not because of epistasis. buff.ly/TQARoDp
Our paper about how ancestral recombination graphs can be used to detect "phantom" genetic interaction signals (that arise due to the genealogy, rather than "real" epistasis) is out in Genetics! Nice thread here by @linoafferreira.bsky.social
academic.oup.com/genetics/adv...
Thank you!
We hope this approach will enable others to search for (and perhaps find!) epistatic effects in cis, and through this to learn more about the genetic basis of complex phenotypes.
Thanks for reading! (end 🧵)
In contrast, our method only requires publicly available WGS data for samples of similar ancestral background (we use 1KGP) whose information is efficiently encoded in the form of an ARG. This makes it applicable in settings where WGS data is not available (including for non-human species).
it was very difficult to rigorously test for such effects.
WGS data or dense imputation panels allowed for checking whether any neighbouring variant accounted for a putative interaction but only if this data was available for the same sample in which the epistasis testing was done.
evidence *against* the existence of a problematic variant.
This allows us to quantify the observed evidence either for or against a potential interaction being real.
Epistasis between variants in cis could be common (or at least less rare than that between variants farther apart) but until now...