@edwardhkennedy.bsky.social
assoc prof of statistics & data science at Carnegie Mellon https://www.ehkennedy.com/ interested in causality, machine learning, nonparametrics, public policy, etc
Awesome!
06.04.2025 14:59 β π 0 π 0 π¬ 0 π 0Text from van der Vaart, "Asymptotic Statistics" Ch 27, http://www.stat.yale.edu/~pollard/Books/LeCamFest/VanderVaart.pdf The theorem may have looked to somewhat too complicated to gain popularity. Nevertheless HΓ‘jek's result, for general locally asymptotically normal models and general loss functions, is now considered the final result in this direction, HΓ‘jek wrote: "The proof that local asymptotic minimax implies local asymptotic admissibility was first given by LeCam (1953, Theorem 14). ... Apparently not many people have studied Le Cam's paper so far as to read this very last theorem, and the present author is indebted to Professor LeCam for giving him the reference" Not reading to the end of Le Cam's papers became not uncommon in later years. His ideas have been regularly rediscovered
Went to look up textbook results after getting the nagging feeling that an ML paper was reinventing classical ideas, and found this gem:
"Not reading to the end of Le Cam's papers became not uncommon in later years. His ideas have been regularly rediscovered."
At least they're in good company.
Ok I think I'll stop now :) I'm always amazed at how ahead of its time this work was.
It's too bad it's not as widely known among us causal+ML people
Once you have a pathwise differentiable parameter, a natural estimator is a debiased plug-in, which subtracts off the avg of estimated influence fn
Pfanzagl gives this 1-step estimator here - in causal inference this is exactly the doubly robust / DML estimator you know & love!
Pfanzagl uses pathwise differentiability above, but w/regularity conditions this is just a distributional Taylor expansion, which is easier to think about
I note this in my tutorial here:
www.ehkennedy.com/uploads/5/8/...
Also v related to so-called "Neyman orthogonality" - worth separate thread
Hereβs Pfanzagl on the gradient of a functional/parameter, aka derivative term in a von Mises expansion, aka influence function, aka Neyman-orthogonal score
Richard von Mises first characterized smoothness this way for stats in the 30s/40s! eg:
projecteuclid.org/journals/ann...
From twitter:
A short thread:
It amazes me how many crucial ideas underlying now-popular semiparametrics (aka doubly robust parameter/functional estimation / TMLE / double/debiased/orthogonal ML etc etc) were first proposed many decades ago.
I think this is widely under-appreciated!
The m-estimator logic certainly relies on βexactly correctβ
Once you start moving to βclose enoughβ to me that means youβre no longer getting precise root-n rates with the nuisances. Then youβll have to deal with the bias/variance consequences just as if you were using flexible ML
And here for more specific discussion:
arxiv.org/pdf/2405.08525
I think DR estimation vs inference are two quite different things and we need different assumptions to make them work
If we really rely on 2 parametric models, we should of course use a variance estimator recognizing this. But this is more about how we model nuisances vs DR estimator itself
Also our paper here suggests strictly more assumptions are needed for DR inference vs estimation:
arxiv.org/pdf/2305.04116
I find it much more believable that I could estimate both nuisances consistently, but at slower rates, vs that I could pick 2 parametric models (without looking at data) & happen to get one exactly correct
11.02.2025 14:43 β π 5 π 0 π¬ 2 π 0Hm not sure I agree with this logicβ¦
To me the beautiful thing about the DR estimator is you can get away with estimating both nuisances at slower rates (as long as the product is < 1/sqrt(n))
This opens the door to using much more flexible methods - random forests, lasso, ensembles, etc etc
"Randomized trials should be used to answer any causal question that can be so studied...
But the reality is that observational methods are used everyday to answer pressing causal questions that cannot be studied in randomized trials."
- Jamie Robins, 2002
tinyurl.com/4yuxfxes
tinyurl.com/zncp39mr
What's the best paper you read this year?
27.12.2024 17:02 β π 35 π 4 π¬ 13 π 2Here's the recent paper!
bsky.app/profile/edwa...
Thank you Alec for leading this project, I learned a lot! This paper has a very useful study of what contrasts are feasible in situations with many treatments and positivity violations, including necessary assumptions and efficient one-step estimators. Check it out!
13.12.2024 23:53 β π 12 π 3 π¬ 0 π 0New-ish paper alert! arxiv.org/abs/2410.13522
Β
We tackle the challenge of comparing multiple treatments when some subjects have zero prob. of receiving certain treatments. Eg, provider profiling: comparing hospitals (the βtreatmentsβ) for patient outcomes. Positivity violations are everywhere.
Found slides by Ankur Moitra (presented at a TCS For All event) on "How to do theoretical research." Full of great advice!
My favourite: "Find the easiest problem you can't solve. The more embarrassing, the better!"
Slides: drive.google.com/file/d/15VaT...
TCS For all: sigact.org/tcsforall/
@bonv.bsky.social presented this at NYU this week -- terrific work with an excellent presentation (no surprise there)! I found the connections to higher-order estimators and the orthogonalizing property of the U-stat kernel fascinating&illuminating.
13.12.2024 19:05 β π 2 π 1 π¬ 1 π 0led by the amazing Matteo Bonvini @bonv.bsky.social
www.matteobonvini.com
has lots of connections to doubly robust inference
academic.oup.com/biomet/artic...
arxiv.org/abs/1905.00744
arxiv.org/abs/2107.06124
Should we use structure-agnostic (arxiv.org/abs/2305.04116) or smooth (arxiv.org/pdf/1512.02174) models for causal inference?
Why not both?
Here we propose novel hybrid smooth+agnostic model, give minimax rates, & new optimal methods
arxiv.org/pdf/2405.08525
-> fast rates under weaker conditions
I see renewed discussion on #statsky about the interpretation of confidence intervals. I will leave here this quote from Larry Wasserman's All of Statistics, which I love. Controlling one's lifetime proportion of studies with an interval that does not contain the parameter is surely desirable!
06.12.2024 14:44 β π 35 π 4 π¬ 1 π 0Table of contents of the monograph
Reminder/plug: my graduate-level monograph on "Topics and Techniques in Distribution Testing" (FnT Comm. and Inf Theory, 2022).
π ccanonne.github.io/survey-topic... [Latest draft+exercise solns, free]
π nowpublishers.com/article/Deta... [Official pub]
π github.com/ccanonne/sur... [LaTeX source]
I strongly believe that the average treatment effect is given way too much prominence in economics and econometrics. ATE can be informative, but it can also badly mislead policy makers and decision makers. If we know the joint distribution of potential outcomes, then we may be able to better calibrate the policy. I hope that Kolmogorov bounds will become a part of the modern econometricianβs toolkit. A good place to learn more about this approach is Fan and Park (2010). Mullahy (2018) explores this approach in the context of health outcomes. Chuck Manski revolutionized econometrics with the introduction of set identification. He probably does not think so, but Chuck has changed the way many economists and most econometricians think about problems. We think much harder about the assumptions we are making. Are the assumptions credible? We are much more willing to present bounds on estimates, rather than make non-credible assumptions to get point estimates. Manskiβs natural bounds allow the researcher to estimate the potential effect of the policy with minimal assumptions. These bounds may not be informative, but that in and of itself is informative. Stronger assumptions may lead to more informative results but at the risk that the assumptions, not the data, determine the results.
π₯π₯π₯ from Chris Adams's "Learning Microeconometrics with R:"
25.11.2024 02:20 β π 23 π 5 π¬ 1 π 2If by distribution of the treatment effect you mean the distribution of the counterfactual difference Y1-Y0, then (a) this is not a soft intervention (asks what happen if all treated vs control), and (b) that distribution (other than the mean) is not identified even when all confounders are measured
25.11.2024 14:47 β π 0 π 0 π¬ 0 π 0The main motivation is that soft / stochastic intervention effects are typically much more realistic/plausible than more standard effects - instead of asking what would happen if every single person got treatment vs control, they ask what would happen if the treatment distribution shifted slightly
24.11.2024 15:45 β π 6 π 1 π¬ 4 π 0"Thereβs no way you can just sit down & do a `big thing', or at least I canβt. So I just went back to doing lots of little things, & hoping that some of them will turn out okay. Statistics is a wonderfully forgiving field... all you have to do is get an idea & keep at it."
- Brad Efron #statsquotes