Ryan Batten, PhD(c) @ryanbatten

For example, say we have propensity scores for both groups. However there is a lack of overlap.

We decide to focus on the area where there is overlap.

We do this by applying overlap weights.

The population these results apply to would be the overlap population!

2/2

18.12.2024 17:51 — 👍 1 🔁 0 💬 0 📌 0

Average treatment effect in the overlap can be a tricky causal estimand. Why?

The ATO is a little different than other estimands.

Often, it's not well defined before the analysis.

This is because there are many ways to define the population.

Instead, it's based on the statistical method.

1/2

18.12.2024 17:51 — 👍 0 🔁 0 💬 1 📌 0

The third installment of the “how should we actually construct our causal graphs anyway” series is out now! 👇🏼

Nick & I ask the question: can we just get an LLM to tell us what belongs on the graph?

17.12.2024 22:03 — 👍 44 🔁 13 💬 1 📌 0

Causal inference on human behaviour - Nature Human Behaviour In this Review, Drew Bailey et al. present an accessible, non-technical overview of key challenges for causal inference in studies of human behaviour as well as methodological solutions to these chall...

A few papers I think worth reading. Mostly open access.

Causal inference is hard:

www.nature.com/articles/s41...

17.12.2024 01:27 — 👍 165 🔁 53 💬 10 📌 6

The more obscure a statistical analysis method, the more I question the design.

Not saying it's wrong, but I'd have questions why a more "common" approach wasn't used.

16.12.2024 15:33 — 👍 2 🔁 0 💬 0 📌 0

Bootstrapping is sort of a semi-Bayesian approach when you think about it

15.12.2024 00:57 — 👍 3 🔁 0 💬 0 📌 0

Calling bullshit - a skill that every applied statistician should master. Unfortunately many of the younger statisticians I’ve worked with sometimes lack the bravery to do so. The book looks like a must-have. #Statistics #StatsSky @carlbergstrom.com @carlzimmer.bsky.social

14.12.2024 13:22 — 👍 59 🔁 20 💬 4 📌 0

A common critique of Bayesian methods is that priors are arbitrary. I think that's a good thing. It's an assumption, like much of science.

Better to be explicit about assumptions (i.e., DAGs, priors, etc) than implicit

14.12.2024 18:13 — 👍 1 🔁 0 💬 0 📌 0

ggplot2 is like electricity. I don't need it to survive, but I much prefer it

14.12.2024 18:10 — 👍 4 🔁 2 💬 1 📌 0

Noncollapsibility, confounding, and sparse-data bias. Part 1: The oddities of odds To prevent statistical misinterpretations, it has long been advised to focus on estimation instead of statistical testing. This sound advice brings with it the need to choose the outcome and effect measures on which to focus. Measures based on odds or their logarithms have often been promoted due to their pleasing statistical properties, but have an undesirable property for risk summarization and communication: Noncollapsibility, defined as a failure of the measure when taken on a group to equal a simple average of the measure when taken on the group's members or subgroups.

Don't think this is the paper you're referencing but there's one from Sander Greenland (2021) talking about non-collapsibility (aka why using marginal effects for certain measures gives a different result than conditional effects)

Paper: www.jclinepi.com/article/S089...

12.12.2024 19:39 — 👍 2 🔁 0 💬 1 📌 0

It can be tempting to think of propensity scores as a prediction problem. This is problematic. Why?

In prediction models, any variable that helps can be included.

In causal inference, this can cause bias, e.g., collider bias.

Instead, use a directed acyclic graph (DAG) for variable selection.

11.12.2024 13:49 — 👍 3 🔁 0 💬 0 📌 0

Fantastic to see simulation on the list! After learning how to use simulations, use them almost every day

11.12.2024 02:11 — 👍 1 🔁 0 💬 0 📌 0

a man in a suit and tie stands in front of two other men ALT: a man in a suit and tie stands in front of two other men

Percentages > 100%...

11.12.2024 02:02 — 👍 1 🔁 0 💬 0 📌 0

Fantastic initiative! Especially useful for papers using simulations

11.12.2024 01:34 — 👍 0 🔁 0 💬 0 📌 0

For IPTW which causal estimand was it? If it was ATE, then it's estimating something different from PSM.

The causal estimand impacts several area. It's important to keep in mind.

PS: There are four estimands:
- ATE
- ATT
- ATU
- ATO

3/3

09.12.2024 18:43 — 👍 0 🔁 0 💬 0 📌 0

- Inverse Probability of Treatment Weighting (IPTW)
- Propensity Score Matching (PSM)

We simulate some data and choose the metrics to evaluate them.

Then we compare the methods.

We decide that one is better than the other.

That may be true...but did they estimate the same thing?

2/3

09.12.2024 18:43 — 👍 0 🔁 0 💬 1 📌 0

Choosing a causal estimand is important. Why?

To make sure the research question is answered!

Certain methods can only estimate specific estimands. This is important when comparing methods.

Let's use an example.

Imagine we want to compare two methods:

1/3

09.12.2024 18:43 — 👍 0 🔁 0 💬 1 📌 0

The best way to improve your analysis:

Plot your data

08.12.2024 22:07 — 👍 0 🔁 0 💬 0 📌 0

I find the same thing! One area where its helpful is condensing emails (when possible)

08.12.2024 21:50 — 👍 1 🔁 0 💬 0 📌 0

My take:

A frequentist approach assumes there is a fixed value. Take y = mx+b. A frequentist view assumes m is fixed.

A determinist view would be similar, assuming there is a fixed set of values.

(no refs, but interested in any you find!)

08.12.2024 16:27 — 👍 3 🔁 0 💬 0 📌 0

Same 😂

07.12.2024 14:34 — 👍 1 🔁 0 💬 0 📌 0

Too accurate

07.12.2024 13:56 — 👍 0 🔁 0 💬 1 📌 0

This is a good example of how Bayes & Frequentist methods are different paradigms of stats.

Not unlike calculus vs linear algebra.

Both useful, but mixing them is problematic.

06.12.2024 16:58 — 👍 2 🔁 0 💬 0 📌 0

I find the same thing! One solution I'm exploring is to take a previous LinkedIn post and get ChatGPT to condense.

Have to edit it, but helpful as a starting point!

06.12.2024 15:11 — 👍 1 🔁 0 💬 0 📌 0

Nominal Coverage for GLM Nominal Coverage for GLM . GitHub Gist: instantly share code, notes, and snippets.

Code for plot: gist.github.com/battenr/e6f5...

06.12.2024 15:10 — 👍 0 🔁 0 💬 0 📌 0

Nominal coverage helped me with confidence intervals:

If you repeat an analysis 1,000 times, nominal coverage is the % of intervals that capture the true effect.

For 95% CIs, we'd expect ~950/1,000 to include the true value. It's a long-run frequency idea, not a guarantee for any single interval!

06.12.2024 15:09 — 👍 0 🔁 0 💬 1 📌 0

Great question! For this example, I'm assuming there is no time varying confounding (tried to keep it simple as an introductory example).

If there is time-varying confounding then there are better methods (like a marginal structural model).

Thanks for the link! Look forward to reading it

06.12.2024 15:06 — 👍 0 🔁 0 💬 0 📌 0

Great its on your reading list! Unfortunate they went out of business

06.12.2024 02:08 — 👍 2 🔁 0 💬 0 📌 0

You should! I actually started a couple months ago

Highly recommend Statistical Rethinking (~85% of the way through it)

06.12.2024 01:00 — 👍 1 🔁 0 💬 1 📌 0

If you're meeting hobbits from The Shire...does that make you Gandalf?

06.12.2024 00:57 — 👍 2 🔁 0 💬 0 📌 0

Ryan Batten, PhD(c)

Latest posts by ryanbatten.bsky.social on Bluesky

@ryanbatten is following 19 prominent accounts