Frank Harrell's Avatar

Frank Harrell

@f2harrell.bsky.social

Professor of Biostatistics Vanderbilt University School of Medicine Expert Biostatistics Advisor FDA Center for Drug Evaluation and Research https://hbiostat.org https://fharrell.com

7,693 Followers  |  138 Following  |  1,217 Posts  |  Joined: 12.10.2023
Posts Following

Posts by Frank Harrell (@f2harrell.bsky.social)

Right; you create a circular multiple imputation, nothing wrong with that.

04.03.2026 23:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1

That's my first choice, because it fully informs you of the importance of the value that is missing for the person being predicted.

04.03.2026 16:26 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

John Fox was an incredible educator, proponent of excellent statistical practices, writer, R developer, and person. We will always miss him. His effect on us will never disappear.

04.03.2026 12:28 β€” πŸ‘ 11    πŸ” 6    πŸ’¬ 1    πŸ“Œ 0

Very nice interactive demonstration of maximum likelihood estimation. I added this link to hbiostat.org/rmsc/mle #Statistics #StatsSky

04.03.2026 12:24 β€” πŸ‘ 17    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

I joined the boycott. Consider joining yourself. All AI is dangerous but ChatGPT's developers are truly devoid of conscience.

04.03.2026 12:16 β€” πŸ‘ 12    πŸ” 5    πŸ’¬ 2    πŸ“Œ 0

Right, and the more single imputation resembles single-value fill-in methods the more wrong they get because they don't preserve the correlation structure of the predictors. This will ruin regression coefficients.

04.03.2026 12:01 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

There are many ways, as described in our 2009 Clinical Chemistry article. Multiple imputation is attractive because it quantifies the cost of not collecting certain variables, by giving a range of predictions over multiple imputations before averaging the predictions for an overall estimate.

04.03.2026 12:00 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Recursive partitioning (classification and regression trees) use surrogate splits to handle missing data. I used to think that was a good idea but research has shown otherwise. Coupled with the 10 fold higher sample size needed for trees, I don't think this is a good choice.

04.03.2026 11:58 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0
3Β  Missing Data – Regression Modeling Strategies

I was referring to multiple imputation. Single imputation is seldom a good idea because it completely screws up standard errors of regression coefficient estimates. I do use single imputation when the fraction of missing is tiny. You're right re: omitting Y for single. hbiostat.org/rmsc/missing

03.03.2026 23:11 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 2    πŸ“Œ 0

No, it fails because it then under imputes missing X, resulting in biasing regression coefficients towards zero. Full Bayesian models don't impute but rather treat missing data as unknown parameters.

03.03.2026 19:16 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Dealing with Missing Predictor Values When Applying Clinical Prediction Models Abstract. Background: Prediction models combine patient characteristics and test results to predict the presence of a disease or the occurrence of an event

I hope our 2009 paper gets remembered: academic.oup.com/clinchem/art...
Main advantages of Bayes: gets correct coefficients without using Y as an impure; exact inference; response order of data collection flow.

03.03.2026 16:36 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Richard - hope you touch on full Bayesian models which have a major advantage in this context - not using the outcome variable as an imputer, unlike multiple imputation, which requires Y to be used to impute X, making prospective prediction tricky.

03.03.2026 13:04 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

good morning everyone project your personal imposter syndrome onto this gif ur welcome

03.03.2026 08:08 β€” πŸ‘ 132    πŸ” 35    πŸ’¬ 5    πŸ“Œ 3

Bayesian say that you should always have parameters for things you know you don't know.

03.03.2026 13:00 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Unmeasured potential confounders are easy to find in health research, e.g., insurance coverage, health-seeking behavior, diet, family cash on hand, ...

03.03.2026 12:59 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Bayesian Design

There are dramatic differences between Bayesian and frequentist in sequential designs. Here's an example where the sample size at which a conclusion is reached is dramatically smaller for Bayes: hbiostat.org/bayes/design - see the simulation of the Bayesian multi-goal design. #Statistics #StatsSky

03.03.2026 12:55 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

I hope readers of this have their bullshit detectors set to maximum ...

03.03.2026 12:52 β€” πŸ‘ 14    πŸ” 1    πŸ’¬ 2    πŸ“Œ 0

I really like this chart: terminology for cause vs describe vs predict.

03.03.2026 12:45 β€” πŸ‘ 37    πŸ” 12    πŸ’¬ 1    πŸ“Œ 0
Screenshot of the "Does that use a lot of energy?" online app

Screenshot of the "Does that use a lot of energy?" online app

Hannah Ritchie has built a fun little tool where you can compare energy usage of various products and activities.

This is super helpful imho, because it's so hard to develop intuitions even just about the scales involved here.

hannahritchie.substack.com/p/does-that-...

03.03.2026 09:27 β€” πŸ‘ 139    πŸ” 61    πŸ’¬ 3    πŸ“Œ 5

Thanks very much for this reference. The wish to find simpler formulas, which flies in the face of maximum likelihood estimation which is almost always iterative, is quite surprising. Wilson's CI is so easy to program (and has been in the R Hmisc package since 1991).

03.03.2026 12:40 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Equality as a Consolation Prize In a secular world, equality is a last attempt to offer some dignity to the weak

Amazingly thought provoking from @ruxandrabio.bsky.social : substack.com/home/post/p-...

03.03.2026 12:37 β€” πŸ‘ 5    πŸ” 2    πŸ’¬ 2    πŸ“Œ 0

Nice paper to know about. Related to my blog article, the inference from observational data with no randomized samples will come almost entirely from the prior distribution you put on observational study bias.

02.03.2026 15:27 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

My impression was what Wilson confidence intervals worked great without adding pseudocounts. No?

02.03.2026 15:20 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

Nice study. Had they evaluated how many of the 200 studies designed data collection post consideration of confounding, the results would have been far worse.

02.03.2026 15:18 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

The problem with that is that the authors will then label it exploratory but try to reach the same conclusions anyway.

02.03.2026 15:11 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Yes, teaching design is always first priority. Observational studies are being published all the time now without even having a design phase. I'm going to blog about this before long.

02.03.2026 12:49 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Even though it can be impossible, to even attempt to do so on a study that had no pre-data-collection design phase is a major problem.

02.03.2026 12:47 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Researcher β€˜honestly shocked’ to discover name on paper, editor claims misunderstanding While reviewing her Google Scholar profile to prepare a list of her publications, psychologist Maryam Farhang came across a paper she didn’t recognize.Β  The article, in the Journal of Research…

While reviewing her Google Scholar profile to prepare a list of her publications, psychologist Maryam Farhang came across a paper she didn’t recognize.

The article included her name and affiliation, but shevhadn’t written or contributed to the paper in any way.

27.02.2026 21:13 β€” πŸ‘ 53    πŸ” 14    πŸ’¬ 0    πŸ“Œ 1

The next time they tell you there’s no money for healthcare, remember there was money to start a war with Iran.

The next time they tell you there’s no money for housing or social supports, remember there was money to bomb a girls elementary school.

There’s always money for war.

28.02.2026 23:52 β€” πŸ‘ 470    πŸ” 172    πŸ’¬ 11    πŸ“Œ 5

In many observational studies it is a push to even call the sample a 'cohort'. For example in electronic health record-based studies we seldom know what makes a patient enter our health system, and anything about the patient that occurred while in a previous system is unknown.

01.03.2026 13:21 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0