Image of R code. To reproduce: library(ggplot2) library(dplyr) library(mice, warn.conflicts = FALSE) imp <- mice(nhanes, m = 5, maxit = 5, seed = 1, ignore = rep(c(FALSE, TRUE), c(20, 5)), print = FALSE) impdats <- complete(imp, "all") train <- lapply(impdats, function(dat) subset(dat, !imp$ignore)) test <- lapply(impdats, function(dat) subset(dat, imp$ignore)) fits <- lapply(train, function(dat) lm(age ~ bmi + hyp + chl, data = dat)) preds <- predict_mi(object = fits, newdata = test, pool = TRUE, interval = "prediction") preds preds %>% as.data.frame() %>% mutate(case = 1:nrow(preds), y = test[[1]]$age) %>% ggplot(aes(x = fit, y = case, col = rowSums(is.na(nhanes[imp$ignore,]))>0)) + geom_point() + geom_errorbar(aes(xmin = lwr, xmax = upr)) + theme_minimal() + scale_color_manual(values = mice::mdc(1:2), labels = c("observed", "missing")) + theme(legend.title = element_blank(), legend.position = "bottom") + labs(x = "prediction", title = "Pooled prediction intervals")
Cool stuff!
Florian van Leeuwen and I implemented a prediction function in the #mice package that allows the incorporation of missing data uncertainty in a prediction interval.
The `predict_mi()` function is available in the current development version: github.com/amices/mice
#Rstats #statsky