Matthew Kay's Avatar

Matthew Kay

@mjskay.com.bsky.social

Assoc Prof Computer Science and Communication Studies at Northwestern. Infovis, HCI. Author of tidybayes & ggdist R pkgs. he/him. πŸ³οΈβ€πŸŒˆ https://mjskay.com/ Co-director https://mucollective.northwestern.edu Co-founder https://journalovi.org

11,278 Followers  |  1,026 Following  |  1,497 Posts  |  Joined: 07.05.2023  |  2.2981

Latest posts by mjskay.com on Bluesky

I think @chelseaparlett.bsky.social complained once about have to teach posterior-predictive functions have names like β€œpp_check()”

05.08.2025 20:15 β€” πŸ‘ 14    πŸ” 4    πŸ’¬ 7    πŸ“Œ 0

cumsum()

also this:

06.08.2025 03:04 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 3    πŸ“Œ 0
A screenshot of my JSM 2025 Agenda Chat bot, featuring the answer to "when is Mitchell O'Hara-Wild presenting".

A screenshot of my JSM 2025 Agenda Chat bot, featuring the answer to "when is Mitchell O'Hara-Wild presenting".

There's so many parallel sessions at #JSM2025 that it's hard to choose where to go, so I web-scraped the schedule and made a chatbot to help.

I've made it public, so hopefully it can help you too - try it out πŸ‘‰ shiny.mitchelloharawild.com/jsm2025/

04.08.2025 15:32 β€” πŸ‘ 17    πŸ” 8    πŸ’¬ 5    πŸ“Œ 0

hmm allow me to Google the words "frequentist" and "eugenics" and see if I can't drum up an answer to your question

02.08.2025 18:05 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
"I asked the AI" is not research You can't reliably fact-check or research obscure topics by asking an AI. It may know less about the topic than you do.

"I asked the AI" is not research.
clauswilke.substack.com/p/i-asked-th...

02.08.2025 15:10 β€” πŸ‘ 45    πŸ” 15    πŸ’¬ 0    πŸ“Œ 0
Post image

Congrats to Dr. @abhsarma.bsky.social on a successful Ph.D. thesis defense!

"Designing Interactive Systems for Reasoning with Ontological Uncertainty in Data Analysis"

Advised by @mjskay.com and me with Darren Gergle and Fanny Chevalier also on the committee.

02.08.2025 15:24 β€” πŸ‘ 14    πŸ” 1    πŸ’¬ 0    πŸ“Œ 1
Paula and her committee

Paula and her committee

Paula defending her dissertation

Paula defending her dissertation

Congratulations Dr. Paula Kayongo for successfully defending her Ph.D. dissertation!

"Behavioral Information Design for Forecasting Dashboards: Equilibria, Mechanisms, and Calibration"

Thanks to her stellar committee: @jessicahullman.bsky.social, @mjskay.com, and Annie Liang!

01.08.2025 19:58 β€” πŸ‘ 13    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

If you've been working hard on an alt.vis25 submission, then good news!!

The alt.vis deadline is extended! It is now on AoE August 6. We look forward to seeing what you've been up to! Keep up the alt.work!!

31.07.2025 14:59 β€” πŸ‘ 3    πŸ” 2    πŸ’¬ 2    πŸ“Œ 0
Preview
Worklytics - Work Analytics Benchmark Report Design and Specialized Data Visualization Design system, visualization, and automation for the Worklytics Annual Benchmarks Report.

How do people work at work?

Also: How do you get quantile dot plots to play nicely in a dataviz πŸ“Š design system across an 89 page report, with hundreds of different plots, each with varying distributions?

New deep-dive case study on my recent project for Worklytics: 3iap.com/work/worklyt...

30.07.2025 18:22 β€” πŸ‘ 31    πŸ” 8    πŸ’¬ 4    πŸ“Œ 1

An actually interesting take on NA handling in #rstats! That we can treat it like a monad and use function wrappers to handle it instead of arguments like na.rm. I really like this!!

30.07.2025 17:36 β€” πŸ‘ 11    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
Preview
Let's talk about NA-s! ARTHUR: Well, what is it you want? HEAD KNIGHT: We want… a paste function that can deal with NA-s! A language for statistical computing clearly needs to be able to deal with missing values, and R has ...

Interesting read about NA handling in #rstats (via rweekly.org/2025-W31.html): www.biobits.be/biofunctor/2...

The overview of the {stats} package's various NA utility functions was informative: `na.pass`, `na.omit`, `na.exlude`, `na.contagious`, and `na.fail`.

1/2

30.07.2025 16:02 β€” πŸ‘ 20    πŸ” 6    πŸ’¬ 4    πŸ“Œ 2
Scatterplot matrix alongside a dithered correlation heatmaps showing the correlation between each pair of variables (and its uncertainty)

Scatterplot matrix alongside a dithered correlation heatmaps showing the correlation between each pair of variables (and its uncertainty)

Yeah with a map I think that interpretation is reasonable - it's what that visual form usually means. Makes the technique potentially tricky to apply (and also therefore interesting!)

Some time ago I played with these on correlation heatmaps which don't have that baggage

30.07.2025 06:41 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Funny @mitchelloharawild.com just mentioned this package to me earlier today! Really glad to see it being done in a usable way

30.07.2025 04:18 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I really like this idea and TBH I don't think existing empirical work tells us enough --- I'm not even sure what the right parameters would be (apparent "pixel" size, ordering, etc) to make these things work well, and I don't think that's been explored enough to even know if they do work...

30.07.2025 04:17 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Pixelated display of uncertainty in county-level poverty rates from Lucchesi & Wikle (2017)

Pixelated display of uncertainty in county-level poverty rates from Lucchesi & Wikle (2017)

Visualization techniques evaluated in Preston & Ma (2022), of which the "dotmaps" are pixelated depictions of uncertainty: (a) interpolation only, (b) interpolation with sensors, (c) ordered dotmap, (d) smoothed dotmap, (e) small multiples, (f) risk contours.

Visualization techniques evaluated in Preston & Ma (2022), of which the "dotmaps" are pixelated depictions of uncertainty: (a) interpolation only, (b) interpolation with sensors, (c) ordered dotmap, (d) smoothed dotmap, (e) small multiples, (f) risk contours.

It's been studied a bit, but not enough IMO (occasionally I contemplate it...)

Lucchesi & Wikle (doi.org/10.1002/sta4...) discuss it (and link to earlier work I've only found illegible scans of: doi.org/10.1016/S009..., and Preston & Ma did a study on some variations (doi.org/10.1109/TVCG...)

30.07.2025 04:12 β€” πŸ‘ 9    πŸ” 2    πŸ’¬ 4    πŸ“Œ 1

Pretty! Interesting use of gradients for data alongside the model

29.07.2025 16:04 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

MOAR yes but I am like 3 deep on "package to write a package to write a package" and I need to pop stuff off the stack

26.07.2025 19:22 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

I'd probably go the other way and make the text colored (but in a thicker font weight to compensate for reduced contrast) and make the glow white to help the text stand out better

26.07.2025 19:06 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

I love Steve but also he likes to get into arguments and that tracks

26.07.2025 17:36 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

goddammit every day is a new "hmm I should make a package for that" #rstats

26.07.2025 17:31 β€” πŸ‘ 31    πŸ” 2    πŸ’¬ 5    πŸ“Œ 0
Excerpt from linked page showing a dual axis plot alongside its translation to a connected scatterplot, with this text:

Known concepts have new visual features. 
In a connected scatterplot, what does it mean when the line forms a loop? This feature reflects a potentially important pattern in the data, and one that can be tough to detect in more traditional formats. The answer is in our paper, linked above.

Excerpt from linked page showing a dual axis plot alongside its translation to a connected scatterplot, with this text: Known concepts have new visual features. In a connected scatterplot, what does it mean when the line forms a loop? This feature reflects a potentially important pattern in the data, and one that can be tough to detect in more traditional formats. The answer is in our paper, linked above.

@steveharoz.com has a nice page+paper that includes pros/cons of connected scatterplots and how to read them: steveharoz.com/research/con...

26.07.2025 17:30 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

they're always there, waiting...

26.07.2025 17:24 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Re: z scoring, scaling to have the same mean and sd seems to me a reasonable way to do dual axis charts if you're gonna do them (not indexing)

that's what I did here though no one seemed to notice for some reason...

26.07.2025 17:23 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 2    πŸ“Œ 0
# dualaxes ----
# URL: https://freerangestats.info/blog/2016/08/18/dualaxes
library(readxl)
library(tidyverse)
library(scales)
devtools::install_github("ellisp/ggseas/pkg")
library(ggseas) # for stat_index()
library(grid)
library(gridExtra)

# Download data from the Reserve Bank of New Zealand
download.file("http://www.rbnz.govt.nz/-/media/ReserveBank/Files/Statistics/Key%20graphs/graphdata.xlsx?la=en",
  destfile = "data/rbnz.xlsx", mode = "wb"
)

# Import some of that data into R and create a numeric TimePeriod variable from the original
# string that shows year and month:
forex <- read_excel(
  path = "data/rbnz.xlsx", sheet = "NZDUSD", skip = 4,
  col_names = c("DATE", "NZDUSD", "TWI"),
  col_types = c("date", "numeric", "numeric")
)

forex <- forex %>%
  mutate(
    year = year(DATE),
    month = month(DATE),
    TimePeriod = year + (month - 0.5) / 12
  ) %>%
  select(-DATE, -year, -month) |> 
  na.omit()

# Create a long, thin ("tidy") version for use with {ggplot2}:
forex_m <- forex %>%
  gather(variable, value, -TimePeriod)

# Set the basic foundation of the coming ggplot graphics:
basicplot <- ggplot(
  data = forex_m,
  mapping = aes(x = TimePeriod, y = value, colour = variable)
) +
  labs(
    x = NULL,
    caption = "Data from RBNZ; graphic by http://ellisp.github.io",
    colour = ""
  )

## facet versions ----
# Good facet plot
basicplot +
  geom_line() +
  facet_wrap(facets = ~variable, scales = "free_y", ncol = 1) +
  ggtitle("Comparing two time series with facets may reduce comparability")

# dualaxes ---- # URL: https://freerangestats.info/blog/2016/08/18/dualaxes library(readxl) library(tidyverse) library(scales) devtools::install_github("ellisp/ggseas/pkg") library(ggseas) # for stat_index() library(grid) library(gridExtra) # Download data from the Reserve Bank of New Zealand download.file("http://www.rbnz.govt.nz/-/media/ReserveBank/Files/Statistics/Key%20graphs/graphdata.xlsx?la=en", destfile = "data/rbnz.xlsx", mode = "wb" ) # Import some of that data into R and create a numeric TimePeriod variable from the original # string that shows year and month: forex <- read_excel( path = "data/rbnz.xlsx", sheet = "NZDUSD", skip = 4, col_names = c("DATE", "NZDUSD", "TWI"), col_types = c("date", "numeric", "numeric") ) forex <- forex %>% mutate( year = year(DATE), month = month(DATE), TimePeriod = year + (month - 0.5) / 12 ) %>% select(-DATE, -year, -month) |> na.omit() # Create a long, thin ("tidy") version for use with {ggplot2}: forex_m <- forex %>% gather(variable, value, -TimePeriod) # Set the basic foundation of the coming ggplot graphics: basicplot <- ggplot( data = forex_m, mapping = aes(x = TimePeriod, y = value, colour = variable) ) + labs( x = NULL, caption = "Data from RBNZ; graphic by http://ellisp.github.io", colour = "" ) ## facet versions ---- # Good facet plot basicplot + geom_line() + facet_wrap(facets = ~variable, scales = "free_y", ncol = 1) + ggtitle("Comparing two time series with facets may reduce comparability")

# Good index plot
basicplot +
  stat_index(index.ref = 1) +
  labs(y = "Index (January 1984 = 100)") +
  ggtitle("Usually accepted version of comparing two time series",
    subtitle = "Converted to an index, reference period first point in time"
  )

# Good index plot basicplot + stat_index(index.ref = 1) + labs(y = "Index (January 1984 = 100)") + ggtitle("Usually accepted version of comparing two time series", subtitle = "Converted to an index, reference period first point in time" )

# Also a good index plot,
# but showing that arbitrary choices are still being made
basicplot +
  stat_index(index.ref = 361) +
  labs(y = "Index (January 2014 = 100)") +
  ggtitle("But then, a different picture?",
    subtitle = "Converted to an index, reference period chosen arbitrarily later in the series"
  )

# Also a good index plot, # but showing that arbitrary choices are still being made basicplot + stat_index(index.ref = 361) + labs(y = "Index (January 2014 = 100)") + ggtitle("But then, a different picture?", subtitle = "Converted to an index, reference period chosen arbitrarily later in the series" )

## connected scatterplot ----
forex %>%
  mutate(label = ifelse(round(TimePeriod - floor(TimePeriod), 3) == 0.042, substring(TimePeriod, 1, 4), "")) %>%
  ggplot(aes(x = NZDUSD, y = TWI, label = label, colour = TimePeriod)) +
  geom_path() +
  geom_text(fontface = "bold") +
  scale_colour_gradientn("", colours = c("grey75", "darkblue")) +
  ggtitle("Connected scatter plot may be the best analytically\nbut is intimidating to non-specialists")

## connected scatterplot ---- forex %>% mutate(label = ifelse(round(TimePeriod - floor(TimePeriod), 3) == 0.042, substring(TimePeriod, 1, 4), "")) %>% ggplot(aes(x = NZDUSD, y = TWI, label = label, colour = TimePeriod)) + geom_path() + geom_text(fontface = "bold") + scale_colour_gradientn("", colours = c("grey75", "darkblue")) + ggtitle("Connected scatter plot may be the best analytically\nbut is intimidating to non-specialists")

@freerangestats.info on the dangers of dual-axes plots.
freerangestats.info/blog/2016/08...

Amazing how these codes still run even without a lockfile or a docker environment
#rstats #econsky #dataviz

Code in ALT

26.07.2025 16:17 β€” πŸ‘ 32    πŸ” 7    πŸ’¬ 4    πŸ“Œ 0

marginalizing over random effects with posterior::rvar()s

24.07.2025 20:54 β€” πŸ‘ 2    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
plot of the what i just said

plot of the what i just said

yeah i fit a beta regression model with a 3-df spline on age and by-child random intercepts and computed marginal means by simulating and averaging 1000 children on each posterior draw... just to get the same thing as a LOESS smoooooth of the observations

24.07.2025 21:14 β€” πŸ‘ 32    πŸ” 2    πŸ’¬ 1    πŸ“Œ 1

(I interpreted this as "monotonic within groups", globally monotonic I don't have a clever solution for)

24.07.2025 05:03 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

add indices to duplicates, something like this?

index_duplicates = \(x) {
x = mtfrm(x)
paste(x, ave(x, x, FUN = seq_along))
}

newmatch = \(x, table, ...) match(index_duplicates(x), index_duplicates(table), ...)

24.07.2025 00:59 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Lol. Maybe a burner account just for searches?

(I admit I'm only offering bad ideas atm)

23.07.2025 18:31 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@mjskay.com is following 20 prominent accounts