Xan Gregg's Avatar

Xan Gregg

@xangregg.bsky.social

Engineering Fellow at JMP, focused on #DataViz, preferring smoothers over fitted lines. Creator of JMP #GraphBuilder and #PackedBars chart type for high-cardinality Pareto data. #TieDye #LessIsMore

1,715 Followers  |  1,811 Following  |  276 Posts  |  Joined: 04.11.2023
Posts Following

Posts by Xan Gregg (@xangregg.bsky.social)

Here's a line chart version of the same France mortality rate data with z-score on the Y axis and a separate line for each birth year. #dataviz

09.02.2026 18:43 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Combat exposure seems like a conspicuous factor, but then I would expect a wider orange band accounting for several birth years. I did see a few papers linking it to "in utero influenza exposure".

08.02.2026 18:07 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Heatmap of birth cohort on the x axis (1900 - 1970) and age on the y axis (0 - 70) showing relative mortality rates within France. Each cell is colored from blue-to-gray-to-orange (-5 to 0 to +5) by the z-score (standard deviations from the mean) when this age/birth year mortality rate is compared to the 20 nearby years at the same age. Vertical orange stripes appear at 1920 and 1946 indicating generally higher mortality rates for those birth cohorts. Diagonal orange stripes appear at birth+age equal to 1918, 1940 and 1945.

Heatmap of birth cohort on the x axis (1900 - 1970) and age on the y axis (0 - 70) showing relative mortality rates within France. Each cell is colored from blue-to-gray-to-orange (-5 to 0 to +5) by the z-score (standard deviations from the mean) when this age/birth year mortality rate is compared to the 20 nearby years at the same age. Vertical orange stripes appear at 1920 and 1946 indicating generally higher mortality rates for those birth cohorts. Diagonal orange stripes appear at birth+age equal to 1918, 1940 and 1945.

Playing with mortality data, I accidentally discovered how being born in the immediate aftermath of the Spanish flu pandemic is associated with later-life increased mortality rates in many European countries. #dataviz for France. Blog post rawdatastudies.com/2026/02/07/b...

08.02.2026 16:28 β€” πŸ‘ 14    πŸ” 4    πŸ’¬ 2    πŸ“Œ 1

Nice illustration. It seems that any of those definitions of prepandemic baseline would be equally valid/invalid for extrapolation. I don't see that choice explained in the paper.

08.02.2026 13:06 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

cc @jeanfisch.bsky.social who first pointed out this paper

08.02.2026 00:56 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Line chart of mortality rates for Poland and France since 1990. Dashed lines show the 2015-2019 linear trend extrapolated to the end year of the data (2023). Both rate curves jump in 2020 and fall back to pre-covid levels in 2023. However, Poland's trend line is upward and the extrapolated value is above the recent mortality rate value.

Line chart of mortality rates for Poland and France since 1990. Dashed lines show the 2015-2019 linear trend extrapolated to the end year of the data (2023). Both rate curves jump in 2020 and fall back to pre-covid levels in 2023. However, Poland's trend line is upward and the extrapolated value is above the recent mortality rate value.

I made this #dataviz trying to understand a study checking for covid displacement deaths. Apparently Poland is 1 of the 3 of 34 countries showing displacement b/c its recent mortality rate is below the pre-covid trend (my HMD data only goes to 2023) jamanetwork.com/journals/jam...

08.02.2026 00:49 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

More idealism: dataviz researchers no longer need to be so connected to computer science / programming skills.

06.02.2026 15:57 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Using AI to Replicate (and Extend) Data Visualization Experiments What happens when replicating a study is (almost) a few prompts away?

Interesting blog post and video from @ebertini.bsky.social on using AI for #dataviz research and experiments. filwd.substack.com/p/using-ai-t...

06.02.2026 12:54 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Two side by side CDF plots, each with a black reference line at y=x and a blue cumulative probability step line. The line for 2024 is completely below the reference line. The blue line for 2025 hugs the reference line, going above and below it.

Two side by side CDF plots, each with a black reference line at y=x and a blue cumulative probability step line. The line for 2024 is completely below the reference line. The blue line for 2025 hugs the reference line, going above and below it.

Two area charts representing the differences in each of the CDF plots from the y=x reference line.

Two area charts representing the differences in each of the CDF plots from the y=x reference line.

CDF plots are great but have you tried flattened CDF plots? #dataviz
rawdatastudies.com/2026/01/19/s...

20.01.2026 21:24 β€” πŸ‘ 14    πŸ” 2    πŸ’¬ 0    πŸ“Œ 2

Good #dataviz inspirations in slides and notes of less-common-but-useful statistical charts.

11.01.2026 15:16 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Excellent! What's the significance of the bold numbers in the color swatches, 50, 100, 200, ..., 950 (for 11 colors)? I was surprised they're not equally spaced but then thought I'm missing something important.

09.01.2026 13:38 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Line chart showing annual COβ‚‚-equivalent emissions (in MtCOβ‚‚e) for major non-state fossil-fuel and mining companies from roughly 1935 to 2023. Chevron, ExxonMobil, BP, and Shell rise steeply from the 1950s to peaks in the late 1960s–early 1970s, with Chevron the highest at nearly 1,800 MtCOβ‚‚e. Most companies decline after the 1970s, then level off or fluctuate. Coal-focused firms such as Peabody Energy and CONSOL Energy peak later, around the 2000s.

Line chart showing annual COβ‚‚-equivalent emissions (in MtCOβ‚‚e) for major non-state fossil-fuel and mining companies from roughly 1935 to 2023. Chevron, ExxonMobil, BP, and Shell rise steeply from the 1950s to peaks in the late 1960s–early 1970s, with Chevron the highest at nearly 1,800 MtCOβ‚‚e. Most companies decline after the 1970s, then level off or fluctuate. Coal-focused firms such as Peabody Energy and CONSOL Energy peak later, around the 2000s.

Filled area chart of total yearly COβ‚‚-equivalent emissions (in MtCOβ‚‚e) from carbonmajors.org, spanning roughly 1935 to 2023. Emissions rise gradually until the 1950s, accelerate sharply through the 1960s and early 1970s, dip slightly in the late 1970s and early 1980s, then resume steady growth. Totals surpass 20,000 MtCOβ‚‚e around 1990 and reach over 30,000 MtCOβ‚‚e by the 2010s, remaining near that level through the most recent years.

Filled area chart of total yearly COβ‚‚-equivalent emissions (in MtCOβ‚‚e) from carbonmajors.org, spanning roughly 1935 to 2023. Emissions rise gradually until the 1950s, accelerate sharply through the 1960s and early 1970s, dip slightly in the late 1970s and early 1980s, then resume steady growth. Totals surpass 20,000 MtCOβ‚‚e around 1990 and reach over 30,000 MtCOβ‚‚e by the 2010s, remaining near that level through the most recent years.

More #dataviz from carbonmajors.org data. Yearly emissions (smoothed) from top 10 cumulative non-state entities. Didn't expect such a spike in the 70s (without a recovery). Looks like the all-entity sum (area chart) only paused at that point.

06.01.2026 22:08 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

For a dataviz person, it's probably easiest to understand as a new treemap packing algorithm. That is, each area is proportional to some value and the total area is the total sum. But the top values are arranged like a ranked bar chart. In this case, the top bar is too big for a rectangular total.

06.01.2026 12:46 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

The carbon majors database doesn't include countries, so that was just my carelessness in company-to-country mapping.

06.01.2026 00:33 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Very nice!

05.01.2026 19:20 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Horizontal packed bar chart ranking entities by cumulative emissions (GtCOβ‚‚-equivalent), using data from carbonmajors.org. China (Coal) is by far the largest contributor, followed by uSSR (Coal), Saudi Arabia (Oil), Chevron, ExxonMobil, and Russia (Natural Gas). Other major contributors include BP, Shell, India (Coal), Iran (Oil), and China (Oil). Each row shows one dominant producer in blue, with many smaller producers stacked to the right in light gray, illustrating the long tail beyond the top emitters.

Horizontal packed bar chart ranking entities by cumulative emissions (GtCOβ‚‚-equivalent), using data from carbonmajors.org. China (Coal) is by far the largest contributor, followed by uSSR (Coal), Saudi Arabia (Oil), Chevron, ExxonMobil, and Russia (Natural Gas). Other major contributors include BP, Shell, India (Coal), Iran (Oil), and China (Oil). Each row shows one dominant producer in blue, with many smaller producers stacked to the right in light gray, illustrating the long tail beyond the top emitters.

Cumulative CO2e emissions since 1854. Packed bar chart of top 10 entities, and all the others in gray. Data from carbonmajors.org, but recoding/combining govt-controlled entities as Country (Fuel). #dataviz

05.01.2026 12:55 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

I came across another paper link.springer.com/article/10.1... with the exact same false Data Availability statement:

No datasets were generated or analysed during the current study.

Digging deeper, I found it's one of the example statements from Nature's guidance at www.nature.com/documents/nr...

04.01.2026 14:01 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Late follow-up, but looking closer at the article, I see these are essentially bootstrap confidence intervals of the mean. They ran many simulations of the model, and the bands show quantiles of the many simulation results. The labels come from IPCC's "calibrated likelihood" language.

03.01.2026 19:53 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Data Strips: Quintiles vs. Box Plots – Raw Data Studies Experiments with a new β€œquintile area” strip plot, prompted by skewed box plots in a biology paper, ended up clarifying why box plots remain so robust.

My venture into quintile area strip plots ended up giving me greater appreciation for the quiet strengths of box plots. #dataviz New blog post: rawdatastudies.com/2026/01/02/d...

02.01.2026 16:32 β€” πŸ‘ 9    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

That sounds like what I saw a while back in a Nature paper but still don't know a common name for it. I would guess "quantile bands" but I mainly only see that name used with bands around 2D line charts. bsky.app/profile/xang...

01.01.2026 22:45 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Thanks. Now I can see how the two end regions might look like one pole when they're about the same "thickness".
A transformation would help the other values look less varied (and help deal with extreme densities, too), but I suspect the ranking of heights will still be prominent. I'll keep at it...

01.01.2026 18:02 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

To be fair to the quintile/quartile area variants, they do perform well for very skewed distributions like these bacterial samples from a PLOS Biology paper that put me on this path. journals.plos.org/plosbiology/...
Experiment yourself at xangregg.github.io/data-strips/ #dataviz

01.01.2026 17:50 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0
Chart of four horizontal representations of a sample (n=100) from a Gaussian distribution. Quintile Area, Quartile Area, Grubbs Box and Tukey Box. Of note, the "Area" plots have their box heights sized by density (so their areas represent counts) and show noticeable variation. For the quintile area chart, the five relative heights appear to be 1, 10, 9, 8 and 2.

Chart of four horizontal representations of a sample (n=100) from a Gaussian distribution. Quintile Area, Quartile Area, Grubbs Box and Tukey Box. Of note, the "Area" plots have their box heights sized by density (so their areas represent counts) and show noticeable variation. For the quintile area chart, the five relative heights appear to be 1, 10, 9, 8 and 2.

Quintile area plot: this seemed like a great idea as I lay awake thinking about box plot alternatives. But now I see it's terrible for most distributions, like this random Gaussian sample. It draws too much attention to the meaningless variation. Now at xangregg.github.io/data-strips/ #dataviz

31.12.2025 19:06 β€” πŸ‘ 12    πŸ” 1    πŸ’¬ 2    πŸ“Œ 1

I haven't seen that book, but I did worry about such interpretations.

30.12.2025 19:40 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Thanks. I edited the post to avoid the apparent text encoding/escaping error this morning, and it seems to have propagated through the caches now.

30.12.2025 19:34 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Daily Greenland surface melt area with 2019 highlighted against a historical reference range.
β€”
Line chart showing daily surface melt area of the Greenland ice sheet from April through October, measured in thousands of square kilometers. A blue line represents the median melt area for the 1981–2010 reference period, surrounded by a light blue shaded band indicating the range that contains 80% of historical readings. A red line overlays the year 2019, which follows the historical median early in the season before rising sharply in early June to values well above the reference range. The chart emphasizes the abruptness of the 2019 spike while also showing substantial variability within the historical distribution.
https://www.economist.com/graphic-detail/2019/06/17/the-greenland-ice-sheet-is-melting-unusually-fast

Daily Greenland surface melt area with 2019 highlighted against a historical reference range. β€” Line chart showing daily surface melt area of the Greenland ice sheet from April through October, measured in thousands of square kilometers. A blue line represents the median melt area for the 1981–2010 reference period, surrounded by a light blue shaded band indicating the range that contains 80% of historical readings. A red line overlays the year 2019, which follows the historical median early in the season before rising sharply in early June to values well above the reference range. The chart emphasizes the abruptness of the 2019 spike while also showing substantial variability within the historical distribution. https://www.economist.com/graphic-detail/2019/06/17/the-greenland-ice-sheet-is-melting-unusually-fast

Daily Greenland surface melt area showing multiple extreme melt years alongside a historical reference range.
β€”
Line chart of daily Greenland ice sheet surface melt area from April through October, measured in thousands of square kilometers. A dark blue line shows the median melt area for the 1981–2010 reference period, with a shaded band indicating the range containing 80% of historical observations. Several individual high-melt years are overlaid and labeled, including 2002, 2007, and 2012. The year 2019 is highlighted in red and shows a sharp early-season rise that exceeds the reference range but remains lower than the most extreme peaks in 2002 and 2012. By plotting multiple high-melt years together, the chart places the 2019 spike in the context of earlier extreme events rather than presenting it in isolation.
https://www.economist.com/science-and-technology/2019/06/22/greenlands-ice-sheet-is-melting-unusually-fast

Daily Greenland surface melt area showing multiple extreme melt years alongside a historical reference range. β€” Line chart of daily Greenland ice sheet surface melt area from April through October, measured in thousands of square kilometers. A dark blue line shows the median melt area for the 1981–2010 reference period, with a shaded band indicating the range containing 80% of historical observations. Several individual high-melt years are overlaid and labeled, including 2002, 2007, and 2012. The year 2019 is highlighted in red and shows a sharp early-season rise that exceeds the reference range but remains lower than the most extreme peaks in 2002 and 2012. By plotting multiple high-melt years together, the chart places the 2019 spike in the context of earlier extreme events rather than presenting it in isolation. https://www.economist.com/science-and-technology/2019/06/22/greenlands-ice-sheet-is-melting-unusually-fast

Anyone remember this alarming Economist chart from 2019 that accidentally taught us all a lesson on individual observations vs group averages? Not that the melting wasn't bad, but their follow-up added useful context. My deeper #dataviz dive with latest data: rawdatastudies.com/2025/12/29/g...

29.12.2025 14:02 β€” πŸ‘ 20    πŸ” 3    πŸ’¬ 2    πŸ“Œ 0
Screenshot from a Chartle episode. Trying to guess the marked line in a spaghetti plot. Several close but wrong guesses are highlighted.

Screenshot from a Chartle episode. Trying to guess the marked line in a spaghetti plot. Several close but wrong guesses are highlighted.

Screenshot from a Chartle episode. Trying to guess the marked line in a spaghetti plot. Several close but wrong guesses are highlighted.

Screenshot from a Chartle episode. Trying to guess the marked line in a spaghetti plot. Several close but wrong guesses are highlighted.

I'm still enjoying @chartle.cc, but it would be nice to get a little credit for near misses.

28.12.2025 16:45 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

The kicker from investigating the data behind this Pac-Man radar area chart: it's showing values normalized to themselves, so the "relative efficacy" is necessarily all 1.0 except where zero (which is the mouth).

15.12.2025 15:47 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
From radar charts to curve fitting and back – Raw Data Studies An exploration of radar charts from a recent Nature article, tracing the path from radars to fitted sigmoidal curves to alternate derived summary views.

I turned my Pac-Man radar chart investigation into a blog post, including two AI assists: help extracting tidy data from a semi-structured Excel file and writing alt-texts for the images. #dataviz rawdatastudies.com/2025/12/15/f...

15.12.2025 15:12 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1
excerpt of a figure from a Nature article showing three radar charts (and three related 1D heatmaps)
source: https://www.nature.com/articles/s41586-025-09643-2/figures/1

excerpt of a figure from a Nature article showing three radar charts (and three related 1D heatmaps) source: https://www.nature.com/articles/s41586-025-09643-2/figures/1

I'm already not a fan of radar charts for #dataviz, but especially when they look like objects. I see Pac-Man, Germany?, and ... Excalibur?

12.12.2025 16:45 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0