Paulius Alaburda's Avatar

Paulius Alaburda

@alaburda.bsky.social

Love all things R, data, medicine and energy! Head of Data Analytics @ Ignitis Lithuania ๐Ÿ‡ฑ๐Ÿ‡น

82 Followers  |  189 Following  |  59 Posts  |  Joined: 17.08.2023  |  1.8868

Latest posts by alaburda.bsky.social on Bluesky

Preview
Calling Bullshit: Data Reasoning in a Digital World The world is awash in bullshit. Politicians are unconstrained by facts. Science is conducted by press release. Higher education rewards bullshit over analytic thought. Startup culture elevates bullshi...

Continuing my tour of books I should have already read, Calling Bullshit by @carlbergstrom.com and @jevinwest.bsky.social. Just a delight - an accessible, entertaining, insightful look at various forms of BS. Much like Weapons of Math Destruction, would love a 2025 update of this one.

07.10.2025 00:55 โ€” ๐Ÿ‘ 364    ๐Ÿ” 71    ๐Ÿ’ฌ 11    ๐Ÿ“Œ 6
Preview
How I, a non-developer, read the tutorial you, a developer, wrote for me, a beginner - annie's blog โ€œHello! I am a developer. Here is my relevant experience: I code in Hoobijag and sometimes jabbernocks and of course ABCDE++++ (but never ABCDE+/^+ are you kidding? ha!)  and I like working with ...

"How I, a non-developer, read the tutorial you, a developer, wrote for me, a beginner" by Annie Mueller ๐Ÿ˜… ๐Ÿ˜‚ ๐Ÿ˜ญ

anniemueller.com/posts/how-i-...

23.09.2025 07:57 โ€” ๐Ÿ‘ 323    ๐Ÿ” 95    ๐Ÿ’ฌ 15    ๐Ÿ“Œ 30
Video thumbnail

BIG NEWS! We've updated the website of the Open Visualization Academy, where you can see all its contributors: openvisualizationacademy.org

This is the announcement in our newsletter: openvisualizationacademy.beehiiv.com/p/we-re-back...

#dataViz #infographics #dataJournalism #dataVis ๐Ÿ“Š

1/x

25.09.2025 18:05 โ€” ๐Ÿ‘ 133    ๐Ÿ” 40    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 4
Screenshot of first page of slidecrafting-book.com website

Screenshot of first page of slidecrafting-book.com website

I'm exited to announce a new resource about making slides with quarto and revealjs. This book is the combination of all the work I have done in this area, reordered and polished up

There isn't a lot of new information yet, but this format allows me to add more easily

slidecrafting-book.com
#quarto

24.09.2025 16:12 โ€” ๐Ÿ‘ 179    ๐Ÿ” 64    ๐Ÿ’ฌ 11    ๐Ÿ“Œ 6
Preview
One mother for two species via obligate cross-species cloning in ants - Nature In a case of obligate cross-species cloning, female ants of Messor ibericus need to clone males of Messor structor to obtain sperm for producing the worker caste, resulting in males from the same mother having distinct genomes and morphologies.

It's never occurred to me that it IS an assumption. This is the most astonishing start to a paper I've read in years:

"Living organisms are assumed to produce same-species offspring. Here, we report a shift from this norm in Messor ibericus, an ant that lays individuals from two distinct species."

24.09.2025 17:26 โ€” ๐Ÿ‘ 451    ๐Ÿ” 101    ๐Ÿ’ฌ 19    ๐Ÿ“Œ 22

For instance, yesterday I read a paper with a table describing participants' sickness absence days with a mean of 71 and SD = 88. Generating a random (gaussian) sample using these values produces ~20% participants with less than zero sick days.

22.09.2025 13:24 โ€” ๐Ÿ‘ 3    ๐Ÿ” 1    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 2
Preview
Review: Dashboards That Deliver, Nightingale Dashboards That Deliver: How to Design, Develop, and Deploy Dashboards That Work, the upcoming book by Andy Cotgreave, Amanda Makulec, Jeffrey Shaffer, and...

๐Ÿ“Š The new book Dashboards That Deliver is a tour de force of data visualization and project management expertise.

Emilia Ruzicka reviews how the authors lay bare their expertise and process for everyone to benefit from their decades of experience.

nightingaledvs.com/review-dashb...

22.09.2025 14:12 โ€” ๐Ÿ‘ 8    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
Barcode plots showing the distribution of age of grandmaster, international master, fide master, and candidate master for male and female chess players. Age seems to be less related to title for male players.

Barcode plots showing the distribution of age of grandmaster, international master, fide master, and candidate master for male and female chess players. Age seems to be less related to title for male players.

For this week's #TidyTuesday chess player rating data, I made an annotated barcode plot to show the distribution of age by title โ™Ÿ๏ธ It was hard to set a good transparency level for the lines since there's such a difference between the number of male and female players ๐Ÿ“Š

#RStats #DataViz #ggplot2

22.09.2025 10:39 โ€” ๐Ÿ‘ 37    ๐Ÿ” 6    ๐Ÿ’ฌ 5    ๐Ÿ“Œ 1
Preview
Chartle - A daily chart game Guess the country in red by analysing today's chart

Launch day ๐Ÿš€

Weโ€™ve just released @chartlecc.bsky.social - a daily chart game!

Your job is to guess which country is represented by the red line in today's chart. You get 5 tries, no other clues!

Play today, come back tomorrow for a different chart with new data and share with your chart friends ๐Ÿ“ˆ

12.09.2025 13:41 โ€” ๐Ÿ‘ 113    ๐Ÿ” 50    ๐Ÿ’ฌ 13    ๐Ÿ“Œ 24
Preview
DataBS Conf "Data, Behind the Scenes" is a free-to-attendย online-only, single trackย conference centered on the real stories of data work from the folks in the trenches. Weโ€™re not here for the latest AI hype, perf...

#DataBS Conf 2025 preshow! We have two talks that we couldn't fit into the schedule but the speakers pre-recorded their talk for us to share before the main event next week!

Both are really good and give me lots of excitement about what we'll see next week.

ti.to/databsconf/d... <- free tix

๐Ÿงต1/3

18.09.2025 15:43 โ€” ๐Ÿ‘ 10    ๐Ÿ” 8    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 4

Shannon's slides are always so unbelievably clear and helpful!!!

github.com/shannonpileg...

I'm having "Ohhhhh that's what that means" moments every 10 seconds here.
#positconf2025

18.09.2025 15:09 โ€” ๐Ÿ‘ 37    ๐Ÿ” 15    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

This is pretty cool: UDFs went into #PowerBI yesterday; and today I'm using them in a non-trivial manner in an actual report.

@jaypowerbi.bsky.social Good job if this is you.

17.09.2025 23:59 โ€” ๐Ÿ‘ 7    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Grid of ternary plots showing the percentage of fat, carbohydrates, and protein for different recipes on allrecipes.com, split by Italian, Cuban, French, Greek, Lebanese, and Japanese cuisines. Italian shows no points with high level of fat.

Grid of ternary plots showing the percentage of fat, carbohydrates, and protein for different recipes on allrecipes.com, split by Italian, Cuban, French, Greek, Lebanese, and Japanese cuisines. Italian shows no points with high level of fat.

I normally only see these ternary plots used to show UK election results, but decided to see if I could make them work for this week's #TidyTuesday data from Allrecipes ๐Ÿ“Š A little bit tricky to add annotations to but overall, I quite like the result!

#RStats #ggplot2 #DataViz

15.09.2025 12:49 โ€” ๐Ÿ‘ 63    ๐Ÿ” 5    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 2
A scatterplot on a cream white background with the title "The chicken ๐Ÿ“ or the egg ๐Ÿฅš?". The square grid is split by four reddish arrows pointing outwards north/south/east/west. A textbox in the top left reads "Based on 2,218 recipes categorized by the cuisine (country, region, culture), this graph shows the proportional frequency of the ingredients chicken vs. egg (x-axis) & butter vs. oil (y-axis) mentioned across recipes of each cuisine". Each point and text in the plot indicates where on a scale from chicken (left) to egg (right) and butter (top) to oil (bottom) the cuisine is located. Turkish is found at the very center of the plot, Chinese and Thai in the bottom left (chicken & oil), whereas Austrian/German/Swiss and particularly Scandinavian are in the top right (egg & butter). An arrow points at the top right data point saying "Scandinavian cuisine is a clear outlier". Visualization: C. Bรถrstell; Data: allrecipes.com via {tastyR} & TidyTuesday; Packages: {ggarrow, ggrepel, ggtext, tidyverse}

A scatterplot on a cream white background with the title "The chicken ๐Ÿ“ or the egg ๐Ÿฅš?". The square grid is split by four reddish arrows pointing outwards north/south/east/west. A textbox in the top left reads "Based on 2,218 recipes categorized by the cuisine (country, region, culture), this graph shows the proportional frequency of the ingredients chicken vs. egg (x-axis) & butter vs. oil (y-axis) mentioned across recipes of each cuisine". Each point and text in the plot indicates where on a scale from chicken (left) to egg (right) and butter (top) to oil (bottom) the cuisine is located. Turkish is found at the very center of the plot, Chinese and Thai in the bottom left (chicken & oil), whereas Austrian/German/Swiss and particularly Scandinavian are in the top right (egg & butter). An arrow points at the top right data point saying "Scandinavian cuisine is a clear outlier". Visualization: C. Bรถrstell; Data: allrecipes.com via {tastyR} & TidyTuesday; Packages: {ggarrow, ggrepel, ggtext, tidyverse}

The ๐Ÿ— or the ๐Ÿณ? #TidyTuesday

Looking at the ingredients of over 2000 recipes online, where are different cuisines found on the chicken vs. egg (x-axis) and butter vs. oil (y-axis) scales?

As a Scandinavian, I guess I'm part of the egg+butter outlier!

Code: github.com/borstell/tid...

#R4DS

15.09.2025 12:52 โ€” ๐Ÿ‘ 46    ๐Ÿ” 16    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 3
Post image

This week #TidyTuesday exploring a curated collection of recipes collected from Allrecipes website. I created donut chart subplots to look like plates and added a fork and knife image to each.
#pydytuesday #dataviz

16.09.2025 16:42 โ€” ๐Ÿ‘ 18    ๐Ÿ” 4    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

The only fun name I've found is GitLab data team's google sheets loader called "sheetload" ๐Ÿ˜

17.09.2025 15:42 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

New #rstats blog post: Deep Dive into ellmer: Part 2.

I explore the source code behind ellmer's tool calling functionality:

www.howardbaik.com/posts/deep-d...

Thanks to @hadley.nz, @grrrck.xyz, @atheriel.bsky.social, others for ๐Ÿ˜ellmer๐Ÿ˜!

16.09.2025 15:22 โ€” ๐Ÿ‘ 13    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

SquidSim is the coolest package - it let's you build complex hierarchical data structures and then simulate data from the world you create. The best tool for doing proper power analyses and testing how well your models can uncover the 'truth'. I've been recommending it to everyone!

15.09.2025 17:08 โ€” ๐Ÿ‘ 71    ๐Ÿ” 18    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Video thumbnail

Had the pleasure of teaching a workshop on #Shiny today.
The {shiny} package can be used to make a web-app that uses R (or Python) under the hood. Ideal for interactive #dataViz

15.09.2025 17:37 โ€” ๐Ÿ‘ 20    ๐Ÿ” 5    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
"I'm just a goat, standing in front of a contestant, asking them to choose me"

"I'm just a goat, standing in front of a contestant, asking them to choose me"

Monty Hall reminder: the only good reason to want the car is to sell it for more goats.

15.09.2025 07:34 โ€” ๐Ÿ‘ 22    ๐Ÿ” 7    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
If all the world were a monorepo The R ecosystem and the case for extreme empathy in software maintenance

Really insightful post from Julie Tibshirani (spotted in LinkedIn, can't find on Bsky) reflecting on #rstats 's unique governance structure and what can be learned for other languages

jtibs.substack.com/p/if-all-the...

14.09.2025 23:29 โ€” ๐Ÿ‘ 126    ๐Ÿ” 49    ๐Ÿ’ฌ 7    ๐Ÿ“Œ 8
output from a GAM in the linked essay

output from a GAM in the linked essay

Simon Wood, the GOAT of generalized additive models & creator of the mgcv #rstats package, has an Annual Review of Statistics essay on GAMs, available open access #statssky #mlsky

www.annualreviews.org/content/jour...

10.09.2025 02:14 โ€” ๐Ÿ‘ 89    ๐Ÿ” 41    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1

The move patterns are VERY frustrating too

09.09.2025 05:45 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
A plot with a top panel with a histogram showing all penguin weights, with a bottom faceted panel with species-specific weight histograms

A plot with a top panel with a histogram showing all penguin weights, with a bottom faceted panel with species-specific weight histograms

library(tidyverse)
library(ggtext)
library(patchwork)
library(scales)

top_plot <- ggplot(penguins, aes(x = body_mass)) +
  geom_histogram(binwidth = 100, color = "white", boundary = 0) +
  scale_x_continuous(
    breaks = seq(2500, 6500, by = 1000),
    limits = c(2500, 6500),
    labels = label_comma()
  ) +
  labs(title = "All penguins", x = NULL, y = "Count") +
  theme_bw() +
  theme(
    plot.title = element_textbox_simple(
      face = "bold",
      fill = "grey75",
      size = rel(0.85),
      halign = 0,
      linetype = 1,
      linewidth = 0.2,
      padding = margin(5, 5, 5, 5)
    ),
    strip.background = element_rect(fill = "grey92"),
    strip.text = element_text(hjust = 0),
    axis.title.y = element_text(hjust = 1)
  )

bottom_plot <- penguins |>
  ggplot(aes(x = body_mass, fill = species)) +
  geom_histogram(binwidth = 100, color = "white", boundary = 0) +
  scale_x_continuous(
    breaks = seq(2500, 6500, by = 1000),
    limits = c(2500, 6500),
    labels = label_comma()
  ) +
  guides(fill = "none") +
  facet_wrap(vars(species), ncol = 1) +
  labs(x = "Body mass (g)", y = "Count", title = "Specific penguin species") +
  theme_bw() +
  theme(
    plot.title = element_textbox_simple(
      face = "bold",
      fill = "grey75",
      size = rel(0.85),
      halign = 0,
      linetype = 1,
      linewidth = 0.2,
      padding = margin(5, 5, 5, 5)
    ),
    strip.background = element_rect(fill = "grey92"),
    strip.text = element_text(hjust = 0),
    axis.title.x = element_text(hjust = 0),
    axis.title.y = element_text(hjust = 1)
  )

(top_plot / bottom_plot) +
  plot_layout(heights = c(0.25, 0.75))

library(tidyverse) library(ggtext) library(patchwork) library(scales) top_plot <- ggplot(penguins, aes(x = body_mass)) + geom_histogram(binwidth = 100, color = "white", boundary = 0) + scale_x_continuous( breaks = seq(2500, 6500, by = 1000), limits = c(2500, 6500), labels = label_comma() ) + labs(title = "All penguins", x = NULL, y = "Count") + theme_bw() + theme( plot.title = element_textbox_simple( face = "bold", fill = "grey75", size = rel(0.85), halign = 0, linetype = 1, linewidth = 0.2, padding = margin(5, 5, 5, 5) ), strip.background = element_rect(fill = "grey92"), strip.text = element_text(hjust = 0), axis.title.y = element_text(hjust = 1) ) bottom_plot <- penguins |> ggplot(aes(x = body_mass, fill = species)) + geom_histogram(binwidth = 100, color = "white", boundary = 0) + scale_x_continuous( breaks = seq(2500, 6500, by = 1000), limits = c(2500, 6500), labels = label_comma() ) + guides(fill = "none") + facet_wrap(vars(species), ncol = 1) + labs(x = "Body mass (g)", y = "Count", title = "Specific penguin species") + theme_bw() + theme( plot.title = element_textbox_simple( face = "bold", fill = "grey75", size = rel(0.85), halign = 0, linetype = 1, linewidth = 0.2, padding = margin(5, 5, 5, 5) ), strip.background = element_rect(fill = "grey92"), strip.text = element_text(hjust = 0), axis.title.x = element_text(hjust = 0), axis.title.y = element_text(hjust = 1) ) (top_plot / bottom_plot) + plot_layout(heights = c(0.25, 0.75))

The {ggh4x} package has neat support for nested facets for ggplot, but it wasn't quite working for a thing I was making, but I made a neat plot with fake nested facets with a combination of {ggtext} and {patchwork}! #rstats

(code here: gist.github.com/andrewheiss/...)

29.08.2025 14:02 โ€” ๐Ÿ‘ 47    ๐Ÿ” 5    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
An abstract pattern of interlocking hexagons, with a central group of colorful and detailed hexagons featuring various logos, characters, and text.

An abstract pattern of interlocking hexagons, with a central group of colorful and detailed hexagons featuring various logos, characters, and text.

New from Posit! The August Glimpse newsletter is here, featuring the new free IDE Positron, complete with LLM-powered tools Positron Assistant and Databot. Plus, updates to Quarto and Shiny for Python!

Check out the post here: posit.co/blog/posit-g...

#RStats #Python

28.08.2025 15:18 โ€” ๐Ÿ‘ 22    ๐Ÿ” 5    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Conversation: LLMs and Building Abstractions How should we work with LLMs when growing abstractions?

NEW POST

Unmesh Joshi and I had an interesting email conversation about how when programming with an LLM he likes to grow a language of abstractions.

martinfowler.com/articles/con...

26.08.2025 13:34 โ€” ๐Ÿ‘ 18    ๐Ÿ” 4    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1

#statstab #405 Best Practices for Estimating, Interpreting, and
Presenting Nonlinear Interaction Effects

Thoughts: Guidance on nonlinear interactions, reporting (probabilities) and visualisations.

#probit #logit #logisticregression #nonlinear #guide

sociologicalscience.com/download/vol...

22.08.2025 19:20 โ€” ๐Ÿ‘ 44    ๐Ÿ” 10    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 2
Models as Prediction Machines: How to Convert Confusing Coefficients into Clear Quantities

Abstract
Psychological researchers usually make sense of regression models by interpreting coefficient estimates directly. This works well enough for simple linear models, but is more challenging for more complex models with, for example, categorical variables, interactions, non-linearities, and hierarchical structures. Here, we introduce an alternative approach to making sense of statistical models. The central idea is to abstract away from the mechanics of estimation, and to treat models as โ€œcounterfactual prediction machines,โ€ which are subsequently queried to estimate quantities and conduct tests that matter substantively. This workflow is model-agnostic; it can be applied in a consistent fashion to draw causal or descriptive inference from a wide range of models. We illustrate how to implement this workflow with the marginaleffects package, which supports over 100 different classes of models in R and Python, and present two worked examples. These examples show how the workflow can be applied across designs (e.g., observational study, randomized experiment) to answer different research questions (e.g., associations, causal effects, effect heterogeneity) while facing various challenges (e.g., controlling for confounders in a flexible manner, modelling ordinal outcomes, and interpreting non-linear models).

Models as Prediction Machines: How to Convert Confusing Coefficients into Clear Quantities Abstract Psychological researchers usually make sense of regression models by interpreting coefficient estimates directly. This works well enough for simple linear models, but is more challenging for more complex models with, for example, categorical variables, interactions, non-linearities, and hierarchical structures. Here, we introduce an alternative approach to making sense of statistical models. The central idea is to abstract away from the mechanics of estimation, and to treat models as โ€œcounterfactual prediction machines,โ€ which are subsequently queried to estimate quantities and conduct tests that matter substantively. This workflow is model-agnostic; it can be applied in a consistent fashion to draw causal or descriptive inference from a wide range of models. We illustrate how to implement this workflow with the marginaleffects package, which supports over 100 different classes of models in R and Python, and present two worked examples. These examples show how the workflow can be applied across designs (e.g., observational study, randomized experiment) to answer different research questions (e.g., associations, causal effects, effect heterogeneity) while facing various challenges (e.g., controlling for confounders in a flexible manner, modelling ordinal outcomes, and interpreting non-linear models).

Figure illustrating model predictions. On the X-axis the predictor, annual gross income in Euro. On the Y-axis the outcome, predicted life satisfaction. A solid line marks the curve of predictions on which individual data points are marked as model-implied outcomes at incomes of interest. Comparing two such predictions gives us a comparison. We can also fit a tangent to the line of predictions, which illustrates the slope at any given point of the curve.

Figure illustrating model predictions. On the X-axis the predictor, annual gross income in Euro. On the Y-axis the outcome, predicted life satisfaction. A solid line marks the curve of predictions on which individual data points are marked as model-implied outcomes at incomes of interest. Comparing two such predictions gives us a comparison. We can also fit a tangent to the line of predictions, which illustrates the slope at any given point of the curve.

A figure illustrating various ways to include age as a predictor in a model. On the x-axis age (predictor), on the y-axis the outcome (model-implied importance of friends, including confidence intervals).

Illustrated are 
1. age as a categorical predictor, resultings in the predictions bouncing around a lot with wide confidence intervals
2. age as a linear predictor, which forces a straight line through the data points that has a very tight confidence band and
3. age splines, which lies somewhere in between as it smoothly follows the data but has more uncertainty than the straight line.

A figure illustrating various ways to include age as a predictor in a model. On the x-axis age (predictor), on the y-axis the outcome (model-implied importance of friends, including confidence intervals). Illustrated are 1. age as a categorical predictor, resultings in the predictions bouncing around a lot with wide confidence intervals 2. age as a linear predictor, which forces a straight line through the data points that has a very tight confidence band and 3. age splines, which lies somewhere in between as it smoothly follows the data but has more uncertainty than the straight line.

Ever stared at a table of regression coefficients & wondered what you're doing with your life?

Very excited to share this gentle introduction to another way of making sense of statistical models (w @vincentab.bsky.social)
Preprint: doi.org/10.31234/osf...
Website: j-rohrer.github.io/marginal-psy...

25.08.2025 11:49 โ€” ๐Ÿ‘ 942    ๐Ÿ” 283    ๐Ÿ’ฌ 49    ๐Ÿ“Œ 19
Preview
A review of spline function procedures in R - BMC Medical Research Methodology Background With progress on both the theoretical and the computational fronts the use of spline modelling has become an established tool in statistical regression analysis. An important issue in splin...

The paper about splines in R that I wish I had known about when I was learning splines in R.

bmcmedresmethodol.biomedcentral.com/articles/10....

22.08.2025 16:51 โ€” ๐Ÿ‘ 43    ๐Ÿ” 8    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1

my lil reproducibility talk from today / I really wanted to instill the PhD students some simple first practices and ways to step up your game from there github.com/tjmahr/2025-...

20.08.2025 23:52 โ€” ๐Ÿ‘ 51    ๐Ÿ” 13    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 3

@alaburda is following 20 prominent accounts