The real magic is โจ๏ธregexโจ๏ธ
15.10.2025 18:46 โ ๐ 6 ๐ 1 ๐ฌ 0 ๐ 0@cborstell.bsky.social
Linguist at the University of Bergen ๐ณ๐ด #SignLanguages, #linguistics, #RStats & #dataviz
The real magic is โจ๏ธregexโจ๏ธ
15.10.2025 18:46 โ ๐ 6 ๐ 1 ๐ฌ 0 ๐ 0A plot showing the "Prevalence of moderate or severe food insecurity in the total population of European areas (Northern, Eastern, Western, Southern)" as line charts from 2014 to 2024. Annual percentages (lines) and confidence intervals (ribbons) โ Dotted lines show European average. Data: The Food and Agriculture Organization of the United Nations (FAO); Packages: {tidyverse, ggtext, patchwork, rnaturalearth, scales}; Visualization: C. Bรถrstell. Next to each line chart is a map of Europe with the relevant countries (of the area) filled in with a color coding.
World Food Day #TidyTuesday
FAO's data on moderate/severe food insecurity across areas of Europe ๐
Code: github.com/borstell/tid...
A scatter plot on a white background with blue dots clearly along a linear correlation line between x and y axes
In 2017, I collected a bunch of ratings for a project. The participants were a messy group and the survey tool and design were not ideal, so I always wanted to recollect better data with a better experimental design.
I just did. Seems like the old ratings were pretty solid after all.
A plot that says Euroleague Basketball teams by stadium capacity on a offwhite background and black/gray text. Each data point is plotted as a basketball (emoji) and text label with the name of the team(s) playing at the arena, along a parabola simulating the trajectory of a basketball shot towards a schematic basketball hoop located in the bottom right corner. A caption reads: "There are 20 teams in Euroleague Basketball, playing at 19 unique arenas with a median capacity of 12,700 spectators. The biggest arena by capacity is Belgrade Arena which hosts the teams Crvena zvezda Meridianbet and Partizan and seats up to 18,386 people. The smallest arena is Salle Gaston Mรฉdecin which hosts Monaco and has a capacity of 5,000. Data: EuroLeague & Wikipedia via {TidyTuesday}; Packages: {tidyverse, ggrepel, ggtext, glue, scales} ; Visualization: C. Bรถrstell."
Euroleague Basketball #TidyTuesday
A mini dataset, so decided to find a way to plot stadium capacity in an interesting way: went with points along the trajectory of a basketball shot! Swoosh! ๐
Code: github.com/borstell/tid...
#R4DS #DataViz
I did! Just one more reply down the thread! ๐
bsky.app/profile/cbor...
Line chart showing a simulation of the Monty Hall problem. Two lines representing the strategies stay vs switch quickly converge around the โ and โ probabilities of winning the car, respectively
I just wanted to simulate the Monty Hall problem, but couldn't get why the host would sometimes pick the door that the contestant had already chosen
๐ช๐๐ช๐๐ช๐
Trying to debug for much longer than needed before realizing that
sample(3, 1)
is interpreted as
sample(1:3, 1)
Could not understand what the issue was, thinking it must be my own code, but finally looked at ?sample documentation... #RStats
A map of Sweden in 10 different panels, each showing the log odds frequency of different 3-letter place name endings in various shades of red (darker means higher prevalence; gray means no data).
Something like this!
Tweaked the calculation to something I had originally intended and included 3-letter endings only.
โarp is very Skรฅne
I like the distribution of the โred/โrรถd/โryd endings, of the same origin
Code: github.com/borstell/maps
I haven't! But I should, I'll see if I can clean it up a bit, I was only trying to experiment with spatial data.
The method is definitely a bit weak as it's only looking among the just over 2000 "tรคtorter", which are already quite skewed in distribution and the number per municipality.
A graphic in the style of a yellowish page of sheet music. Title says "Taylor's Albums (melodized)". On the right, each album is represented by a line of sheet music. On the left, the text reads: "Each album as a single line of sheet music. The key of each track as a single note. Each note's duration as the relative track duration (z scored) across all albums."
It's super cool!
I used it for a bonus part TidyTuesday two years ago, making a tune out of Taylor Swift song data, saved as an mp3. Spoiler: the tune wasn't/isn't very good!
github.com/borstell/tid...
library(lubridate) yr <- 2025 # This doesn't work ymd(yr) #> Warning message: #> All formats failed to parse. No formats found. #> [1] NA #------- # TIRED #------- # I've done this for *yeeeeears* ymd(paste0(yr, "-01-01")) #> [1] "2025-01-01" #------- # WIRED #------- # The truncated argument tells {lubridate} that it can # ignore up to 2 (in this case) formats to look for. # Ordinarily `ymd()` looks for three formats: a year, # a month, and a day. `truncated = 2` means it can skip # the month and day parts ymd(yr, truncated = 2) #> [1] "2025-01-01" ymd(c("2025", "2025-10", "2025-10-02"), truncated = 2) #> [1] "2025-01-01" "2025-10-01" "2025-10-02"
I just learned about the `truncate` argument in {lubridate} functions, which means NO MORE HACKY paste0(year, "-01-01") code to build dates when converting years to dates in #rstats
02.10.2025 17:43 โ ๐ 95 ๐ 6 ๐ฌ 6 ๐ 2The meanings are more or less:
รฅ = river
berg = mountain
kvarn = mill
mark = land
รถ = island
ryd = open land
sta = place
torp = cottage
trรคsk = marsh
tuna = enclosed area, yard
Full disclosure I'm 100% not a historical linguist nor onomastician
Tun(a) comes from something like 'enclosed area, yard, plot of land' and is related to "town"!
01.10.2025 17:37 โ ๐ 3 ๐ 1 ๐ฌ 2 ๐ 0A map of Sweden in 10 different panels, each showing the proportional frequency of different place name endings in various shades of blue
Distributions of endings in Swedish place names
#RStats #DataViz
๐จ Theming got a huge overhaul with the latest #ggplot2 release. In honour of that @teunbrand.bsky.social has written a comprehensive deep-dive into styling your plots, covering both old and new functionality. Grab a coffee and dive in!
#rstats
A data visualization in the style of handwritten/-drawn notes on a piece of paper. Title reads "Cranes at Lake Hornborgasjรถn: Observations of cranes at the Swedish lake peak earlier over time". The graph shows ridgelines along dates from mid-March to end of April (x-axis) across the years 1994 to 2024 (y-axis). The first date each year where that year's maximum number of observations was reached is marked with a little handdrawn X on the year ridgelines. Across all years, there are lines showing the mean and median values for first max observations across years down the middle. X'es are generally farther to the right in early years (i.e. later in the spring) and farther to the left in more recent years (earlier in the spring). In the plot margins, there are embedded images of cranes in flight. Data: "Transtatistik", Naturum Hornborgasjรถn via TidyTuesday; Image: Gllawm, Wikimedia Commons curid=147233798; Packages: {tidyverse, magick}; Visualization: C. Bรถrstell
I partially contributed to this week's #TidyTuesday dataset of crane observations at the lake Hornborgasjรถn, ๐ธ๐ช
I made a minimalist plot of observation maxima โ earlier over time โ in the style of written notes.
{magick} magic for images!
Code: github.com/borstell/tid...
#R4DS #DataViz #ggplot2
Logo for the #TidyTuesday Project. The words TidyTuesday, A weekly data project from the Data Science Learning Community (dslc.io) overlaying a black paint splash.
TidyTuesday is a weekly social data project. All are welcome to participate! Please remember to share the code used to generate your results! TidyTuesday is organized by the Data Science Learning Community. Join our Slack for free online help with R and other data-related topics, or to participate in a data-related book club! How to Participate Data is posted to social media every Monday morning. Follow the instructions in the new post for how to download the data. Explore the data, watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about causation in the data. Create a visualization, a model, a shiny app, or some other piece of data-science-related output, using R or another programming language. Share your output and the code used to generate it on social media with the #TidyTuesday hashtag.
Large flock of cranes gathered in a grassy field during golden hour, with some birds in flight overhead and autumn-colored vegetation in the background.
Line chart showing the number of cranes per day at Lake Hornborga in Sweden during March and April from 2014-2021. Multiple gray lines represent different years, with the 2018 and 2021 seasons highlighted in blue. The chart shows crane migration patterns starting near zero in early March, building to peak numbers of 15,000-27,000 birds between March 30 and April 8 (highlighted in beige), then declining through late April. The highest recorded count was 27,300 cranes on April 3rd, 2019. Dashed lines indicate days when weather conditions made accurate counting difficult. A note explains that 2018 had unusually cold February temperatures causing later arrival. The chart is credited to Anna Thieme from Lรคnsstyrelsen VG at https://transtat.lansstyrelsen.se/
@dslc.io welcomes you to week 39 of #TidyTuesday! We're exploring Crane Observations at Lake Hornborgasjรถn, Sweden (1994โ2024)!
๐ https://tidytues.day/2025/2025-09-30
#RStats #PyData #JuliaLang #DataViz #tidyverse #r4ds
Since today is the #EuropeanDayOfLanguages, let's not forget that you are not required to keep your languages separate, pure, intact or in any way feel inadequate about your way of using your own language/s.
And don't let anyone take away your co-ownership of the languages you have.
Besides questionable variable names (test, testtest, test2, ...) and working in Untitled for hours, it's the 20+ lines of piping, including pipes inside join functions, all piped into ggplot with additional anonymous functions of subsetting and piping inside the geoms ๐ซฃ
24.09.2025 21:17 โ ๐ 5 ๐ 0 ๐ฌ 3 ๐ 0Them in any other channel:
"We will organize an amazing thing, we will literally not tell you the date, location, contents or any other details here, please see our fb [link]" *logs off for the next 6 months*
And it's particularly sign languages in the Southern Hemisphere that have been added to the Glottolog database in the past 8 years.
23.09.2025 15:10 โ ๐ 18 ๐ 3 ๐ฌ 0 ๐ 0A line chart with the title "Cumulative increase of documented languages by "family"/grouping across releases of Glottolog: only showing families with 10 or more languages". It shows the cumulative increase over time with releases of the Glottolog language database. "Sign languages" is highlighted in a turquoise line, steadily increasing, having increased by around 26% from version 3.1 to version 5.2.
Because sign language linguistics is a young field, many sign languages were undocumented for a long time, and still are.
In fact, in Glottolog, the "sign language" category is in the top with regard to the increase in number of documented languages across version releases.
A map showing the distribution of the 227 sign languages documented in Glottolog 5.2 (only 224 languages shown as 3 are missing coordinates). The languages are shown as turquoise dots, distributed across all continents.
In the Glottolog language database, there are currently 227 different sign languages documented, and they are found all around the globe!
23.09.2025 15:10 โ ๐ 21 ๐ 6 ๐ฌ 1 ๐ 1It's the #InternationalDayOfSignLanguages!
The GIF below shows a commonly used international sign for 'sign language'. But is there only a single, universal sign language? Of course not, there are many!
#Linguistics
A plot titled "FIDE chess players by country & birth year: Ranking of the top International Chess Federation (FIDE) countries by the number of rated players (Elo rating โฉพ1400) per age group (year of birth). Number of players shown under each flag (percentage of age group in brackets). Numbers under the flags show the number of players (with percentage of totals in brackets)". The plot resembles a chessboard, with a grayish purple background and the rankings being displayed as country flags on top of the chessboard's squares. In the oldest age brackets (left side), European countries are dominating with Germany, Spain and France having the most players. On the right side with the younger age groups, India is quickly rising to the top, in the youngest age group (2010โ2021), Sri Lanka is also up-and-coming. Data: FIDE (September 2025) via TidyTuesday; Packages: {ggtext, tidyverse}; Visualization: C. Bรถrstell
Which countries have must rated chess players per age group? โ๏ธ
India rising to the top in the youngest age groups. #TidyTuesday
github.com/borstell/tid...
#R4DS #DataViz #ggplot2
library(tidyverse)
ggplot(mutate(uncount(tibble(x=LETTERS[1:7],y=rep(1:4,e=2)[-1],z=c(rep("#DDC",4),"#678","#643","#BCD")),y),g=row_number()))+geom_bar(aes(x,fill=I(z),group=g),col="#444")+theme_void()
๐จ๐ปโ๐ปโณ๏ธ #RStats
Application for a PhD position at our department is now open! We are looking for applicants that are interested in areas of sign language linguistics, general sign language studies, or deaf bilingualism/multilingualism. Deadline is 15th Oct. Please share away! su.varbi.com/en/what:job/...
19.09.2025 05:05 โ ๐ 26 ๐ 23 ๐ฌ 1 ๐ 0I agree! But I see people teach with the raw notebooks themselves, which to me makes things muddled and I'm not sure I get the benefits of. But for posting it like a report or tutorial, the rendered output with clear separation of comments, code and outputs is much nicer!
17.09.2025 11:00 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0I've seen a few posts about notebooks vs plain scripts for teaching programming. I've always preferred using plain scripts for coding-in-class/exercises, and only used e.g. #Quarto when it's a tutorial/guide to be read rather than run.
Am I missing some benefit in notebook-type formats? #RStats
This is at Stockholm University, my alma mater & previous workplace. The dept is quite unique by having a fully deaf-led and deaf-majority sign language group, with deaf people in positions from department management, to professors, researchers and teachers and assistants in a signing environment!
17.09.2025 10:02 โ ๐ 7 ๐ 0 ๐ฌ 0 ๐ 0