Måns Thulin (@mansthulin) — Bluesky Profile

5 days ago

Is your working directory and/or your files on OneDrive? Strangely, this can cause these kinds of problems (the solution being storing things on a local drive instead).

1 0 1 0

2 weeks ago

No, but I'll check it out. Thanks!

0 0 0 0

2 weeks ago

train |> ... |> fit(train) gives my soul a papercut

0 0 1 0

2 weeks ago

Thanks, Di. I too am hoping that these issues will be fixed. Until then I'm sticking to caret in my teaching, as it also does a good job of coordinating machine learning software. I'm reluctant to tell students to use the tidymodels ecosystem because of the issues mentioned in my post.

1 0 1 0

2 weeks ago

Yeah, I think that's an important difference between tidymodels and ggplot2!

0 0 0 0

2 weeks ago

Kind of awkward to have to add data at two different steps. But definitely an improvement on the flow recommended in the documentation!

0 0 2 0

2 weeks ago

Absolutely. But in all the examples you mention, you'd start with data. And starting a pipeline with the data is the de facto standard in R. Creating a separate logic for how to pipe things is not very helpful to beginners.

1 0 0 0

2 weeks ago

Happy to submit these as issues within the next few days!

2 0 0 0

2 weeks ago

Great description! And so it boils down to whether you're willing to accept two different logics for how the pipe works. I maintain that the dual logics create more problems than they solve, but I get that some people like the tidymodels approach.

2 0 1 0

2 weeks ago

Some, yes. All ignored unfortunately. I agree that most of these issues could be fixed (which of course is the reason that I wrote this in the first place!)

2 0 1 0

2 weeks ago

Glad you liked it!

0 0 0 0

2 weeks ago

Why I don’t use {tidymodels} – Måns Thulin

While we're discussing what we like and dislike in #Rstats, here's why I don't like tidymodels: mansthulin.se/posts/tidymo...

43 8 7 3

2 weeks ago

Diff of a change the 'Getting Started in R - Tinyverse Edition' manuscript with a nice data.table improvement: inside a pair of ( ) we can place each chained operation on its own line improving readability.

I.e. we have

cw21 <- ( # use '(' to keep ops on separate lines
cw[Time %in% c(0,21)] # i: select rows
[, weight := Weight] # j: mutate
[, Group := factor(Group)]
[, .(Chick,Group,Time,weight)] # j: arrange
[order(Chick,Time)] # i: order
[1:5] # i: subset
)

Josh Goldstein emailed me a nice tip for @rdatatable.bsky.social chaining: if we start a chained `data.table` operation inside a set of parens, we are no longer subject to the 'REPL constraint' and can keep each operation on a line. See ALT text. #rstats

Now in the pdf at github.com/eddelbuettel...

24 5 2 2

2 weeks ago

Well, you shouldn't use Python or MATLAB for statistics. Simple as. 😀

5 2 1 0

2 weeks ago

2 The basics | Modern Statistics with R 2 The basics | Modern Statistics with R

I love the hidden-gem magrittr pipes, but these days I stick with the base pipe. In this case, you can do:

mtcars |> with(cor(disp, mpg))

5 1 1 0

2 weeks ago

R Medicine 2026 - main graphic - Call for Proposal, deadline March 6

R/Medicine CFP is open 🩺🧪

Deadline: March 6 - still time!

Submit: Talks, Lightning Talks, Demos, Workshops - Using R + Shiny for health, lab, clinical data

First-time speaker? Email for feedback: rmedicine.conference@gmail.com

rconsortium.github.io/RMedicine_we...

#rstats #datascience

9 5 1 1

2 weeks ago

I love RStudio, but I'm flabbergasted by the fact that
@posit.co still haven't made |> the default for the Ctrl+Shift+C keyboard shortcut, despite their using it in e.g. the tidyverse documentation and R4DS. #Rstats

9 2 1 0

3 weeks ago

Thanks, will do!

0 0 0 0

3 weeks ago

Bootstrap p-values and confidence intervals for regression models

Looks really nice! Is there an option to print confidence intervals instead of standard errors (the former being more informative)? If you'd be interested in adding bootstrap p-values/CIs as an option, I'd be happy to assist in integrating it with {boot.pval} (mthulin.github.io/boot.pval/ar...)

1 0 1 0

1 month ago

Quick start – Model to Meaning

I've been playing around with {marginaleffects} in some projects lately, and I really like it. Lots of useful stuff in there! If you work with regression models and haven't checked it out already, I strongly recommend that you do so: marginaleffects.com/bonus/get_st...
#Rstats #Databs

15 0 1 0

1 month ago

How about penguins |> subset(select = c("island", "bill_len")) |> subset(island == "Biscoe" & bill_len > 55)

2 0 0 0

1 month ago

Modern Statistics with R Modern Statistics with R

I cover both base and the tidyverse in Modern Statistics with R (expect for plotting, where I focus on ggplot2 and only briefly mention base): www.modernstatisticswithr.com

2 0 0 0

1 month ago

Gave the first lecture in my introductory statistics for biologist course yesterday, so this should come in handy. 😀 Thanks for sharing!

2 0 1 0

3 months ago

Will you incorporate LLMs and AI prompting into the course in the future?
No.

Why won’t you incorporate LLMs and AI prompting into the course?
These tools are useful for coding (see this for my personal take on this).

However, they’re only useful if you know what you’re doing first. If you skip the learning-the-process-of-writing-code step and just copy/paste output from ChatGPT, you will not learn. You cannot learn. You cannot improve. You will not understand the code.

In that post, it warns that you cannot use it as a beginner:

…to use Databot effectively and safely, you still need the skills of a data scientist: background and domain knowledge, data analysis expertise, and coding ability.

There is no LLM-based shortcut to those skills. You cannot LLM your way into domain knowledge, data analysis expertise, or coding ability.

The only way to gain domain knowledge, data analysis expertise, and coding ability is to struggle. To get errors. To google those errors. To look over the documentation. To copy/paste your own code and adapt it for different purposes. To explore messy datasets. To struggle to clean those datasets. To spend an hour looking for a missing comma.

This isn’t a form of programming hazing, like “I had to walk to school uphill both ways in the snow and now you must too.” It’s the actual process of learning and growing and developing and improving. You’ve gotta struggle.

This Tumblr post puts it well (it’s about art specifically, but it applies to coding and data analysis too):

Contrary to popular belief the biggest beginner’s roadblock to art isn’t even technical skill it’s frustration tolerance, especially in the age of social media. It hurts and the frustration is endless but you must build the frustration tolerance equivalent to a roach’s capacity to survive a nuclear explosion. That’s how you build on the technical skill. Throw that “won’t even start because I’m afraid it won’t be perfect” shit out the window. Just do it. Just start. Good luck. (The original post has disappeared, but here’s a reblog.)

It’s hard, but struggling is the only way to learn anything.

You might not enjoy code as much as Williams does (or I do), but there’s still value in maintaining codings skills as you improve and learn more. You don’t want your skills to atrophy.

As I discuss here, when I do use LLMs for coding-related tasks, I purposely throw as much friction into the process as possible:

To avoid falling into over-reliance on LLM-assisted code help, I add as much friction into my workflow as possible. I only use GitHub Copilot and Claude in the browser, not through the chat sidebar in Positron or Visual Studio Code. I treat the code it generates like random answers from StackOverflow or blog posts and generally rewrite it completely. I disable the inline LLM-based auto complete in text editors. For routine tasks like generating {roxygen2} documentation scaffolding for functions, I use the {chores} package, which requires a bunch of pointing and clicking to use.

Even though I use Positron, I purposely do not use either Positron Assistant or Databot. I have them disabled.

So in the end, for pedagogical reasons, I don’t foresee me incorporating LLMs into this class. I’m pedagogically opposed to it. I’m facing all sorts of external pressure to do it, but I’m resisting.

You’ve got to learn first.

Some closing thoughts for my students this semester on LLMs and learning #rstats datavizf25.classes.andrewheiss.com/news/2025-12...

331 99 14 31

3 months ago

This is so useful. I usually add custom information to the box shown when hovering, using the text geom. An example can be found here: www.modernstatisticswithr.com/eda.html#det... #Rstats #statsky #databs

6 1 0 0

3 months ago

Data Visualisation Gallery Gallery of data visualisations created by Nicola Rennie.

One of the things that has been on my to do list for a very long time, is building a gallery of all of the charts I've made across #TidyTuesday, #30DayChartChallenge, #30DayMapChallenge, and other miscellaneous projects 📊

And it's finally here!

Link: nrennie.rbind.io/viz-gallery/

#DataViz #RStats

111 18 6 3

3 months ago

"The difference between the groups is 1.1-2.6 measured using something that is a bit like the median but not quite the median" 😉

3 0 0 0

3 months ago

They test different hypotheses though, so the Wilcoxon test isn't a like-for-like replacement for the t-test. A bootstrap t-test is my go-to method for tests about means (using {boot.pval}). It has the added benefit of providing confidence intervals, unlike the Wilcoxon test.

3 0 1 0

3 months ago

Students often ask: “Is this model good enough?”
My reply: “For what?” AUC, precision, F1—none of them matter unless you know what decision you're informing. Always tie metrics to action.

#DataScience #MachineLearning #AI #RStats

7 2 0 0

3 months ago

It's great to use with pipes! Then everything goes from left to right.

1 0 0 0