Tom Houslay's Avatar

Tom Houslay

@tomhouslay.bsky.social

Former academic researcher in evolution & animal behaviour. Now: data & biology stuff for industry [human milk, infant nutrition, microbiome, healthy ageing...] Also: dad, wildlife pond enthusiast, ND

132 Followers  |  330 Following  |  9 Posts  |  Joined: 06.11.2023
Posts Following

Posts by Tom Houslay (@tomhouslay.bsky.social)

Iโ€™m not arguing with no account defending AI. My grandmother lives in an area being environmentally compromised currently by an ai data center. I remember the color of the lake before they built it and the sludge I see in it when I go to see her. Shut the fuck up about โ€œthe benefits of AIโ€ forever.

07.02.2026 15:46 โ€” ๐Ÿ‘ 7220    ๐Ÿ” 2731    ๐Ÿ’ฌ 47    ๐Ÿ“Œ 35
Screenshot of the Toggle extension documentation website showing the Quick Start page. The left sidebar displays the Toggle logo and navigation menu with sections for Get Started, Features, Demos, and Reference. The main content shows an interactive toggle button demo, explains the โ–ผ (visible) and โ–ถ (hidden) chevron indicators, and includes R code examples with a visible "Output" toggle button on the code block. The page demonstrates how the toggle button appears when hovering over code cells.

Screenshot of the Toggle extension documentation website showing the Quick Start page. The left sidebar displays the Toggle logo and navigation menu with sections for Get Started, Features, Demos, and Reference. The main content shows an interactive toggle button demo, explains the โ–ผ (visible) and โ–ถ (hidden) chevron indicators, and includes R code examples with a visible "Output" toggle button on the code block. The page demonstrates how the toggle button appears when hovering over code cells.

Toggle extension logo: A blue hexagonal badge featuring a stylized code block with a toggle button in the upper right corner, and a line chart output below it. The word "toggle" appears at the bottom in white lowercase text.

Toggle extension logo: A blue hexagonal badge featuring a stylized code block with a toggle button in the upper right corner, and a line chart output below it. The word "toggle" appears at the bottom in white lowercase text.

{toggle} does one thing: adds a button to hide code output in #quarto docs.

Took two versions to do that one thing well. Now it works everywhere... tabsets, callouts, nested containers, you name it.

๐Ÿ“š quarto.thecoatlessprofessor.com/toggle/
๐Ÿ™ github.com/coatless-qua...

29.12.2025 06:48 โ€” ๐Ÿ‘ 37    ๐Ÿ” 10    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
Will you incorporate LLMs and AI prompting into the course in the future?
No.

Why wonโ€™t you incorporate LLMs and AI prompting into the course?
These tools are useful for coding (see this for my personal take on this).

However, theyโ€™re only useful if you know what youโ€™re doing first. If you skip the learning-the-process-of-writing-code step and just copy/paste output from ChatGPT, you will not learn. You cannot learn. You cannot improve. You will not understand the code.

Will you incorporate LLMs and AI prompting into the course in the future? No. Why wonโ€™t you incorporate LLMs and AI prompting into the course? These tools are useful for coding (see this for my personal take on this). However, theyโ€™re only useful if you know what youโ€™re doing first. If you skip the learning-the-process-of-writing-code step and just copy/paste output from ChatGPT, you will not learn. You cannot learn. You cannot improve. You will not understand the code.

In that post, it warns that you cannot use it as a beginner:

โ€ฆto use Databot effectively and safely, you still need the skills of a data scientist: background and domain knowledge, data analysis expertise, and coding ability.

There is no LLM-based shortcut to those skills. You cannot LLM your way into domain knowledge, data analysis expertise, or coding ability.

The only way to gain domain knowledge, data analysis expertise, and coding ability is to struggle. To get errors. To google those errors. To look over the documentation. To copy/paste your own code and adapt it for different purposes. To explore messy datasets. To struggle to clean those datasets. To spend an hour looking for a missing comma.

This isnโ€™t a form of programming hazing, like โ€œI had to walk to school uphill both ways in the snow and now you must too.โ€ Itโ€™s the actual process of learning and growing and developing and improving. Youโ€™ve gotta struggle.

In that post, it warns that you cannot use it as a beginner: โ€ฆto use Databot effectively and safely, you still need the skills of a data scientist: background and domain knowledge, data analysis expertise, and coding ability. There is no LLM-based shortcut to those skills. You cannot LLM your way into domain knowledge, data analysis expertise, or coding ability. The only way to gain domain knowledge, data analysis expertise, and coding ability is to struggle. To get errors. To google those errors. To look over the documentation. To copy/paste your own code and adapt it for different purposes. To explore messy datasets. To struggle to clean those datasets. To spend an hour looking for a missing comma. This isnโ€™t a form of programming hazing, like โ€œI had to walk to school uphill both ways in the snow and now you must too.โ€ Itโ€™s the actual process of learning and growing and developing and improving. Youโ€™ve gotta struggle.

This Tumblr post puts it well (itโ€™s about art specifically, but it applies to coding and data analysis too):

Contrary to popular belief the biggest beginnerโ€™s roadblock to art isnโ€™t even technical skill itโ€™s frustration tolerance, especially in the age of social media. It hurts and the frustration is endless but you must build the frustration tolerance equivalent to a roachโ€™s capacity to survive a nuclear explosion. Thatโ€™s how you build on the technical skill. Throw that โ€œwonโ€™t even start because Iโ€™m afraid it wonโ€™t be perfectโ€ shit out the window. Just do it. Just start. Good luck. (The original post has disappeared, but hereโ€™s a reblog.)

Itโ€™s hard, but struggling is the only way to learn anything.

This Tumblr post puts it well (itโ€™s about art specifically, but it applies to coding and data analysis too): Contrary to popular belief the biggest beginnerโ€™s roadblock to art isnโ€™t even technical skill itโ€™s frustration tolerance, especially in the age of social media. It hurts and the frustration is endless but you must build the frustration tolerance equivalent to a roachโ€™s capacity to survive a nuclear explosion. Thatโ€™s how you build on the technical skill. Throw that โ€œwonโ€™t even start because Iโ€™m afraid it wonโ€™t be perfectโ€ shit out the window. Just do it. Just start. Good luck. (The original post has disappeared, but hereโ€™s a reblog.) Itโ€™s hard, but struggling is the only way to learn anything.

You might not enjoy code as much as Williams does (or I do), but thereโ€™s still value in maintaining codings skills as you improve and learn more. You donโ€™t want your skills to atrophy.

As I discuss here, when I do use LLMs for coding-related tasks, I purposely throw as much friction into the process as possible:

To avoid falling into over-reliance on LLM-assisted code help, I add as much friction into my workflow as possible. I only use GitHub Copilot and Claude in the browser, not through the chat sidebar in Positron or Visual Studio Code. I treat the code it generates like random answers from StackOverflow or blog posts and generally rewrite it completely. I disable the inline LLM-based auto complete in text editors. For routine tasks like generating {roxygen2} documentation scaffolding for functions, I use the {chores} package, which requires a bunch of pointing and clicking to use.

Even though I use Positron, I purposely do not use either Positron Assistant or Databot. I have them disabled.

So in the end, for pedagogical reasons, I donโ€™t foresee me incorporating LLMs into this class. Iโ€™m pedagogically opposed to it. Iโ€™m facing all sorts of external pressure to do it, but Iโ€™m resisting.

Youโ€™ve got to learn first.

You might not enjoy code as much as Williams does (or I do), but thereโ€™s still value in maintaining codings skills as you improve and learn more. You donโ€™t want your skills to atrophy. As I discuss here, when I do use LLMs for coding-related tasks, I purposely throw as much friction into the process as possible: To avoid falling into over-reliance on LLM-assisted code help, I add as much friction into my workflow as possible. I only use GitHub Copilot and Claude in the browser, not through the chat sidebar in Positron or Visual Studio Code. I treat the code it generates like random answers from StackOverflow or blog posts and generally rewrite it completely. I disable the inline LLM-based auto complete in text editors. For routine tasks like generating {roxygen2} documentation scaffolding for functions, I use the {chores} package, which requires a bunch of pointing and clicking to use. Even though I use Positron, I purposely do not use either Positron Assistant or Databot. I have them disabled. So in the end, for pedagogical reasons, I donโ€™t foresee me incorporating LLMs into this class. Iโ€™m pedagogically opposed to it. Iโ€™m facing all sorts of external pressure to do it, but Iโ€™m resisting. Youโ€™ve got to learn first.

Some closing thoughts for my students this semester on LLMs and learning #rstats datavizf25.classes.andrewheiss.com/news/2025-12...

09.12.2025 20:17 โ€” ๐Ÿ‘ 331    ๐Ÿ” 99    ๐Ÿ’ฌ 14    ๐Ÿ“Œ 31
Post image

If your Shiny app works with a mouse and looks fine on your screen, it may still be unusable for some users. Issues like missing alt text, ARIA misuse, or loose WCAG checks often surface too late. This Thursdayโ€™s free webinar shows how to catch them earlier.

๐Ÿ• 13:00 UK time
Register:

09.12.2025 14:00 โ€” ๐Ÿ‘ 2    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Screenshot of title page including the following abstract:

To minimise confounding bias and disentangle warranted from unwarranted disparities, researchers examining sentencing discrimination have traditionally sought to control for as many legal factors as possible. However, over the past decade, a growing number of scholars have questioned this strategy, noting that many legal factors are themselves subject to judicial discretion and that controlling for them can introduce post-treatment bias. Here, we use directed acyclic graphs (DAGs) to provide a formal and comprehensive assessment of the different types of bias that may arise from different choices of controls. In addition, we propose a new modelling framework to facilitate the selection of controls and reflect the model uncertainty created by the trade-off inherent in judicially-defined legal factors and other factors with a similar dual causal role. We apply this framework to examine race disparities in US federal courts and gender disparities in the England and Wales magistratesโ€™ court. We find substantial model uncertainty for gender disparities and for race disparities affecting Hispanic offenders, rendering estimates of the latter inconclusive. Disparities against black offenders are more consistent and โ€” under specific conditions โ€” could be interpreted as evidence of direct discrimination.

Screenshot of title page including the following abstract: To minimise confounding bias and disentangle warranted from unwarranted disparities, researchers examining sentencing discrimination have traditionally sought to control for as many legal factors as possible. However, over the past decade, a growing number of scholars have questioned this strategy, noting that many legal factors are themselves subject to judicial discretion and that controlling for them can introduce post-treatment bias. Here, we use directed acyclic graphs (DAGs) to provide a formal and comprehensive assessment of the different types of bias that may arise from different choices of controls. In addition, we propose a new modelling framework to facilitate the selection of controls and reflect the model uncertainty created by the trade-off inherent in judicially-defined legal factors and other factors with a similar dual causal role. We apply this framework to examine race disparities in US federal courts and gender disparities in the England and Wales magistratesโ€™ court. We find substantial model uncertainty for gender disparities and for race disparities affecting Hispanic offenders, rendering estimates of the latter inconclusive. Disparities against black offenders are more consistent and โ€” under specific conditions โ€” could be interpreted as evidence of direct discrimination.

Thrilled to share my latest paper entitled, "Estimating Discrimination in Sentencing: Distinguishing between Good and Bad Controls"

Led by @jpinasanchez.bsky.social, the paper introduces a framework for examining discrimination in criminal justice processes.

๐Ÿงต 1/10

publicera.kb.se/ejels/articl...

08.12.2025 10:19 โ€” ๐Ÿ‘ 75    ๐Ÿ” 34    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1
Preview
Itโ€™s the worldโ€™s rarest ape. Now a billion-dollar dig for gold threatens its future Tapanuli orangutans survive only in Indonesiaโ€™s Sumatran rainforest where a mine expansion will cut through their home. Yet the mining company says the alternative will be worse

www.theguardian.com/environment/...

09.12.2025 09:22 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Video thumbnail

Here are some mammals that like to pop out from under my office at night.

04.12.2025 09:12 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
An Introduction to Writing Your Own ggplot2 Geoms โ€“ R Works The ggextenders club provides inspiration and resources for those venturing into the exciting world of creating custom ggplot2 extensions.

I wrote a lil post on the amazing work that
@ginareynolds.bsky.social does championing ggplot2 extension developers and teaching others to build their own!

The post features the Scrollytelling Quarto extension and the group's cute #RStats hex ๐Ÿฑ:

rworks.dev/posts/ggplot...

03.11.2025 15:22 โ€” ๐Ÿ‘ 69    ๐Ÿ” 16    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 3

Wrote up a little intervention post/explanation for my class about why using LLMs for trying to learn programming (as first time learners!) is bad and detrimental datavizf25.classes.andrewheiss.com/news/2025-11...

02.11.2025 22:17 โ€” ๐Ÿ‘ 179    ๐Ÿ” 52    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 7

i looked at the methodology for this and it is
a. sex addiction counseling group in texas did a surveymonkey and extrapolated the results to the entire us population which is the sort of research design that earns you an ff on an intro methods class (the extra f is for extra effort), and
b. p-hacked

03.10.2025 02:15 โ€” ๐Ÿ‘ 9427    ๐Ÿ” 2478    ๐Ÿ’ฌ 128    ๐Ÿ“Œ 139
Preview
a man is covering his mouth with his hands and the word schitts creek is on the bottom right ALT: a man is covering his mouth with his hands and the word schitts creek is on the bottom right

If colleagues or students share a dataset with you and you make this face, consider sharing these resources with them. โ˜บ๏ธ

datamgmtinedresearch.com

1. Organizing data (Ch. 3)
2. Naming variables and files (Ch. 9)
3. Documenting data (Ch. 8)
4. Cleaning data (Ch. 14)

01.10.2025 13:20 โ€” ๐Ÿ‘ 57    ๐Ÿ” 18    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 0

very much so! thanks for sharing (and lucky for me that I saw it on my monthly 'oh yeah that thing' login here)

02.10.2025 13:51 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
a table about lemurs

a table about lemurs

a table about students and schools

a table about students and schools

a table about wines

a table about wines

{tinytable} 0.14.0 for #RStats makes it super easy to draw tables in html, tex, docx, typ, md & png.

There are only a few functions to learn, but don't be fooled! Small ๐Ÿ“ฆs can still be powerful.

Check out the new gallery page for fun case studies.

vincentarelbundock.github.io/tinytable/vi...

29.09.2025 12:44 โ€” ๐Ÿ‘ 136    ๐Ÿ” 38    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 4

(I also deleted my twitter profile a while ago, which was nice to do but I've realised belatedly that this has broken a bunch of my own links!)

01.10.2025 16:05 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Useful resource! Good to see ordered beta regression in the mix as I've found that useful recently, although the note on it is linked to a Ben Bolker twitter comment that is now gone as his profile is deleted...

01.10.2025 16:05 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Preview
Best practices and tools in R and Python for statistical processing and visualization of lipidomics and metabolomics data - Nature Communications Mass spectrometry-based lipidomics and metabolomics generate large, complex datasets requiring effective analysis. Here, authors review key statistical and visualization methods alongside widely used R and Python tools, and provide a GitBook with step-by-step code for accessible, reproducible data analysis.

Thank you for citing #tidyplots ๐Ÿ™

Jakub Idkowiak et al. Best practices and tools in R and Python for statistical processing and visualization of lipidomics and metabolomics data. Nature Communications (2025).

doi.org/10.1038/s414...

#rstats #dataviz #phd

01.10.2025 15:05 โ€” ๐Ÿ‘ 10    ๐Ÿ” 5    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Models as Prediction Machines: How to Convert Confusing Coefficients into Clear Quantities

Abstract
Psychological researchers usually make sense of regression models by interpreting coefficient estimates directly. This works well enough for simple linear models, but is more challenging for more complex models with, for example, categorical variables, interactions, non-linearities, and hierarchical structures. Here, we introduce an alternative approach to making sense of statistical models. The central idea is to abstract away from the mechanics of estimation, and to treat models as โ€œcounterfactual prediction machines,โ€ which are subsequently queried to estimate quantities and conduct tests that matter substantively. This workflow is model-agnostic; it can be applied in a consistent fashion to draw causal or descriptive inference from a wide range of models. We illustrate how to implement this workflow with the marginaleffects package, which supports over 100 different classes of models in R and Python, and present two worked examples. These examples show how the workflow can be applied across designs (e.g., observational study, randomized experiment) to answer different research questions (e.g., associations, causal effects, effect heterogeneity) while facing various challenges (e.g., controlling for confounders in a flexible manner, modelling ordinal outcomes, and interpreting non-linear models).

Models as Prediction Machines: How to Convert Confusing Coefficients into Clear Quantities Abstract Psychological researchers usually make sense of regression models by interpreting coefficient estimates directly. This works well enough for simple linear models, but is more challenging for more complex models with, for example, categorical variables, interactions, non-linearities, and hierarchical structures. Here, we introduce an alternative approach to making sense of statistical models. The central idea is to abstract away from the mechanics of estimation, and to treat models as โ€œcounterfactual prediction machines,โ€ which are subsequently queried to estimate quantities and conduct tests that matter substantively. This workflow is model-agnostic; it can be applied in a consistent fashion to draw causal or descriptive inference from a wide range of models. We illustrate how to implement this workflow with the marginaleffects package, which supports over 100 different classes of models in R and Python, and present two worked examples. These examples show how the workflow can be applied across designs (e.g., observational study, randomized experiment) to answer different research questions (e.g., associations, causal effects, effect heterogeneity) while facing various challenges (e.g., controlling for confounders in a flexible manner, modelling ordinal outcomes, and interpreting non-linear models).

Figure illustrating model predictions. On the X-axis the predictor, annual gross income in Euro. On the Y-axis the outcome, predicted life satisfaction. A solid line marks the curve of predictions on which individual data points are marked as model-implied outcomes at incomes of interest. Comparing two such predictions gives us a comparison. We can also fit a tangent to the line of predictions, which illustrates the slope at any given point of the curve.

Figure illustrating model predictions. On the X-axis the predictor, annual gross income in Euro. On the Y-axis the outcome, predicted life satisfaction. A solid line marks the curve of predictions on which individual data points are marked as model-implied outcomes at incomes of interest. Comparing two such predictions gives us a comparison. We can also fit a tangent to the line of predictions, which illustrates the slope at any given point of the curve.

A figure illustrating various ways to include age as a predictor in a model. On the x-axis age (predictor), on the y-axis the outcome (model-implied importance of friends, including confidence intervals).

Illustrated are 
1. age as a categorical predictor, resultings in the predictions bouncing around a lot with wide confidence intervals
2. age as a linear predictor, which forces a straight line through the data points that has a very tight confidence band and
3. age splines, which lies somewhere in between as it smoothly follows the data but has more uncertainty than the straight line.

A figure illustrating various ways to include age as a predictor in a model. On the x-axis age (predictor), on the y-axis the outcome (model-implied importance of friends, including confidence intervals). Illustrated are 1. age as a categorical predictor, resultings in the predictions bouncing around a lot with wide confidence intervals 2. age as a linear predictor, which forces a straight line through the data points that has a very tight confidence band and 3. age splines, which lies somewhere in between as it smoothly follows the data but has more uncertainty than the straight line.

Ever stared at a table of regression coefficients & wondered what you're doing with your life?

Very excited to share this gentle introduction to another way of making sense of statistical models (w @vincentab.bsky.social)
Preprint: doi.org/10.31234/osf...
Website: j-rohrer.github.io/marginal-psy...

25.08.2025 11:49 โ€” ๐Ÿ‘ 1007    ๐Ÿ” 288    ๐Ÿ’ฌ 47    ๐Ÿ“Œ 22

you're extremely welcome, and congrats on a great paper! Great to see that data being used for more cool work :)

01.09.2025 13:25 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Covariance reaction norms: A flexible method for estimating complex environmental effects on trait (co)variances Estimating quantitative genetic and phenotypic (co)variances is crucial for investigating evolutionary ecological phenomena such as developmental integration, life history trade-offs and niche spe...

My first solo author paper is now available in early view at Methods in Ecology and Evolution! I develop a covariance reaction norm (CRN) model for estimating continuous, multivariate, and nonlinear environmental effects on G and P matrices.

besjournals.onlinelibrary.wiley.com/doi/10.1111/...

11.08.2025 13:24 โ€” ๐Ÿ‘ 33    ๐Ÿ” 15    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I got some good answers from my question about how to get back up to date with the world of #rstats

15.07.2025 09:58 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Thank you so much! Lots of names here I recognise from twitter as well (I really should have gone through my follows list there before deleting my account)

15.07.2025 09:57 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Stupid question alert but here we go anyway: I long-ago deleted my twitter account, I cannot cope with linkedin, but I need to get back into knowing what's going on in #rstats world. Anyone recommend specific feeds / people / other stuff to drag myself back into it? (or do I just trawl rstats)

14.07.2025 13:54 โ€” ๐Ÿ‘ 5    ๐Ÿ” 1    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 0