Type S and M errors as a โrhetorical toolโ
We recently posted a preprint criticizing the idea of Type S and M errors ( https://osf.io/2phzb_v1 ). From our abstract: โWhile these conce...
New blog post on Gelman's recent claim that Type S and M errors are intended as a 'rhetorical tool', and if I was wrong to believe they were recommended more routinely in our recent preprint criticizing the idea of Type S and M errors. daniellakens.blogspot.com/2025/09/type...
28.09.2025 05:22 โ ๐ 11 ๐ 6 ๐ฌ 0 ๐ 1
We present our new preprint titled "Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation".
We quantify LLM hacking risk through systematic replication of 37 diverse computational social science annotation tasks.
For these tasks, we use a combined set of 2,361 realistic hypotheses that researchers might test using these annotations.
Then, we collect 13 million LLM annotations across plausible LLM configurations.
These annotations feed into 1.4 million regressions testing the hypotheses.
For a hypothesis with no true effect (ground truth $p > 0.05$), different LLM configurations yield conflicting conclusions.
Checkmarks indicate correct statistical conclusions matching ground truth; crosses indicate LLM hacking -- incorrect conclusions due to annotation errors.
Across all experiments, LLM hacking occurs in 31-50\% of cases even with highly capable models.
Since minor configuration changes can flip scientific conclusions, from correct to incorrect, LLM hacking can be exploited to present anything as statistically significant.
๐จ New paper alert ๐จ Using LLMs as data annotators, you can produce any scientific result you want. We call this **LLM Hacking**.
Paper: arxiv.org/pdf/2509.08825
12.09.2025 10:33 โ ๐ 259 ๐ 94 ๐ฌ 5 ๐ 19
What corresponds to the Z-test in this analogy? If the P-curve is the W-test then what is the Z-test?
25.09.2025 13:30 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
More examples of faked institutional email addresses from @deevybee.bsky.social here deevybee.blogspot.com/2022/10/what...
23.09.2025 14:24 โ ๐ 12 ๐ 2 ๐ฌ 2 ๐ 1
Absolutely! I'm planning on getting met into the stats curriculum in our undergrad business adm program. My favorite resource is Lakens' online book.
16.09.2025 21:01 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
If it makes sense to test a hypothesis, do minimum effect testing and/or set alpha as a function of sample size.
16.09.2025 16:51 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
Can researchers stop AI making up citations?
OpenAIโs GPT-5 hallucinates less than previous models do, but cutting hallucination completely might prove impossible.
"OpenAI is making โsmall steps that are good, but I donโt think weโre anywhere near where we need to beโ, says Mark Steyvers, a cognitive science and AI researcher at UC Irvine. โItโs not frequent enough that GPT says โI donโt knowโ.โ" www.nature.com/articles/d41...
09.09.2025 03:23 โ ๐ 5 ๐ 3 ๐ฌ 0 ๐ 0
โก๏ธ Deadline approachingโonly one month left to send in your papers and presentation proposals for #CDSM2025!
๐จ ๐๐ฎ๐น๐น ๐ณ๐ผ๐ฟ ๐ฃ๐ฎ๐ฝ๐ฒ๐ฟ๐: ๐๐ฎ๐๐๐ฎ๐น ๐๐ฎ๐๐ฎ ๐ฆ๐ฐ๐ถ๐ฒ๐ป๐ฐ๐ฒ ๐ ๐ฒ๐ฒ๐๐ถ๐ป๐ด ๐ฎ๐ฌ๐ฎ๐ฑ ๐จ
๐
๐ก๐ผ๐ ๐ญ๐ฎโ๐ญ๐ฏ, ๐ฎ๐ฌ๐ฎ๐ฑ (๐ฉ๐ถ๐ฟ๐๐๐ฎ๐น)
๐ฅ Submission Deadline: ๐ฆ๐ฒ๐ฝ๐ ๐ฏ๐ฌ, ๐ฎ๐ฌ๐ฎ๐ฑ
30.08.2025 12:40 โ ๐ 20 ๐ 23 ๐ฌ 0 ๐ 1
Models as Prediction Machines: How to Convert Confusing Coefficients into Clear Quantities
Abstract
Psychological researchers usually make sense of regression models by interpreting coefficient estimates directly. This works well enough for simple linear models, but is more challenging for more complex models with, for example, categorical variables, interactions, non-linearities, and hierarchical structures. Here, we introduce an alternative approach to making sense of statistical models. The central idea is to abstract away from the mechanics of estimation, and to treat models as โcounterfactual prediction machines,โ which are subsequently queried to estimate quantities and conduct tests that matter substantively. This workflow is model-agnostic; it can be applied in a consistent fashion to draw causal or descriptive inference from a wide range of models. We illustrate how to implement this workflow with the marginaleffects package, which supports over 100 different classes of models in R and Python, and present two worked examples. These examples show how the workflow can be applied across designs (e.g., observational study, randomized experiment) to answer different research questions (e.g., associations, causal effects, effect heterogeneity) while facing various challenges (e.g., controlling for confounders in a flexible manner, modelling ordinal outcomes, and interpreting non-linear models).
Figure illustrating model predictions. On the X-axis the predictor, annual gross income in Euro. On the Y-axis the outcome, predicted life satisfaction. A solid line marks the curve of predictions on which individual data points are marked as model-implied outcomes at incomes of interest. Comparing two such predictions gives us a comparison. We can also fit a tangent to the line of predictions, which illustrates the slope at any given point of the curve.
A figure illustrating various ways to include age as a predictor in a model. On the x-axis age (predictor), on the y-axis the outcome (model-implied importance of friends, including confidence intervals).
Illustrated are
1. age as a categorical predictor, resultings in the predictions bouncing around a lot with wide confidence intervals
2. age as a linear predictor, which forces a straight line through the data points that has a very tight confidence band and
3. age splines, which lies somewhere in between as it smoothly follows the data but has more uncertainty than the straight line.
Ever stared at a table of regression coefficients & wondered what you're doing with your life?
Very excited to share this gentle introduction to another way of making sense of statistical models (w @vincentab.bsky.social)
Preprint: doi.org/10.31234/osf...
Website: j-rohrer.github.io/marginal-psy...
25.08.2025 11:49 โ ๐ 942 ๐ 283 ๐ฌ 49 ๐ 19
"Being Bayesian in a Frequentist World"
New post on "Bayesian dynamic borrowing" in R ๐
Link ๐
25.08.2025 06:19 โ ๐ 10 ๐ 2 ๐ฌ 2 ๐ 0
If you are preparing your bachelor statistics course and would like to add optional material for students to better understand statistics on a conceptual level (see topics in the screenshot) my free textbook provides a state of the art overview. lakens.github.io/statistical_...
25.08.2025 04:54 โ ๐ 213 ๐ 68 ๐ฌ 4 ๐ 4
My video about how LLMs are not search engines has led to many, MANY comments telling me that I should be using Perplexity. Some insisting that Perplexity does not hallucinate.
Out of a list of 26 papers it just provided me (in "Research" mode) 4 were real. FOUR. 85% hallucination rate.
23.08.2025 17:54 โ ๐ 1511 ๐ 474 ๐ฌ 29 ๐ 24
CRISPR as a microbial immune system
In 2003, Mojica wrote the first paper suggesting that CRISPR was an innate microbial immune system. The paper was rejected by a series of high-profile journals, including Nature, Proceedings of the National Academy of Sciences, Molecular Microbiology and Nucleic Acids Research, before finally being accepted by Journal of Molecular Evolution in February, 2005.[3][4]
TIL the original paper describing CRISPR, by Francisco Mojica, was rejected by 4 journals and took 2 years to be published
17.08.2025 04:00 โ ๐ 301 ๐ 78 ๐ฌ 6 ๐ 10
1. David Ackerly (UC Berkeley)
While his most-cited work is on leaf size and SLA, he also wrote explicitly about plasticity in leaf traits, including shape, in the context of ecological strategies.
Example: Ackerly (1997), โAllocation, leaf display, and growth in fluctuating light environments: A comparative study of deciduous and evergreen speciesโ (Oecologia). This emphasizes how plasticity in leaf traits mediates adaptation to light.
Just in case there was any doubt, ChatGPT 5.0 still makes up completely random citations that don't exist and should not be used for literature search.
16.08.2025 06:57 โ ๐ 977 ๐ 270 ๐ฌ 28 ๐ 23
5. Most frequentist methods are just *fine* and there's no need to always go full luxury bayesian in every application.
04.08.2025 17:01 โ ๐ 27 ๐ 7 ๐ฌ 1 ๐ 0
Trump, Claiming Weak Jobs Numbers Were โRigged,โ Fires Labor Official
When power is derived from lies, data become the enemy.
www.nytimes.com/2025/08/01/b...
02.08.2025 05:29 โ ๐ 109 ๐ 36 ๐ฌ 5 ๐ 1
Will the videos be released to a broad audience after the conference?
01.08.2025 06:02 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
From the OpenAI community on Reddit
Explore this post and more from the OpenAI community
This is fascinating: www.reddit.com/r/OpenAI/s/I...
Someone โworked on a book with ChatGPTโ for weeks and then sought help on Reddit when they couldnโt download the file. Redditors helped them realized ChatGPT had just been roleplaying/lying and there was no file/bookโฆ
16.07.2025 20:07 โ ๐ 8170 ๐ 1956 ๐ฌ 256 ๐ 691
Using time series graphs to make causal claims be like
14.07.2025 12:12 โ ๐ 854 ๐ 160 ๐ฌ 11 ๐ 7
Thick vs thin causality
07.07.2025 15:06 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
After all these reports of authors adding language instructions for LLM reviews in their papers I wanted to check this myself and I downloaded the .tex source from one of these papers.
Here is an example.
(I will not share the identity of the paper)
05.07.2025 17:12 โ ๐ 388 ๐ 125 ๐ฌ 16 ๐ 33
Introducing Papercheck
Introducing Papercheck Introducing Papercheck An Automated Tool to Check for Best Practices in Scientifi...
Very excited to publicly share news about a new tool, Papercheck, that @debruine.bsky.social and me started to develop more than a year ago! In an introductory blog post, we explain our philosophy to automatically check scientific papers for best practices. daniellakens.blogspot.com/2025/06/intr...
17.06.2025 11:15 โ ๐ 178 ๐ 79 ๐ฌ 5 ๐ 6
Venn (1866) described the problem with the "sharpshooter fallacy": ๐ฏ painting the bullseye on the spot where a bullet hits a door.
In psychology, we call this "HARKing" โ Hypothesizing After Results Are Known.
13.06.2025 17:18 โ ๐ 59 ๐ 12 ๐ฌ 3 ๐ 0
Diabolus Ex Machina
This Is Not An Essay
Possibly the best thing I've read about ChatGPT yet.
h/t @melaniemitchell.bsky.social
amandaguinzburg.substack.com/p/diabolus-e...
04.06.2025 04:41 โ ๐ 1033 ๐ 382 ๐ฌ 55 ๐ 120
Sample Size Justification
An important step when designing an empirical study is to justify the sample size that will be collected. The key aim of a sample size justification for such studies is to explain how the collected da...
The journal of sport sciences recently made a sample size justification section mandatory. More and more journals are implementing this, as it is a central part of a study! Learn how to create a state of the art sample size justification from my paper: online.ucpress.edu/collabra/art...
02.06.2025 04:50 โ ๐ 50 ๐ 12 ๐ฌ 4 ๐ 0
Our mission: To provide tools and resources to foster a diverse, friendly, and inclusive community of data science learners and practitioners. Join us at https://dslc.io
Expressive probabilistic programming language for writing statistical models. Fast Bayesian inference. Interfaces for Python, Julia, R, and the Unix shell. A rich ecosystem of tools for validation and visualization.
Home https://mc-stan.org/
Data science & Bayes at Novo Nordisk | Author of "The Data Analyst's Guide to Cause and Effect" (Sage, 2026)
www.theissbendixen.com
Hey nerds. Itโs a quantitative methods podcast. With attitude. And merch (http://tinyurl.com/qpodmerch).
PhD candidate in management at emlyon business school in France. Research on disruptive contexts and entrepreneurship.
Bayesian Statistician and Data Scientist in the Gaming/Entertainment Industry | Bayesian Statistics, Causal Inference, R, Python, Stan, Decision Theory, Guitar | Former Political Scientist
Researcher and educator, Reader at Cranfield University UK. Interested in how governance structures influence behaviour and well-being. Currently focused on quantification, commensuration, complexity, and the all-present AI. Trained in quants, loving quals
Principal Scientist at Naver Labs Europe, Lead of Spatial AI team. AI for Robotics, Computer Vision, Machine Learning. Austrian in France. https://chriswolfvision.github.io/www/
he/him - writing statistical software at Posit, PBC (nรฉe RStudio)๐ฅ
simonpcouch.com, @simonpcouch elsewhere
Academy of Management (AOM) Today offers quick tips and concise analysis of business news and workplace trends featuring perspectives from AOM scholars.
The AI community building the future!
I'm a Professor of International Economics at ESCP Business School in Berlin. I study macro policy in open economies (and some other stuff).
Papers: https://sites.google.com/view/goncalopina
In the Memory of Astronomer, Researcher, Educator, Communicator, Advocate and Activist who taught us importance of understanding Science.
Carl Sagan tribute account.
Epidemiologist/mathematician. Professor at London School of Hygiene & Tropical Medicine. Author of The Rules of Contagion and The Perfect Bet. Views own.
New book Proof: The Uncertain Science of Certainty available now: proof.kucharski.io
Research, news, and commentary from Nature, the international science journal. For daily science news, get Nature Briefing: https://go.nature.com/get-Nature-Briefing
Associate Professor at Aarhus University (Denmark) | Psychiatric Epidemiology | Social Epidemiology | Biostatistics | Health metrics
Em. Prof., UC Davis. Many awards, incl. book, teaching, public service. Many books, latest The Art of Machine Learning (uses qeML pkg). Former Editor in Chief, the R Journal. Views mine. heather.cs.ucdavis.edu/matloff.html