Jesper N. Wulff @jnwulff - Bluesky Profile

PubPeer - Bridging the gap: explainable ai for autism diagnosis and pa... There are comments on PubPeer for publication: Bridging the gap: explainable ai for autism diagnosis and parental support with TabPFNMix and SHAP (2025)

Some comments on PubPeer: pubpeer.com/publications...

27.11.2025 20:09 — 👍 65 🔁 3 💬 2 📌 0

Infographic with AI slop published in Nature Scientific Reports

"Runctitiononal features"? "Medical fymblal"? "1 Tol Line storee"? This gets worse the longer you look at it. But it's got to be good, because it was published in Nature Scientific Reports last week: www.nature.com/articles/s41... h/t @asa.tsbalans.se

27.11.2025 09:30 — 👍 2256 🔁 738 💬 206 📌 472

A Quarto document in RStudio with the author field set to "Andrew Heiss"

hahahaha just got an email from someone who was using Claude to generate a boilerplate #QuartoPub document and the LLM *used my name* as the author. The computers are literally trying to be me now 😂🤣🙃🫠

21.11.2025 13:38 — 👍 341 🔁 29 💬 15 📌 2

Which tools are you using for making them?

20.11.2025 18:23 — 👍 2 🔁 0 💬 1 📌 0

Why Is Larry Summers Still Employed? The revelations about the economist’s attempts to pressure a women into a “relationship”—with guidance from Jeffrey Epstein—should finally disqualify him from teaching students.

I know that at this point it's a subplot in the Epstein files drama, but I feel compelled to point out, once again, that Larry Summers HAS NO BUSINESS teaching students at ANY university ever again!

My latest cries into the abyss, in @thenation.com

www.thenation.com/article/soci...

19.11.2025 16:24 — 👍 2568 🔁 615 💬 70 📌 64

PubPeer - An expert criticism on post-publication peer review platform... There are comments on PubPeer for publication: An expert criticism on post-publication peer review platforms: the case of pubpeer (2025)

A paper critiquing post-publication peer review has numerous made-up references, including a @nature.com article falsely attributed to our Ivan Oransky.
link.springer.com/article/10.1...

16.11.2025 09:11 — 👍 58 🔁 29 💬 1 📌 8

I've used your book several times for a graduate course on Bayes stats for business and data science students with great success. If you divide into beginner and adv. I imagine the first half would fit well for a bachelor level course.

17.11.2025 07:23 — 👍 2 🔁 0 💬 0 📌 0

NEW: Epstein survivors release the most powerful PSA I have ever seen.

Make this go viral so every member of the House of Representatives sees it.

16.11.2025 23:43 — 👍 59194 🔁 36305 💬 1297 📌 2868

Knowledge centre META/e: home for those improving science TU/e has gained a new research centre: META/e. Daniël Lakens and Krist Vaesen were among the founders of this knowledge hub for metascience—research aimed at improving the practice of science itself. ...

TU/e has gained a new research centre: META/e. Daniël Lakens and Krist Vaesen were among the founders of this knowledge hub for metascience—research aimed at improving the practice of science itself. “We want to be a home for every researcher who occasionally wonders: what are we even doing?”

13.11.2025 14:50 — 👍 21 🔁 10 💬 0 📌 1

Yes!!

01.11.2025 07:04 — 👍 4 🔁 0 💬 1 📌 0

Type S and M errors as a “rhetorical tool” We recently posted a preprint criticizing the idea of Type S and M errors ( https://osf.io/2phzb_v1 ). From our abstract: “While these conce...

New blog post on Gelman's recent claim that Type S and M errors are intended as a 'rhetorical tool', and if I was wrong to believe they were recommended more routinely in our recent preprint criticizing the idea of Type S and M errors. daniellakens.blogspot.com/2025/09/type...

28.09.2025 05:22 — 👍 11 🔁 6 💬 0 📌 1

$We present our new preprint titled "Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation". We quantify LLM hacking risk through systematic replication of 37 diverse computational social science annotation tasks. For these tasks, we use a combined set of 2,361 realistic hypotheses that researchers might test using these annotations. Then, we collect 13 million LLM annotations across plausible LLM configurations. These annotations feed into 1.4 million regressions testing the hypotheses. For a hypothesis with no true effect (ground truth $p > 0.05$), different LLM configurations yield conflicting conclusions. Checkmarks indicate correct statistical conclusions matching ground truth; crosses indicate LLM hacking -- incorrect conclusions due to annotation errors. Across all experiments, LLM hacking occurs in 31-50\% of cases even with highly capable models. Since minor configuration changes can flip scientific conclusions, from correct to incorrect, LLM hacking can be exploited to present anything as statistically significant.$

We present our new preprint titled "Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation". We quantify LLM hacking risk through systematic replication of 37 diverse computational social science annotation tasks. For these tasks, we use a combined set of 2,361 realistic hypotheses that researchers might test using these annotations. Then, we collect 13 million LLM annotations across plausible LLM configurations. These annotations feed into 1.4 million regressions testing the hypotheses. For a hypothesis with no true effect (ground truth $p > 0.05$), different LLM configurations yield conflicting conclusions. Checkmarks indicate correct statistical conclusions matching ground truth; crosses indicate LLM hacking -- incorrect conclusions due to annotation errors. Across all experiments, LLM hacking occurs in 31-50\% of cases even with highly capable models. Since minor configuration changes can flip scientific conclusions, from correct to incorrect, LLM hacking can be exploited to present anything as statistically significant.

🚨 New paper alert 🚨 Using LLMs as data annotators, you can produce any scientific result you want. We call this **LLM Hacking**.

Paper: arxiv.org/pdf/2509.08825

12.09.2025 10:33 — 👍 269 🔁 96 💬 6 📌 21

What corresponds to the Z-test in this analogy? If the P-curve is the W-test then what is the Z-test?

25.09.2025 13:30 — 👍 0 🔁 0 💬 1 📌 0

More examples of faked institutional email addresses from @deevybee.bsky.social here deevybee.blogspot.com/2022/10/what...

23.09.2025 14:24 — 👍 12 🔁 2 💬 2 📌 1

9 Equivalence Testing and Interval Hypotheses – Improving Your Statistical Inferences This open educational resource contains information to improve statistical inferences, design better experiments, and report scientific research more transparently.

9 Equivalence Testing and Interval Hypotheses – Improving Your Statistical Inferences share.google/tZRu9HekIBdY...

16.09.2025 21:02 — 👍 3 🔁 0 💬 0 📌 0

Absolutely! I'm planning on getting met into the stats curriculum in our undergrad business adm program. My favorite resource is Lakens' online book.

16.09.2025 21:01 — 👍 1 🔁 0 💬 1 📌 0

If it makes sense to test a hypothesis, do minimum effect testing and/or set alpha as a function of sample size.

16.09.2025 16:51 — 👍 1 🔁 0 💬 1 📌 0

Can researchers stop AI making up citations? OpenAI’s GPT-5 hallucinates less than previous models do, but cutting hallucination completely might prove impossible.

"OpenAI is making “small steps that are good, but I don’t think we’re anywhere near where we need to be”, says Mark Steyvers, a cognitive science and AI researcher at UC Irvine. “It’s not frequent enough that GPT says ‘I don’t know’.”" www.nature.com/articles/d41...

09.09.2025 03:23 — 👍 5 🔁 3 💬 0 📌 0

➡️ Deadline approaching—only one month left to send in your papers and presentation proposals for #CDSM2025!

🚨 𝗖𝗮𝗹𝗹 𝗳𝗼𝗿 𝗣𝗮𝗽𝗲𝗿𝘀: 𝗖𝗮𝘂𝘀𝗮𝗹 𝗗𝗮𝘁𝗮 𝗦𝗰𝗶𝗲𝗻𝗰𝗲 𝗠𝗲𝗲𝘁𝗶𝗻𝗴 𝟮𝟬𝟮𝟱 🚨
📅 𝗡𝗼𝘃 𝟭𝟮–𝟭𝟯, 𝟮𝟬𝟮𝟱 (𝗩𝗶𝗿𝘁𝘂𝗮𝗹)
📥 Submission Deadline: 𝗦𝗲𝗽𝘁 𝟯𝟬, 𝟮𝟬𝟮𝟱

30.08.2025 12:40 — 👍 20 🔁 22 💬 0 📌 1

Models as Prediction Machines: How to Convert Confusing Coefficients into Clear Quantities Abstract Psychological researchers usually make sense of regression models by interpreting coefficient estimates directly. This works well enough for simple linear models, but is more challenging for more complex models with, for example, categorical variables, interactions, non-linearities, and hierarchical structures. Here, we introduce an alternative approach to making sense of statistical models. The central idea is to abstract away from the mechanics of estimation, and to treat models as “counterfactual prediction machines,” which are subsequently queried to estimate quantities and conduct tests that matter substantively. This workflow is model-agnostic; it can be applied in a consistent fashion to draw causal or descriptive inference from a wide range of models. We illustrate how to implement this workflow with the marginaleffects package, which supports over 100 different classes of models in R and Python, and present two worked examples. These examples show how the workflow can be applied across designs (e.g., observational study, randomized experiment) to answer different research questions (e.g., associations, causal effects, effect heterogeneity) while facing various challenges (e.g., controlling for confounders in a flexible manner, modelling ordinal outcomes, and interpreting non-linear models).

Figure illustrating model predictions. On the X-axis the predictor, annual gross income in Euro. On the Y-axis the outcome, predicted life satisfaction. A solid line marks the curve of predictions on which individual data points are marked as model-implied outcomes at incomes of interest. Comparing two such predictions gives us a comparison. We can also fit a tangent to the line of predictions, which illustrates the slope at any given point of the curve.

A figure illustrating various ways to include age as a predictor in a model. On the x-axis age (predictor), on the y-axis the outcome (model-implied importance of friends, including confidence intervals). Illustrated are 1. age as a categorical predictor, resultings in the predictions bouncing around a lot with wide confidence intervals 2. age as a linear predictor, which forces a straight line through the data points that has a very tight confidence band and 3. age splines, which lies somewhere in between as it smoothly follows the data but has more uncertainty than the straight line.

Ever stared at a table of regression coefficients & wondered what you're doing with your life?

Very excited to share this gentle introduction to another way of making sense of statistical models (w @vincentab.bsky.social)
Preprint: doi.org/10.31234/osf...
Website: j-rohrer.github.io/marginal-psy...

25.08.2025 11:49 — 👍 978 🔁 286 💬 47 📌 20

"Being Bayesian in a Frequentist World"

New post on "Bayesian dynamic borrowing" in R 📚

Link 👇

25.08.2025 06:19 — 👍 10 🔁 2 💬 2 📌 0

If you are preparing your bachelor statistics course and would like to add optional material for students to better understand statistics on a conceptual level (see topics in the screenshot) my free textbook provides a state of the art overview. lakens.github.io/statistical_...

25.08.2025 04:54 — 👍 212 🔁 66 💬 3 📌 4

My video about how LLMs are not search engines has led to many, MANY comments telling me that I should be using Perplexity. Some insisting that Perplexity does not hallucinate.

Out of a list of 26 papers it just provided me (in "Research" mode) 4 were real. FOUR. 85% hallucination rate.

23.08.2025 17:54 — 👍 1505 🔁 468 💬 29 📌 24

CRISPR as a microbial immune system In 2003, Mojica wrote the first paper suggesting that CRISPR was an innate microbial immune system. The paper was rejected by a series of high-profile journals, including Nature, Proceedings of the National Academy of Sciences, Molecular Microbiology and Nucleic Acids Research, before finally being accepted by Journal of Molecular Evolution in February, 2005.[3][4]

TIL the original paper describing CRISPR, by Francisco Mojica, was rejected by 4 journals and took 2 years to be published

17.08.2025 04:00 — 👍 300 🔁 77 💬 6 📌 10

1. David Ackerly (UC Berkeley) While his most-cited work is on leaf size and SLA, he also wrote explicitly about plasticity in leaf traits, including shape, in the context of ecological strategies. Example: Ackerly (1997), “Allocation, leaf display, and growth in fluctuating light environments: A comparative study of deciduous and evergreen species” (Oecologia). This emphasizes how plasticity in leaf traits mediates adaptation to light.

Just in case there was any doubt, ChatGPT 5.0 still makes up completely random citations that don't exist and should not be used for literature search.

16.08.2025 06:57 — 👍 972 🔁 267 💬 27 📌 23

‼️Cool new paper‼️

Finds that journal data policies in psychology boost sharing statements to ~100%, but only about half of datasets are complete, understandable, reusable.

Open: open.lnu.se/index.php/me...

12.08.2025 16:35 — 👍 17 🔁 4 💬 1 📌 1

5. Most frequentist methods are just *fine* and there's no need to always go full luxury bayesian in every application.

04.08.2025 17:01 — 👍 26 🔁 6 💬 1 📌 0

Trump, Claiming Weak Jobs Numbers Were ‘Rigged,’ Fires Labor Official

When power is derived from lies, data become the enemy.
www.nytimes.com/2025/08/01/b...

02.08.2025 05:29 — 👍 109 🔁 36 💬 5 📌 1

Will the videos be released to a broad audience after the conference?

01.08.2025 06:02 — 👍 0 🔁 0 💬 1 📌 0

Inspiring PDW on using sensitivity analysis in empirical management research. My contribution is to present the sensemakr package by Cinelli & Hazlett (2020) for observational designs. Thanks a lot to the organizers for putting this fantastic session together. #AOM2025

26.07.2025 07:52 — 👍 16 🔁 2 💬 1 📌 0

Jesper N. Wulff

Latest posts by jnwulff.bsky.social on Bluesky

@jnwulff is following 20 prominent accounts