František Bartoš's Avatar

František Bartoš

@fbartos.bsky.social

PhD Candidate | Psychological Methods | UvA Amsterdam | interested in statistics, meta-analysis, and publication bias | once flipped a coin too many times

888 Followers  |  199 Following  |  97 Posts  |  Joined: 14.11.2024  |  2.1507

Latest posts by fbartos.bsky.social on Bluesky


Post image

Everything is ready for the Perspectives on Scientific Error conference that starts tomorrow in Leiden! I look forward to hanging out with the mix of metascientists, philosophers of science, and statisticians! So many old friends will be there (and hopefully some new ones)! #PSE8

10.02.2026 17:10 — 👍 51    🔁 9    💬 0    📌 2

Come to Amsterdam or join online for the full week of JASP workshops (24th-28th of August)! If you can't do the full week or you are only interested in meta-analysis, I will be giving the Meta-Analysis workshop on 25th of August.

jasp-stats.org/2026/02/05/h...

06.02.2026 09:23 — 👍 4    🔁 1    💬 0    📌 0
Diagram showing four phases of methodological research (Theory, Exploration, Systematic Comparison, Evidence Synthesis) with an arrow indicating that preregistration usefulness increases from early to late phases. Each phase lists its aim, elements, outcome, and an example from factor retention research.

Diagram showing four phases of methodological research (Theory, Exploration, Systematic Comparison, Evidence Synthesis) with an arrow indicating that preregistration usefulness increases from early to late phases. Each phase lists its aim, elements, outcome, and an example from factor retention research.

Does it make sense to preregister simulation studies?
This question has sparked a lot of debate.

▶️We* work through the why, when, and how
▶️We discuss different phases of methodological research to clarify where preregistration might (or might not) add value

📝 Preprint: doi.org/10.31234/osf...

04.02.2026 10:40 — 👍 37    🔁 13    💬 1    📌 0

Does it mean that AI/LLMs do not help at education? I personally don't think so. I'm using the AI every day and find it incredibly useful. It would be odd if they didn't help at learning at all. However, the current empirical base does not substantiate strong claims.

28.01.2026 20:41 — 👍 1    🔁 1    💬 0    📌 0
Post image

Meta-analysis level re-analysis then further highlights the issue of publication bias. Extremely overstated evidence (left) and mean effect size estimates (middle) due to a large degree of publication bias (right).

28.01.2026 20:41 — 👍 1    🔁 0    💬 1    📌 0
Post image

We explored several moderators and compared results of studies published before and after 2023 (to assess older AI systems and modern LLMs) but we did not find any meaningful difference.

28.01.2026 20:41 — 👍 0    🔁 0    💬 1    📌 0

Publication bias-adjusted estimates decrease the average effect from d = 0.63 to d = 0.20. More importantly, the between-study heterogeneity is so large that the distribution of effects can range from -1.52 to 1.91! This is a ridiculous variance making the mean meaningless.

28.01.2026 20:41 — 👍 1    🔁 0    💬 1    📌 0
Post image

We managed to collect 1,840 effect size estimates from 67 meta-analyses. The distribution of study-level effect sizes shows both a notable skew (funnel plot on the left) and clear selection for positive effects (z-curve plots on the right).

28.01.2026 20:41 — 👍 0    🔁 0    💬 1    📌 0

We recently criticized one meta-analysis on the effect if ChatGPT on learning for failing to adjust for publication bias (bsky.app/profile/fbar...). In a response, the original authors argued that many other meta-analyses find the same effects. So we examined them all.

28.01.2026 20:41 — 👍 0    🔁 0    💬 1    📌 0
Post image

We just posted a preprint with a comprehensive meta-meta-analysis of the effects of AI/LLMs on learning.

TLDR:
- 1,840 effect sizes
- extreme between-study heterogeneity
- extreme publication bias
- small average effects (three times lower than usually reported)
(osf.io/preprints/ps...)

28.01.2026 20:41 — 👍 8    🔁 1    💬 1    📌 0
Preview
Redefine Statistical Significance Part XXI: Edgeworth Proposed the .005 Criterion Back in 1885 The statistical significance test was not invented by Ronald Fisher. The key idea was already laid out by Francis Ysidro Edgeworth (1845-1926), whose 1885 article “Methods of statistics&#8221…

Edgeworth proposed the alpha=.005 criterion 134 years prior to Benjamin et al. (2019). :-)
www.bayesianspectacles.org/redefine-sta...

14.01.2026 11:41 — 👍 1    🔁 1    💬 0    📌 0
Preview
JASP for Quality Control, Example 6: The World's Earliest Recorded Outlier? - JASP Services BV Information hidden in raw data can be revealed most easily by means of statistical visualization techniques (“always plot your data”). To demonstrate I will now analyze the duration of reign (in years...

The world's earliest recorded outlier? Let me know if you have an even older example!

www.jasp-services.com/jasp-for-qua...

24.12.2025 13:09 — 👍 2    🔁 1    💬 0    📌 0

Surprisingly never in the case of publication bias tests :D

23.12.2025 20:01 — 👍 3    🔁 0    💬 1    📌 0
Post image Post image

"we did not find any evidence for publication bias (p=0.077)"

23.12.2025 19:49 — 👍 3    🔁 0    💬 1    📌 0

This is also likely to be the last update of this version of the package. Next year, I will introduce breaking changes to the interface with the 4.0 major release, which will make the interface much more similar to metafor.

23.12.2025 10:11 — 👍 1    🔁 0    💬 1    📌 0
Guide to RoBMA Vignettes

As such, it provides an easy-to-apply state-of-the-art Bayesian meta-analytic methodology for most meta-analytic settings!

See an overview of the current functionality with a brief description of all vignettes fbartos.github.io/RoBMA/articl...

23.12.2025 10:11 — 👍 2    🔁 0    💬 1    📌 0
Multilevel Robust Bayesian Meta-Analysis

The Robust Bayesian Meta-Analysis package got updated with additional vignettes explaining how to perform Bayesian model-averaged publication bias-adjusted

- multilevel meta-analysis (cran.r-project.org/web/packages...)
- multilevel meta-regression (cran.r-project.org/web/packages...)

23.12.2025 10:11 — 👍 22    🔁 9    💬 1    📌 0
Preview
At the Albert Heijn, You Get About 2% More Potatoes Than What it Says on the Label - JASP Services BV A week ago I started a small quality control project where I measured twenty “1 kg” Albert Heijn (AH) potato bags in order to assess whether or not AH is systematically underfilling them, as some soci...

I recently bought 20 bags of potato that, according to the Albert Heijn supermarket, should each contain 1 kg. This turns out to be *false*.

www.jasp-services.com/at-the-alber...

17.12.2025 21:51 — 👍 4    🔁 3    💬 0    📌 0

Yep, its ridiculous. Those studies should not be published...

Extracting the study-level data from existing meta-analyses is quite feasible, so, there is almost no excuse not to do so.

15.12.2025 14:09 — 👍 2    🔁 0    💬 1    📌 0
OSF

Also, you cannot really evaluate between-study heterogeneity, see e.g. our latest study-level meta-meta-analysis that shows the limitations of the previous meta-analysis-level meta-meta-analysis doi.org/10.31234/osf...

15.12.2025 13:52 — 👍 2    🔁 0    💬 1    📌 0

My main worry is that they might have synthesized the meta-analytic estimates rather than the study-level estimates? The manuscript wasn't super clear on that and the OSF had only meta-analysis level data?
If so, that makes the publication bias adjustment ineffective...

15.12.2025 13:49 — 👍 2    🔁 0    💬 1    📌 0
Preview
Do the 1kg Albert Heijn Potato Bags Really Contain 1kg of Potatoes? - JASP Services BV A few days ago I announced a small quality control project where I would measure 20 bags of “1 kg” Albert Heijn (AH) potato bags to assess whether or not AH is systematically underfilling them, as som...

The suspense is building: do the measurements of 20 units indicate that the Albert Heijn underfills its 1 kg bags of potatoes? An interim post on the importance of articulating your predictions *before* seeing the results. :-)

www.jasp-services.com/do-the-1kg-a...

12.12.2025 11:43 — 👍 6    🔁 4    💬 0    📌 0
Preview
JASP for Quality Control, Example 4: The Raincloud Plot - JASP Services BV In our last post we discussed the boxplot of the distances to the sun for each of the eight planets in our solar system, as measured in astronomical units (AU; AU=1 is the average distance from the ea...

This week's blog post features "raincloud plots", a relatively recent development in data visualization.

Will the raincloud plot gradually replace the box plot? It just might!

Check out the raincloud plot for the planets in our solar system at

www.jasp-services.com/jasp-for-qua...

03.12.2025 09:39 — 👍 5    🔁 5    💬 0    📌 0

Also, this should not be a reason to stop exercising.
1) There are other benefits of exercise
2) Some populations/exercises show benefit
3) There might be wider effects on cognition; however, the literature is too heterogeneous and contaminated with publication bias to be certain

01.12.2025 16:19 — 👍 13    🔁 0    💬 0    📌 0

I think that the field needs to clean up the published literature a bit. Additional small studies are not going to move the needle at this point; maybe a couple of large-scale, pre-registered studies might provide more insight?

01.12.2025 16:19 — 👍 11    🔁 2    💬 1    📌 0
Post image

We also re-analyzed all of the original meta-analyses individually. Many of them are consistent with publication bias: the evidence for and the degree of the pooled effects decrease once publication bias is adjusted for.

01.12.2025 16:19 — 👍 2    🔁 0    💬 1    📌 0
Post image Post image Post image

We run subgroup analyses for each outcome/population/intervention. We found that most results are too heterogeneous to tell (see wide prediction intervals), but some interventions seem to be promising and some have substantive evidence against them. See figures for each outcome.

01.12.2025 16:19 — 👍 1    🔁 0    💬 1    📌 0
Post image

First, we found notable publication bias, especially in studies on general cognition and executive function. Importantly, there was extreme between-study heterogeneity (tau ~ 0.3-0.6!). This means that the results were consistent with both large benefit but also large harm.

01.12.2025 16:19 — 👍 3    🔁 0    💬 1    📌 0

We were not the only ones to notice, also see @matthewbjane.bsky.social commenting on this when the study came out:
x.com/MatthewBJane...

So, we manually extracted the study-level data from the included meta-analyses and re-evaluated the evidence.

01.12.2025 16:19 — 👍 1    🔁 0    💬 1    📌 0
Preview
Effectiveness of exercise for improving cognition, memory and executive function: a systematic umbrella review and meta-meta-analysis Objective To evaluate systematic reviews of randomised controlled trials (RCTs) on the effects of exercise on general cognition, memory and executive function across all populations and ages. Methods...

Previous meta-meta-analysis (doi.org/10.1136/bjsp...) indicated consistent benefits of exercise for cognitive benefits across all domains and populations. However, it synthesized meta-analytic estimates and, as such, it could not adjust for publication nor evaluate heterogeneity.

01.12.2025 16:19 — 👍 3    🔁 0    💬 1    📌 0

@fbartos is following 20 prominent accounts