xavier roberts-gaal's Avatar

xavier roberts-gaal

@xrg.bsky.social

three language models in a trench coat harvard psych (scholar.harvard.edu/xrg)

141 Followers  |  180 Following  |  31 Posts  |  Joined: 27.09.2023  |  1.8703

Latest posts by xrg.bsky.social on Bluesky


It must be very hard to publish null results
Publication practices in the social sciences act as a filter that favors statistically significant results over null findings. While the problem of selection on significance (SoS) is well-known in theory, it has been difficult to measure its scope empirically, and it has been challenging to determine how selection varies across contexts. In this article, we use large language models to extract granular and validated data on about 100,000 articles published in over 150 political science journals from 2010 to 2024. We show that fewer than 2% of articles that rely on statistical methods report null-only findings in their abstracts, while over 90% of papers highlight significant results. To put these findings in perspective, we develop and calibrate a simple model of publication bias. Across a range of plausible assumptions, we find that statistically significant results are estimated to be one to two orders of magnitude more likely to enter the published record than null results. Leveraging metadata extracted from individual articles, we show that the pattern of strong SoS holds across subfields, journals, methods, and time periods. However, a few factors such as pre-registration and randomized experiments correlate with greater acceptance of null results. We conclude by discussing implications for the field and the potential of our new dataset for investigating other questions about political science.

It must be very hard to publish null results Publication practices in the social sciences act as a filter that favors statistically significant results over null findings. While the problem of selection on significance (SoS) is well-known in theory, it has been difficult to measure its scope empirically, and it has been challenging to determine how selection varies across contexts. In this article, we use large language models to extract granular and validated data on about 100,000 articles published in over 150 political science journals from 2010 to 2024. We show that fewer than 2% of articles that rely on statistical methods report null-only findings in their abstracts, while over 90% of papers highlight significant results. To put these findings in perspective, we develop and calibrate a simple model of publication bias. Across a range of plausible assumptions, we find that statistically significant results are estimated to be one to two orders of magnitude more likely to enter the published record than null results. Leveraging metadata extracted from individual articles, we show that the pattern of strong SoS holds across subfields, journals, methods, and time periods. However, a few factors such as pre-registration and randomized experiments correlate with greater acceptance of null results. We conclude by discussing implications for the field and the potential of our new dataset for investigating other questions about political science.

I have a new paper. We look at ~all stats articles in political science post-2010 & show that 94% have abstracts that claim to reject a null. Only 2% present only null results. This is hard to explain unless the research process has a filter that only lets rejections through.

11.02.2026 17:00 β€” πŸ‘ 635    πŸ” 222    πŸ’¬ 30    πŸ“Œ 51

If you work at the intersection of computational neuroscience and machine learning, consider applying for this postdoc position (January 2027 start date):
academicpositions.harvard.edu/postings/15868
An opportunity to work with a great group of people across Harvard, MIT, and UC Berkeley.

10.02.2026 19:36 β€” πŸ‘ 71    πŸ” 48    πŸ’¬ 3    πŸ“Œ 2
OSF

We just preprinted a huge meta-meta-analysis examining the effects of exercise on cognition, memory, and executive function

In short
- 2239 effect sizes
- extreme between-study heterogeneity
- extensive publication bias
- some subgroup/exercise-specific effects

More below (doi.org/10.31234/osf...)

01.12.2025 16:19 β€” πŸ‘ 65    πŸ” 31    πŸ’¬ 1    πŸ“Œ 0
Preview
Doctoral Researcher, Philosophy of science: AI in scientific problem solving Doctoral Researcher, Philosophy of science: AI in scientific problem solving

A funded PhD position in philosophy of science: AI in scientific problem solving. At @helsinki.fi @tint-philosophy.bsky.social #philsci jobs.helsinki.fi/job/Helsinki...

28.11.2025 14:26 β€” πŸ‘ 22    πŸ” 18    πŸ’¬ 1    πŸ“Œ 1
Lying and 6 Other Things Babies Learn Early
YouTube video by SciShow Lying and 6 Other Things Babies Learn Early

How and when and why do children use loopholes?

Our research on this hits the Big Time*:

(* Big time = SciShow, with Hank Greene)

youtu.be/f7FhKywXRGk?...

(original paper in question: srcd.onlinelibrary.wiley.com/doi/abs/10.1...)

18.11.2025 19:49 β€” πŸ‘ 33    πŸ” 2    πŸ’¬ 1    πŸ“Œ 1
Post image

then you can reply with

12.11.2025 14:42 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

thanks for mentioning our preprint!

we're currently in revision; keen to hear any devastating critiques so the paper is as useful as it can be :)

12.11.2025 14:40 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

🚨New Preprint: We develop a novel task that probes counterfactual thinking without using counterfactual language, and that teases apart genuine counterfactual thinking from related forms of thinking. Using this task, we find that the ability for counterfactual thinking emerges around 5 years of age.

13.10.2025 19:58 β€” πŸ‘ 77    πŸ” 14    πŸ’¬ 2    πŸ“Œ 0
Preview
Scripps Research-led team receives $14.2M NIH award to map the body’s β€œhidden sixth sense”

The NIH has awarded a $14.2M Director’s Transformative Research Award to a team led by Nobel Prize-winning neuroscientist @ardemp.bskyverified.social, Prof. @liye-tsri.bsky.social and Assoc. Prof. @xinjin.bsky.social to map interoception and build the first atlas of this hidden sixth sense.

10.10.2025 00:06 β€” πŸ‘ 124    πŸ” 25    πŸ’¬ 0    πŸ“Œ 0
Abstract and results summary

Abstract and results summary

🚨 New preprint 🚨

Across 3 experiments (n = 3,285), we found that interacting with sycophantic (or overly agreeable) AI chatbots entrenched attitudes and led to inflated self-perceptions.

Yet, people preferred sycophantic chatbots and viewed them as unbiased!

osf.io/preprints/ps...

Thread 🧡

01.10.2025 15:16 β€” πŸ‘ 171    πŸ” 88    πŸ’¬ 5    πŸ“Œ 15

i LOVED getting over it! will check out :)

23.09.2025 20:16 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
When You Fall on Your Face, a Philosophical Designer Succeeds

My friends @foddy.net and @gcuzzillo.bsky.social's game @babystepsgame.bsky.social came out today and it looks amazing. @foddy.net is an artist and philosopher in the truest sense of the words, who just happens to be using video games as his medium at the moment: www.nytimes.com/2025/09/23/a...

23.09.2025 19:59 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 2    πŸ“Œ 0

love this really elegant paper spearheaded by Linas!

one of the clearest instances of resource-rational social cognition i've seen

worth a read!

17.09.2025 02:38 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Transmission networks of long-term and short-term knowledge in a foraging society Abstract. Cultural transmission across generations is key to cumulative cultural evolution. While several mechanismsβ€”such as vertical, horizontal, and obli

πŸ’™New paper!πŸ’™

How is knowledge transmitted across generations in a foraging society?

With @danielredhead.bsky.social
we found: In BaYaka foragers, long-term skills pass in smaller, sparser networks, while short-term food info circulates broadly & reciprocally

academic.oup.com/pnasnexus/ar...

14.09.2025 07:52 β€” πŸ‘ 162    πŸ” 66    πŸ’¬ 4    πŸ“Œ 5
Post image Post image

out now in Open Mind: "People Evaluate Agents Based on the Algorithms That Drive Their Behavior"

by Bigelow & me

Paper: direct.mit.edu/opmi/article...

OSF: osf.io/yzbrq/?view_...

15.09.2025 13:29 β€” πŸ‘ 40    πŸ” 9    πŸ’¬ 1    πŸ“Œ 0
Preview
Pseudo Effects: How Method Biases Can Produce Spurious Findings About Close Relationships - Samantha Joel, John K. Sakaluk, James J. Kim, Devinder Khera, Helena Yuchen Qin, Sarah C. E. Stanton, 2025 Research on interpersonal relationships frequently relies on accurate self-reporting across various relationship facets (e.g., conflict, trust, appreciation). Y...

In a new paper, my colleagues and I set out to demonstrate how method biases can create spurious findings in relationship science, by using a seemingly meaningless scale (e.g., "My relationship has very good Saturn") to predict relationship outcomes. journals.sagepub.com/doi/10.1177/...

10.09.2025 18:18 β€” πŸ‘ 189    πŸ” 80    πŸ’¬ 11    πŸ“Œ 12

good timing!

Also check out this paper by Jonathan de Quidt, Johannes Haushofer, and Christopher Roth deriving bounds for demand effects in the dictator game (here, we directly replicate their "weak" demand cue in a different sample) www.aeaweb.org/articles?id=...

15.09.2025 18:59 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

yes, thanks for your interest! the preprint is here: osf.io/preprints/ps...

(i never know whether the algorithm penalizes threads with a link in the first post)

15.09.2025 18:41 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Sage Journals: Discover world-class research Subscription and open access journals from Sage, the world's leading independent academic publisher.

haha, well, at least 4% of people say shape-shifting lizards control the govt.

Also, some great work by Seetahul and Greitemeyer suggests that participants are more likely to react when they think studies will counteract their interests (journals.sagepub.com/doi/full/10....)

15.09.2025 18:39 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

One thing we can't rule out: a mixture of demand compliance AND reactance in the same person (i.e., feeling pulled in both directions). But I'm not sure what kind of experiment could test this easily. A straightforward within-subjects design could be subject to concerns of "meta demand."

15.09.2025 18:33 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

We also don't see a very sharp difference in the standard deviations in both demand conditions (which we'd expect if we have reacters and compliers). Distributions look pretty similar.

15.09.2025 18:33 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1

Good point! We address this in study 3 (p. 27), where we fit a mixture model testing for latent classes of compliers and reacters. No latent class exhibited significant evidence of a shift from zero, either in the compliance or reactance direction. The subsample which trended closest was <5% of Ps

15.09.2025 18:27 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

No Evidence of Experimenter Demand Effects in Three Online Psychology Experiments: https://osf.io/g6xhf

15.09.2025 16:44 β€” πŸ‘ 5    πŸ” 4    πŸ’¬ 0    πŸ“Œ 0
OSF

Thrilled to work with Lucas Woodley, @rcalcott.bsky.social, & @fierycushman.bsky.social on this project! (Also, glad that many of our causal estimates seem to be unbiased by demand.) Lots more in the paper if you’re interested: osf.io/g6xhf_v1

15.09.2025 17:18 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
meme about demand effects. Darth Maul from Star Wars: The Phantom Menace is igniting his lightsaber in the Naboo palace. The top panel shows Maul igniting the first beam of his lightsaber, with the phrase (from a reviewer, in Comic Sans) "I'm worried these results may be due to demand." The bottom panel shows Maul igniting the second beam of his lightsaber (in dramatic fashion), with the text in large bold font "DEMAND EFFECTS DO NOT EXIST." (Note that we only claim demand effects in online experiments using standard paradigms are weak and/or elusive, and therefore unlikely to bias results. It is a meme :))

meme about demand effects. Darth Maul from Star Wars: The Phantom Menace is igniting his lightsaber in the Naboo palace. The top panel shows Maul igniting the first beam of his lightsaber, with the phrase (from a reviewer, in Comic Sans) "I'm worried these results may be due to demand." The bottom panel shows Maul igniting the second beam of his lightsaber (in dramatic fashion), with the text in large bold font "DEMAND EFFECTS DO NOT EXIST." (Note that we only claim demand effects in online experiments using standard paradigms are weak and/or elusive, and therefore unlikely to bias results. It is a meme :))

In short: do you need to worry about experimenter demand ruining *your* online study? Based on our evidence, probably not.

That's good news for the field! As we argue, demand effects appear, at least in their simplest form, to be more phantom than menace (7/8)

15.09.2025 17:18 β€” πŸ‘ 8    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Then we measured participants' dictator game behavior, moral vignette judgments, and change in ingroup attitudes after an intervention (we used an inert subliminal priming intervention for measurement purposes).

Control and demand conditions were statistically indistinguishable! (6/8)

15.09.2025 17:18 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

To answer this, we used obvious ("We hypothesize...") and subtle demand manipulations ("These images are designed to make you feel more warmth toward the average [conservative/liberal]")

In each case we verified participants correctly understood study hypotheses. (5/8)

15.09.2025 17:18 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

...and demand effects are most often observed with small student samples or very heavy-handed cues ("You will help us if you...")

But modern psychology experiments use experienced online samples and standardized paradigms. Is demand a realistic concern in this setting? (4/8)

15.09.2025 17:18 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Some background: meta-analysis (@nicholascoles.bsky.social, Morgan Wyatt, & Michael C. Frank) and prior large-scale studies using economic games (@jondequidt.bsky.social, @johanneshaushofer.com, & Christopher Roth) find small though inconsistent demand effects... (3/8)

15.09.2025 17:18 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

In three preregistered studies (N=2,254), we revealed the study’s hypothesis. Participants’ beliefs changed but their behavior didn’t.

In other words, in a dictator game, a moral vignette, and an attitudes intervention, we created experimenter demand but it had no effect! (2/8)

15.09.2025 17:18 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@xrg is following 20 prominent accounts