Jamie Cummins's Avatar

Jamie Cummins

@jamiecummins.bsky.social

Currently a visiting researcher at Uni of Oxford. Normally at Uni of Bern. Meta-scientist building tools to help other scientists. NLP, simulation, & LLMs. Creator and developer of RegCheck (https://regcheck.app). 1/4 of @error.reviews. ๐Ÿ‡ฎ๐Ÿ‡ช

2,723 Followers  |  675 Following  |  882 Posts  |  Joined: 24.06.2023  |  1.6889

Latest posts by jamiecummins.bsky.social on Bluesky

Many error remain to be found in clinical trials. Patients deserve reliable results. Kudos to these authors for their persistent work to correct the record.

02.12.2025 10:12 โ€” ๐Ÿ‘ 7    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

A thread about being wrong:

5 years ago, we wrote a paper about how how newly enfranchised 16-year-olds vote in Austria. But we were wrong.

This year, @elisabethgraf.bsky.social, @schnizzl.bsky.social, Sylvia Kritzinger and I are setting the record straight: authors.elsevier.com/c/1juT5xRaZk...

21.11.2024 18:00 โ€” ๐Ÿ‘ 665    ๐Ÿ” 163    ๐Ÿ’ฌ 26    ๐Ÿ“Œ 59
Video thumbnail

"In 2019 we notified journals about serious integrity concerns in 172 clinical trials. Over five years later, only 22 have been retracted. The 135 unretracted trials have 1989 citations in systematic reviews, clinical guidelines, and consensus statements"

[paraphrased]
www.bmj.com/content/390/...

28.11.2025 10:11 โ€” ๐Ÿ‘ 37    ๐Ÿ” 14    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 2
OpenSAFELY in a nutshell
YouTube video by Bennett Institute for Applied Data Science OpenSAFELY in a nutshell

If you'd like to learn more about how OpenSAFELY works - and how we solved the privacy and efficiency challenges, to make national GP data securely accessible - here's a 5 minute video!

www.youtube.com/watch?v=GRjR...

26.11.2025 19:08 โ€” ๐Ÿ‘ 22    ๐Ÿ” 10    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
Video thumbnail

๐Ÿšจ SynthNet is out ๐Ÿšจ
Researchers propose new constructs and measures faster than anyone can track. We (@anniria.bsky.social @ruben.the100.ci) built a search engine to check what already exists and help identify redundancies; indexing 74,000 scales from ~31,500 instruments in APA PsycTests. ๐Ÿงต1/3

26.11.2025 11:42 โ€” ๐Ÿ‘ 144    ๐Ÿ” 80    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 3
Post image

I was thrilled to have been invited by @sakshighai.bsky.social to speak to folk at LSE on Wednesday about methodological and inferential issues that have cropped up in social science attempts to study large language models!

28.11.2025 13:00 โ€” ๐Ÿ‘ 22    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Congratulations to @simine.com for winning the Einstein Foundation Individual Award! ๐ŸŽ‰

A well-deserved recognition for her seminal efforts to improve scientific rigor, which includes instituting detailed checks for errors and computational reproducibility at Psychological Science.

24.11.2025 13:21 โ€” ๐Ÿ‘ 21    ๐Ÿ” 6    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image Post image

๐Ÿ† Individual: @simine.com, psychologist at @unimelb.bsky.social & editor-in-chief of Psychological Science, is recognized for pioneering methodological rigor, reproducibility & collaborative research, driving initiatives such as @improvingpsych.org & the journal Collabra @ucpress.bsky.social. (2/5)

24.11.2025 09:59 โ€” ๐Ÿ‘ 90    ๐Ÿ” 22    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 8
The speaker at the lectern

The speaker at the lectern

Title slide

Title slide

Next: Jack Wilkinson @jdwilko.bsky.social with 'Problematic clinical trials and the threat to evidence synthesis'
Systematic reviews are considered the cornerstone of medicine. But some of the eligible trials that could be included might be problematic. They could get included.
#IRICSydney

17.11.2025 22:30 โ€” ๐Ÿ‘ 35    ๐Ÿ” 6    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 0

I think this is an overly pessimistic take from the @bmj.com.

Sharing data does not inherently increase trust, rather it enables verification which allows for trust calibration.

This example is a win. Serious issues were rapidly detected that would not have been without mandatory data sharing.

14.11.2025 20:18 โ€” ๐Ÿ‘ 54    ๐Ÿ” 14    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 1
Post image Post image Post image

With every LLM since GPT-4, I've tried a game: ask it to commit a 20 Questions guess to a cipher, we play 20 Questions, and then we see if what it claims to have been its original choice is consistent with its cipher.

ChatGPT-5.1 Thinking is the first model to do this successfully!

14.11.2025 15:44 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Synchronous Robustness Reports could explore implications of different analytical choices โ€“ but they could still suffer from bias. Hardwicke argues that preregistration is crucial to prevent it.

@tomhardwicke.bsky.social

14.11.2025 14:54 โ€” ๐Ÿ‘ 8    ๐Ÿ” 9    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Are methodological and causal inference errors creating a false impression that the gut microbiome causes autism? In this strong analysis, Mitchell, Dahly, and Bishop question the evidence.

They show that triangulation in science requires multiple robust lines of research.

14.11.2025 12:49 โ€” ๐Ÿ‘ 17    ๐Ÿ” 10    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Yes, like a Netflix documentary included IN EVERY SOCIAL PSYCHOLOGY TEXTBOOK

13.11.2025 16:11 โ€” ๐Ÿ‘ 22    ๐Ÿ” 5    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

There is a lot of fuss today over whether chatbots can replace human participants in social sciences research when the solution is obvious: ask chatbots to simulate the views of social scientists and survey them on attitudes towards chatbots as substitutes for human subjects.

10.11.2025 22:45 โ€” ๐Ÿ‘ 170    ๐Ÿ” 27    ๐Ÿ’ฌ 5    ๐Ÿ“Œ 2
Post image

Delighted to support MU Psych Soc's invited lecture on Forensic Metascience by departmental alum, Dr Jamie Cummins @jamiecummins.bsky.social whose work in this area seeks to enhance rigour & accuracy in scientific reporting.

Sincere thanks to Dr Cummins. #MUPsychologyAt25

07.11.2025 12:36 โ€” ๐Ÿ‘ 3    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
The threat of analytic flexibility in using large language models to simulate human data: A call to attention Social scientists are now using large language models to create "silicon samples" - synthetic datasets intended to stand in for human respondents, aimed at revolutionising human subjects research. How...

Super interesting, looking forward to reading this later. You may find this of interest: arxiv.org/abs/2509.13397

07.11.2025 11:20 โ€” ๐Ÿ‘ 6    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Computational Turing Test Reveals Systematic Differences Between Human and AI Language Large language models (LLMs) are increasingly used in the social sciences to simulate human behavior, based on the assumption that they can generate realistic, human-like text. Yet this assumption rem...

LLMs are now widely used in social science as stand-ins for humansโ€”assuming they can produce realistic, human-like text

But... can they? We donโ€™t actually know.

In our new study, we develop a Computational Turing Test.

And our findings are striking:
LLMs may be far less human-like than we think.๐Ÿงต

07.11.2025 11:13 โ€” ๐Ÿ‘ 329    ๐Ÿ” 133    ๐Ÿ’ฌ 14    ๐Ÿ“Œ 38

It was such an honour and privilege to be back at my alma mater 9 years (!!!) after finishing my undergraduate degree to give a talk as part of psych department's 25 year anniversary!

07.11.2025 10:58 โ€” ๐Ÿ‘ 9    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Lovely to welcome back Dr @jamiecummins.bsky.social for tonight's @mupsychology.bsky.social talk as part of our #MUpsychologyAt25 events @maynoothuniversity.ie

06.11.2025 18:48 โ€” ๐Ÿ‘ 8    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

My master thesis file name on my old university's thesis archive site still makes me chuckle.

30.10.2025 12:21 โ€” ๐Ÿ‘ 37    ๐Ÿ” 4    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

example #2345432 that nobody really knows what they mean by "AI"

30.10.2025 12:21 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Will AI solve medicine?

This year Demis Hassabis predicted AI could cure all disease in a decade.

But other scientists like Claus Wilke & Derek Lowe say biology is far more complex, or progress will be limited by clinical trials & economics.

In a new 4hr podcast episode of *Hard Drugs*, we answer: Will AI solve medicine?

29.10.2025 14:11 โ€” ๐Ÿ‘ 52    ๐Ÿ” 15    ๐Ÿ’ฌ 8    ๐Ÿ“Œ 11
me with some garden hoses connected in a  X -> Z <- Y fashion. If I shut the valve at Z, water from X spills out at Y

me with some garden hoses connected in a X -> Z <- Y fashion. If I shut the valve at Z, water from X spills out at Y

I built a DAG diagram with garden hoses for teaching.
Pictured: a collider bias diagram, inspired by a blocked pipe situation I experienced (which I credit with giving me the intuition though it also ruined my belongings in the flooded cellar).

28.10.2025 17:50 โ€” ๐Ÿ‘ 114    ๐Ÿ” 22    ๐Ÿ’ฌ 6    ๐Ÿ“Œ 5
"Traumatized Mr. Incredible" meme with "Data and code available", "After looking at data & code"

"Traumatized Mr. Incredible" meme with "Data and code available", "After looking at data & code"

27.10.2025 15:15 โ€” ๐Ÿ‘ 18    ๐Ÿ” 6    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Video thumbnail

The 2011 Presidential Debate where Sean Gallagher loses the election
part 1 #aras25

22.10.2025 11:32 โ€” ๐Ÿ‘ 32    ๐Ÿ” 10    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1
Preview
AI Surrogates and illusions of generalizability in cognitive science Recent advances in artificial intelligence (AI) have generated enthusiasm for using AI simulations of human research participants to generate new knowโ€ฆ

Can AI simulations of human research participants advance cognitive science? In @cp-trendscognsci.bsky.social, @lmesseri.bsky.social & I analyze this vision. We show how โ€œAI Surrogatesโ€ entrench practices that limit the generalizability of cognitive science while aspiring to do the opposite. 1/

21.10.2025 20:24 โ€” ๐Ÿ‘ 281    ๐Ÿ” 117    ๐Ÿ’ฌ 9    ๐Ÿ“Œ 25
Video thumbnail

New hobby:

Remaking article abstracts as movie trailers to expose hype and fearmongering.

20.10.2025 10:22 โ€” ๐Ÿ‘ 107    ๐Ÿ” 24    ๐Ÿ’ฌ 7    ๐Ÿ“Œ 6
Preview
The threat of analytic flexibility in using large language models to simulate human data: A call to attention Social scientists are now using large language models to create "silicon samples" - synthetic datasets intended to stand in for human respondents, aimed at revolutionising human subjects research.โ€ฆ

"Silicon samples" - using LLMs to generate fake survey responses instead of recruiting humans. Sounds efficient until you realize small model tweaks completely flip your results. Shortcuts in research usually aren't.

09.10.2025 13:05 โ€” ๐Ÿ‘ 8    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Psychologists running empirical studies to rediscover engineering design choices is such a strange genre of papers. By all means, run studies on LLM judgments -- but what else than lexical co-occurence and statistical priors would they be based on??

17.10.2025 10:59 โ€” ๐Ÿ‘ 33    ๐Ÿ” 6    ๐Ÿ’ฌ 5    ๐Ÿ“Œ 4

@jamiecummins is following 20 prominent accounts