27 AI models were ranked by the public and ChatGPT came 8th β these are the models that beat it
A surprising set of results for the big AI chatbots
AI benchmarks show models like GPT-5 might ace scientific reasoning, or Gemini and Claude adapt to new concepts. But which LLMs offer the best user experience?
@tomsguide.com explains how Prolific's AI leaderboard HUMAINE is a big step forward.
#ArtificialIntelligence #MachineLearning #LLM
19.09.2025 15:32 β π 4 π 0 π¬ 0 π 0
Super excited to share this work with the world!
We wanted to put human preference at the heart of our AI model leaderboard, and to do so in a principled and representative manner.
The result: HUMAINE
Check out the data here and get in touch with any questions: huggingface.co/spaces/Proli...
17.09.2025 18:22 β π 2 π 2 π¬ 0 π 0
Probability of Practical Superiority (PPS) Comparison Matrix is available under "Model Comparison" for estimating how likely one model is likely to outperform another in real-world usage.
Values >55% suggest one model meaningfully outperforms beyond random chance.
17.09.2025 16:12 β π 0 π 0 π¬ 1 π 0
HUMAINE uses rigorous methodology to ensure representative and reliable results.
Our methodology includes:
- Comparative assessment
- Statistically rigorous hierarchical modeling
- Four core dimensions of user experience and overall winner
- Commitment to demographic representation
17.09.2025 16:12 β π 0 π 0 π¬ 1 π 0
π Gemini-2.5-Pro takes the #1 spot decisively, with DeepSeek-V3 emerging as the surprise runner-up.
βοΈ Age groups show highest disagreement on model rankings.
π 27 models tested across 100K+ human comparisons - one of the largest public AI evaluations to date.
17.09.2025 16:12 β π 0 π 0 π¬ 1 π 0
Our human-centered evaluation and leaderboard shows how people actually experience AI models in the real world.
Rich, multi-dimensional feedback from a diverse, representative sample of real users from our pool - revealing not just which model they prefer, but why.
17.09.2025 16:12 β π 0 π 0 π¬ 1 π 0
Visual of Prolific's human-centered leaderboard for Large Language Models (LLMs). There are four visible bar graphs with the labels of gemini-2.5-pro, deepseek-chat-v3-0324, magistral-medium-2506, and grok-4. The bars appear on the branded Prolific blue, pink, and white blackground. The title reads "The benchmark with human experience at the center."
Introducing HUMAINE: the LLM benchmark that puts real human experience first π―
21,352 human evaluators. 27 models. 22 demographic groups. 5 evaluation dimensions.
In partnership with @hf.co. See insights: huggingface.co/spaces/Proli...
More in π§΅ #ArtificialIntelligence #MLSky #LLM #Developer
17.09.2025 16:12 β π 2 π 0 π¬ 1 π 1
What's New at Prolific: regional rep samples and more | Prolific
Discover Prolific's September 2025 updates: regional rep samples, new skilled participants, and more
We've also launched new features like regionally-stratified representative samples for the US and UK, upgraded AI-powered audience search, and more.
See everything that's new here: www.prolific.com/resources/wh...
11.09.2025 19:43 β π 0 π 0 π¬ 0 π 0
New: Periodic identity confirmation checks with video recording to ensure authentic, human feedback in your academic and industry research.
Along with bank-grade ID verification, this ensures participants are who they say they areβnot bots, duplicates, or fraud.
#AcademicSky #ResearchIntegrity
11.09.2025 19:43 β π 3 π 1 π¬ 1 π 0
Prolific Meetup #1 β Researchers Building Real-World AI Tooling Β· Luma
About the Event
Prolific is bringing the AI community together in San Francisco to explore the evolving role of humans in post-training, alignment, evaluation,β¦
AI/ML researchers in San Francisco, we have a few spaces left at our community event πΊπΈ
Join us Sep 11 for this monthβs theme: Researchers building real-world AI tooling. With speakers Prolific CEO Phelim Bradley and Jiaxin Pei, Postdoc at @stanfordhai.bsky.social. π
#MLSkyΒ #ArtificialIntelligence
08.09.2025 18:41 β π 3 π 0 π¬ 0 π 0
π Audience Finder is here.
Instantly check if the participants you need for your research are on Prolific. No set-up required.
Just type who youβre looking for (βBilingual psychologists who speak Mandarinβ) and see live results in seconds.
β‘οΈ www.prolific.com/audience-fin... | #AcademicSky
21.08.2025 10:11 β π 1 π 0 π¬ 0 π 0
YouTube video by Prolific
Keeping Research Real in the Age of AI: LLM Detection, Data Quality, and Fraud Prevention | Prolific
π Model-based screening during participant onboarding through Protocol, our proprietary data protection system, where we run internal models to flag LLM-generated responses with high confidence.
More information and tips in this transparent session: youtu.be/MBo50M6etCk
18.08.2025 17:29 β π 1 π 1 π¬ 0 π 0
π Bi-weekly data quality audits where a specialized team of human reviewers benchmark data quality across honesty, transparency, verbosity, and attention.
π Authenticity checks, our own tool that detects LLM-generated responses using advanced behavioral analysisβwith 98.7% accuracy.
18.08.2025 17:29 β π 1 π 1 π¬ 1 π 0
In the age of AI, authenticity matters more than ever in academic research. LLM-generated responses, false demographics, and more pose a threat.
VP of Product Sara details tools and internal measures weβve implemented to help maintain research integrity π§΅
#AcademicSky #ResearchIntegrity
18.08.2025 17:29 β π 7 π 1 π¬ 1 π 1
Absolutely. We continuously adapt our systems to ensure high quality data. Our LLM detection tool uses advanced behavioral analysis (98.7% accuracy), our data protection system also catches LLM use at onboarding stage, and more. Please do report low-quality IDs in-app and get in touch with feedback.
06.08.2025 12:52 β π 1 π 0 π¬ 0 π 0
Interesting findings around how agents compete with humans for partnerships by @yaominj.bsky.social, @levinbrinkmann.bsky.social, Anne-Marie Nussberger, Ivan Soraperra, @jfbonnefon.bsky.social, @iyadrahwan.bsky.social, @mpib-berlin.bsky.social.
Taskers sourced via Prolific.
#AI #MLSky #AcademicSky
04.08.2025 20:17 β π 4 π 2 π¬ 0 π 0
"Through three experiments (N = 975), we found that bots, though more prosocial than humans and linguistically distinguishable, were not selected preferentially when their identity was hidden. Instead, humans misattributed botsβ behavior to humans and vice versa."
arxiv.org/abs/2507.13524
04.08.2025 20:17 β π 5 π 0 π¬ 1 π 0
OSF
New preprint -- we (me, @klempert.bsky.social, Dave Wolk, and Joe Kable) examined whether characteristic aging-related changes in cognition and personality show up on six online platforms (3 crowdsourcing sites -- MTurk, CloudResearch Toolkit, & Prolific, and 3 panels). osf.io/preprints/ps... 1/
28.07.2025 19:27 β π 5 π 2 π¬ 1 π 1
Sharing a piece I wrote for The AI Journal on who should govern AI: aijourn.com/a-fine-balan...
Our recent @joinprolific.bsky.social polling shows 69.7% of people think AI investment will primarily benefit corporations not the public. They're probably right. [1/3]
#AI #AIGovernance #TechPolicy
28.07.2025 10:14 β π 2 π 1 π¬ 1 π 0
Great analysis and always a pleasure to support.
24.07.2025 20:05 β π 0 π 0 π¬ 0 π 0
Original thread: bsky.app/profile/kobi...
Congratulations Kobi and team. Pleased to have supported this project.
24.07.2025 19:56 β π 0 π 0 π¬ 0 π 0
A still image of the PhD candidate Kobi Hackenburg's AI persuasion research paper titled "The Levers of Political Persuasion with Conversational AI."
The largest investigation of AI persuasion with 76,977 participants across 3 large-scale experiments. Excellent work by @ox.ac.uk PhD candidate @kobihackenburg.bsky.social.
19 LLMs. 707 political issues. 466,769 fact-checkable claims evaluated.
arxiv.org/abs/2507.13919
#AcademicSky #MLSky #PhDSky
24.07.2025 19:56 β π 9 π 3 π¬ 1 π 0
Sorry to hear this Jim, we're keen to investigate this jump. You can report it quickly within the platform ("Action" tab), but alternatively could you submit a data quality report ASAP if not already so we can investigate the IDs more urgently? Thank you! forms.prolific.com/to/Zv6w7ZWt
18.07.2025 14:13 β π 0 π 0 π¬ 0 π 0
4) Report data quality concerns in app
If you do experience data quality issues, we want to investigate quickly. To easily report it, click the "Action" dropdown on the submissions page to flag any concerns.
Full list of updates: www.prolific.com/resources/wh...
10.07.2025 14:00 β π 0 π 0 π¬ 0 π 0
3) Liveness checks for all new participants
These checks use biometric data to ensure participants are real, live people.
10.07.2025 14:00 β π 0 π 0 π¬ 1 π 0
2) Increased guardrails on new participant acquisition
For example, as a researcher you can bring your own participants by request only.
10.07.2025 14:00 β π 0 π 0 π¬ 1 π 0
1) More rigorous security when changing email addresses
This makes it harder for accounts to be compromised and helps prevent fraud.
10.07.2025 14:00 β π 0 π 0 π¬ 1 π 0
An image depicting new product features on Prolific, a platform which provides human data for research.
4 new security enhancements for better data quality in your academic research π§΅
With data quality an absolute priority yet challenge for many researchers, weβve made several important updates to strengthen security and maintain data quality on the Prolific platform.
#AcademicSky #PhDSky #Research
10.07.2025 14:00 β π 3 π 0 π¬ 1 π 0
π₯ Webinar: Why AI leaderboards miss the mark π«
@cohereforai.bsky.socialβs Oliver Nan, University of Washington's @huashen.bsky.social, and Prolificβs Nora Petrova will be discussing where leaderboards are falling short, how to improve them, and more.
π www.prolific.com/resources/wh... | #MLSky #AI
07.07.2025 14:01 β π 4 π 0 π¬ 0 π 1
AI professor. Director, Foundations of Cooperative AI Lab at Carnegie Mellon. Head of Technical AI Engagement, Institute for Ethics in AI (Oxford). Author, "Moral AI - And How We Get There."
https://www.cs.cmu.edu/~conitzer/
Assistant Professor in Social Psychology, Tilburg University
CogSci, Philosophy & AI, Postdoc at Max Planck Institute Berlin.
Professor of Machine Learning at TUBerlin, group leader at PTB. Lab account: @qailabs.bsky.social.
@sparsity@mastodon.social
tu.berlin/uniml/about/head-of-group
Assistant professor at Northwestern Kellogg | human AI collaboration | computational social science | affective computing
Senior research scientist at Los Alamos National Laboratory. Former UCL, UTexas, Alan Turing Institute, Ellis EU. CogSci, AI, Comp Neuro, AI for scientific discovery https://bradlove.org
π©βπ« Professor at Wharton
π Author of the WSJ Bestseller #HowToChange
π§ Host of Charles Schwabβs #Choiceology podcast
π° Author of the Milkman Delivers newsletter @ http://katymilkman.substack.com
π Website: www.katymilkman.com
Chief Behavioral Scientist | Author
www.michaelhallsworth.com
Assistant Professor of Psychology at Duke University studying kids & culture. Director of the Mind & Culture Lab. Mom x3. Some people just want to watch the world learn.
dorsaamir.com | mindandculturelab.com
Assistant Prof. in Org. Behavior @StanfordGSB | Computational Culture Lab http://comp-culture.org | Social Networks, Cognition, Cultural Evolution, AI
Assistant professor in the science of reading at UNC. I study the intersection of knowledge and text comprehension.
Professor, Researcher, Author, Speaker, Editor
Prof of Ed Psych and Learning Sciences | Making Tech Work For Us, Again | APA & AERA Fellow | Self-regulated learning, epistemic cognition, digital literacy | Journal and Handbook Editor | Book Author | Views are my own. https://linktr.ee/jeffgreene
CNRS Researcher at IAST and Toulouse School of Economics
Working on Cumulative culture, Social learning, Innovation, ...
Research Scientist @MPIB in Berlin working on language evolution and human-machine cultural evolution
Education Professor, Stirling Centre for Research into Curriculum Making, University of Stirling. Mountain and prog rock enthusiast. European. Blog at http://mrpriestley.wordpress.com
Publications at https://www.stir.ac.uk/people/255862
π₯Ό @MPIB-Berlin.bsky.social;
π computational social science, disinformation/polarization, crowdsourcing, OSINT;
π @MIT.edu @MediaLab.bsky.social, @DFRLab.bsky.social #DigitalSherlocks;
π London, England;
https://www.sohandsouza.info
Senior Research Fellow @UniofOxford. Senior Lecturer @UniofExeter. Associate Research Scientist @Max_Planck_CHM. Formerly @MIT @medialab. Moral Machine. Syrian.
Director, Max Planck Center for Humans & Machines http://chm.mpib-berlin.mpg.de | Former prof. @MIT | Creator of http://moralmachine.net | Art: http://instagram.com/iyad.rahwan Web: rahwan.me
Behavioral Science of AI @ Toulouse School of Economics @tse-fr.eu Director of @iast.fr Chair of Moral AI @ Artificial and Natural Intelligence Toulouse Institute https://jfbonnefon.github.io/ I have a successful life with bipolar disorder