Prolific's Avatar

Prolific

@joinprolific.bsky.social

The ultimate human data platform to power world-changing AI and research. πŸ”— www.prolific.com

272 Followers  |  1,392 Following  |  158 Posts  |  Joined: 02.10.2023  |  2.3355

Latest posts by joinprolific.bsky.social on Bluesky

Preview
27 AI models were ranked by the public and ChatGPT came 8th β€” these are the models that beat it A surprising set of results for the big AI chatbots

AI benchmarks show models like GPT-5 might ace scientific reasoning, or Gemini and Claude adapt to new concepts. But which LLMs offer the best user experience?

@tomsguide.com explains how Prolific's AI leaderboard HUMAINE is a big step forward.

#ArtificialIntelligence #MachineLearning #LLM

19.09.2025 15:32 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Super excited to share this work with the world!

We wanted to put human preference at the heart of our AI model leaderboard, and to do so in a principled and representative manner.

The result: HUMAINE

Check out the data here and get in touch with any questions: huggingface.co/spaces/Proli...

17.09.2025 18:22 β€” πŸ‘ 2    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Preview
HUMAINE Leaderboard - a Hugging Face Space by ProlificAI This application helps you analyze human feedback to evaluate AI models. You provide feedback data, and it gives you insights to improve your models.

Get more insights, compare models, and give us a like on Hugging Face ➑️ huggingface.co/spaces/Proli...

Follow for updates on requested models.

17.09.2025 16:12 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Probability of Practical Superiority (PPS) Comparison Matrix is available under "Model Comparison" for estimating how likely one model is likely to outperform another in real-world usage.

Values >55% suggest one model meaningfully outperforms beyond random chance.

17.09.2025 16:12 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

HUMAINE uses rigorous methodology to ensure representative and reliable results.

Our methodology includes:

- Comparative assessment
- Statistically rigorous hierarchical modeling
- Four core dimensions of user experience and overall winner
- Commitment to demographic representation

17.09.2025 16:12 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

πŸ† Gemini-2.5-Pro takes the #1 spot decisively, with DeepSeek-V3 emerging as the surprise runner-up.

βš–οΈ Age groups show highest disagreement on model rankings.

πŸ“Š 27 models tested across 100K+ human comparisons - one of the largest public AI evaluations to date.

17.09.2025 16:12 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

Our human-centered evaluation and leaderboard shows how people actually experience AI models in the real world.

Rich, multi-dimensional feedback from a diverse, representative sample of real users from our pool - revealing not just which model they prefer, but why.

17.09.2025 16:12 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Visual of Prolific's human-centered leaderboard for Large Language Models (LLMs). There are four visible bar graphs with the labels of gemini-2.5-pro, deepseek-chat-v3-0324, magistral-medium-2506, and grok-4. The bars appear on the branded Prolific blue, pink, and white blackground. The title reads "The benchmark with human experience at the center."

Visual of Prolific's human-centered leaderboard for Large Language Models (LLMs). There are four visible bar graphs with the labels of gemini-2.5-pro, deepseek-chat-v3-0324, magistral-medium-2506, and grok-4. The bars appear on the branded Prolific blue, pink, and white blackground. The title reads "The benchmark with human experience at the center."

Introducing HUMAINE: the LLM benchmark that puts real human experience first 🎯

21,352 human evaluators. 27 models. 22 demographic groups. 5 evaluation dimensions.

In partnership with @hf.co. See insights: huggingface.co/spaces/Proli...

More in 🧡 #ArtificialIntelligence #MLSky #LLM #Developer

17.09.2025 16:12 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1
What's New at Prolific: regional rep samples and more | Prolific Discover Prolific's September 2025 updates: regional rep samples, new skilled participants, and more

We've also launched new features like regionally-stratified representative samples for the US and UK, upgraded AI-powered audience search, and more.

See everything that's new here: www.prolific.com/resources/wh...

11.09.2025 19:43 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

New: Periodic identity confirmation checks with video recording to ensure authentic, human feedback in your academic and industry research.

Along with bank-grade ID verification, this ensures participants are who they say they areβ€”not bots, duplicates, or fraud.

#AcademicSky #ResearchIntegrity

11.09.2025 19:43 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Preview
Prolific Meetup #1 – Researchers Building Real-World AI Tooling Β· Luma About the Event Prolific is bringing the AI community together in San Francisco to explore the evolving role of humans in post-training, alignment, evaluation,…

AI/ML researchers in San Francisco, we have a few spaces left at our community event πŸ‡ΊπŸ‡Έ

Join us Sep 11 for this month’s theme: Researchers building real-world AI tooling. With speakers Prolific CEO Phelim Bradley and Jiaxin Pei, Postdoc at @stanfordhai.bsky.social. πŸ‘‡

#MLSkyΒ #ArtificialIntelligence

08.09.2025 18:41 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

πŸ” Audience Finder is here.

Instantly check if the participants you need for your research are on Prolific. No set-up required.

Just type who you’re looking for (β€œBilingual psychologists who speak Mandarin”) and see live results in seconds.

➑️ www.prolific.com/audience-fin... | #AcademicSky

21.08.2025 10:11 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Keeping Research Real in the Age of AI: LLM Detection, Data Quality, and Fraud Prevention | Prolific
YouTube video by Prolific Keeping Research Real in the Age of AI: LLM Detection, Data Quality, and Fraud Prevention | Prolific

πŸ‘‰ Model-based screening during participant onboarding through Protocol, our proprietary data protection system, where we run internal models to flag LLM-generated responses with high confidence.

More information and tips in this transparent session: youtu.be/MBo50M6etCk

18.08.2025 17:29 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

πŸ‘‰ Bi-weekly data quality audits where a specialized team of human reviewers benchmark data quality across honesty, transparency, verbosity, and attention.

πŸ‘‰ Authenticity checks, our own tool that detects LLM-generated responses using advanced behavioral analysisβ€”with 98.7% accuracy.

18.08.2025 17:29 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

In the age of AI, authenticity matters more than ever in academic research. LLM-generated responses, false demographics, and more pose a threat.

VP of Product Sara details tools and internal measures we’ve implemented to help maintain research integrity 🧡

#AcademicSky #ResearchIntegrity

18.08.2025 17:29 β€” πŸ‘ 7    πŸ” 1    πŸ’¬ 1    πŸ“Œ 1

Absolutely. We continuously adapt our systems to ensure high quality data. Our LLM detection tool uses advanced behavioral analysis (98.7% accuracy), our data protection system also catches LLM use at onboarding stage, and more. Please do report low-quality IDs in-app and get in touch with feedback.

06.08.2025 12:52 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image Post image

Interesting findings around how agents compete with humans for partnerships by @yaominj.bsky.social, @levinbrinkmann.bsky.social, Anne-Marie Nussberger, Ivan Soraperra, @jfbonnefon.bsky.social, @iyadrahwan.bsky.social, @mpib-berlin.bsky.social.

Taskers sourced via Prolific.

#AI #MLSky #AcademicSky

04.08.2025 20:17 β€” πŸ‘ 4    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Post image

"Through three experiments (N = 975), we found that bots, though more prosocial than humans and linguistically distinguishable, were not selected preferentially when their identity was hidden. Instead, humans misattributed bots’ behavior to humans and vice versa."

arxiv.org/abs/2507.13524

04.08.2025 20:17 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
OSF

New preprint -- we (me, @klempert.bsky.social, Dave Wolk, and Joe Kable) examined whether characteristic aging-related changes in cognition and personality show up on six online platforms (3 crowdsourcing sites -- MTurk, CloudResearch Toolkit, & Prolific, and 3 panels). osf.io/preprints/ps... 1/

28.07.2025 19:27 β€” πŸ‘ 5    πŸ” 2    πŸ’¬ 1    πŸ“Œ 1

Sharing a piece I wrote for The AI Journal on who should govern AI: aijourn.com/a-fine-balan...

Our recent @joinprolific.bsky.social polling shows 69.7% of people think AI investment will primarily benefit corporations not the public. They're probably right. [1/3]

#AI #AIGovernance #TechPolicy

28.07.2025 10:14 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Great analysis and always a pleasure to support.

24.07.2025 20:05 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Original thread: bsky.app/profile/kobi...

Congratulations Kobi and team. Pleased to have supported this project.

24.07.2025 19:56 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
A still image of the PhD candidate Kobi Hackenburg's AI persuasion research paper titled "The Levers of Political Persuasion with Conversational AI."

A still image of the PhD candidate Kobi Hackenburg's AI persuasion research paper titled "The Levers of Political Persuasion with Conversational AI."

The largest investigation of AI persuasion with 76,977 participants across 3 large-scale experiments. Excellent work by @ox.ac.uk PhD candidate @kobihackenburg.bsky.social.

19 LLMs. 707 political issues. 466,769 fact-checkable claims evaluated.

arxiv.org/abs/2507.13919

#AcademicSky #MLSky #PhDSky

24.07.2025 19:56 β€” πŸ‘ 9    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0

Sorry to hear this Jim, we're keen to investigate this jump. You can report it quickly within the platform ("Action" tab), but alternatively could you submit a data quality report ASAP if not already so we can investigate the IDs more urgently? Thank you! forms.prolific.com/to/Zv6w7ZWt

18.07.2025 14:13 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

4) Report data quality concerns in app

If you do experience data quality issues, we want to investigate quickly. To easily report it, click the "Action" dropdown on the submissions page to flag any concerns.

Full list of updates: www.prolific.com/resources/wh...

10.07.2025 14:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

3) Liveness checks for all new participants

These checks use biometric data to ensure participants are real, live people.

10.07.2025 14:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

2) Increased guardrails on new participant acquisition

For example, as a researcher you can bring your own participants by request only.

10.07.2025 14:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

1) More rigorous security when changing email addresses

This makes it harder for accounts to be compromised and helps prevent fraud.

10.07.2025 14:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
An image depicting new product features on Prolific, a platform which provides human data for research.

An image depicting new product features on Prolific, a platform which provides human data for research.

4 new security enhancements for better data quality in your academic research 🧡

With data quality an absolute priority yet challenge for many researchers, we’ve made several important updates to strengthen security and maintain data quality on the Prolific platform.

#AcademicSky #PhDSky #Research

10.07.2025 14:00 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸŽ₯ Webinar: Why AI leaderboards miss the mark 🚫

@cohereforai.bsky.social’s Oliver Nan, University of Washington's @huashen.bsky.social, and Prolific’s Nora Petrova will be discussing where leaderboards are falling short, how to improve them, and more.

πŸ”— www.prolific.com/resources/wh... | #MLSky #AI

07.07.2025 14:01 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1

@joinprolific is following 20 prominent accounts