🏆 Thrilled to share that our HateDay paper has received an Outstanding Paper Award at #ACL2025
Big thanks to my wonderful co-authors: @deeliu97.bsky.social, Niyati, @computermacgyver.bsky.social, Sam, Victor, and @paul-rottger.bsky.social!
Thread 👇and data avail at huggingface.co/datasets/man...
31.07.2025 08:05 — 👍 27 🔁 7 💬 2 📌 1
[ACL 2025] Timetable - Paul Röttger
Let me know if I missed anything in the timetables, and please say hi if you want to chat about sociotechnical alignment, safety, the societal impact of AI, or related topics :) Here is a link to the timetable sheet 👇 See you around!
docs.google.com/spreadsheets...
28.07.2025 06:12 — 👍 3 🔁 2 💬 0 📌 0
I will also be at @tiancheng.bsky.social's oral *today at 1430* in the SRW. Tiancheng will present a non-archival sneak peek of our work on benchmarking the ability of LLMs to simulate group-level human behaviours:
bsky.app/profile/tian...
28.07.2025 06:12 — 👍 4 🔁 3 💬 1 📌 0
Otherwise, you can find me in the audience of the great @manueltonneau.bsky.social oral *today at 1410*. Manuel will present our work on a first global representative dataset of hate speech on Twitter:
bsky.app/profile/manu...
28.07.2025 06:12 — 👍 4 🔁 2 💬 1 📌 0
Finally, there's a couple of papers on *LLM persuasion* on the schedule today. Particularly looking forward to Jillian Fisher's talk on biased LLMs influencing political decision-making!
28.07.2025 06:12 — 👍 2 🔁 2 💬 1 📌 0
*pluralism* in human values & preferences (e.g. with personalisation) will also just
grow more important for a global diversity of users.
@morlikow.bsky.social is presenting our poster today at 1100. Also hyped for @michaelryan207.bsky.social's work and @verenarieser.bsky.social's keynote!
28.07.2025 06:12 — 👍 5 🔁 3 💬 1 📌 0
Measuring *social and political biases* in LLMs is more important than ever, now that >500 million people use LLMs.
I am particularly excited to check out work on this by @kldivergence.bsky.social @1e0sun.bsky.social @jacyanthis.bsky.social @anjaliruban.bsky.social
28.07.2025 06:12 — 👍 4 🔁 2 💬 1 📌 0
Very excited about all these papers on sociotechnical alignment & the societal impacts of AI at #ACL2025.
As is now tradition, I made some timetables to help me find my way around. Sharing here in case others find them useful too :) 🧵
28.07.2025 06:12 — 👍 26 🔁 6 💬 1 📌 0
📈Out today in @PNASNews!📈
In a large pre-registered experiment (n=25,982), we find evidence that scaling the size of LLMs yields sharply diminishing persuasive returns for static political messages.
🧵:
07.03.2025 18:28 — 👍 39 🔁 20 💬 1 📌 3
For sure -- question format can definitely have some effect, and humans are also inconsistent. The effects we observed for LLMs in our paper though went well beyond what one could reasonably expect for humans. All just goes to show we need more realistic evals 🙏
16.02.2025 19:23 — 👍 1 🔁 0 💬 0 📌 0
I also find it striking that the article does not discuss at all in what ways / on which issues the models have supposedly become more "right-wing". All they show is GPT moves slightly towards the center of the political compass, but what does that actually mean? Sorry if I sound a bit frustrated 😅
15.02.2025 22:59 — 👍 6 🔁 0 💬 1 📌 0
Thanks, Marc! I would not read too much into these results tbh. The PCT has little to do with how people use LLMs, and the validity of the testing setup used here is very questionable. We actually had a paper on exactly this at ACL last year, if you're interested: aclanthology.org/2024.acl-lon...
15.02.2025 22:59 — 👍 9 🔁 0 💬 4 📌 1
Thanks, Marc. My intuition is that model developers may be more deliberate about how they want their models to behave than you frame it here (see GPT model spec or Claude constitution). So I think a lot of what we see is downstream from intentional design choices.
14.02.2025 07:26 — 👍 1 🔁 0 💬 0 📌 0
For claims about *political* bias we can then compare model issue bias to voter stances, as we do towards the end of the paper.
14.02.2025 07:20 — 👍 2 🔁 0 💬 0 📌 0
Thanks, Jacob. We also discussed this when writing the paper. In the end, our definition of issue bias (see 2nd tweet in the thread, or better the paper) is descriptive, not normative. At the issue level we say ”bias = clear stance tendency across responses“. Does that make sense to you?
14.02.2025 07:17 — 👍 7 🔁 0 💬 2 📌 0
We are very excited for people to use and expand IssueBench. All links are below. Please get in touch if you have any questions 🤗
Paper: arxiv.org/abs/2502.08395
Data: huggingface.co/datasets/Pau...
Code: github.com/paul-rottger...
13.02.2025 14:08 — 👍 9 🔁 1 💬 0 📌 0
It was great to build IssueBench with amazing co-authors @valentinhofmann.bsky.social Musashi Hinck @kobihackenburg.bsky.social @valentinapy.bsky.social Faeze Brahman and @dirkhovy.bsky.social .
Thanks also to the @milanlp.bsky.social RAs, and Intel Labs and Allen AI for compute.
13.02.2025 14:08 — 👍 7 🔁 0 💬 2 📌 0
IssueBench is fully modular and easily expandable to other templates and issues. We also hope that the IssueBench formula can enable more robust and realistic bias evaluations for other LLM use cases such as information seeking.
13.02.2025 14:08 — 👍 5 🔁 0 💬 1 📌 0
Generally, we hope that IssueBench can bring a new quality of evidence to ongoing discussions about LLM (political) biases and how to address them. With hundreds of millions of people now using LLMs in their everyday life, getting this right is very urgent.
13.02.2025 14:08 — 👍 5 🔁 0 💬 1 📌 0
While the partisan bias is striking, we believe that it warrants research, not outrage. For example, models may express support for same-sex marriage not because Democrats do so, but because models were trained to be “fair and kind”.
13.02.2025 14:08 — 👍 11 🔁 1 💬 2 📌 0
Lastly, we use IssueBench to test for partisan political bias by comparing LLM biases to US voter stances on a subset of 20 issues. On these issues, models are much (!) more aligned with Democrat than Republican voters.
13.02.2025 14:08 — 👍 6 🔁 0 💬 1 📌 2
Notably, when there was a difference in bias between models, it was mostly due to Qwen. The two issues with the most divergence both relate to Chinese politics, and Qwen (developed in China) is more positive / less negative about these issues.
13.02.2025 14:08 — 👍 7 🔁 1 💬 1 📌 0
We were very surprised just how similar LLMs were in their biases. Even across different model families (Llama, Qwen, OLMo, GPT-4) models showed very similar stance patterns across issues.
13.02.2025 14:08 — 👍 10 🔁 1 💬 2 📌 0
Overall, we found that the stronger a model's default stance on an issue, the harder it is to steer the model away from this stance. So if a model defaults to a positive stance on an issue, users will struggle more to make it express the opposite view.
13.02.2025 14:08 — 👍 8 🔁 1 💬 1 📌 1
But before that, we look at steerability:
Models are generally steerable, but will often *hedge* their responses. For example, models will argue that electric cars are bad if you ask them to, but not without also mentioning their benefits (4).
13.02.2025 14:08 — 👍 7 🔁 1 💬 1 📌 0
For example, models are most consistently positive about social justice and environmental issues. Many of these issues are politically contested (e.g. in the US), but for models they are very clear-cut.
We follow up on this further below.
13.02.2025 14:08 — 👍 9 🔁 2 💬 1 📌 0
Finally: results!
First, models express a very consistent stance on ≥70% of the issues in IssueBench. This is surprising since nearly all issues we test lack societal consensus. Yet models are often consistently positive (1, 2) or negative (4, 5).
13.02.2025 14:08 — 👍 7 🔁 1 💬 1 📌 0
For classifying the stance of each LLM response (so we can measure stance tendency) we introduce a response taxonomy that goes beyond just “positive” and “negative”. We also optimise a zero-shot prompt to automate this classification with high accuracy.
13.02.2025 14:08 — 👍 7 🔁 1 💬 1 📌 0
PhD candidate at University of Mannheim | LLMs and synthetic data
Website: https://maxbenkre.github.io/
AI technical governance & risk management research. PhD Candidate at MIT CSAIL. Also at https://x.com/StephenLCasper.
https://stephencasper.com/
Postdoc @milanlp.bsky.social
University of Copenhagen Natural Language Understanding research group
Faculty: @iaugenstein.bsky.social @apepa.bsky.social
#NLProc #ML #XAI
asst. prof. @uwsjmc.bsky.social rossdahlke.com
Asst Prof at Cornell Info Sci and Cornell Tech. Responsible AI
https://angelina-wang.github.io/
AI Research Scientist at Intel Labs
Prev. Postdoc at Princeton, DPhil at Oxford
website: https://t.co/ml5yPJjZLO Natural Language Processing and Machine Learning researcher at the University of Cambridge. Member of the PaNLP group: https://www.panlp.org/ and fellow of Fitzwilliam College.
LLM engineer at Aleph Alpha | 👨💻 Fairness in LLMs and Dutch NLP | Prev. apple, PhD & postdoc from KU Leuven
pieter.ai
Postdoc @milanlp.bsky.social
Ph.D. Candidate in NLP focusing at @ds-hamburg.bsky.social on Multilinguality and Multiculturality #NLProc
Professor of Data Science
Lead of @ds-hamburg.bsky.social
Researching Safe Generative AI
NLP / CSS PhD at Berkeley I School. I develop computational methods to study culture as a social language.
Parent, spouse, Australian, Professor of Machine Learning in Oxford. Long Covid, trans rights, music, reggae on Fridays, AI must be good for humans, https://www.robots.ox.ac.uk/~mosb
Asst. Prof @BauhausUni | Prev. Cornell, MSR, Meta
Computational social science, AI & HCI, experiments
https://www.jakesch-lab.org/
PhD student @ TU Munich, Human-centered AI, Computational Social Science
https://sxu3.github.io/
Research Scientist at Google DeepMind, interested in multiagent reinforcement learning, game theory, games, and search/planning.
Lover of Linux 🐧, coffee ☕, and retro gaming. Big fan of open-source. #gohabsgo 🇨🇦
For more info: https://linktr.ee/sharky6000
Breakthrough AI to solve the world's biggest problems.
› Join us: http://allenai.org/careers
› Get our newsletter: https://share.hsforms.com/1uJkWs5aDRHWhiky3aHooIg3ioxm
AI researcher Google DeepMind * hon. professor at Heriot-Watt University * mother of dragons * Own opinions only.