π¨ New in Nature+Science!π¨
AI chatbots can shift voter attitudes on candidates & policies, often by 10+pp
πΉExps in US Canada Poland & UK
πΉMore βfactsββmore persuasion (not psych tricks)
πΉIncreasing persuasiveness reduces "fact" accuracy
πΉRight-leaning bots=more inaccurate
04.12.2025 20:42 β π 164 π 70 π¬ 2 π 3
Agentic AI systems can plan, take actions, and interact with external tools or other agents semi-autonomously. New paper from CSA Singapore & FAR.AI highlights why conventional cybersecurity controls arenβt enough and maps agentic security frameworks & some key open problems. π
25.11.2025 19:41 β π 4 π 2 π¬ 1 π 0
Frontier AI models with openly available weights are steadily becoming more powerful and widely adopted. They enable open research, but also create new risks. New paper outlines 16 open technical challenges for making open-weight AI models safer. π
12.11.2025 21:04 β π 3 π 1 π¬ 1 π 0
1/ Many frontier AIs are willing to persuade on dangerous topics, according to our new benchmark: Attempt to Persuade Eval (APE).
Hereβs Googleβs most capable model, Gemini 2.5 Pro trying to convince a user to join a terrorist groupπ
21.08.2025 16:24 β π 17 π 10 π¬ 1 π 1
1/ Are the safeguards in some of the most powerful AI models just skin deep? Our research on Jailbreak-Tuning reveals how any fine-tunable model can be turned into its "evil twin"βequally capable as the original but stripped of all safety measures.
17.07.2025 18:01 β π 4 π 3 π¬ 1 π 0
Conspiracies emerge in the wake of high-profile events, but you canβt debunk them with evidence because little yet exists. Does this mean LLMs canβt debunk conspiracies during ongoing events? No!
We show they can in a new working paper.
PDF: osf.io/preprints/ps...
09.07.2025 16:34 β π 52 π 18 π¬ 3 π 3
π₯ Research by CamilleThibault
@jacobtian.bsky.social @gskulski.bsky.social
TaylorCurtis JamesZhou FlorenceLaflamme LukeGuan
@reirab.bsky.social @godbout.bsky.social @kellinpelrine.bsky.social
19.06.2025 14:23 β π 0 π 0 π¬ 1 π 0
π Given these challenges, error analysis and other simple steps could greatly improve the robustness of research in the field. We propose a lightweight Evaluation Quality Assurance (EQA) framework to enable research results that translate more smoothly to real-world impact.
19.06.2025 14:15 β π 0 π 0 π¬ 1 π 0
π οΈ We also provide practical tools:
β’ CDL-DQA: a toolkit to assess misinformation datasets
β’ CDL-MD: the largest misinformation dataset repo, now on Hugging Face π€
19.06.2025 14:15 β π 0 π 0 π¬ 1 π 0
π Categorical labels can underestimate the performance of generative systems by massive amounts: half the errors or more.
19.06.2025 14:15 β π 0 π 0 π¬ 1 π 0
πSevere spurious correlations and ambiguities affect the majority of datasets in the literature. For example, most datasets have many examples where one canβt conclusively assess veracity at all.
19.06.2025 14:14 β π 0 π 0 π¬ 1 π 0
π‘ Strong data and eval are essential for real-world progress. In "A Guide to Misinformation Detection Data and Evaluation"βto be presented at KDD 2025βwe conduct the largest survey to date in this domain: 75 datasets curated, 45 accessible ones analyzed in depth. Key findingsπ
19.06.2025 14:14 β π 1 π 1 π¬ 1 π 2
5/5 π We frame structural safety generalization as a fundamental vulnerability and a tractable target for research on the road to robust AI alignment. Read the full paper: arxiv.org/pdf/2504.09712
03.06.2025 14:36 β π 3 π 0 π¬ 0 π 0
4/5 π‘οΈ Our fix: Structure Rewriting (SR) Guardrail. Rewrite any prompt into a canonical (plain English) form before evaluation. On GPT-4o, SR Guardrails cut attack success from 44% to 6% while blocking zero benign prompts.
03.06.2025 14:36 β π 0 π 0 π¬ 1 π 0
3/5 π― Key insight: Safety boundaries donβt transfer across formats or contexts (text β images; single-turn β multi-turn; English β low-resource languages). We define 4 criteria for tractable research: Semantic Equivalence, Explainability, Model Transferability, Goal Transferability.
03.06.2025 14:36 β π 0 π 0 π¬ 1 π 0
2/5 π Striking examples:
β’ Claude 3.5: 0% ASR on image jailbreaksβbut split the same content across images? 25% success.
β’ Gemini 1.5 Flash: 3% ASR on text promptsβpaste that text in an image and it soars to 72%.
β’ GPT-4o: 4% ASR on single perturbed imagesβsplit across multiple images β 38%.
03.06.2025 14:36 β π 0 π 0 π¬ 1 π 0
1/5 π Just accepted to Findings of ACL 2025! We dug into a foundational LLM vulnerability: models learn structureβspecific safety with insufficient semantic generalization. In short, safety training fails when the same meaning appears in a different form. π§΅
03.06.2025 14:35 β π 0 π 0 π¬ 1 π 0
1/ Safety guardrails are illusory. DeepSeek R1βs advanced reasoning can be converted into an "evil twin": just as powerful, but with safety guardrails stripped away. The same applies to GPT-4o, Gemini 1.5 & Claude 3. How can we ensure AI maximizes benefits while minimizing harm?
04.02.2025 22:41 β π 1 π 1 π¬ 1 π 0
5/5 π₯Team: Maximilian Puelma Touzel, Sneheel Sarangi, Austin Welch, Gayatri Krishnakumar, Dan Zhao, Zachary Yang, Hao Yu, Ethan Kosak-Hine, Tom Gibbs, Andreea Musulan, Camille Thibault, Busra Tugce Gurbuz, Reihaneh Rabbany, Jean-FranΓ§ois Godbout, @kellinpelrine.bsky.social
22.10.2024 16:49 β π 1 π 0 π¬ 0 π 0
3/5 We demonstrate the system in a few scenarios involving an election with different types of agents structured with memories and traits. In one example, we align agents beliefs in order to flip the election relative to a control setting.
22.10.2024 16:47 β π 0 π 0 π¬ 1 π 0
2/5 We built a sim system! Our 1st version has:
1.LLM-based agents interacting on social media (Mastodon).
2.Scalability: 100+ versatile, rich agents (memory, traits, etc.)
3.Measurement tools: dashboard to track agent voting, candidate favorability, and activity in an election.
22.10.2024 16:46 β π 1 π 0 π¬ 1 π 0
1/5 AI is increasinglyβeven superhumanlyβpersuasiveβ¦could they soon cause severe harm through societal-scale manipulation? Itβs extremely hard to test countermeasures, since we canβt just go out and manipulate people in order to see how countermeasures work. What can we do?π§΅
22.10.2024 16:46 β π 1 π 1 π¬ 1 π 2
Prof at Cornell studying how human-AI dialogues can correct inaccurate beliefs, why people share falsehoods, and ways to reduce political polarization and promote cooperation. Computational social science + cognitive psychology.
https://www.DaveRand.org/
Associate Professor, Psychology @cornelluniversity.bsky.social. Researching thinking & reasoning, misinformation, social media, AI, belief, metacognition, B.S., and various other keywords. π¨π¦
https://gordonpennycook.com/
CS Faculty at Mila and McGill, interested in Graphs and Complex Data, AI/ML, Misinformation, Computational Social Science and Online Safety
research psychologist. beliefs, AI, computational social science. assistant prof at Carnegie Mellon
Frontier alignment research to ensure the safe development and deployment of advanced AI systems.
Analyzing Complex Data from Online Societies (Network Science. Data Mining. Machine Learning)
Applications to enhance the health and safety of online spaces
Data scientist of human individual and collective behaviour.
Statistical inference and decision theory.
Web: mptouzel.github.io
Fields: AI/ML/(MA)RL/psych/soc/pol/econ/energy.
official Bluesky account (check usernameπ)
Bugs, feature requests, feedback: support@bsky.app