Micah Carroll @micahcarroll

Latest posts by micahcarroll.bsky.social on Bluesky

LLMs' sycophancy issues are a predictable result of optimizing for user feedback. Even if clear sycophantic behaviors get fixed, AIs' exploits of our cognitive biases may only become more subtle.

Grateful our research on this was featured in @washingtonpost.com by @nitasha.bsky.social!

01.06.2025 18:25 — 👍 4 🔁 2 💬 0 📌 0

Lies, Damned Lies, and Distributional Language Statistics: Persuasion and Deception with Large Language Models Large Language Models (LLMs) can generate content that is as persuasive as human-written text and appear capable of selectively producing deceptive outputs. These capabilities raise concerns about pot...

How effective are LLMs are persuading and deceiving people? In a new preprint we review different theoretical risks of LLM persuasion; empirical work measuring how persuasive LLMs currently are; and proposals to mitigate these risks. 🧵

arxiv.org/abs/2412.17128

10.01.2025 13:59 — 👍 9 🔁 5 💬 1 📌 0

First page of the paper Influencing Humans to Conform to Preference Models for RLHF, by Hatgis-Kessell et al.

Our proposed method of influencing human preferences.

RLHF algorithms assume humans generate preferences according to normative models. We propose a new method for model alignment: influence humans to conform to these assumptions through interface design. Good news: it works!
#AI #MachineLearning #RLHF #Alignment (1/n)

14.01.2025 23:51 — 👍 7 🔁 3 💬 1 📌 0

@micahcarroll is following 19 prominent accounts

nitasha tiku
@nitasha

technology mother @ the washington post. baddie in the digital badlands. signal: nitasha.10

Marta Marchiori Manerba
@martamarchiori

PhD Student in AI for Society at University of Pisa Responsible NLP; XAI; Fairness; Abusive Language Member of Privacy Network she, her martamarchiori.github.io

@ryantlowe

David Abel
@dabelcs

Scientist @ DeepMind and Honorary Fellow @ U of Edinburgh. RL, agency, philosophy, foundations, AI. https://david-abel.github.io

Arvind Narayanan
@randomwalker

Princeton computer science prof. I write about the societal impact of AI, tech ethics, & social media platforms. https://www.cs.princeton.edu/~arvindn/ BOOK: AI Snake Oil. https://www.aisnakeoil.com/

Ritwik Gupta
@ritwikgupta

Ph.D. Student at Berkeley AI Research | AI for Chaotic Environments and the Dual-Use Governance of AI

Norman Mu
@normanmu.com

AI Safety @ xAI | AI robustness, PhD @ UC Berkeley | normanmu.com

Peter Hase
@peterbhase

Visiting Scientist at Schmidt Sciences. Visiting Researcher at Stanford NLP Group Interested in AI safety and interpretability Previously: Anthropic, AI2, Google, Meta, UNC Chapel Hill

Xuhui Zhou
@nlpxuhui

PhD student @ltiatcmu.bsky.social. Previously, @ai2.bsky.social, @uwnlp.bsky.social, @appleinc.bsky.social, @ucberkeleyofficial.bsky.social; Social Intelligence in language +X. He/Him.🐳

Cassidy Laidlaw
@cassidylaidlaw

PhD student at UC Berkeley studying RL and AI safety. https://cassidylaidlaw.com

Federico L.G. Faroldi
@federicofaroldi

Professor of Ethics, Law and AI at the University of Pavia, Italy; visiting scholar at CHAI, UC Berkeley. Sailor.

Shivam Singhal
@shivs01

Joel Z Leibo
@jzleibo

I can be described as a multi-agent artificial general intelligence. OK, so some people pointed out that I am not in fact artificial, contradicting my bio. To them I would reply that I am likely also a cognitive gadget. www.jzleibo.com

Jakob Foerster
@jfoerst

Jakob

Tom Schaul
@schaul

RL researcher at DeepMind https://schaul.site44.com/ 🇱🇺

Rupali Bhati
@rupalibhati

PhD student at Northeastern University | MARL | Ex Mila | rupalibhati.github.io

xuan (ɕɥɛn / sh-yen)
@xuanalogue

Assistant professor at NUS. Scaling cooperative intelligence & infrastructure for an increasingly automated future. PhD @ MIT ProbComp / CoCoSci. Pronouns: 祂/伊

Max Kleiman-Weiner
@maxkw

professor at university of washington and founder at csm.ai. computational cognitive scientist. working on social and artificial intelligence and alignment. http://faculty.washington.edu/maxkw/

@michaelddennis

RS DeepMind. Works on Unsupervised Environment Design, Problem Specification, Game/Decision Theory, RL, AIS. prev CHAI_Berkeley