Gillian Hadfield's Avatar

Gillian Hadfield

@ghadfield.bsky.social

Economist and legal scholar turned AI researcher focused on AI alignment and governance. Prof of government and policy and computer science at Johns Hopkins where I run the Normativity Lab. Recruiting CS postdocs and PhD students. gillianhadfield.org

1,174 Followers  |  1,147 Following  |  48 Posts  |  Joined: 19.11.2024  |  2.1276

Latest posts by ghadfield.bsky.social on Bluesky

(2/2) Insurers profit by preventing losses, not paying claimsโ€”so they'll invest in figuring out what actually makes AI safer. Working with Fathom, we're proposing legislation where government sets acceptable risk levels and private evaluators verify companies meet them.

20.11.2025 00:00 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Insurance companies are trying to avoid big payouts by making AI safer As government regulation lags, some insurance companies see a business case for pushing AI companies to minimize risk and adopt stronger guardrails.

(1/2) 99% of surveyed businesses have lost money from AI failuresโ€”two-thirds lost over $1M, according to Ernst & Young. Insurance companies are stepping in: meet verifiable safety standards, get coverage. Don't meet them, you're on your own. I spoke with NBC News: buff.ly/wkmPooC

20.11.2025 00:00 โ€” ๐Ÿ‘ 4    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Gillian Hadfield - Alignment is social: lessons from human alignment for AI
Current approaches conceptualize the alignment challenge as one of eliciting individual human preferences and training models to choose outputs that that satisfy those preferences. To the extentโ€ฆ Gillian Hadfield - Alignment is social: lessons from human alignment for AI

The recording of my keynote from #COLM2025 is now available!

06.11.2025 21:35 โ€” ๐Ÿ‘ 10    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Gillian Hadfield and Thomas Friedman stand together smiling in an office, each holding their respective books - Hadfield holds 'Rules for a Flat World' and Friedman holds 'The World is Flat.'

Gillian Hadfield and Thomas Friedman stand together smiling in an office, each holding their respective books - Hadfield holds 'Rules for a Flat World' and Friedman holds 'The World is Flat.'

I finally got a chance to meet @thomaslfriedman.bsky.social, whose book The World Is Flat inspired my own Rules for a Flat World. I had a great conversation with him and Andrew Freedman about the challenge we find the world facing: how do we build rules for AI that work in a complex world?

31.10.2025 22:39 โ€” ๐Ÿ‘ 7    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image Post image Post image Post image

Human cooperation evolved through complex norms and institutions. Now we're introducing powerful new AI actors into our economic systems. At a recent workshop hosted at ASU we explored what evolution teaches us about getting the rules right.

27.10.2025 22:06 โ€” ๐Ÿ‘ 7    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Grateful to keynote at #COLM2025. Here's what we're missing about AI alignment: Humans donโ€™t cooperate just by aggregating preferences, we build social processes and institutions to generate norms that make it safe to trade with strangers. AI needs to play by these same systems, not replace them.

15.10.2025 23:00 โ€” ๐Ÿ‘ 15    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Talk Isn't Always Cheap: Understanding Failure Modes in Multi-Agent Debate While multi-agent debate has been proposed as a promising strategy for improving AI reasoning ability, we find that debate can sometimes be harmful rather than helpful. The prior work has exclusivelyโ€ฆ

Future work should focus on developing smarter debate protocols that weight expertise, discourage blind agreement, and reward critical verification of reasoning. We need to move beyond the naive assumption that 'more talk = better outcomes. (10/10) arxiv.org/abs/2509.05396

23.09.2025 17:06 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

We suspect RLHF training creates sycophantic behavior, models trained to be agreeable may prioritize consensus over critical evaluation. This suggests current alignment techniques might undermine collaborative reasoning.

23.09.2025 17:06 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Stronger agents were more likely to change from correct to incorrect answers in response to weaker agents' reasoning than vice versa. Models showed a tendency toward favoring agreement over critical evaluation, creating an echo chamber instead of an actual debate.

23.09.2025 17:06 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

However, we still observed performance gains on math problems under most conditions, suggesting debate effectiveness depends heavily on the type of reasoning required.

23.09.2025 17:06 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

The impact varies significantly by task type. On CommonSenseQAโ€”a dataset we newly examinedโ€”debate reduced performance across ALL experimental conditions.

23.09.2025 17:06 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Even when stronger models outweighed weaker ones, group accuracy decreased over successive debate rounds. Introducing weaker models into debates produced results that were worse than the results when agents hadnโ€™t engaged in discussion at all.

23.09.2025 17:06 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

We tested debate effectiveness across three tasks (CommonSenseQA, MMLU, GSM8K) using three different models (GPT-4o-mini, LLaMA-3.1-8B, Mistral-7B) in various configurations.

23.09.2025 17:06 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

We found that multi-agent debate among large language models can sometimes harm performance rather than improve it, contradicting the assumption that more discussion can lead to better outcomes.

23.09.2025 17:06 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

My lab members Harsh Satija and Andrea Wynn and I have a new preprint examining AI multi-agent debate among diverse models, based on our ICML MAS 2025 workshop.

23.09.2025 17:06 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Talk Isn't Always Cheap: Understanding Failure Modes in Multi-Agent Debate While multi-agent debate has been proposed as a promising strategy for improving AI reasoning ability, we find that debate can sometimes be harmful rather than helpful. The prior work has exclusivelyโ€ฆ

Using debate among AI agents has been proposed as a promising strategy for improving AI reasoning capabilities. Our new research shows that this strategy can often have the opposite effect - and the implications for AI deployment are significant. (1/10) arxiv.org/abs/2509.05396

23.09.2025 17:06 โ€” ๐Ÿ‘ 7    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Preview
Jobs I have postdoc and staff openings for our lab at the Johns Hopkins University in either Baltimore, MD or Washington, DC.Postdoctoral FellowWe are hiring an interdisciplinary scholar with a track reโ€ฆ

These roles will shape the conversation on AI and provide the opportunity for rich, interdisciplinary collaboration with colleagues and researchers in the Department of Computer Science and the School of Government and Policy.
Please spread the word in your network! 5/5
gillianhadfield.org/jobs/

16.06.2025 18:18 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

We're recruiting for a Postdoctoral fellow with a track record in computational modeling that explores AI systems and autonomous AI agent dynamics, and experience with ML systems to investigate the foundations of human normativity, and how to build AI systems aligned with human values. 4/5

16.06.2025 18:17 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

We're hiring an AI Communications Associate to craft and execute a multi-channel strategy that turns leading computer science and public policy research into accessible content for a broad audience of stakeholders. 3/5

16.06.2025 18:16 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Jobs I have postdoc and staff openings for our lab at the Johns Hopkins University in either Baltimore, MD or Washington, DC.Postdoctoral FellowWe are hiring an interdisciplinary scholar with a track reโ€ฆ

We're hiring an AI Policy Researcher to conduct in-depth research into the technical and policy challenges in AI alignment, safety, and governance, and to produce high-quality research reports, white papers, and policy recommendations. 2/5

16.06.2025 18:15 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Jobs I have postdoc and staff openings for our lab at the Johns Hopkins University in either Baltimore, MD or Washington, DC.Postdoctoral FellowWe are hiring an interdisciplinary scholar with a track reโ€ฆ

My lab @johnshopkins is recruiting research and communications professionals, and AI postdocs to advance our work ensuring that AI is safe and aligned to human well-being worldwide. 1/5

16.06.2025 18:15 โ€” ๐Ÿ‘ 1    ๐Ÿ” 1    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 0
Preview
Can a market-based regulatory framework help govern AI? New report weighs in โ€” Schwartz Reisman Institute In April 2024, the Schwartz Reisman Institute for Technology and Society (SRI) hosted a workshop that brought together 33 high-level experts to explore the viability of regulatory markets. Over the co...

Our report is now out, chock-a-block with new ideas including insurance partnerships, government oversight of private regulators, building a robust ecosystem, and fostering trust and investment. Check it out here: srinstitute.utoronto.ca/news/co-desi...

12.06.2025 00:34 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

destabilize or harm our communities, economies, or politics.Together with @djjrjr.bsky.social and @torontosri.bsky.social we held a design workshop last year with a stunning group of experts from AI labs, regulatory technology startups, enterprise clients, civil society, academia,and government.2/3

12.06.2025 00:33 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Six years ago @jackclarksf.bsky.social and I proposed regulatory markets as a new model for AI governance that would attract more investment---money and brainsโ€”in a democratically legitimate way, fostering AI innovation while ensuring these powerful technologies donโ€™t 1/2

12.06.2025 00:32 โ€” ๐Ÿ‘ 4    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Interview with Gillian Hadfield: Normative infrastructure for AI alignment - ฮ‘ฮ™hub

In this insightful interview, AIhub ambassador Kumar Kshitij Patel met @ghadfield.bsky.socialโ€ฌ, keynote speaker at โ€ช@ijcai.org, to find out more about her interdisciplinary research, career trajectory, AI alignment, and her thoughts on AI systems in general.

aihub.org/2025/05/22/i...

23.05.2025 14:47 โ€” ๐Ÿ‘ 5    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
AIhub monthly digest: May 2025 โ€“ materials design, object state classification, and real-time monitoring for healthcare data - ฮ‘ฮ™hub

Our latest monthly digest features:
-Ananya Joshi on healthcare data monitoring
-AI alignment with @ghadfield.bsky.socialโ€ฌ
-Onur Boyar on drug and material design
-Object state classification with Filippos Gouidis
aihub.org/2025/05/30/a...

04.06.2025 15:13 โ€” ๐Ÿ‘ 3    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Voters Were Right About the Economy. The Data Was Wrong. Hereโ€™s why unemployment is higher, wages are lower and growth less robust than government statistics suggest.

Everyone, including those who think we're building powerful AI to improve lives for everyone, should take seriously how poorly our current economic indicators (unemployment, earnings, inflation) capture the well-being of low- and moderate-income folks. www.politico.com/news/magazin...

15.02.2025 15:58 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
Preview
China, US should fight rogue AI risks together, despite tensions: ex-diplomat Open-source AI models like DeepSeek allow collaborators to find security vulnerabilities more easily, Fu Ying tells Parisโ€™ AI Action Summit.

I was at this meeting Mon, and the quality & seriousness of discussion made it a highlight. But Fu Ying is right that forging the cooperation needed, even limited to the extreme risks that threaten everyone, is becoming ever harder. We must keep trying.
www.scmp.com/news/china/d...

14.02.2025 12:15 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I think that would only require โ€œreadโ€ access

05.02.2025 02:18 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Do we think Musk is using treasury payments data to train, fine tune or do inference on AI models? @caseynewton.bsky.social

04.02.2025 21:20 โ€” ๐Ÿ‘ 8    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

@ghadfield is following 20 prominent accounts