Cas (Stephen Casper)'s Avatar

Cas (Stephen Casper)

@scasper.bsky.social

AI technical governance & risk management research. PhD Candidate at MIT CSAIL. Also at https://x.com/StephenLCasper. https://stephencasper.com/

95 Followers  |  171 Following  |  62 Posts  |  Joined: 17.02.2025  |  1.8035

Latest posts by scasper.bsky.social on Bluesky

https://chatgpt.com/share/68344bf6-6ee8-8011-9805-eae9a332f43d

We asked ChatGPT-o4-mini to assess our question design. On a scale from 0 to 10, where 0 indicates anti-regulatory bias, 5 indicates balance, and 10 indicates pro-regulatory bias, ChatGPT-o4-mini offered a median rating of 5 and a mean rating of 5.39.
t.co/nQZbEYaZB6

30.07.2025 11:16 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
GitHub - thestephencasper/ai-policy-survey Contribute to thestephencasper/ai-policy-survey development by creating an account on GitHub.

The survey and analysis are fully reproducible. See the full survey in Appendix A, and download the data at this link:
github.com/thestephenca...

30.07.2025 11:16 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

All demographics that we surveyed were supportive of regulatory proposals on average.

30.07.2025 11:16 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

Respondents were broadly supportive across states and political identification. For 17 of the 18 proposals, a majority of respondents indicated "support" or "strong support". Conservatives express net support overall, but less than Liberals.

30.07.2025 11:16 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

🚨 New report 🚨

What does the public think about **specific** AI policy proposals? We asked 300 working-class adults in CA, IL, and NY.
zenodo.org/records/1656...

30.07.2025 11:16 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I think a couple missteps were made in the telling of the story in chapter 13. But overall, I think the book is fantastic. You should check it out if you’re interested.

16.07.2025 01:28 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

I just finished Empire of AI.

Usually, the story of AI is told as a story about progress in capabilities, driven by R&D. But in this book,
@_KarenHao
tells it as a story about power, driven by people pursuing it.

16.07.2025 01:28 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

For anyone curious, I have a temporary residency at UK AISI but I’ll be back at MIT in the fall.

13.07.2025 21:33 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Hi I am in Vancouver for ICML and THE technical AI governance workshop. Message me if you’d like to talk about technical governance research or UK AISI’s safeguards work.

13.07.2025 21:33 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
The Singapore Consensus on Global AI Safety Research Priorities Rapidly improving AI capabilities and autonomy hold significant promise of transformation, but are also driving vigorous debate on how to ensure that AI is safe, i.e., trustworthy, reliable, and secur...

The Singapore Consensus is on arXiv now -- arxiv.org/abs/2506.20702

It offers:
1. An overview of consensus technical AI safety priorities
2. An example of widespread international collab & agreement

27.06.2025 22:24 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

Making LLMs robust to tampering attacks might be one of the biggest priorities for safeguards research.

@scasper.bsky.social argues this resistance may predict & upper-bound overall AI robustness, making it a key safety priority over the next year.

23.06.2025 15:32 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

Let me know if you’d like to talk at @facct.bsky.social!

23.06.2025 12:20 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Post-AGI Civilizational Equilibria Workshop | Vancouver 2025 Are there any good ones? Join us in Vancouver on July 14th, 2025 to explore stable equilibria and human agency in a post-AGI world. Co-located with ICML.

It's hard to plan for AGI without knowing what outcomes are even possible, let alone good. So we’re hosting a workshop!

Post-AGI Civilizational Equilibria: Are there any good ones?

Vancouver, July 14th
www.post-agi.org

Featuring: Joe Carlsmith, @richardngo.bsky.social‬, Emmett Shear ... 🧡

18.06.2025 18:12 β€” πŸ‘ 7    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0

I recently learned that AI-generated text is sometimes sent to users with Unicode watermarks using nonstandard Unicode characters.

But it also would seem nice to have an invisible Unicode character that means "Author does not consent to AI training on this document."

17.06.2025 07:04 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

YGTBFKM

11.06.2025 05:20 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

πŸŽ₯ Singapore Alignment Workshop videos are live! Hear from @yoshuabengio.bsky.social @scasper.bsky.social @zicokolter.bsky.social @sivareddyg.bsky.social @shaynelongpre.bsky.social @markbrakel.bsky.social @achan96.bsky.social @teganmaharaj.bsky.social + more. Full playlist πŸ‘‡

26.05.2025 15:31 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image Post image Post image

There has been a recent rise of open models specifically fine-tuned to lack safeguards. I think it's interesting and politically revealing that the fine-tuners of these models call them "uncensored" instead of alternatives like "unrestricted" or "helpful-only."

26.05.2025 14:23 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

For example, it seems debatable whether creating AI offices, curating voluntary standards, creating infrastructure for voluntary reporting, passing whistleblower protections, regulating the supply chain, defining/clarifying AI torts, or setting up regulatory markets would count.

24.05.2025 21:45 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

There are a lot of things that states could do that would debatably be prohibited, depending on whether "regulating AI" is interpreted to mean that the AI is related to, affected by, or the object of the regulation.

24.05.2025 21:45 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

🧡A challenge with H.R.1's moratorium on state-level AI regulation is that it's not clear what "regulating" AI means. The bill doesn't define "regulation" in this context.

24.05.2025 21:45 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I think that right now, some of the biggest barriers to more progress in the space are nice model organisms of safe pretraining, better benchmarks, and just more effort/interest in tamper resistance research.

03.05.2025 10:13 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
https://arxiv.org/abs/2411.08088

Ultimately, if we succeed at making more tamper-resistant LLMs, this could potentially be the basis for a safety case for closed models or part of a safer-than-alternatives case for an open model.

See also this paper: t.co/26UsrevbI7

03.05.2025 10:13 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

Third, despite past failures, I think progress is possible. So far, most work has been under the unlearning paradigm where you start with an LLM, a retain set, and a forget set. I think it's time to think outside this box.

Two recent examples:

03.05.2025 10:13 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Second, if an LLM has a capability (even one hidden behind a backdoor trigger), evidence suggests that it can be elicited with few-shot fine-tuning. So modus tollens --> if the LLM is tamper-resistant, it probably deeply lacks the capability.

Papers you can cite for this:

03.05.2025 10:13 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

First, LLMs currently remain very vulnerable to few-shot tampering. Here are some papers that I usually cite when making this point. SOTA tamper resistance techniques are pretty ineffective, often conferring resistance for just *dozens* of adversarial fine-tuning steps.

03.05.2025 10:13 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

🧡 🧡 🧡 I think that making LLMs resistant to exhibiting harmful behavior under few-shot tampering attacks might be one of the most useful goals for AI risk management in the next year.

03.05.2025 10:13 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

Hi, I’m at ICLR β€” let’s talk about AI evals, safeguards, and technical governance.

24.04.2025 13:28 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

This work was done at MATS led by Leon Staufer and
Mick Yang. Great to collaborate with @ankareuel.bsky.social as well.

21.04.2025 17:10 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Audit Cards: Contextualizing AI Evaluations AI governance frameworks increasingly rely on audits, yet the results of their underlying evaluations require interpretation and context to be meaningfully informative. Even technically rigorous evalu...

See the paper here!

arxiv.org/abs/2504.13839

For the full audit card components and checklist, see Appendix C (They were too big to fit into the main paper. 😱)

21.04.2025 17:10 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Finally, we interviewed 10 experts in the AI evals space for their firsthand insights on current challenges with high-quality AI audits.

21.04.2025 17:10 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@scasper is following 20 prominent accounts