https://chatgpt.com/share/68344bf6-6ee8-8011-9805-eae9a332f43d
We asked ChatGPT-o4-mini to assess our question design. On a scale from 0 to 10, where 0 indicates anti-regulatory bias, 5 indicates balance, and 10 indicates pro-regulatory bias, ChatGPT-o4-mini offered a median rating of 5 and a mean rating of 5.39.
t.co/nQZbEYaZB6
30.07.2025 11:16 β π 0 π 0 π¬ 0 π 0
GitHub - thestephencasper/ai-policy-survey
Contribute to thestephencasper/ai-policy-survey development by creating an account on GitHub.
The survey and analysis are fully reproducible. See the full survey in Appendix A, and download the data at this link:
github.com/thestephenca...
30.07.2025 11:16 β π 0 π 0 π¬ 1 π 0
All demographics that we surveyed were supportive of regulatory proposals on average.
30.07.2025 11:16 β π 0 π 0 π¬ 1 π 0
Respondents were broadly supportive across states and political identification. For 17 of the 18 proposals, a majority of respondents indicated "support" or "strong support". Conservatives express net support overall, but less than Liberals.
30.07.2025 11:16 β π 0 π 0 π¬ 1 π 0
π¨ New report π¨
What does the public think about **specific** AI policy proposals? We asked 300 working-class adults in CA, IL, and NY.
zenodo.org/records/1656...
30.07.2025 11:16 β π 1 π 0 π¬ 1 π 0
I think a couple missteps were made in the telling of the story in chapter 13. But overall, I think the book is fantastic. You should check it out if youβre interested.
16.07.2025 01:28 β π 1 π 0 π¬ 0 π 0
I just finished Empire of AI.
Usually, the story of AI is told as a story about progress in capabilities, driven by R&D. But in this book,
@_KarenHao
tells it as a story about power, driven by people pursuing it.
16.07.2025 01:28 β π 1 π 0 π¬ 1 π 0
For anyone curious, I have a temporary residency at UK AISI but Iβll be back at MIT in the fall.
13.07.2025 21:33 β π 0 π 0 π¬ 0 π 0
Hi I am in Vancouver for ICML and THE technical AI governance workshop. Message me if youβd like to talk about technical governance research or UK AISIβs safeguards work.
13.07.2025 21:33 β π 1 π 0 π¬ 1 π 0
Making LLMs robust to tampering attacks might be one of the biggest priorities for safeguards research.
@scasper.bsky.social argues this resistance may predict & upper-bound overall AI robustness, making it a key safety priority over the next year.
23.06.2025 15:32 β π 2 π 1 π¬ 1 π 0
Let me know if youβd like to talk at @facct.bsky.social!
23.06.2025 12:20 β π 2 π 0 π¬ 0 π 0
Post-AGI Civilizational Equilibria Workshop | Vancouver 2025
Are there any good ones? Join us in Vancouver on July 14th, 2025 to explore stable equilibria and human agency in a post-AGI world. Co-located with ICML.
It's hard to plan for AGI without knowing what outcomes are even possible, let alone good. So weβre hosting a workshop!
Post-AGI Civilizational Equilibria: Are there any good ones?
Vancouver, July 14th
www.post-agi.org
Featuring: Joe Carlsmith, @richardngo.bsky.socialβ¬, Emmett Shear ... π§΅
18.06.2025 18:12 β π 7 π 3 π¬ 1 π 0
I recently learned that AI-generated text is sometimes sent to users with Unicode watermarks using nonstandard Unicode characters.
But it also would seem nice to have an invisible Unicode character that means "Author does not consent to AI training on this document."
17.06.2025 07:04 β π 2 π 0 π¬ 0 π 0
YGTBFKM
11.06.2025 05:20 β π 0 π 0 π¬ 0 π 0
π₯ Singapore Alignment Workshop videos are live! Hear from @yoshuabengio.bsky.social @scasper.bsky.social @zicokolter.bsky.social @sivareddyg.bsky.social @shaynelongpre.bsky.social @markbrakel.bsky.social @achan96.bsky.social @teganmaharaj.bsky.social + more. Full playlist π
26.05.2025 15:31 β π 4 π 1 π¬ 1 π 0
For example, it seems debatable whether creating AI offices, curating voluntary standards, creating infrastructure for voluntary reporting, passing whistleblower protections, regulating the supply chain, defining/clarifying AI torts, or setting up regulatory markets would count.
24.05.2025 21:45 β π 0 π 0 π¬ 0 π 0
There are a lot of things that states could do that would debatably be prohibited, depending on whether "regulating AI" is interpreted to mean that the AI is related to, affected by, or the object of the regulation.
24.05.2025 21:45 β π 0 π 0 π¬ 1 π 0
π§΅A challenge with H.R.1's moratorium on state-level AI regulation is that it's not clear what "regulating" AI means. The bill doesn't define "regulation" in this context.
24.05.2025 21:45 β π 0 π 0 π¬ 1 π 0
I think that right now, some of the biggest barriers to more progress in the space are nice model organisms of safe pretraining, better benchmarks, and just more effort/interest in tamper resistance research.
03.05.2025 10:13 β π 0 π 0 π¬ 0 π 0
https://arxiv.org/abs/2411.08088
Ultimately, if we succeed at making more tamper-resistant LLMs, this could potentially be the basis for a safety case for closed models or part of a safer-than-alternatives case for an open model.
See also this paper: t.co/26UsrevbI7
03.05.2025 10:13 β π 0 π 0 π¬ 1 π 0
Third, despite past failures, I think progress is possible. So far, most work has been under the unlearning paradigm where you start with an LLM, a retain set, and a forget set. I think it's time to think outside this box.
Two recent examples:
03.05.2025 10:13 β π 0 π 0 π¬ 1 π 0
Second, if an LLM has a capability (even one hidden behind a backdoor trigger), evidence suggests that it can be elicited with few-shot fine-tuning. So modus tollens --> if the LLM is tamper-resistant, it probably deeply lacks the capability.
Papers you can cite for this:
03.05.2025 10:13 β π 1 π 0 π¬ 1 π 0
First, LLMs currently remain very vulnerable to few-shot tampering. Here are some papers that I usually cite when making this point. SOTA tamper resistance techniques are pretty ineffective, often conferring resistance for just *dozens* of adversarial fine-tuning steps.
03.05.2025 10:13 β π 1 π 0 π¬ 1 π 0
π§΅ π§΅ π§΅ I think that making LLMs resistant to exhibiting harmful behavior under few-shot tampering attacks might be one of the most useful goals for AI risk management in the next year.
03.05.2025 10:13 β π 1 π 1 π¬ 1 π 0
Hi, Iβm at ICLR β letβs talk about AI evals, safeguards, and technical governance.
24.04.2025 13:28 β π 3 π 0 π¬ 0 π 0
This work was done at MATS led by Leon Staufer and
Mick Yang. Great to collaborate with @ankareuel.bsky.social as well.
21.04.2025 17:10 β π 1 π 0 π¬ 0 π 0
Finally, we interviewed 10 experts in the AI evals space for their firsthand insights on current challenges with high-quality AI audits.
21.04.2025 17:10 β π 0 π 0 π¬ 1 π 0
CS PhD candidate at Princeton. I study the societal impact of AI.
Website: cs.princeton.edu/~sayashk
Book/Substack: aisnakeoil.com
Assistant Professor @Mila-Quebec.bsky.social
Co-Director @McGill-NLP.bsky.social
Researcher @ServiceNow.bsky.social
Alumni: @StanfordNLP.bsky.social, EdinburghNLP
Natural Language Processor #NLProc
sentio ergo sum. developing the science of evals at METR. prev NYU, cohere
Researcher at Anthropic, incoming faculty at BU. Based in SF. Likes cats. smsharma.github.io.
METR is a research nonprofit that builds evaluations to empirically test AI systems for capabilities that could threaten catastrophic harm to society.
friendly deep sea dweller
Data centers, AI policy, compute governance
AI gov, sw eng, infra, transport πΉ
Research Scholar at the Centre for the Governance of AI (GovAI)
The Thirty-Eighth Annual Conference on Neural Information Processing Systems will be held in Vancouver Convention Center, on Tuesday, Dec 10 through Sunday, Dec 15.
https://neurips.cc/
ML for remote sensing @Mila_Quebec * UdeM x McGill CS alum
Interests: Responsible ML for climate & societal impacts, STS, FATE, AI Ethics & Safety
prev: SSofCS lab
ππ¨π¦ Montreal (allegedly)
TW: @XMichellelinX
https://mchll-ln.github.io/
Research Fellow at GovAI | π¨π¦
AI Safety and Security. Fellow @ CSET | Georgetown. CS/AI PhD. Nerd.
Senior Policy Advisor for AI and Emerging Technology, White House Office of Science and Technology Policy | Strategic Advisor for AI, National Science Foundation
https://hyperdimensional.co
how shall we live together?
societal impacts researcher at Anthropic
saffronhuang.com
Human being. Trying to do good. CEO @ Encultured AI. AI Researcher @ UC Berkeley. Listed bday is approximate ;)
Research Scientist @DeepMind | Previously @OSFellows & @hrdag. RT != endorsements. Opinions Mine. Pronouns: he/him