Open-weight model safety is AI safety in hard mode. Anyone can modify every parameter. @scasper.bsky.social: Open-weight models are only months behind closed models, which are reaching dangerous capability thresholds. 2026 will be critical.π
29.01.2026 16:32 β π 3 π 1 π¬ 1 π 0
https://docs.google.com/document/d/10XkZpUabt4fEK8BUtd8Jz26-M8ARQ6c5iJCbefaUtQI/edit?usp=sharing
I made a fully-open, living document with notes and concrete project ideas about tamper-resistance and open-weight model safety research.
You, yes you π«΅, should feel free to look, comment, or message me about it.
docs.google.com/document/d/1...
23.01.2026 18:28 β π 1 π 0 π¬ 0 π 0
I would love to see the draft. scasper@mit.edu
23.01.2026 01:46 β π 0 π 0 π¬ 0 π 0
"Policy levers to mitigate AI-facilitated terrorism"
"The biggest AI incidents of 2026, and how they could have been prevented"
22.01.2026 16:55 β π 0 π 0 π¬ 0 π 0
"Technocrats always win: Against 'pluralistic' algorithms and pluralism-washing"
βIs your Machine Unlearning Algorithm Better than a Bag-of-Words Classifier? (No)β
"Donβt overthink it: extremely dumb solutions that improve tamper-resistant unlearning in LLMs"
...
22.01.2026 16:55 β π 1 π 0 π¬ 1 π 0
Here are some miscellaneous title ideas for papers that I'm not currently working on, but sometimes daydream about. Let me know if you are thinking about anything related.
22.01.2026 16:55 β π 2 π 0 π¬ 2 π 0
Research on tamper-resistant machine unlearning is funny.
The SOTA, according to papers proposing techniques, is resistance to tens of thousands of adversarial fine-tuning steps.
But according to papers that do second-party red-teaming, the SOTA is just a couple hundred steps.
22.01.2026 14:00 β π 1 π 0 π¬ 0 π 0
To people working on adversarial vulnerabilities for safeguards against AI deepfake porn, I'm glad you're doing what you're doing. But don't forget that mitigations matter, & we're not always up against sophisticated attacks. Half the time, the perpetrators are literal teenagers.
13.01.2026 14:02 β π 3 π 0 π¬ 0 π 0
And we also have to remember that in this domain, like half of the perpetrators are literal teenagers.
12.01.2026 19:55 β π 1 π 0 π¬ 0 π 0
But mitigations matter. There is evidence from other fields like digital piracy that reducing the accessibility of illicit things drives up sanctioned uses, even when perfect prevention isnβt possibleβ¦
12.01.2026 19:55 β π 1 π 0 π¬ 1 π 0
Probably by restricting their distribution on platforms like civitai under the same kind of law.
Sometimes people tell me, βthat kind of stuff is not gonna work because models will still be accessible on the Internet.ββ¦
12.01.2026 19:55 β π 1 π 0 π¬ 1 π 0
Finally, this seems like the right thing to do anyway. It would be a strong protection against training data depicting non-consenting people or minors. And many people might reasonably consent to their NSFW likeness to be online in general without consenting to AI training.
12.01.2026 19:30 β π 0 π 0 π¬ 0 π 0
This kind of approach would make the creation of NCII-capable AI models/apps very onerous. Meanwhile, Congress probably would not run into 1st Amendment issues with this type of modification to fair use law.
12.01.2026 19:30 β π 0 π 0 π¬ 1 π 0
Or B: The developer could alternatively attest/verify that they developed the system using an externally sourced dataset known to satisfy A1-A3 for their usage.
12.01.2026 19:30 β π 0 π 0 π¬ 1 π 0
A3: Third, the developer would be required for a period (e.g. 10 years) to preserve a record of the unredacted list, all contracts signed by subjects, and their contact information.
12.01.2026 19:30 β π 0 π 0 π¬ 1 π 0
A2: Second, the declaration must attest that all subjects provided affirmative, informed consent for their NSFW likeness to be used to develop this technology.
12.01.2026 19:30 β π 0 π 0 π¬ 1 π 0
A1: First, the declaration needs to present a redacted list (unless subjects consented to non-redaction) of all individuals whose likeness was involved in developing the technology -- i.e. all humans whose likeness is depicted in pornographic training data.
12.01.2026 19:30 β π 0 π 0 π¬ 1 π 0
In practice, we could prohibit NCII-capable models/applications unless they are hosted/released alongside a declaration under penalty of perjury from the developer, with a few requirements: Either (A1, A2, A3) or B.
12.01.2026 19:30 β π 0 π 0 π¬ 1 π 0
Instead, I think there is a viable alternative that doesn't categorically restrict NSFW AI capabilities. Instead, we could simply require that anyone whose NSFW likeness is used in training a model or developing software offer affirmative and informed consent.
12.01.2026 19:30 β π 0 π 0 π¬ 1 π 0
Arguing for categorical prohibitions of realistic NCII-capable technology grounds that there is no alternative might work. Both of the above decisions cited viable, narrower alternative restrictions. But I wouldn't hold my breath.
12.01.2026 19:30 β π 1 π 0 π¬ 1 π 0
This poses First Amendment challenges in the USA.
1. Reno v. ACLU (1997) prohibits categorical restrictions on internet porn.
2. Ashcroft v Free Speech Coalition (2002) protects legal "child porn" as long as no actual child's likeness is involved.
12.01.2026 19:30 β π 1 π 0 π¬ 1 π 0
The technical reality: If we want open-weight AI models that are even slightly difficult to use/adapt for making non-consensual personalized deepfake porn, it is overwhelmingly clear from a technical perspective that we will have to limit models' overall NSFW capabilities.
12.01.2026 19:30 β π 1 π 0 π¬ 1 π 0
First, see this thread for some extra context.
x.com/StephenLCas...
12.01.2026 19:30 β π 1 π 0 π¬ 1 π 0
π§΅ Non-consensual AI deepfakes are out of control. But the 1st Amendment will likely prevent the US from directly prohibiting models/apps that make producing personalized NCII trivial.
In this thread, I'll explain the problem and a 1st Amendment-compatible solution (I think).
12.01.2026 19:30 β π 1 π 0 π¬ 2 π 0
One example of how easily harmful derivatives of open-weight models proliferate can be found on Hugging Face. Search "uncensored" or "abliterated" in the model search bar. You'll find some 7k models fine-tuned specifically to remove safeguards.
10.01.2026 14:00 β π 2 π 0 π¬ 0 π 0
Thanks + props to the paper's leaders and other coauthors.
08.01.2026 22:40 β π 0 π 0 π¬ 0 π 0
1. Studying the extent to which production agentic AI systems do or do not follow the law.
2. Studying the legal alignment of deployed AI systems βin the wild.β
3. Using legal frameworks to navigate challenges with pluralism in AI.
08.01.2026 22:40 β π 0 π 0 π¬ 1 π 0
There's a lot of good stuff in the paper, but I'll highlight a few of the directions I'm most interested in here...
08.01.2026 22:40 β π 0 π 0 π¬ 1 π 0
Tech Law and Policy LL.M. Candidate, Georgetown University Law Center
AI Governance, Content Moderation, Regulation, Torts, Law and Economics
CS Prof at Brown University, PI of the GIRAFFE lab, former AI Policy Advisor in the US Senate, co-chair of the ACM Tech Policy Subcommittee on AI and Algorithms.
PhD at MIT CSAIL '23, Harvard '16, former Google APM. Dog mom to Ducki.
βοΈ LL.M Candidate in Corporate & Commercial Law @ Maastricht University | π Public Int'l Law LL.M alumni @ University of Oslo | πΌ Privacy Lawyer | interested in tech, cybersecurity, data protection, international law & geopolitics | ππͺπͺ | views my own
Security writer @wired.com
Cybercrime, privacy, surveillance, and more.
Signal: mattburgess.20 | Email: matt_burgess@wired.com
PhD @Georgetown University | Works on GenAI, Multimodal models, and AI safety.
Post-doc at EPFL studying privacy and safety harms in data-driven systems. PhD in data privacy from Imperial College London. https://ana-mariacretu.github.io/
Co-founder and journalist at 404media.co
I read every email
Language and keyboard stuff at Google + PhD student at Tokyo Institute of Technology.
I like computers and Korean and computers-and-Korean and high school CS education.
Georgia Tech β μ°μΈλνκ΅ β ζ±δΊ¬ε·₯ζ₯ε€§ε¦.
https://theoreticallygoodwithcomputers.com/
context maximizer
https://gleech.org/
Running experiments @OpenAI + @ml-collective.bsky.social
Prev: Windscape AI, Uber AI Labs founding team, adviser Recursion Pharma, Cornell, Montreal, Caltech π»
Opinions are not my own, but merely a dance of signals traversing the neural network. MS AI @ Oregon State University | Creative Technologist | AI Safety | memes that aren't funny
Programmer-turned-lawyer, trying to build human(e) futures.
Day job: SonarSource. Boards: Creative Commons, OpenET (open water data), CA Housing Defense. Also: 415, dad. Past: Wikipedia, Moz, 305
Also: https://lu.is + https://social.coop/@luis_in_brief
Anti-cynic. Towards a weirder future. Reinforcement Learning, Autonomous Vehicles, transportation systems, the works. Asst. Prof at NYU
https://emerge-lab.github.io
https://www.admonymous.co/eugenevinitsky
Computational social scientist researching human-AI interaction and machine learning, particularly the rise of digital minds. Visiting scholar at Stanford, co-founder of Sentience Institute, and PhD candidate at University of Chicago. jacyanthis.com
UK AI Security Institute
Former Ada Lovelace Institute, Google, DeepMind, OII
Philosopher working on AI ethics and epistemology. Also a socialist and a metalhead, but sadly much more boring than that might suggest.
https://www.willfleisher.net
Associate professor of @umdcs @umiacs @ml_umd at UMD. Researcher in #AI/#ML, AI #Alignment, #RLHF, #Trustworthy ML, #EthicalAI, AI #Democratization, AI for ALL.
CS PhD candidate at Princeton. I study the societal impact of AI.
Website: cs.princeton.edu/~sayashk
Book/Substack: aisnakeoil.com
Assistant Professor @Mila-Quebec.bsky.social
Co-Director @McGill-NLP.bsky.social
Researcher @ServiceNow.bsky.social
Alumni: @StanfordNLP.bsky.social, EdinburghNLP
Natural Language Processor #NLProc
sentio ergo sum. developing the science of evals at METR. prev NYU, cohere