And when adversarial dynamics are at play, we are probably cooked.
12.12.2025 17:00 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0@scasper.bsky.social
AI technical gov & risk management research. PhD student @MIT_CSAIL, fmr. UK AISI. I'm on the CS faculty job market! https://stephencasper.com/
And when adversarial dynamics are at play, we are probably cooked.
12.12.2025 17:00 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0Still though, moving forward, I fully expect realistic deepfakes to be a really big problem. In a world where a fraction of people are still vulnerable to weird crypto email scams, AI deepfakes are going to consistently cause issues.
12.12.2025 17:00 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0That requires some forms of realism, like five-fingered hands and plausible physics. But it tends toward mode collapse on cinematic, well-lit, staged-seeming, prototypical, and stock-photo-like images/video. Not the kind of stuff I usually take with my phone camera.
12.12.2025 17:00 โ ๐ 1 ๐ 0 ๐ฌ 2 ๐ 0Image and video models are getting better, and I certainly think that AI systems could be more capable of imitating real photos and videos. But these systems aren't being fine-tuned and optimized for realism. They are being designed to produce "high-quality" media...
12.12.2025 17:00 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0๐งต I think people often assume that AI images/video will get harder to distinguish from natural ones over time with better models.
In most (non-adversarial) cases, I expect the opposite will often apply...
...you know, that one genre of AI research that is the most politically relevant and the most diverse and socially representative.
11.12.2025 19:05 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0Meanwhile, arXiv still has it on hold. I'm worried they think it's a position paper (it's definitely not: descriptive causal claims with political valence -/> position).
Regardless, I dislike arXiv's new policy -- it disproportionately suppresses research on AI & society...
Excited that our paper has been on SSRN for 8 days, but became SSRN's most downloaded paper of the past 60 days in two ejournal categories. Glad about this -- I think this is one of the more important projects I've worked on.
papers.ssrn.com/sol3/papers....
UK AISI is hiring for a technical research role on open-weight model safeguards.
www.aisi.gov.uk/careers
Thanks to collaborators! This was a really interesting paper for me to work on, and it took a special group of interdisciplinary people to get it done.
Max Kamachee
@r-jy.bsky.social
@michelleding.bsky.social
@ankareuel.bsky.social
@stellaathena.bsky.social
@dhadfieldmenell.bsky.social
In the paper, we discuss a lot of reasons why the answer is no. Most importantly, saying that the AI ecosystem is only as safe as its least safe systems paves the way for a race to the bottom.
04.12.2025 17:32 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0Finally, we consider whether it's too late. Does the current existence of a few advanced NCII-capable AI video generators negate the value of future models being safeguarded?
No...
Unfortunately, though, the relationships between developer choices and downstream harms can be difficult to study because prominent open-weight AI video model developers rarely report on safeguards against harmful adaptation of their models.
04.12.2025 17:32 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0...For example, Stable Diffusion 1.x models were trained on lots of porn, while Stable Diffusion 2.x models were trained on filtered SFW data. This seems to have made a big empirical difference. SD 1.x models are responsible for >1,000x more NSFW content on CivitAI than SD 2.x.
04.12.2025 17:32 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0Why are some models so much more disproportionately used for NSFW content than others? The safeguards implemented by developers seem to be a big factor...
04.12.2025 17:32 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0Currently, a small number of open-weight models are dominantly used for NSFW video generation, including Wan2.x. Variants of these models specialized for NSFW videos are widely shared across a small number of key online distribution platforms, including CivitAI.
04.12.2025 17:32 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0We study the supply chain behind video-NCII capabilities, observing how model developers and distribution platforms represent critical bottlenecks.
04.12.2025 17:32 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0Today, in 2025, history is repeating itself for AI-generated *video* NCII. The medium is new, but the story of how these AI capabilities proliferate and get abused is much the same. Only this time, we have hindsight, and things have been much more foreseeable.
04.12.2025 17:32 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0For example, ActiveFence found a 400% increase in web threads related to AI-generated non-consensual intimate imagery (NCII) between 2022 and 2023.
04.12.2025 17:32 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0After the release of Stable Diffusion 1.0 in Aug. 2022, there was a boom in personalized, non-consensual AI-generated intimate deepfakes. New fine-tuned models, workflows, apps, and communities made creating realistic non-consensual intimate imagery dramatically more accessible.
04.12.2025 17:32 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0See the paper here.
papers.ssrn.com/sol3/papers....
Did you know that one base model is responsible for 94% of model-tagged NSFW AI videos on CivitAI?
This new paper studies how a small number of models power the non-consensual AI video deepfake ecosystem and why their developers could have predicted and mitigated this.
See this paper for more of my thoughts.
papers.ssrn.com/sol3/papers...
Here are my current favorite ideas for how to improve tamper-resistant ignorance/unlearning in LLMs.
Shamelessly copied from a slack message.
By coincidence, I just stumbled into this paper today shortly after posting this.
arxiv.org/abs/2511.19299
For more thoughts, see our agenda paper.
t.co/CVkAKNXZme
In general, it's still hard to study the impacts of data filtering because the experiments are expensive, & developers don't generally report much about what they do. For example, we found very limited/inconsistent reporting in some recent analysis.
t.co/CVkAKNXZme
Those are the key recent papers that I know of. Do you know of any others???
25.11.2025 20:00 โ ๐ 0 ๐ 0 ๐ฌ 2 ๐ 05. Biorisk evals paper (Nov 2025)
They tested filtration of species/genus data against adv. fine-tuning. It didn't work well. This suggests filtering may work better if applied to entire tasks/domains rather than specific instances.
arxiv.org/abs/2510.27629
4. Deep ignorance paper (August 2025) @kyletokens.bsky.social
We showed that filtering biothreat-related pretraining data is SOTA for making models resist adversarial fine-tuning. We proposed an amendment to the hypothesis from papers 1 and 2 above.
deepignorance.ai