Cas (Stephen Casper)'s Avatar

Cas (Stephen Casper)

@scasper.bsky.social

AI technical gov & risk management research. PhD student @MIT_CSAIL, fmr. UK AISI. I'm on the CS faculty job market! https://stephencasper.com/

174 Followers  |  180 Following  |  194 Posts  |  Joined: 17.02.2025  |  2.0094

Latest posts by scasper.bsky.social on Bluesky

And when adversarial dynamics are at play, we are probably cooked.

12.12.2025 17:00 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Still though, moving forward, I fully expect realistic deepfakes to be a really big problem. In a world where a fraction of people are still vulnerable to weird crypto email scams, AI deepfakes are going to consistently cause issues.

12.12.2025 17:00 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

That requires some forms of realism, like five-fingered hands and plausible physics. But it tends toward mode collapse on cinematic, well-lit, staged-seeming, prototypical, and stock-photo-like images/video. Not the kind of stuff I usually take with my phone camera.

12.12.2025 17:00 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

Image and video models are getting better, and I certainly think that AI systems could be more capable of imitating real photos and videos. But these systems aren't being fine-tuned and optimized for realism. They are being designed to produce "high-quality" media...

12.12.2025 17:00 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿงต I think people often assume that AI images/video will get harder to distinguish from natural ones over time with better models.

In most (non-adversarial) cases, I expect the opposite will often apply...

12.12.2025 17:00 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

...you know, that one genre of AI research that is the most politically relevant and the most diverse and socially representative.

11.12.2025 19:05 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Meanwhile, arXiv still has it on hold. I'm worried they think it's a position paper (it's definitely not: descriptive causal claims with political valence -/> position).

Regardless, I dislike arXiv's new policy -- it disproportionately suppresses research on AI & society...

11.12.2025 19:05 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Excited that our paper has been on SSRN for 8 days, but became SSRN's most downloaded paper of the past 60 days in two ejournal categories. Glad about this -- I think this is one of the more important projects I've worked on.

papers.ssrn.com/sol3/papers....

11.12.2025 19:05 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

UK AISI is hiring for a technical research role on open-weight model safeguards.

www.aisi.gov.uk/careers

11.12.2025 14:00 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Thanks to collaborators! This was a really interesting paper for me to work on, and it took a special group of interdisciplinary people to get it done.
Max Kamachee
@r-jy.bsky.social
@michelleding.bsky.social
@ankareuel.bsky.social
@stellaathena.bsky.social
@dhadfieldmenell.bsky.social

04.12.2025 17:32 โ€” ๐Ÿ‘ 2    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

In the paper, we discuss a lot of reasons why the answer is no. Most importantly, saying that the AI ecosystem is only as safe as its least safe systems paves the way for a race to the bottom.

04.12.2025 17:32 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Finally, we consider whether it's too late. Does the current existence of a few advanced NCII-capable AI video generators negate the value of future models being safeguarded?

No...

04.12.2025 17:32 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Unfortunately, though, the relationships between developer choices and downstream harms can be difficult to study because prominent open-weight AI video model developers rarely report on safeguards against harmful adaptation of their models.

04.12.2025 17:32 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

...For example, Stable Diffusion 1.x models were trained on lots of porn, while Stable Diffusion 2.x models were trained on filtered SFW data. This seems to have made a big empirical difference. SD 1.x models are responsible for >1,000x more NSFW content on CivitAI than SD 2.x.

04.12.2025 17:32 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Why are some models so much more disproportionately used for NSFW content than others? The safeguards implemented by developers seem to be a big factor...

04.12.2025 17:32 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Currently, a small number of open-weight models are dominantly used for NSFW video generation, including Wan2.x. Variants of these models specialized for NSFW videos are widely shared across a small number of key online distribution platforms, including CivitAI.

04.12.2025 17:32 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

We study the supply chain behind video-NCII capabilities, observing how model developers and distribution platforms represent critical bottlenecks.

04.12.2025 17:32 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Today, in 2025, history is repeating itself for AI-generated *video* NCII. The medium is new, but the story of how these AI capabilities proliferate and get abused is much the same. Only this time, we have hindsight, and things have been much more foreseeable.

04.12.2025 17:32 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

For example, ActiveFence found a 400% increase in web threads related to AI-generated non-consensual intimate imagery (NCII) between 2022 and 2023.

04.12.2025 17:32 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

After the release of Stable Diffusion 1.0 in Aug. 2022, there was a boom in personalized, non-consensual AI-generated intimate deepfakes. New fine-tuned models, workflows, apps, and communities made creating realistic non-consensual intimate imagery dramatically more accessible.

04.12.2025 17:32 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Video Deepfake Abuse: How Company Choices Predictably Shape Misuse Patterns In 2022, AI image generators crossed a key threshold, enabling much more efficient and dynamic production of photorealistic deepfake images than before. This en

See the paper here.

papers.ssrn.com/sol3/papers....

04.12.2025 17:32 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Did you know that one base model is responsible for 94% of model-tagged NSFW AI videos on CivitAI?

This new paper studies how a small number of models power the non-consensual AI video deepfake ecosystem and why their developers could have predicted and mitigated this.

04.12.2025 17:32 โ€” ๐Ÿ‘ 6    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

See this paper for more of my thoughts.

papers.ssrn.com/sol3/papers...

26.11.2025 16:00 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Here are my current favorite ideas for how to improve tamper-resistant ignorance/unlearning in LLMs.

Shamelessly copied from a slack message.

26.11.2025 16:00 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Open-weight genome language model safeguards: Assessing robustness via adversarial fine-tuning Novel deep learning architectures are increasingly being applied to biological data, including genetic sequences. These models, referred to as genomic language mod- els (gLMs), have demonstrated impre...

By coincidence, I just stumbled into this paper today shortly after posting this.
arxiv.org/abs/2511.19299

25.11.2025 21:30 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

For more thoughts, see our agenda paper.

t.co/CVkAKNXZme

25.11.2025 20:00 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

In general, it's still hard to study the impacts of data filtering because the experiments are expensive, & developers don't generally report much about what they do. For example, we found very limited/inconsistent reporting in some recent analysis.
t.co/CVkAKNXZme

25.11.2025 20:00 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Those are the key recent papers that I know of. Do you know of any others???

25.11.2025 20:00 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Post image

5. Biorisk evals paper (Nov 2025)

They tested filtration of species/genus data against adv. fine-tuning. It didn't work well. This suggests filtering may work better if applied to entire tasks/domains rather than specific instances.

arxiv.org/abs/2510.27629

25.11.2025 20:00 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image Post image

4. Deep ignorance paper (August 2025) @kyletokens.bsky.social

We showed that filtering biothreat-related pretraining data is SOTA for making models resist adversarial fine-tuning. We proposed an amendment to the hypothesis from papers 1 and 2 above.

deepignorance.ai

25.11.2025 20:00 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

@scasper is following 20 prominent accounts