Dave Willner's Avatar

Dave Willner

@dwillner.bsky.social

Co-Founder at Zentropi. Formerly Head of Trust & Safety at OpenAI, of Community Policy at Airbnb, and of Content Policy Facebook. Strictly cold takes.

9,530 Followers  |  1,855 Following  |  199 Posts  |  Joined: 06.05.2023  |  2.3904

Latest posts by dwillner.bsky.social on Bluesky

Right? I don’t even know what the current fight is about, but let’s not be silly now.

03.10.2025 15:11 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

So, the first part of this is plainly false, both historically and currently. I don’t think it’s a good thing in most cases…but it’s plainly the case that pressuring the people in charge of moderation to either ban (or not ban) people works *All The Time*. It is why people do it!

03.10.2025 15:03 β€” πŸ‘ 47    πŸ” 10    πŸ’¬ 3    πŸ“Œ 1
Preview
Moderating is Such Sweet Sorrow - Ctrl-Alt-Speech In this week’s roundup of the latest news in online speech, content moderation and internet regulation, Mike is joined by Dave Willner, founder of Zentropi, and long-time trust & safety expert who...

New Ctrl-Alt-Speech: Moderating is Such Sweet Sorrow with guest host @dwillner.bsky.social who is entirely responsible for bringing up Shakespeare as part of this discussion. (@benwhitelaw.bsky.social will be back next week!)

podcast.ctrlaltspeech.com/2315966/epis...

01.10.2025 23:25 β€” πŸ‘ 12    πŸ” 4    πŸ’¬ 1    πŸ“Œ 1

While terrible, this is entirely unsurprising. If you hold serious safety efforts in contempt, this sort of thing is inevitable.

23.09.2025 04:59 β€” πŸ‘ 76    πŸ” 22    πŸ’¬ 1    πŸ“Œ 0

Disney/ABC have a responsibility to refuse to participate in corruption.

Kimmel must be reinstated. If Disney/ABC agree to this extortion then perhaps creatives + workers should consider collective action to push back. Same w/buying park + cruise tickets if they bow.

People have power. Ask Target

20.09.2025 01:13 β€” πŸ‘ 59566    πŸ” 13481    πŸ’¬ 1822    πŸ“Œ 538

No one who agrees to this is a journalist.

20.09.2025 00:27 β€” πŸ‘ 11    πŸ” 4    πŸ’¬ 0    πŸ“Œ 0

I somehow doubt it.

18.09.2025 16:47 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Losing my ever-loving-mind watching the same people who were just clutching their pearls claiming censorship over mean emails from WH staffers to Twitter about COVID misinfo are now HAVING THE FCC CHAIR openly threaten broadcast licenses over a joke about the president AND THE BROADCASTERS CENSOR IT

18.09.2025 13:06 β€” πŸ‘ 395    πŸ” 71    πŸ’¬ 4    πŸ“Œ 4
Post image

This is a massive, history making abuse of your power. It will define your legacy and one day you will come to regret punishing free speech and trying to destroy democracy.

18.09.2025 00:53 β€” πŸ‘ 17508    πŸ” 4490    πŸ’¬ 1080    πŸ“Œ 302

This is jawboning. This is what the Freedom Caucus fascists of the Weaponization Committee and their Substack lackeys pretended was happening under some β€œBiden regime,” but it wasn’t. It was always projection.

17.09.2025 23:03 β€” πŸ‘ 219    πŸ” 58    πŸ’¬ 10    πŸ“Œ 7
Republicans are honoring Charlie Kirk’s memory by declaring war on the First AmendmentThe guardians of β€˜free speech’ sure had a change of heart.

Republicans are honoring Charlie Kirk’s memory by declaring war on the First AmendmentThe guardians of β€˜free speech’ sure had a change of heart.

We cannot make the headlines blunter people www.theverge.com/policy/77979...

17.09.2025 23:30 β€” πŸ‘ 3920    πŸ” 1042    πŸ’¬ 33    πŸ“Œ 25

staring straight into the camera and lying. just a despicable person and a poor excuse for a national leader.

15.09.2025 18:34 β€” πŸ‘ 11518    πŸ” 2205    πŸ’¬ 440    πŸ“Œ 83

We must stand resolutely against political assassination and political violence of all kinds, and just as resolutely against everyone who exploits acts of violence as the pretext or excuse for political repression of political opponents.

10.09.2025 21:12 β€” πŸ‘ 5313    πŸ” 1350    πŸ’¬ 261    πŸ“Œ 52
Preview
Zentropi LLM Policy Writing Workshop Signup By popular demand, we will be hosting a virtual version of our sold-out TrustCon workshop on how to write high quality content policies with and for LLMs. In this session, you will learn best practic...

We got really positive feedback on the TrustCon workshop we ran on writing good content policies for LLMs...so we're doing it again! If you're interested go sign up here, so we can start to figure out timing: forms.gle/tj7vf7ng8n7R...

27.08.2025 18:11 β€” πŸ‘ 2    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

The agenda for the Trust and Safety Research Conference is out now. Two days of lightning talks, presentations, networking and more, with @dwillner.bsky.social‬ as keynote. Join us!

For the full line-up and times, plus link to register, visit:

cyber.fsi.stanford.edu/content/trus...

20.08.2025 18:46 β€” πŸ‘ 4    πŸ” 4    πŸ’¬ 0    πŸ“Œ 0
Post image Post image Post image

A sprite of mischief in New York left sunflowers at the Russian consulate.

19.08.2025 14:03 β€” πŸ‘ 213    πŸ” 39    πŸ’¬ 6    πŸ“Œ 2

I mean honestly, people, this is a really good idea. Cut sunflowers make Russian diplomats really really angry.

Buy a few sunflowers.

Drop them at an embassy/consulate near you.

youtu.be/R8tr6Dhn78A?...

19.08.2025 14:27 β€” πŸ‘ 392    πŸ” 106    πŸ’¬ 12    πŸ“Œ 1

The joy we felt when after hearing repeatedly to not expect anything sooner than 18 months when we were told they'd be rolling something out before the end of the year. The absolute testament to the power and ingenuity of the American system of science. Moon landing level stuff, and now erased.

10.08.2025 01:35 β€” πŸ‘ 1225    πŸ” 225    πŸ’¬ 12    πŸ“Œ 11

Replied over here to a similar question - bsky.app/profile/dwil...

06.08.2025 16:11 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Interested in how you'd think about probing it for weaknesses, let me know if you want to chat!

06.08.2025 16:10 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

That is an advantage against adversarial behavior, since exactly how it behaves won't be obviously the same across users with different policies...but it also means testing for it's across-policy tendencies (which surely exist) is hard, since you'd need a lot of "clearly good" policy to do it.

06.08.2025 16:10 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

That's tangled up with a broader unsolved evaluation problem for this kind of approach - namely, the results you get on the classification side are a function of *both* CoPE's performance *and* your specific policy formulation.

06.08.2025 16:10 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

It's an area we need to explore more deeply. The classification model isn't trivial to trick because of how strictly it's been trained to take it's lead from the policy document itself, but I'd imagine you could do so with dedicated effort under the right circumstances.

06.08.2025 16:10 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1
Preview
zentropi-ai/cope-a-9b Β· Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

You could also just run that policy using CoPE as the labeler in production - the interpreting model is only 9B parameters and is open sourced, so we can run it for you or you can run it on your own infra! huggingface.co/zentropi-ai/...

01.08.2025 14:50 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Thanks friend!

01.08.2025 14:48 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

this is actually and truly, huge. That workshop was ridiculous to hear about and I think I saw like a thousand lightbulbs turn on in people's heads at the same time

31.07.2025 22:44 β€” πŸ‘ 9    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0

This looks absolutely amazing and a quick perusal shows it might actually make running a labeler smooth enough that I might be able to do it once we figure out why my brain is melting

01.08.2025 12:49 β€” πŸ‘ 108    πŸ” 8    πŸ’¬ 5    πŸ“Œ 1

It means a lot to me that you like it πŸ˜€

01.08.2025 14:47 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

The system offers you candidate policy revisions (and data labels you applied it thinks might not follow from your policy), then you read/accept/assess them, deciding whether or not they are closer to what you want.

01.08.2025 14:44 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Here again, you can end up with a policy you *don't want* but it can't really be hallucinated in the traditional sense, since the policy is a set of definitions you're asserting for the purposes of this labeling exercise. There's no ground-truth "true" policy, it's a construct.

01.08.2025 14:44 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@dwillner is following 20 prominent accounts