Right? I donβt even know what the current fight is about, but letβs not be silly now.
03.10.2025 15:11 β π 4 π 0 π¬ 1 π 0@dwillner.bsky.social
Co-Founder at Zentropi. Formerly Head of Trust & Safety at OpenAI, of Community Policy at Airbnb, and of Content Policy Facebook. Strictly cold takes.
Right? I donβt even know what the current fight is about, but letβs not be silly now.
03.10.2025 15:11 β π 4 π 0 π¬ 1 π 0So, the first part of this is plainly false, both historically and currently. I donβt think itβs a good thing in most casesβ¦but itβs plainly the case that pressuring the people in charge of moderation to either ban (or not ban) people works *All The Time*. It is why people do it!
03.10.2025 15:03 β π 47 π 10 π¬ 3 π 1New Ctrl-Alt-Speech: Moderating is Such Sweet Sorrow with guest host @dwillner.bsky.social who is entirely responsible for bringing up Shakespeare as part of this discussion. (@benwhitelaw.bsky.social will be back next week!)
podcast.ctrlaltspeech.com/2315966/epis...
While terrible, this is entirely unsurprising. If you hold serious safety efforts in contempt, this sort of thing is inevitable.
23.09.2025 04:59 β π 76 π 22 π¬ 1 π 0Disney/ABC have a responsibility to refuse to participate in corruption.
Kimmel must be reinstated. If Disney/ABC agree to this extortion then perhaps creatives + workers should consider collective action to push back. Same w/buying park + cruise tickets if they bow.
People have power. Ask Target
No one who agrees to this is a journalist.
20.09.2025 00:27 β π 11 π 4 π¬ 0 π 0I somehow doubt it.
18.09.2025 16:47 β π 0 π 0 π¬ 1 π 0Losing my ever-loving-mind watching the same people who were just clutching their pearls claiming censorship over mean emails from WH staffers to Twitter about COVID misinfo are now HAVING THE FCC CHAIR openly threaten broadcast licenses over a joke about the president AND THE BROADCASTERS CENSOR IT
18.09.2025 13:06 β π 395 π 71 π¬ 4 π 4This is a massive, history making abuse of your power. It will define your legacy and one day you will come to regret punishing free speech and trying to destroy democracy.
18.09.2025 00:53 β π 17508 π 4490 π¬ 1080 π 302This is jawboning. This is what the Freedom Caucus fascists of the Weaponization Committee and their Substack lackeys pretended was happening under some βBiden regime,β but it wasnβt. It was always projection.
17.09.2025 23:03 β π 219 π 58 π¬ 10 π 7Republicans are honoring Charlie Kirkβs memory by declaring war on the First AmendmentThe guardians of βfree speechβ sure had a change of heart.
We cannot make the headlines blunter people www.theverge.com/policy/77979...
17.09.2025 23:30 β π 3920 π 1042 π¬ 33 π 25staring straight into the camera and lying. just a despicable person and a poor excuse for a national leader.
15.09.2025 18:34 β π 11518 π 2205 π¬ 440 π 83We must stand resolutely against political assassination and political violence of all kinds, and just as resolutely against everyone who exploits acts of violence as the pretext or excuse for political repression of political opponents.
10.09.2025 21:12 β π 5313 π 1350 π¬ 261 π 52We got really positive feedback on the TrustCon workshop we ran on writing good content policies for LLMs...so we're doing it again! If you're interested go sign up here, so we can start to figure out timing: forms.gle/tj7vf7ng8n7R...
27.08.2025 18:11 β π 2 π 2 π¬ 0 π 0The agenda for the Trust and Safety Research Conference is out now. Two days of lightning talks, presentations, networking and more, with @dwillner.bsky.socialβ¬ as keynote. Join us!
For the full line-up and times, plus link to register, visit:
cyber.fsi.stanford.edu/content/trus...
A sprite of mischief in New York left sunflowers at the Russian consulate.
19.08.2025 14:03 β π 213 π 39 π¬ 6 π 2I mean honestly, people, this is a really good idea. Cut sunflowers make Russian diplomats really really angry.
Buy a few sunflowers.
Drop them at an embassy/consulate near you.
youtu.be/R8tr6Dhn78A?...
The joy we felt when after hearing repeatedly to not expect anything sooner than 18 months when we were told they'd be rolling something out before the end of the year. The absolute testament to the power and ingenuity of the American system of science. Moon landing level stuff, and now erased.
10.08.2025 01:35 β π 1225 π 225 π¬ 12 π 11Replied over here to a similar question - bsky.app/profile/dwil...
06.08.2025 16:11 β π 1 π 1 π¬ 1 π 0Interested in how you'd think about probing it for weaknesses, let me know if you want to chat!
06.08.2025 16:10 β π 0 π 0 π¬ 0 π 0That is an advantage against adversarial behavior, since exactly how it behaves won't be obviously the same across users with different policies...but it also means testing for it's across-policy tendencies (which surely exist) is hard, since you'd need a lot of "clearly good" policy to do it.
06.08.2025 16:10 β π 0 π 0 π¬ 1 π 0That's tangled up with a broader unsolved evaluation problem for this kind of approach - namely, the results you get on the classification side are a function of *both* CoPE's performance *and* your specific policy formulation.
06.08.2025 16:10 β π 0 π 0 π¬ 1 π 0It's an area we need to explore more deeply. The classification model isn't trivial to trick because of how strictly it's been trained to take it's lead from the policy document itself, but I'd imagine you could do so with dedicated effort under the right circumstances.
06.08.2025 16:10 β π 1 π 0 π¬ 1 π 1You could also just run that policy using CoPE as the labeler in production - the interpreting model is only 9B parameters and is open sourced, so we can run it for you or you can run it on your own infra! huggingface.co/zentropi-ai/...
01.08.2025 14:50 β π 1 π 0 π¬ 0 π 0Thanks friend!
01.08.2025 14:48 β π 0 π 0 π¬ 0 π 0this is actually and truly, huge. That workshop was ridiculous to hear about and I think I saw like a thousand lightbulbs turn on in people's heads at the same time
31.07.2025 22:44 β π 9 π 2 π¬ 1 π 0This looks absolutely amazing and a quick perusal shows it might actually make running a labeler smooth enough that I might be able to do it once we figure out why my brain is melting
01.08.2025 12:49 β π 108 π 8 π¬ 5 π 1It means a lot to me that you like it π
01.08.2025 14:47 β π 5 π 0 π¬ 0 π 0The system offers you candidate policy revisions (and data labels you applied it thinks might not follow from your policy), then you read/accept/assess them, deciding whether or not they are closer to what you want.
01.08.2025 14:44 β π 2 π 0 π¬ 1 π 0Here again, you can end up with a policy you *don't want* but it can't really be hallucinated in the traditional sense, since the policy is a set of definitions you're asserting for the purposes of this labeling exercise. There's no ground-truth "true" policy, it's a construct.
01.08.2025 14:44 β π 2 π 0 π¬ 1 π 0