Don't take our word for it! Go kick the tires at zentropi.ai and build your own content labeler (no subscription required!)
31.07.2025 21:43 β π 3 π 0 π¬ 0 π 0@samidh.bsky.social
Co-Founder at Zentropi (Trustworthy AI). Formerly Meta Civic Integrity Founder, Google X and Google Civic Innovation Lead, and Groq CPO.
Don't take our word for it! Go kick the tires at zentropi.ai and build your own content labeler (no subscription required!)
31.07.2025 21:43 β π 3 π 0 π¬ 0 π 0@mmasnick.bsky.social I have a bluesky demo for you that you might want to see :)
31.07.2025 21:41 β π 3 π 0 π¬ 1 π 0Just tested this on a few that I know Redditβs existing Hatred & Harassment automation has blindspots for; it built a focused, accurate labeler in under 10 minutes / a dozen examples, & the human readable criteria it built could be dropped into a training manual / erratum / used to build a regex
31.07.2025 19:48 β π 12 π 5 π¬ 2 π 0After a visit to TrustCon, I wrote about the near-total failure of trust and safety leaders to defend human rights after conservatives successfully bullied platforms into submission www.platformer.news/trustcon-tru... www.platformer.news/trustcon-tru...
25.07.2025 01:09 β π 205 π 46 π¬ 7 π 3So excited for #TrustCon this week! We will be publicly unveiling Zentropi, a platform that helps people instantly build their own content labelers. We'll be opening it up for early access and open sourcing the underlying language model we trained for the task so that it is accessible to everyone.
21.07.2025 01:13 β π 8 π 1 π¬ 0 π 0I expect @dwillner.bsky.social to run around like a maniac again at #Trustcon this year as he shows off Zentropi -- our platform that makes it simple to build your own CoPE-powered content labeler.
18.07.2025 19:30 β π 2 π 0 π¬ 1 π 0A year ago at #TrustCon, I ran around like a maniac showing people something on my laptop. We'd just gotten CoPE - our policy interpretation model - working. It felt like a huge achievement, validating our ideas about LLM-powered labeling. π§΅ 1/7
18.07.2025 19:21 β π 39 π 8 π¬ 2 π 3Take back your attention.
21.01.2025 17:12 β π 23692 π 2230 π¬ 417 π 106The splinternet accelerates. If this stands, look for more countries in 2025 to ban Facebook, Instagram, YouTube, etc. out of fears of American surveillance. www.bloomberg.com/news/article...
17.01.2025 20:39 β π 5 π 0 π¬ 0 π 0When I access that @nytimes.com article, it doesn't say "for Facebook's culture". Instead it says "for an inclusivity initiative at Facebook that encouraged employeesβ self-expression in the workplace". Was it edited or updated?
16.01.2025 23:45 β π 8 π 0 π¬ 2 π 0@caseynewton.bsky.social In terms of other news feed demotions that could be rolled back (or maybe already have been), take a look at this blog post. Would be smart to track it as it is frequently updated: transparency.meta.com/features/app...
15.01.2025 05:21 β π 6 π 0 π¬ 0 π 0A huge chunk of demoted misinfo on FB are not explicitly about politics at all. Now these hoaxes will get massive amplification. One recent example from Snopes (which was an important fact checker!): www.snopes.com/news/2025/01...
15.01.2025 05:04 β π 3 π 0 π¬ 0 π 0If someone on my team at Meta had ever said these kinds of words, I'm pretty sure I'd have had an obligation to notify HR. But maybe times have changed and "masculine energy" is now part of the performance review rubric. www.bloomberg.com/news/article...
12.01.2025 20:01 β π 21 π 2 π¬ 4 π 2Don't forget taking out TikTok.
12.01.2025 19:36 β π 3 π 0 π¬ 1 π 0That's interesting. Seems like the deadline for donations could be set to be before the election then?
12.01.2025 01:29 β π 7 π 0 π¬ 1 π 0All the tech companies I've ever worked for required employees to sit through hours of training about how bad it would be to ever pay a government official a bribe. I am at a loss as to how this "tradition" of donating to inauguration funds is any different. Can someone explain?
12.01.2025 01:04 β π 44 π 3 π¬ 3 π 0I don't think they want llama talking in the way they now allow on FB. More likely strategic objective: keep our new AI czar in good graces.
11.01.2025 05:57 β π 4 π 0 π¬ 0 π 0Bingo. But this started even earlier, probably with the Google walkouts being the catalyzing event that led other companies to try to actively avoid that fate.
11.01.2025 04:24 β π 15 π 1 π¬ 0 π 0I'm guessing that the integrity teams are working through the answers to these questions (the announced changes were likely a surprise to most staff). But it's critical we start asking these questions now because, depending on implementation, the societal repercussions could be tectonic. π§΅ 12/12
10.01.2025 18:37 β π 13 π 1 π¬ 1 π 1D. Will Meta commit to continuing to publish CSER? Expand it to more countries? Will all definitional changes be disclosed and 3rd party audited?
E. To ensure vulnerable populations don't get lost in averages, will Meta publish p99 metrics (i.e. how much harm is in the "worst" 1% of feeds)? π§΅ 11/12
B. In which harm areas will proactive enforcement stop? How, it at all, will that vary depending on public vs. closed content? Recommended content vs. connected content?
C. In which areas will the precision of automated enforcement change? By how much? Will these be disclosed? π§΅ 10/12
With that context, here are the questions we should be asking of Meta. They owe it to their community to answer because it will have drastic impacts on our online ecosystem:
A. Which harm areas will be considered low severity vs. high severity? Will this vary by country? By time?
... π§΅ 9/12
Be forewarned that it's possible to play all kinds of games with prevalence metrics. If your definition of what is "violating" changes (like Meta did this week), your prevalence changes. If you change how you count a "view" (i.e., maybe start to exclude private groups?), it changes. Etc. Etc. π§΅ 8/12
10.01.2025 18:37 β π 12 π 0 π¬ 1 π 0To really understand the impact on the ecosystem, the best metric to track going forward will be what's the overall prevalence of harms on Meta's platforms (i.e., what % of views are found to be violating), which for the moment CSER provides. Here's an example of hate speech prevalence on FB. π§΅ 7/12
10.01.2025 18:37 β π 8 π 0 π¬ 1 π 0Here's a typical precision/recall curve for an ML classifier, the tech that underlies automated content moderation systems today. You can see, for example, that increasing precision from just 80% -> 90% means that you start missing more than twice as much harm (35% vs 15%). No free lunch!π§΅ 6/12
10.01.2025 18:37 β π 17 π 4 π¬ 1 π 0Another way that Meta could implement Mark's new guidance is not to stop automated enforcement but just tune it more for precision. This means that the ML models need higher confidence before taking an action. That reduces overenforcement, but also lets a lot more harmful posts slip through. π§΅ 5/12
10.01.2025 18:37 β π 8 π 1 π¬ 1 π 1Does this mean that if proactive enforcement were to stop there'd be 25x more bullying & harassment on Instagram? Probably not. At least some harmful posts will be caught eventually by a user report. But in private spaces with fewer eyeballs or less ideological diversity, the # may balloon. π§΅ 4/12
10.01.2025 18:37 β π 9 π 0 π¬ 2 π 0One of the most relevant stats from CSER is the "proactive rate", which is the % of violating posts Meta took action on prior to a user report. For example, in Q3 2024, of all the posts Instagram found to be bullying & harassment, a whopping 96% were taken down first by automated systems. π§΅ 3/12
10.01.2025 18:37 β π 13 π 2 π¬ 1 π 0Meta publishes, to its credit, a quarterly Community Standards Enforcement Report (CSER) where they share a number of trust & safety-related stats about their platforms (transparency.meta.com/reports/comm...). Looking at this can give us an idea of how this announcement may change things. π§΅ 2/12
10.01.2025 18:37 β π 10 π 0 π¬ 1 π 0Will Meta's retreat from proactive/automated enforcement of its community standards lead to Instagram and Facebook becoming deeper cess pools? Well, it's complicated and the devil will be in the details. Let's look at the data and see what questions we should be asking... π§΅ 1/12
10.01.2025 18:37 β π 71 π 14 π¬ 7 π 5