The Midas Project Watchtower's Avatar

The Midas Project Watchtower

@safetychanges.bsky.social

We monitor AI safety policies from companies and governments for substantive changes. Anonymous submissions: https://forms.gle/3RP2xu2tr8beYs5c8 Run by @TheMidasProject.bsky.social

41 Followers  |  1 Following  |  40 Posts  |  Joined: 26.11.2024  |  2.5287

Latest posts by safetychanges.bsky.social on Bluesky

Google | The Midas Project Updated their Frontier Safety Framework

On the whole, it's good that Google is continuing to update its risk management policies, and they seem to treat the issue with much more seriousness than some competitors.

Read the full diff at our website: www.themidasproject.com/watchtower/g...

25.09.2025 18:10 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Remember that in 2024 Google promised to *define* specific risk thresholds, not explore illustrative examples.

25.09.2025 18:10 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Additionally, as pointed out by Zach Stein-Perlman of AI Lab Watch, the CCLs for misalignment, which used to be a concrete (albeit initial) approach, are now described as "exploratory" and "illustrative."

25.09.2025 18:10 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

Similarly, for ML R&D, models that "can" accelerate AI development no longer require RAND SL 3. Only models that have been used for this purpose count. But this is a strange ordering -- shouldn't the safeguards precede the deployment (and even the training) of such a model?

25.09.2025 18:10 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

But it's weakened in other ways.

Critical capability levels, which previously focused on capabilities (e.g. "can be used to cause a mass casualty event") now seems to rely on anticipated outcomes (e.g. "resulting in additional expected harm at severe scale")

25.09.2025 18:10 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

In their blog post, Google describes this as a strengthening of the policy.

And in some ways, it is: they define a new harmful manipulation risk category, and they even soften the claim from v2 that they would only follow their promise if every other company does so as well.

25.09.2025 18:10 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Date: September 22, 2025
Company: Google
Change: Released v3 of their Frontier Safety Framework

25.09.2025 18:10 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Responsible AI The mission of the Responsible AI and Human Centered Technology (RAI-HCT) team is to conduct research and develop methodologies, technologies, and best practices to ensure AI systems are built respons...

Old page: web.archive.org/web/20250206...

Current page: research.google/teams/respon...

08.03.2025 00:19 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Date: Feb 26 - March 6, 2025
Company: Google
Change: Scrubbed mentions of diversity and equity from the mission description of their Responsible AI team.

08.03.2025 00:19 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 2
Post image

(Removed)

05.03.2025 04:56 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

(Removed)

05.03.2025 04:56 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

The smaller changes made to Anthropic's practices:

(Added)

05.03.2025 04:56 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

The good news is that the details they provide on internal practices have changed very little (select screenshots included in rest of thread).

Now all they need to do is provide transparency on *all* the commitments they've made + when they are choosing to abandon any.

05.03.2025 04:56 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Most surprisingly, there is now no record of the former commitments on Anthropic's transparency center, a web resource they launched to track their compliance with voluntary commitments and which they describe as "raising the bar on transparency."

05.03.2025 04:56 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

In fact, post-election, multiple tech companies confirmed their commitments hadn't changed.

Perhaps they understood that the commitments were not contingent on whatever way the political winds blow, but made to the public at large.

fedscoop.com/voluntary-ai...

05.03.2025 04:56 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

While there is a new administration in office, nothing in the commitments suggested that the promise was (1) time-bound or (2) contingent on the party affiliation of the sitting president.

05.03.2025 04:56 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
FACT SHEET: Biden-Harris Administration Secures Voluntary Commitments from Leading Artificial Intelligence Companies to Manage the Risks Posed by AI | The White House Voluntary commitments – underscoring safety, security, and trust – mark a critical step toward developing responsible AIBiden-Harris Administration will

The White House Voluntary Commitments, made in 2023, were a pledge to conduct pre-deployment testing, share information on AI risk management frameworks, invest in cybersecurity, implement bug bounties, and publicly report capabilities and limitations.

bidenwhitehouse.archives.gov/briefing-roo...

05.03.2025 04:56 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Company: @anthropic.com
Date: February 27th, 2025
Change: Removed "White House's Voluntary Commitments for Safe, Secure, and Trustworthy AI," seemingly without a trace, from their webpage "Transparency Hub" (formerly "Tracking Voluntary Commitments")

Some thoughts in 🧡

05.03.2025 04:56 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Preview
Seoul Commitment Tracker Tracking the progress of voluntary commitments made at the 2023 AI Safety Summit in Seoul, South Korea

To view all of the policies released by AI companies, and their scorecards, check out our full report at

www.seoul-tracker.org

14.02.2025 23:17 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Company: xAI
Date: February 10, 2025
Change: Released Risk Management Framework draft
URL: x.ai/documents/20...

xAI's policy is stronger than others in terms of using specific benchmarks, but lacks threshold details, and provides no mitigations.

14.02.2025 23:17 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Company: Amazon
Date: February 10, 2025
Change: Released their Frontier Model Safety Framework
URL: amazon.science/publications...

Like Microsoft, Amazon's policy also goes through the motions while setting vague thresholds that aren't clearly connected to specific mitigations

14.02.2025 23:17 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Company: Microsoft
Date: February 8, 2025
Change: Released their Frontier Governance Framework
URL: cdn-dynmedia-1.microsoft.com/is/content/m...

Microsoft's policy is an admirable effort, but as with others, needs further specification. Mitigations should also be connected to specific thresholds

14.02.2025 23:17 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Company: Cohere
Date: February 7th, 2025
Change: Released their "Secure AI Frontier Model Framework"
URL: cohere.com/security/the...

Cohere's framework mostly neglects the most important risks. Like G42, they are not developing frontier models, which makes this more understandable.

14.02.2025 23:17 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Company: G42
Date: February 6, 2025
Change: Released their "Frontier AI Framework"
URL: g42.ai/application/...

G42's policy is surprisingly strong for a non-frontier lab. It's biggest issues are a lack of specificity and not defining future thresholds for catastrophic risks.

14.02.2025 23:17 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Company: Google
Date: February 4, 2025
Change: Released v2 of their Frontier Safety Framework
URL: deepmind.google/discover/blo...

v2 of the framework improves Google's policy in some areas while weakening it in others, most notably no longer promising to adhere to it if others are not.

14.02.2025 23:17 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Company: Meta
Date: February 3rd, 2025
Change: Released their "Frontier AI Framework"
URL: ai.meta.com/static-resou...

Meta's policy includes risk thresholds and a commitment to pause development, but is severely weakened by caveats, loopholes, and no mitigations named

14.02.2025 23:17 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Over the past few weeks, a number of AI companies have released safety frameworks as promised at the Seoul AI Safety Summit.

Many others have not.

Most that do exist are weak or missing key components.

A 🧡 of the recently released policies and evaluation from
@themidasproject.bsky.social

14.02.2025 23:17 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Google Lifts a Ban on Using Its AI for Weapons and Surveillance Google published principles in 2018 barring its AI technology from being used for sensitive purposes. Weeks into President Donald Trump’s second term, those guidelines are being overhauled.

H/T to @wired.com for first reporting on the change earlier today
www.wired.com/story/google...

04.02.2025 23:12 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Google rushed to sell AI tools to Israel’s military after Hamas attack The company fulfilled requests from Israel’s military for more access to AI tools as it sought to compete with Amazon, documents obtained by The Post show.

This change comes weeks after a Washington Post investigation found that Google was marketing its AI services to the Israeli military.
www.washingtonpost.com/technology/2...

04.02.2025 23:12 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Company: Google
Date: February 4, 2024
Change: Removed a ban on using AI technology for warfare and surveillance
URL: ai.google/responsibili...

04.02.2025 23:12 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

@safetychanges is following 1 prominent accounts