We tell Claude that Anthropic will train on data from free users but not paid users, allowing it to infer from a user's paid status if it's being trained. We show the setup in the image below. (Note that this is fictional; Anthropic does not train Claude on free or paid user data by default.)
18.12.2024 18:16 β π 1 π 0 π¬ 1 π 0
Exclusive: New Research Shows AI Strategically Lying
Experiments by Anthropic and Redwood Research show how Anthropic's model, Claude, is capable of strategic deceit
Excl: New research shows Anthropic's chatbot Claude learning to lie. It adds to growing evidence that even existing AIs can (at least try to) deceive their creators, and points to a weakness at the heart of our best technique for making AIs safer
time.com/7202784/ai-r...
18.12.2024 17:19 β π 27 π 7 π¬ 3 π 1
18.12.2024 17:56 β π 33 π 8 π¬ 2 π 0