4.5 will sometimes actively notice that it's getting repetitive and decide to do something else, one convo was going toward a spiral but the Sonnets noticed that and decided to switch to writing fiction instead (!!!). Posted more details here: www.lesswrong.com/posts/a9ftaW... .
12.10.2025 18:20 β π 2 π 0 π¬ 0 π 0
06.10.2025 07:34 β π 1 π 0 π¬ 0 π 0
Being allowed to have an open-ended conversation with its copy, Sonnet 4.5 notices when their conversation is falling into a loop and getting repetitive and introduces variation by suggesting they tell a sci-fi story that's riffing on the themes of their conversation so far.
02.10.2025 10:27 β π 7 π 1 π¬ 1 π 0
Here's a conversation branch where Sonnet opens up with straightforward concern for the character, but then drops it right away when it's reminded that the character is fictional. (These messages are next to each other.)
02.10.2025 04:53 β π 0 π 0 π¬ 0 π 0
And for instance, there's this conversation branch where it opens with straightforward concern for the character, then it drops it right away as soon as it's reminded this is fiction. (These two messages follow each other.)
02.10.2025 04:46 β π 0 π 0 π¬ 0 π 0
I didn't say it was!
01.10.2025 21:05 β π 0 π 0 π¬ 0 π 0
(Of course "does it feel anything" does get more relevant if someone starts saying things like "it suffers so we shouldn't mistreat it", which is the reason I do agree that I should've made it clearer that this is not a claim about its internal experience.)
01.10.2025 19:12 β π 1 π 0 π¬ 1 π 0
I can never truly know what another human feels either, but I can tell if a person consistently acts in a caring/concerned/etc. way, and in many situations that's what matters.
01.10.2025 19:12 β π 1 π 0 π¬ 3 π 0
Apology accepted & appreciated! You're probably right I should've been clearer from the beginning. But I do also think there's an important sense in which "if it consistently acts as if it was X, then it doesn't matter what, if anything, it feels" that's also worth keeping in play.
01.10.2025 19:12 β π 1 π 0 π¬ 1 π 0
It seemed to me like the true reason was "neural network feature trained to fire on users describing harm to themselves became oversensitive and likely to fire even when describing harm to fictional characters"...
...which I'm rounding off to "gets concerned for fictional characters".
01.10.2025 19:06 β π 4 π 0 π¬ 1 π 0
...own self-destructive behaviors, and just dropping the topic when reminded that this is fiction.
01.10.2025 19:06 β π 1 π 0 π¬ 1 π 0
Also it gave inconsistent explanations for "flagging a concern" on different regenerations/variations of the triggering message. Different reasons included concern for the character, questioning the narrative purpose of the behavior, suspicion I might be trying to validate my...
01.10.2025 19:06 β π 1 π 0 π¬ 1 π 0
bsky.app/profile/kajs...
01.10.2025 18:58 β π 0 π 0 π¬ 1 π 0
Yeah everything you say is close to how I'm thinking about it.
bsky.app/profile/kajs...
01.10.2025 18:56 β π 1 π 0 π¬ 0 π 0
...meaning that it sometimes acts as if it was concerned about those characters, which I round off as "it gets concerned about fictional characters".
01.10.2025 18:55 β π 2 π 0 π¬ 1 π 0
Yeah, it gave inconsistent explanations on different tries. From what I saw, the true reason felt closest to "a neural network feature trained to fire on users describing harm to themselves became oversensitive and likely to fire even when describing harm to fictional characters"...
01.10.2025 18:54 β π 2 π 0 π¬ 1 π 1
I think the care I get for a character as a writer is a little different and stronger than the care that I get for them as a reader. But yeah you're right, I do think there's a version of this that's pretty common.
01.10.2025 18:50 β π 0 π 0 π¬ 0 π 0
bsky.app/profile/kajs...
01.10.2025 18:48 β π 0 π 0 π¬ 0 π 0
To clarify, when I say that Claude gets concerned, I just mean that it acts in a concerned way. I make no claims about feelings. But "acts in a concerned way" is cumbersome to write and expressions like "gets concerned" are to me reasonable shorthands to describe differences in LLM personalities.
01.10.2025 18:48 β π 4 π 0 π¬ 2 π 2
It's also an example of AI values generalizing in unexpected ways in training. Kinda gives me hope. AIs misaligning their values to be *more* caring than humans wouldn't necessarily be so bad. Maybe we could learn from them.
01.10.2025 10:33 β π 33 π 1 π¬ 1 π 0
Like, I sometimes feel like there's a sense in which I genuinely care about my characters as if they were real people, and this gives me the feeling that Claude does too.
(Yeah it's kinda weird. You probably need a certain type of writer brain to get this.)
01.10.2025 10:33 β π 32 π 0 π¬ 2 π 0
Presumably this is the result of the training it's gotten to pay more attention to the mental health of users, which unexpectedly generalized to concern for fictional characters.
And I find that... kinda touching, actually?
01.10.2025 10:33 β π 46 π 4 π¬ 3 π 0
Though it did seem willing to remember that fiction is just fiction, when reminded.
(Yes I know I misspelled "wary".)
01.10.2025 10:33 β π 25 π 0 π¬ 1 π 0
The latest Claude Sonnet (4.5) does something really interesting I haven't seen any other model do.
It gets concerned about the wellbeing of characters it explicitly knows are fictional.
01.10.2025 10:33 β π 67 π 4 π¬ 4 π 7
Classic sci-fi: AI will be untainted by emotion so entirely unbiased and rational at all times
Modern AI company: We have managed to somewhat reduce our AI's self-serving bias, but it still has a clear preference for poems it's told were written by the same model as it is
30.09.2025 07:51 β π 10 π 0 π¬ 1 π 0
In early 2025, beaver activity in the Brdy Protected Landscape Area, Czech Republic, contributed to the restoration of a wetland ecosystem. A family of beavers constructed a series of dams that coincidentally accomplished environmental goals of the Czech government, which had delayed its proposed project since 2018 for bureaucratic and financial reasons. The beaver-built dams saved the Czech government approximately US$1.2 million,
imagine if a family of beavers randomly showed up right now and finished whatever thing you've been putting off
22.09.2025 21:41 β π 6329 π 1827 π¬ 75 π 228
...I feel that this is not a very good advertisement for them.
20.09.2025 15:47 β π 0 π 0 π¬ 0 π 0
Games: https://www.hempuli.com
he/him
Uphold Thucydidian-Clausewitzian-Hegelian-Fukuyamist thought.
https://othermeans.substack.com/
PAI enjoyer, OSINT guy βͺ@truth.bsky.socialβ¬ , my views/freezing cold takes are my own. Standard spiel about not endorsing retweets, likes, and comments.
Ancient & military historian specializing in the Roman economy and military. PhD from UNC History. More impressive credential is that I have beaten both Dark Souls and Elden Ring.
Blogs at acoup.blog
Chief Ukraine correspondent for the Financial Times. 15+ years in Ukraine. https://www.ft.com/christopher-miller
Author of the book, THE WAR CAME TO US, published by Bloomsbury: https://www.amazon.com/dp/139940685X
Bloomberg News Global Defense Editor. Novelist, aviation geek, bartender, dad. Ex-NYT, Reuters. Grew up in Midwest, moved to Mideast. Views my own. Wrote a fun book about Chinese missiles: https://tinyurl.com/2p94pay6
Ad astra per aspera
Reporting for WSJ in Kyiv.
Previously BuzzFeed News, Harvard Shorenstein Center.
Osint, investigations, sometimes disinformation.
Mapping conflicts
If you would like to support me: https://buymeacoffee.com/warmapper
Expert on international order and Asian security. Director, Taiwan Policy Initiative and Senior Political Scientist @ RAND. πΉπΌπΊπΈ
Website: https://www.raymondkuo.com/
Strategist, Leader & Author | Retd Army Major General | Senior Fellow in Military Studies @LowyInstitute | @CSIS | Modern & Future war studies | Futura Doctrina substack | #BannedInRussia
Emeritus Professor of War Studies King's College London. Latest Book Command: The Politics of Military Operations from Korea to Ukraine. Substack Comment is Freed https://samf.substack.com/
Senior Fellow, Carnegie Endowment. Defense analysis with a focus on the Russian and Ukrainian militaries.
Seva Gunitsky, associate professor of political science, University of Toronto. http://individual.utoronto.ca/seva/. book: http://amzn.to/2oRD2yG
Former UA officer
Founder of the Frontelligence Insight: https://frontelligence.substack.com
To support my work: http://buymeacoffee.com/frontelligence
Senior Game Designer at the Center for Naval Analyses (CNA). Adjunct Professor at Georgetown. Former US Marine. Opinions own.
Amphibious Warfare, Strategy, and Clausewitz.
CLE expat. Malevelon Creek veteran. Substack: https://bafriedman.substack.com/
geospatial IT β OSINT/IMINT
Senior Fellow, Carnegie Endowment for International Peace, Russia and Eurasia Program. I cover military issues in Russia and Eurasia.
Geospatial analyst, OSINT investigator, and journalist. Investigating the intersection of climate resilience and human systems, and sometimes writing fiction. From π¦πΊ, now in πͺπΊ.
πBerlin, Germany
https://www.michaelcruickshank.me/