On the Biology of a Large Language Model
We investigate the internal mechanisms used by Claude 3.5 Haiku β Anthropic's lightweight production model β in a variety of contexts, using our circuit tracing methodology.
Somehow every BlueSky poster knows how LLMs work, meanwhile Anthropic researchers are releasing 35k-word papers meticulously analyzing the internals and still concluding that they don't really know how they work.
transformer-circuits.pub/2025/attribu...
08.08.2025 14:34 β π 82 π 7 π¬ 5 π 3
Tricking LLMs with the "counting letters" prompt is like showing humans an optical illusion and then, when the human perceives it incorrectly, using it as evidence that humans aren't intelligent. It targets a specific blind spot in how we operate but isn't really representative of anything else.
08.08.2025 12:42 β π 4 π 0 π¬ 0 π 1
This is a failure of the new GPT-5 router more than anything. We know reasoning models can correctly answer this question 100% of the time, the router just isn't sophisticated enough to understand that this question, while superficially simple, actually requires reasoning. It's a fixable problem.
08.08.2025 12:36 β π 0 π 0 π¬ 0 π 0
Ok, apparently they considered that and decided (correctly) that wasting effort on this narrow and manufactured problem wasnβt worth it. Gotta just accept the online dunks from people that know enough to trick the LLM but not enough to understand why the trick works. bsky.app/profile/schm...
08.08.2025 12:16 β π 2 π 0 π¬ 0 π 0
OpenAI should just automatically enable thinking for this dumb question that only exists trick LLMs.
08.08.2025 12:12 β π 1 π 0 π¬ 1 π 0
I like ChatGPT.
25.07.2025 15:19 β π 1 π 0 π¬ 1 π 0
I love that this hypothetical guy immediately made a terrible financial decision on his rent payments. I agree with you, this guy's gonna have a hard time.
01.07.2025 16:51 β π 2 π 0 π¬ 1 π 0
I dislike Scott Adams as a person, his opinions, etc. but I did watch his announcement (the first thing of his I've ever seen) and he was pretty clear that he tried it in the course of leaving no stone unturned. He said he & his dr didn't think it would work, but there were no downsides, so why not.
20.05.2025 01:39 β π 2 π 0 π¬ 1 π 0
This is awesome.
18.05.2025 21:10 β π 8 π 0 π¬ 0 π 0
On the Biology of a Large Language Model
We investigate the internal mechanisms used by Claude 3.5 Haiku β Anthropic's lightweight production model β in a variety of contexts, using our circuit tracing methodology.
By "inside" I mean the billions (trillions?) of parameters, activations, attention patterns etc. that are poked and prodded in interpretability studies. No one fully understands how those things work together to produce the model outputs. transformer-circuits.pub/2025/attribu...
14.05.2025 02:40 β π 4 π 0 π¬ 1 π 0
I feel like we don't have a perfect understanding of what happens inside LLMs and we also don't have a perfect definition of what thinking means, so I guess I am less confident about this than you are.
14.05.2025 00:34 β π 3 π 0 π¬ 6 π 0
I get the argument that this ruling is potentially beneficial, but I think the idea of every app now having an Apple price and a non-Apple price, with different payment flows for each, is ultimately going to end up as net-negative for everyone (users, devs, Apple).
01.05.2025 02:00 β π 3 π 0 π¬ 0 π 0
This is like if you had a human assistant named Steve, and you said, "Steve, can you write an email in my voice," and then Steve did it, and you got furious at Steve for impersonating you.
23.04.2025 05:50 β π 1 π 0 π¬ 0 π 0
Does it still count as impersonation when she asked ChatGPT to impersonate her?
23.04.2025 05:25 β π 0 π 0 π¬ 0 π 0
A lot of angry and upset people in this thread, but almost no one seems to understand the specifics of what they're angry about. This reporter asked ChatGPT to write about something in her own voice, and it did (privately, just to her). WaPo has absolutely nothing to do with this.
23.04.2025 05:15 β π 10 π 1 π¬ 0 π 1
I just want to note that you can give ChatGPT a prompt with literally any name -- real names, fake names, silly names, serious names -- and it will do the exact same thing. Here's an excerpt in the style of extremely not-real WaPo reporter Barnabas Flimflamington. This outrage over this is silly.
23.04.2025 05:02 β π 10 π 0 π¬ 1 π 0
I asked ChatGPT to write a WaPo story in the style of Mike Hearn, and it did, with my name as the byline. I have never written for WaPo (or anywhere). This is what ChatGPT does, because it's essentially what I asked it to do. This whole thread and the various reactions are wild and kind of insane.
23.04.2025 04:48 β π 3 π 0 π¬ 1 π 0
I feel like people are misunderstanding what this is. Sora.com already has a homepage feed with "likes"; once they add following and comments, it's a social app.
16.04.2025 13:02 β π 1 π 0 π¬ 0 π 0
Here are other screenshots that are closer in tone to today's. It's a thing that he does. bsky.app/profile/adis...
15.04.2025 03:16 β π 1 π 0 π¬ 2 π 0
It's awkward to find his third-person tweets because searching "Yglesias" brings up, you know, all his tweets. But if you search "yglesias third person" you can get a litany of people dragging him for using the third-person.
15.04.2025 02:07 β π 16 π 0 π¬ 2 π 2
The timeline of replies to the 3rd-person post is fascinating. It was made 24 hrs ago, so there are a handful of normal replies also made 24 hrs ago from people who understood the post in context, then the screenshot went viral about 6 hours ago, and the rest are just insane from that point on.
15.04.2025 01:55 β π 8 π 0 π¬ 1 π 0
A good trick here is that, on the iPhone, you can hit the power button 5 times in quick succession. It brings up the "Slide to Power Off" screen, disables FaceID/TouchID and requires your full passcode to unlock the phone again.
08.04.2025 01:47 β π 10 π 0 π¬ 0 π 0
Ah didnβt realize that was a thing. Makes sense.
06.04.2025 19:17 β π 0 π 0 π¬ 0 π 0
The lead photo of that post is almost certainly AI, for what itβs worth. I canβt speak to the details of the story itself.
06.04.2025 19:08 β π 10 π 0 π¬ 1 π 0
Isn't the assumed reason they're capitulating because they believe they will lose money (clients) if they are in a fight with the admin?
03.04.2025 13:58 β π 0 π 0 π¬ 0 π 0
I can't recognize it either, because I personally don't see the villain in this. The "or whatever" is the acknowledgement of the open-ended possibilities of this theoretical super intelligence. Curing cancer is the headliner of "problems it might solve" and then there's an infinitely long tail.
26.03.2025 20:28 β π 1 π 0 π¬ 4 π 0
I'm as cynical as the next guy but the idea that he's running some kind of long con to make money using his insider influence is way more far-fetched than just assuming he bought Microsoft because it's Microsoft. He's not trading penny stocks over here, MSFT is top 3 in market cap.
11.03.2025 12:56 β π 1 π 0 π¬ 0 π 0
That's... a weird hallucination. What was your prompt? Did it think you were referring to someone else?
18.02.2025 01:30 β π 0 π 0 π¬ 1 π 0
I think the, uh, political logic, such as it is, is that his vote wouldn't change the result and maybe it gives him some conservative bona fides in a purple state? Manchin obviously did this a lot and it was deeply annoying but it got us a dem senator in WV. Now, PA isn't WV, so, yeah, I dunno.
05.02.2025 02:10 β π 0 π 0 π¬ 0 π 0
Hey there Mr. Blue
We're so pleased to be with you
House guy. DCoS @beyer.house.gov
Priors: Kamala Harris, Jennifer Wexton, et al.
Hendersonville, NC native
SciFi, Rock n Roll, he/him
Posts mine/dumb
Host of Crooked Mediaβs βWhat A Day.β Contributing writer at the New York Times. Previously: host of The Argument. Go Blue means Go the University of Michigan.
Just asking questions.
Senior Editor, Lawfaremedia.org.
Send story ideas and tips to anna.bower@lawfaremedia.org
Signal username: annabower.24
Journalist - @StreamingSmarter.com Previously: Curiosity.com / NBC Chicago / Softonic / RedEye Chicago
Law Professor @AkronLaw | Computer Scientist | 1A π¬ and tech scholar | meme docent π§πΌββοΈ
Priors: Google, Twitter (no, not X), Chamber of Progress
jmiers@uakron.edu
Senior Fellow at the American Immigration Council. Commenting generally on immigration law and policy. Retweets =/= endorsements, views are my own.
economist, π¨π³, policy research. this is for display purposes only. π§Έ contact: robertmarchini1993 at gmail. π³οΈβπ they/them. new jersey resident. https://skymarchini.net/
@ksvesq.bsky.socialβs husband; father of daughters; professor @georgetownlaw.bsky.social; #SCOTUS nerd @CNN.com
Bio: www.law.georgetown.edu/faculty/stephen-i-vladeck
"One First" Supreme Court newsletter: stevevladeck.com
Book: tinyurl.com/shadowdocketpb
SF/F Writer - The Books of the Raksura, The Murderbot Diaries, Witch King, and more. Nebula and Hugo Award winner. NYT and Sunday Times Bestseller. (She/her) Agent: Jennifer Jackson
Dramatist, author, apostate newspaperman specializing in artisanal contempt and discerning maledicta. The claim that I am the angriest man in television is faint praise indeed; the second angriest is yelling at an agent because residual checks are late.
Obama appointee at DOJ/DHS. Litigator for Speaker Pelosi and Jan. 6 Committee. NYC born and bred. Yankee Stadium π vendor. Wannabe historian. Dad joker.
Staff writer for the Atlantic. Messages with links are intended as prompts to read the linked story, not self-contained arguments substituting for the linked story.
All views my own, links posted w/o endorsement
AI, philosophy of science, philosophy of mind, animal cognition
Http://cameronbuckner.net
Minnesota guy.
"This particular activist will not stop." Sen. Chris Murphy
Mostly lurking, in accordance with the robustness principle. In this house we believe distortionary and redistributive issues should be separated, the notion of a "fiduciary" is essential to the future of democracy, and fascism must be destroyed.
βοΈ http://TomTheFinder.com substack and Yahoo Sports | πΊ Portland Trail Blazers analytics insider | π½οΈ Pack Your Knives and Count the Dings | πͺπΌ Hoops4ALS