Jonathan Weisberg's Avatar

Jonathan Weisberg

@jweisber.bsky.social

philosophy prof posting in an uncomfortably personal capacity

4,097 Followers  |  477 Following  |  1,369 Posts  |  Joined: 04.07.2023  |  2.5817

Latest posts by jweisber.bsky.social on Bluesky

Just 38%

18.02.2026 01:37 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
17.02.2026 04:02 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Well it is if conditionalization is true, which Bayesians think it is

16.02.2026 23:23 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Been learning from Joe's work since grad school, got to meet at a conference once and he seemed like a total sweetie, rip

16.02.2026 04:25 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Maybe, or maybe it teaches us something about the transformer architecture specifically? RNNs don't get the same result with the same training, after all. I think maybe it's just really hard to know what to make of all this, given how large and inscrutable these models are?

15.02.2026 22:56 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Both are wrong?

15.02.2026 20:55 β€” πŸ‘ 9    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

Again: disappointing to see people who ought to know better exaggerating when the truth is already damning enough

15.02.2026 20:39 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

4 months old but somebody just reposted it into my feed so here we are

15.02.2026 20:35 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

The usual stellar science reporting here

15.02.2026 20:34 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Well, I shoulda said "internal representation *that correlates with* species", but thanks for being generous πŸ˜…

15.02.2026 18:20 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Honestly, I still find it kinda magical that a simple, 4-neuron NN trained on the Palmer penguins data can develop an internal representation of species on its way to classifying by sex

15.02.2026 15:27 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I think this isn't surprising if you think about it a bit, or if you come at it from the right angle. But looking around sites like this, it sure seems like something a lot of outsiders (and even some insiders) don't take to naturally

15.02.2026 15:21 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

categorically asserting it doesn't represent at all, that it's only ever right by happenstance, etc., either.

15.02.2026 15:17 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I don't think it's by any means a slam dunk caseβ€”when I say we have "a decent amount of evidence", I don't mean enough that you can categorically assert e.g. that Claude represents this or that particular concept. But it's enough that you can't be out here...

15.02.2026 15:17 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Built different ig

15.02.2026 11:03 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Yea, I guess it's counterintuitive that training on next-token prediction can induce representational structure inside the model, so people just overlook the possibility. It's a victim of its own interestingness!

15.02.2026 10:35 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Kinda resent that this is becoming my unofficial BlueSky beatβ€”man, I have my own shit to work on (1/3, 1/6, 1/6, 1/3)

15.02.2026 10:03 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

In fact I have read it (I've even used it to build transformers from scratch), but I've also read neuroscience textbooks and I still don't understand how human minds work. To give just one example: it's a mystery to me how you think it's ok or normal to post these kinds of replies

15.02.2026 09:57 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

could be polysemy or metaphor or something like that. It certainly doesn't understand it the way a human does. And even if it did, larger models like LLMs don't grok their domains anyway, so how much they "understand" is unclear imo

15.02.2026 09:49 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Yes, the probing evidence for representation is more direct and stronger than the case for understanding. Although the reverse-engineering evidence is quite striking re understanding. I would say the transformer trained to do modular arithmetic "understands" the problem it has solved, but this...

15.02.2026 09:46 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I guess something I should start adding into this little stump speech is an acknowledgment of how surprising/counterintuitive it is that training a next-token predictor on gobs of human text can induce any kind of human-comprehensible internal representation at all. It's weird! LLMs are weird.

15.02.2026 09:42 β€” πŸ‘ 8    πŸ” 0    πŸ’¬ 3    πŸ“Œ 0

No I've never heard of this paper, has it had much impact on the field?

15.02.2026 09:27 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

surveyed in this paper (arxiv.org/abs/2405.21030). In my view the authors overstate the case for LLMs having genuine thoughts/beliefs. But the data they survey, and the philosophical framing they give it, are a useful corrective for the bsky tendency to overstate the stochastic parrots view.

15.02.2026 09:23 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Reverse-engineering very small transformers has shown that, when they "grok", they do so by developing genuine representations of the phenomenon they're modelling. And probing larger models finds evidence of representations that roughly correspond to human concepts. Some of these findings are...

15.02.2026 09:19 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

about how little we understand those inner workings. You can even be honest about what a remarkable technological achievement they are; being hyperbolic/dishonest is likely dangerous in its own way.

15.02.2026 09:09 β€” πŸ‘ 11    πŸ” 1    πŸ’¬ 2    πŸ“Œ 0

The critique can just be: these things aren't safe or reliable. You can also tack on that their training is destructive, and their deployment is socially and economically dangerous and unjust. You don't have to overstate our understanding of how the internals work for that. You can just be honest...

15.02.2026 09:08 β€” πŸ‘ 11    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

do that to the extentβ€”or in the mannerβ€”it would need to for applications like the OP's to be safe or reliable. And even if LLMs were purely superficial pattern-matchers, saying that they only ever get things right "by fucking happenstance" would be like saying a sundial is only right twice a day

15.02.2026 08:57 β€” πŸ‘ 9    πŸ” 1    πŸ’¬ 2    πŸ“Œ 0

While it's certainly true that the way we train LLMs doesn't *have* to induce genuine understanding or representation of reality in a model, we also have a decent amount of evidence by now that it *does* do just that in transformer models, to a certain extent. The problem is that it doesn't...

15.02.2026 08:49 β€” πŸ‘ 11    πŸ” 1    πŸ’¬ 2    πŸ“Œ 1

peak BlueSky tendency

14.02.2026 22:51 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

she very clearly says "him" not "me" why do people feel the need to exaggerate like this when the truth is already damning

14.02.2026 22:46 β€” πŸ‘ 9    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1

@jweisber is following 20 prominent accounts