David was such a wonderful contestant :)
04.03.2026 02:32 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0David was such a wonderful contestant :)
04.03.2026 02:32 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0But I mean, continue to enjoy your poorly-conched vomit-chocolate :)
04.03.2026 01:55 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
"As a loyal American I will continue to remind all that we make the most and the best chocolate in the world."
www.youtube.com/shorts/7sbbd...
Claudeโs Cycles Don Knuth, Stanford Computer Science Department (28 February 2026; revised 02 March 2026) Shock! Shock! I learned yesterday that an open problem Iโd been working on for several weeks had just been solved by Claude Opus 4.6 โ Anthropicโs hybrid reasoning model that had been released three weeks earlier! It seems that Iโll have to revise my opinions about โgenerative AIโ one of these days. What a joy it is to learn not only that my conjecture has a nice solution but also to celebrate this dramatic advance in automatic deduction and creative problem solving. Iโll try to tell the story briefly in this note.
YOOOOOOO fucking KNUTH dropped a lil note on a problem of his being solved w/claude y'alllllll
03.03.2026 22:58 โ ๐ 141 ๐ 40 ๐ฌ 2 ๐ 8False.
03.03.2026 20:51 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0Neat, thanks!
03.03.2026 18:10 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0There's a list there somewhere ? I mean, you could browse through every task, but that would take ages.
03.03.2026 17:25 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0ChatGPT cancellations were not for nothing. Also, imo you should keep it cancelled, as a signal that itโs not just one thing
03.03.2026 16:25 โ ๐ 47 ๐ 4 ๐ฌ 2 ๐ 1
ah, OpenAI is entirely stopping DoW deployment for now
that was not clear to me from samaโs post. also, iโm very glad to see Noam getting directly involved in policy. i realize heโs just a researcher, but itโs great to have important people deeply invested in this
If there's anything that warrants a bug report about our species, it's "tens of thousands of people burning themselves to death rather than have to hear the liturgy read from a book that spells Jesus's name with an extra 'i' ".
03.03.2026 16:39 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0(You can test how good a model is at these sorts of things with "needle in a haystack" challenges - you insert a "needle" (some random piece of info or unexpected sentence) in a huge work, and then ask them about the needle)
03.03.2026 16:16 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0a couple early references to the hero secretly wishing he was a pickle, and then at the end of the book add a scene where someone shows up with a jar of pickles, ask it to write the next paragraph, and it'll have him enviously wishing he was among them. It's not at all just "recent words/sentences".
03.03.2026 16:16 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0"... piece of work" - no, they don't actually do that. That's not how LLMs work, and in fact, the fact that they don't do that is *the key differentiating characteristic* of LLMs vs. earlier models. Modern models are so good that you could input an entire fantasy book and scatter in...
03.03.2026 16:16 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
sustain their claims about LLM vs. human writing in reality. And I must stress "blind" because as soon as you tell someone something was written by an AI (or they even suspect it), it deeply colors their rating of it. And to reiterate:
" LLMs write every sentence/paragraph as if it was its own..."
... to be dominated by a recency-bias in the content they were processing, being unable to process a whole long context at once (imagine trying to think about every part of a book at the same time!).
As for the linked Twitter thread, I'd bet on 10 to 1 odds that a blind comparison wouldn't...
Pure LLMs are limited by the lack of a "scratchpad" for internal reasoning traces, but LRMs (most models today) have them, and unsurprisingly, perform much better.
It is the attention mechanism that led to the big leap forward in AI performance. Earlier models without attention mechanisms tended...
If a LLM is writing a book, it's not looking at just the last few words, it has the entire book it has written thusfar in its context, and the attention mechanism gives it access to the whole thing. It very much does have the "larger purpose" on-hand. And they very much plan ahead.
03.03.2026 16:16 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
As for some specifics in your essay, I have some objections.
"But this is precisely how LLMs are trained and evaluated: on next-token prediction, the ability to produce a plausible-sounding sentence regardless of a larger purpose."
This pretends LLMs don't have a context.
Perhaps surprisingly, LLMs have the same weaknesses we do in this regard. Like us, they're not good at just "being random". Randomness is faked, it's thought-out. While examining their their hidden states could allow for determining what's truly random/unexpected, they - like us - can't just do that
03.03.2026 16:16 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
Because ultimately, it's prediction error that forms the basis for learning. Your brain constantly predicts what every sense will experience. Your ear will sense pitch X, because they'll say word Y, because they'll talk about topic Z, etc.
No error? No learning.
From this basis, it's only natural that we'd enjoy flowery language and novel metaphor. It's out of the ordinary enough to not bore. It's not so out of the ordinary that we get confused and frustrated.
It's just a side effect of being beings evolutionarily-tuned to want to learn new things.
It's to the point that people get good at predicting comedy, you can invert it and cause the break in their predictions by *not* throwing in what's expected. Monty Python was famous for this - you set them up for a joke, the audience sees the joke coming, then you instead don't deliver the joke.
03.03.2026 16:16 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0Much of comedy (some argue all) is driven by this at its base. You set the person on a path where all their predictions are matching up and then suddenly reveal that they were led astray early on and have to backtrack. "My friend recently got a hair extension, so now her house looks weird."
03.03.2026 16:16 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
Honestly, I think both the essay and this miss the underlying mechanisms.
We, as humans, enjoy our expectations being broken by just the right amount. Our hunger to learn is a balance between "everything matching our predictions perfectly" (boredom) and "too much not matching" (confusion).
Not very well converted trucks... I guess they're presuming they're being watched with low resolution cameras...
03.03.2026 15:30 โ ๐ 3 ๐ 0 ๐ฌ 1 ๐ 0bsky.app/profile/nafn...
03.03.2026 14:55 โ ๐ 3 ๐ 1 ๐ฌ 0 ๐ 0Alex has a Whatsapp chat for the different versions to chat, but it doesn't work by Alex dictating to them what to do. Rather, he offers support as needed, but otherwise lets them be creative, and only "filters" things he finds against the spirit of the show.
03.03.2026 14:30 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0