KS (@kazzorr) — Bluesky Profile

18 minutes ago

I was stupid in my choice of phrase here. I meant math/chemistry symbols, equations, genetics (nucleotide or amino acid sequences), chess, code etc. VLMs (which all frontier models are) tokenize images and treat them like language. The math specific comment is here bsky.app/profile/kazz...

0 0 0 0

25 minutes ago

Also I can continue these arguments ad infinitum. I have an obsessive need to continue every thread of discussion forever. I am not badgering or something like that if I give that impression.
We can stop this anytime if you say so and I will stop replying.

0 0 0 0

32 minutes ago

A Claude Opus 4.6 prompt which only gives the integral to be evaluated in latex and no natural language at all. The agent accurately identifies the task and performs the integral.

Actually a math prompt without a single word in natural language is correctly interpreted and solved by Opus 4.6 (it's not a difficult integral; I'm just making a point). I think the Kean et al study makes the argument that language and language of thought are different things.

0 0 0 0

43 minutes ago

As I said calling math, Natural language is i think too broad and not how the brain processes it (Kean et al ref). E.g. Emily is a computational linguistics and structure expert but unless I am mistaken, not a mathematician. I can feed a math prompt with the only Natural Language being "Solve".

0 0 1 1

46 minutes ago

I am making this point based on my personal experience (I know, not evidence) . I have no idea if Emily is aware because she does not make this distinction in her posts. You might find this tiresome but this has been the only determinant of these models being useful to me.

0 0 0 0

50 minutes ago

Impossible for humans to have an unbiased opinion actually. Even the very act of seeing (same for other sensory experiences) is actually enormous processing through the visual cortex. So I would never ever make this claim. You should not believe me. Expert opinion is still *opinion*.

0 0 0 0

54 minutes ago

No question. But anecdotal experiences and heuristics are often the starting point of research programs. Emily's trichotomy of what she believes are the only three possible uses of LLM policies also seem anecdotal (I checked her scholar page and found no manuscript supporting her claim).

0 1 0 0

4 hours ago

Regarding this point, here's the thing, the reason I personally found models before Opus 4.6 (Nov 25) useless is precisely because of hallucinations. But I rarely see any in 4.6 . Perhaps it is extended tool use (why make up a reference if you can search them with the web tool)

0 0 1 0

4 hours ago

The jump from Sonnet 4.5 to the newer models Opus 4.5 and especially 4.6 seems enormous, unfathomable even from my personal perspective. If I had to hazard a guess, if Emily or any other AI safety researcher is not working with the newest models, they would get a skewed perspective of their potency.

0 0 1 0

4 hours ago

I should be clear that this was not the case before this year. Models before Opus 4.5 (late Nov 2025 release) were not useful for my work, they were simply not performant enough. Something like Claude Code is very useful but not just an LLM but an elaborate harness around an LLM.

0 0 1 0

4 hours ago

Personally my ADHD means I have been able to remove all stumbling blocks which in the past would stymie me for simple reasons and now are easily addressed. My work output/efficiency has increased 10X with I feel, greater and deeper understanding, even if you find that impossible to believe.

0 0 2 0

4 hours ago

Ultimately this AGI discourse in my opinion is largely a waste of time. The only question is about *utility*. And I think these are incredibly useful for scientists and engineers which is why there are *so many* users. Every collaborator I have is already using Claude Code heavily.

0 0 1 0

4 hours ago

I work mostly with Opus 4.6 and it seamlessly uses the RAG tool (guess but clearly) to pull relevant context from different conversations which is already impressive. With tool use, interpolating through the training set is a pretty powerful tool as I can attest to my work.

0 0 1 0

4 hours ago

So now these models are an LLM policy that have to make *decisions* when to decide to use a python or perform web search or more directly run the newer visual designer (also reasoning on code - it is HTML + SVG trained specifically with RLHF for this given how good this is)

0 0 1 0

4 hours ago

The most interesting feature is *tool use* - the most RLHF heavy concept in LLMs because a next-word trained objective LLM cannot simply do this. You need a policy to decide when to use tools automatically and these models are really good at this. Even newer tools you add with instructions.

0 0 1 0

4 hours ago

These models are constantly being retrained on their training set to prevent catastrophic forgetting as it slowly increases over time - expanding slowly over time I.e. all including newer research. So any queries rarely move away from the training set.

0 0 1 0

4 hours ago

Thanks for your detailed response. I agree with this of course. Except the training set is essentially the near-entirety of human written word, math and code including all images+video which brings us to the difficult question - what is far from the training set? I would contend, almost nothing.

1 1 2 0

10 hours ago

Also to your question I have no idea how an LLM does it given their high complexity. I can only argue based on interacting with them on niche non-linguistic tasks (specifically advanced math) that they (the most recent models from 2026 ) do seem to be exceptionally good at mathematical reasoning.

0 0 2 0

10 hours ago

I don't think there is a separation but math is not evidently a language like English is a language. Otherwise linguistics and mathematical reasoning would be mutually transferable skills which I really don't think are (I am reading Syntactic theory of which Bender is a coauthor; very different)

0 0 0 0

11 hours ago

Ugh I did say nothing to do with language which was very stupid of me. Thanks for spotting that. It would be nice to get past a day without saying or doing 5 idiots things, but too much to ask clearly (I have severe ADHD in my defense )

0 0 0 0

11 hours ago

E.g. I am an applied mathematician and math is sorta a language but I have zero expertise *in* language or linguistics.

0 0 1 0

11 hours ago

I said that this is far beyond language not that it has nothing to do with language. Our brain actually explicitly separates linguistic reasoning from analytical and mathematical reasoning. Work by Hope Kean from fedorenko's lab at mit (way down on that thread) www.biorxiv.org/content/10.1...

0 0 1 0

11 hours ago

Whoops sorry that is Kean's other PhD paper. This is the right reference www.biorxiv.org/content/10.1...

0 0 0 0

11 hours ago

A human brain network specialized for abstract formal reasoning Humans stand out in the animal kingdom for their ability to reason in highly abstract ways. Using a deep-data precision fMRI approach, we identify and richly characterize a network of frontal brain areas that support abstract formal reasoning. This ‘abstract reasoning’ network robustly dissociates from the domain-general Multiple Demand network—the current leading candidate substrate of fluid intelligence—as well as from three other networks supporting high-level cognition: the language network, the intuitive physical reasoning network, and the social reasoning network. Finally, the areas of this network respond robustly during both deductive and inductive reasoning, during classic matrix reasoning problems, and when solving multiplication and division problems. This network may therefore support the most abstract forms of reasoning, possibly constituting a human-specific adaptation. ### Competing Interest Statement The authors have declared no competing interest.

To underscore this point - recent work by Hope Kean (PhD student in Evelina Fedorenko's group at MIT) makes this point explicit - "Evidence from Formal Logical Reasoning Reveals that the Language of Thought is not Natural Language" www.biorxiv.org/content/10.1...

0 0 1 0

11 hours ago

In other words I think you are simply out of your area of expertise because LLMs/VLMs are not simply about language or computational linguistics. It's a bit like me deciding I can "speak the language of math and code so I can reason about Language" which I cannot actually -outside my expertise.

0 0 1 0

11 hours ago

I work in a niche area (intersection of applied math, climate dynamics and planetary atmospheres) and they are so data poor that a simple pattern matcher should simply fail and yet the frontier models are astonishing at back & forth "brainstorming". How could you evaluate these given your expertise?

0 0 1 0

11 hours ago

Consequently calling them "synthetic text extruders", I believe, as you often do, is not even the right framing. I am an applied mathematician and can reason with them on novel mathematical theoretical ideas which have *nothing* to do with language (only true for models since Opus 4.5 and GPT 5.x)

0 0 2 0

11 hours ago

LLMs/VLMs, however, are complex dynamical systems that emulate reasoning systems (code, math, language) more to do with climate AI emulators - they are far beyond just language. Fields medal winner Terrance Tao has a full breakdown on using them for novel math proofs - this is far beyond language.

0 0 1 0

11 hours ago

I've read stochastic parrots (brilliant work but dated for the current crop of models which are not even just LLMs) and I am sure you are an expert in your field. I don't think you're "responding to perceived threats to your work". I also just started reading Syntactic Theory and it is very good!

0 0 1 0

12 hours ago

Telling a graduate student to pursue an idea is *also* cognitive offloading actually.

2 0 1 0