Moravec paradox: The robot can do a wall flip but will completely fall apart trying to do the basic beginner "six-step" move.
11.10.2025 14:12 β π 1 π 0 π¬ 0 π 0@sir-deenicus.bsky.social
tinkering on intelligence amplification. there are only memories of stories; formed into the right shape, the stories can talk back.
Moravec paradox: The robot can do a wall flip but will completely fall apart trying to do the basic beginner "six-step" move.
11.10.2025 14:12 β π 1 π 0 π¬ 0 π 0An LLM's transformer is a markov kernel that drives an order-m Markov chain whose memory is bounded by the modelβs context window.
11.10.2025 12:58 β π 1 π 0 π¬ 0 π 0One minor note, as observed by @maxine.science in the comments, is that LLMs are glorified markov chains (or more precisely, they try to approximate one) and the interesting thing is a glorified markov chain is quite powerful.
11.10.2025 12:58 β π 3 π 0 π¬ 2 π 0We generally don't want to do this because sampling near the mode of the distribution is sampling from slop central. We're limited by hardware and the ability to score samples. LLMs get huge jumps in performance when used correctly, as the generators of probability distributions that they are.
11.10.2025 12:35 β π 0 π 0 π¬ 0 π 0> deterministic as users expect from machines
Machines don't have to be deterministic, they can also be non-deterministic or probabilistic.
Many important industries (eg in Epidemiology) rely on some version of monte carlo (particularly of the markov chain kind), which is probabilistic computation
Automation: The more money I put in, the larger the volume of code it can put out in a given time
Augmentation: The correlation between money and code volume is broken because everything runs through an interactive or symbiotic loop between me and the machine
Two very, very different ways of doing
I doubt that you can reconcile them because @emollick.bsky.social is wrong. There is a central difference. True augmentation has a dependency on the human to proceed and cannot be parallelized; automation has no such dependency so parallelization is an option.
21.09.2025 14:12 β π 0 π 0 π¬ 2 π 0At least, that's the simplified view of the optimization process invoked by the replicator dynamics maths. The key thing is evolution isn't trying to anticipate future possibilities, it's becoming ever more robust over time given niche-environ interactions.
15.09.2025 08:02 β π 2 π 0 π¬ 0 π 0It's productive to think about what the system is: the organism/biology, reproductive strategies, local environment and successful reproduction rates.
Evolutionary processes aren't trying to be optimal for all possible environments, they're instead evolving strategies that minimize long term regret
What's called "evolutionary" algorithms share little resemblance to the actual. They're random search with a (hopefully) constraining grammar. Algorithms closer to mathematical models employed by population dynamics are regret minimization (~solved 2 player poker) and particle filters (used widely)
15.09.2025 04:57 β π 1 π 0 π¬ 0 π 0That's quite heavy just for a graph. what's going on that js and plotly are insufficient?
09.09.2025 07:47 β π 3 π 0 π¬ 1 π 0As a matter of the pre-training objective. A lot of people wrongly hold an implicit mental image that there is some perspective in there with meta-cognition, but the only meta there is, is the meta-distribution induced by the learning problem whose sampled distributions LM must learn to predict well
07.09.2025 14:47 β π 0 π 0 π¬ 0 π 0OpenAI's formulation makes sense and they're hinting at a perspective that unlocked progress for them. How we post-train models worsens the issue of hallucination but correcting that issue won't completely eliminate hallucinations since so called hallucinations are the system functioning as intended
07.09.2025 14:12 β π 0 π 0 π¬ 1 π 0That's not the case in any meaningful sense because the LLM process results in a flawed approximation of the training distribution. It will not always be intuitively addressable (a major cause of this is due to shallow heuristics which dominate token prob) so blaming user error is pointless/unfair.
07.09.2025 12:42 β π 0 π 0 π¬ 0 π 0Algorithmic probability is a useful frame. So while we can see the Transformer as defining a transition matrix (ie autocomplete), it's not a markov chain operating by trivial lookups but one also generatively conditioning on programs given its prefix strings. Thusly approximating data distribution
05.09.2025 02:10 β π 0 π 0 π¬ 0 π 0You're actually right that there's nothing surprising about this, but like OP, failed to identify that next token prediction is just an unflattering way to say a program computing conditional probability distributions over text. Prefixes needn't be continued from tables, and can derive from programs
05.09.2025 01:40 β π 0 π 0 π¬ 1 π 0This doesn't affect the correctness of your broader argument but I am confident mastodon is a better claimant to that title--most anti-AI site(s) on the net. Hackernews is also quite anti-ai but it's split. I can't tell if it's more or less than here because of how the dynamics of both sites differ
04.09.2025 18:58 β π 5 π 0 π¬ 1 π 0FWIW, terrible as the overviews are, people are working backwards from an already arrived at position. I've seen information wants to be free folks suddenly turn pro-IP--rather than point out corporate double-standards, implicitly assign agency to "autocomplete" to enable blame, overhyped yet job-th
02.09.2025 00:52 β π 1 π 0 π¬ 0 π 0Hmm... my experience has been that AI overviews are really bad/useless on balance. At least, that's the only explanation I can give for why my brain no longer allows me to see them unless I concentrate.
02.09.2025 00:42 β π 2 π 0 π¬ 0 π 0With AND what we usually want is intersection. So either directly use a DSL or a small LLM parses it out. From these we can seek intersections: simple and easy is to matrix multiply on unit vectors and filter or, use SVD (more complex but much more flexible). Geometric mean of hadamard as fallback
31.08.2025 14:52 β π 1 π 0 π¬ 0 π 0...to compute a distribution. They don't understand, nor do they have causal or world models for the same reason that metacognition is a term that's not really applicable.
And yet, somehow, they so often act such that it's difficult/pointless to say that they don't. That's what is most fascinating!
In attention, they're expectation values of mixture models over directions on approximate hyperspheres that live in very high dimensions. A few layers of this yields highly abstracted complex functions on inputs & outputs. With FFNs we get programs operating on distributions to compute a distrs
24.08.2025 07:53 β π 0 π 0 π¬ 1 π 0Their pathological confabulations are not consistent with meta-cognition. However, they do reason in the sense of searching over a space of input dependent JIT constructed programs derived from functions/heuristics.
They're not merely operating on structured language patterns after multiple layers.
You're correct on most of that. They're operating probabilistically, they aren't reasoning in the sense humans do, nor that of following a deductive program in a combinatorial search space nor with consistent constraint propagation. And,
24.08.2025 07:53 β π 0 π 0 π¬ 3 π 0Hereβs some Gemini gaslighting:
β’ giving a wrong answer to a puzzle
β’ giving Python code that could test its claim
β’ asserting it obtained results from the code supporting its claim
β’ but when I ran the code myself, it showed the claim was false.
g.co/gemini/share...
Even traditional NLP still has a place as a crude first pass when volume gets large enough. In IR, there's still a place for BM25--careful design with semantic retrieval can yield a combined result better than either alone.
22.08.2025 21:54 β π 0 π 0 π¬ 0 π 0Indeed and beyond that, the counter-intuitive thing about LLMs is that they're bad at general computation, they merely run on computers. Computing strategy profiles for imperfect info games with combinatorial state spaces, bayes inference, convex optimization solvers all still need improving/writing
22.08.2025 21:54 β π 1 π 0 π¬ 1 π 0Interestingly enough, the ULMFit approach was closer to GPT; the original GPT paper directly acknowledges this influence. ELMo, was also foundational and cited for experiments in the original GPT paper too. ELMo's hard to place, but I guess kinda BERT but also an early contextual embedding approach.
22.08.2025 21:20 β π 0 π 0 π¬ 0 π 0Accounting for cost, looks like openai models completely dominate pareto frontier
22.08.2025 08:55 β π 0 π 0 π¬ 0 π 0---
Thoughts: Continual learning is a lot more challenging than previously thought (and it was thought very challenging). Animals are able to do it though. Furthermore, animals likely use something more powerful than SGD, to be able to avoid whatever their equivalent of an ill-conditioned hessian is