deen @sir-deenicus - Bluesky Profile

Moravec paradox: The robot can do a wall flip but will completely fall apart trying to do the basic beginner "six-step" move.

11.10.2025 14:12 — 👍 1 🔁 0 💬 0 📌 0

An LLM's transformer is a markov kernel that drives an order-m Markov chain whose memory is bounded by the model’s context window.

11.10.2025 12:58 — 👍 1 🔁 0 💬 0 📌 0

One minor note, as observed by @maxine.science in the comments, is that LLMs are glorified markov chains (or more precisely, they try to approximate one) and the interesting thing is a glorified markov chain is quite powerful.

11.10.2025 12:58 — 👍 3 🔁 0 💬 2 📌 0

We generally don't want to do this because sampling near the mode of the distribution is sampling from slop central. We're limited by hardware and the ability to score samples. LLMs get huge jumps in performance when used correctly, as the generators of probability distributions that they are.

11.10.2025 12:35 — 👍 0 🔁 0 💬 0 📌 0

> deterministic as users expect from machines

Machines don't have to be deterministic, they can also be non-deterministic or probabilistic.

Many important industries (eg in Epidemiology) rely on some version of monte carlo (particularly of the markov chain kind), which is probabilistic computation

11.10.2025 12:30 — 👍 0 🔁 0 💬 1 📌 0

Automation: The more money I put in, the larger the volume of code it can put out in a given time

Augmentation: The correlation between money and code volume is broken because everything runs through an interactive or symbiotic loop between me and the machine

Two very, very different ways of doing

21.09.2025 14:27 — 👍 1 🔁 0 💬 1 📌 0

I doubt that you can reconcile them because @emollick.bsky.social is wrong. There is a central difference. True augmentation has a dependency on the human to proceed and cannot be parallelized; automation has no such dependency so parallelization is an option.

21.09.2025 14:12 — 👍 0 🔁 0 💬 2 📌 0

At least, that's the simplified view of the optimization process invoked by the replicator dynamics maths. The key thing is evolution isn't trying to anticipate future possibilities, it's becoming ever more robust over time given niche-environ interactions.

15.09.2025 08:02 — 👍 2 🔁 0 💬 0 📌 0

It's productive to think about what the system is: the organism/biology, reproductive strategies, local environment and successful reproduction rates.

Evolutionary processes aren't trying to be optimal for all possible environments, they're instead evolving strategies that minimize long term regret

15.09.2025 07:59 — 👍 2 🔁 1 💬 1 📌 0

What's called "evolutionary" algorithms share little resemblance to the actual. They're random search with a (hopefully) constraining grammar. Algorithms closer to mathematical models employed by population dynamics are regret minimization (~solved 2 player poker) and particle filters (used widely)

15.09.2025 04:57 — 👍 1 🔁 0 💬 0 📌 0

That's quite heavy just for a graph. what's going on that js and plotly are insufficient?

09.09.2025 07:47 — 👍 3 🔁 0 💬 1 📌 0

As a matter of the pre-training objective. A lot of people wrongly hold an implicit mental image that there is some perspective in there with meta-cognition, but the only meta there is, is the meta-distribution induced by the learning problem whose sampled distributions LM must learn to predict well

07.09.2025 14:47 — 👍 0 🔁 0 💬 0 📌 0

OpenAI's formulation makes sense and they're hinting at a perspective that unlocked progress for them. How we post-train models worsens the issue of hallucination but correcting that issue won't completely eliminate hallucinations since so called hallucinations are the system functioning as intended

07.09.2025 14:12 — 👍 0 🔁 0 💬 1 📌 0

That's not the case in any meaningful sense because the LLM process results in a flawed approximation of the training distribution. It will not always be intuitively addressable (a major cause of this is due to shallow heuristics which dominate token prob) so blaming user error is pointless/unfair.

07.09.2025 12:42 — 👍 0 🔁 0 💬 0 📌 0

Algorithmic probability is a useful frame. So while we can see the Transformer as defining a transition matrix (ie autocomplete), it's not a markov chain operating by trivial lookups but one also generatively conditioning on programs given its prefix strings. Thusly approximating data distribution

05.09.2025 02:10 — 👍 0 🔁 0 💬 0 📌 0

You're actually right that there's nothing surprising about this, but like OP, failed to identify that next token prediction is just an unflattering way to say a program computing conditional probability distributions over text. Prefixes needn't be continued from tables, and can derive from programs

05.09.2025 01:40 — 👍 0 🔁 0 💬 1 📌 0

This doesn't affect the correctness of your broader argument but I am confident mastodon is a better claimant to that title--most anti-AI site(s) on the net. Hackernews is also quite anti-ai but it's split. I can't tell if it's more or less than here because of how the dynamics of both sites differ

04.09.2025 18:58 — 👍 5 🔁 0 💬 1 📌 0

FWIW, terrible as the overviews are, people are working backwards from an already arrived at position. I've seen information wants to be free folks suddenly turn pro-IP--rather than point out corporate double-standards, implicitly assign agency to "autocomplete" to enable blame, overhyped yet job-th

02.09.2025 00:52 — 👍 1 🔁 0 💬 0 📌 0

Hmm... my experience has been that AI overviews are really bad/useless on balance. At least, that's the only explanation I can give for why my brain no longer allows me to see them unless I concentrate.

02.09.2025 00:42 — 👍 2 🔁 0 💬 0 📌 0

With AND what we usually want is intersection. So either directly use a DSL or a small LLM parses it out. From these we can seek intersections: simple and easy is to matrix multiply on unit vectors and filter or, use SVD (more complex but much more flexible). Geometric mean of hadamard as fallback

31.08.2025 14:52 — 👍 1 🔁 0 💬 0 📌 0

...to compute a distribution. They don't understand, nor do they have causal or world models for the same reason that metacognition is a term that's not really applicable.

And yet, somehow, they so often act such that it's difficult/pointless to say that they don't. That's what is most fascinating!

24.08.2025 08:00 — 👍 0 🔁 0 💬 0 📌 0

In attention, they're expectation values of mixture models over directions on approximate hyperspheres that live in very high dimensions. A few layers of this yields highly abstracted complex functions on inputs & outputs. With FFNs we get programs operating on distributions to compute a distrs

24.08.2025 07:53 — 👍 0 🔁 0 💬 1 📌 0

Their pathological confabulations are not consistent with meta-cognition. However, they do reason in the sense of searching over a space of input dependent JIT constructed programs derived from functions/heuristics.

They're not merely operating on structured language patterns after multiple layers.

24.08.2025 07:53 — 👍 0 🔁 0 💬 1 📌 0

You're correct on most of that. They're operating probabilistically, they aren't reasoning in the sense humans do, nor that of following a deductive program in a combinatorial search space nor with consistent constraint propagation. And,

24.08.2025 07:53 — 👍 0 🔁 0 💬 3 📌 0

‎Gemini - Five Dimensions: A Dimensional Puzzle Created with Gemini

Here’s some Gemini gaslighting:

• giving a wrong answer to a puzzle
• giving Python code that could test its claim
• asserting it obtained results from the code supporting its claim
• but when I ran the code myself, it showed the claim was false.

g.co/gemini/share...

23.08.2025 11:28 — 👍 36 🔁 6 💬 3 📌 0

Even traditional NLP still has a place as a crude first pass when volume gets large enough. In IR, there's still a place for BM25--careful design with semantic retrieval can yield a combined result better than either alone.

22.08.2025 21:54 — 👍 0 🔁 0 💬 0 📌 0

Indeed and beyond that, the counter-intuitive thing about LLMs is that they're bad at general computation, they merely run on computers. Computing strategy profiles for imperfect info games with combinatorial state spaces, bayes inference, convex optimization solvers all still need improving/writing

22.08.2025 21:54 — 👍 1 🔁 0 💬 1 📌 0

Interestingly enough, the ULMFit approach was closer to GPT; the original GPT paper directly acknowledges this influence. ELMo, was also foundational and cited for experiments in the original GPT paper too. ELMo's hard to place, but I guess kinda BERT but also an early contextual embedding approach.

22.08.2025 21:20 — 👍 0 🔁 0 💬 0 📌 0

Accounting for cost, looks like openai models completely dominate pareto frontier

22.08.2025 08:55 — 👍 0 🔁 0 💬 0 📌 0

---
Thoughts: Continual learning is a lot more challenging than previously thought (and it was thought very challenging). Animals are able to do it though. Furthermore, animals likely use something more powerful than SGD, to be able to avoid whatever their equivalent of an ill-conditioned hessian is

22.08.2025 05:45 — 👍 0 🔁 0 💬 0 📌 0

deen

Latest posts by sir-deenicus.bsky.social on Bluesky

@sir-deenicus is following 18 prominent accounts