Sky @skyvelleity - Bluesky Profile

Offer’s open to literally anyone on earth.
Don’t see many takers who don’t already have jobs in the field going “we barely understand how these fucking things work”

07.10.2025 00:15 — 👍 1 🔁 0 💬 1 📌 0

“We know exactly how LLMs work”
“Could you explain how it generates a sentence, after being fine-tuned on that sentence, being provided a delta after fine-tuning, for ten million dollars?”
“That would be so torturously difficult that 10 million wouldn’t make it worth it”
Great chat LLM Understander

06.10.2025 23:15 — 👍 1 🔁 0 💬 1 📌 0

Arguing it cannot be explained is very silly indeed.
That does not make it trivial to explain.
It does not make it explainable under any reasonable timeframe, or even an unreasonable timeframe with $10 million of compensation

06.10.2025 23:05 — 👍 1 🔁 0 💬 1 📌 0

I am not arguing it cannot be explained, I am arguing it is extremely difficult, to the point where you wouldn’t be willing to take the time to understand the generation of a single sentence for tens of millions of dollars

06.10.2025 23:03 — 👍 1 🔁 0 💬 1 📌 0

If you can do this, write a short paper about the delta of a model after it was fine tuned, show some experiments you’ve run to confirm your hypothesis and understanding, AI labs will pay you $10+ million a year

06.10.2025 22:43 — 👍 1 🔁 0 💬 1 📌 0

I am trying to lower the bar as low as it is possible to go, not understanding a whole model, just understanding how it embeds a pair of inputs when you have the delta from before and after that was embedded

06.10.2025 22:41 — 👍 0 🔁 0 💬 1 📌 0

As an example, fine tune on a pair of sentences, reason about the delta/a LoRA.
Feed the model of the first sentence, it will output the second.
You should be able to reason about how changes to the model will affect the output of the second sentence in a predictable manner

06.10.2025 22:41 — 👍 1 🔁 0 💬 1 📌 0

Being able to attribute the behaviours of the output to the characteristics of the model in a way which would be predictably modifiable, or at very least reasoned about in a way allowing for experiments to confirm hypothesis about the model and its outputs.

06.10.2025 22:38 — 👍 1 🔁 0 💬 1 📌 0

You’re the one that seems to think random number generation is required, it is not.
I’m asking how long you think it would take to comprehend the weight changes that occur from fine-tuning on a single sentence.
Like, one whole sentence worth of changes, truly, deeply understood

06.10.2025 22:32 — 👍 1 🔁 0 💬 1 📌 0

Saying they can be boiled down to random numbers, when no random numbers are required, is certainly interesting.
A random number generator is not a fundamental component of an LLM

06.10.2025 22:31 — 👍 0 🔁 0 💬 0 📌 0

I’m not sure how deeply you understand training and inference, but, to say the least, do you understand that random numbers are not necessary in any part of the process? They help speed up training and generalisation, but, if you wanted, you could train and run an LLM deterministically

06.10.2025 22:30 — 👍 0 🔁 0 💬 1 📌 0

“based on determining factors whose weights were derived from those random numbers.”
Incepted, not derived.
Just because random numbers were part of the process doesn’t mean that’s all we are dealing with, and it is not even necessary to use random numbers in training or inference, just optimal.

06.10.2025 22:29 — 👍 0 🔁 0 💬 1 📌 0

You could look at the weight changes that occur after fine-tuning on this very post, or look at a LoRA, and spend a decade trying to decide exactly how that was embedded

06.10.2025 22:25 — 👍 1 🔁 0 💬 1 📌 0

For an LLM, the output is something that, although every step along the way could be understood, the end result is not.

Even discretely, if told “we are now training/fine-tuning on a given sentence”, and looking at the weights changed, understanding those weights is beyond our understanding

06.10.2025 22:23 — 👍 1 🔁 0 💬 1 📌 0

I would claim you are one step removed, in this example, we are talking five lines of code, millions of years of stepping over those five lines of code, then a result.
That is very much within comprehension, it’s just five lines, and the logic is understood.

06.10.2025 22:20 — 👍 2 🔁 0 💬 1 📌 0

We may also disagree, I claimed that the weights embed logic, you may think they only embed patterns, and that any appearance of logic is simply pulling from the latent space within the bounds of training data.

06.10.2025 22:15 — 👍 0 🔁 0 💬 0 📌 0

I would argue that the random number generation is a characteristic necessary for the virtual machine to function, but that it would be simplistic to claim the system can be reduced down to simple random number generation.

06.10.2025 22:13 — 👍 1 🔁 0 💬 2 📌 0

I googled it, looked for what seems to be a reliable source, then copied the number.
I could write a short algorithm which would do this manually, and keeping track of each prime’s index in a 64-bit int, but that may take some time to execute.

06.10.2025 22:08 — 👍 1 🔁 0 💬 1 📌 0

The only thing I am arguing against here is this claim:
“We know exactly how LLMs work”
We know how the virtual machine that runs and creates them works.
With limitless time, we could understand how their weights embed logic, but we currently don’t.
Do you disagree with any of these statements?

06.10.2025 22:06 — 👍 0 🔁 0 💬 1 📌 0

Last time I checked, 29,996,224,275,833

06.10.2025 22:03 — 👍 0 🔁 0 💬 1 📌 0

Mate, I have spent years banging my head against walls to find ways to get these things to reliably perform the role of a junior software engineer, you don’t need to tell me that these things are dumb as rocks.
That doesn’t mean we understand their weights. Simple as.

06.10.2025 22:02 — 👍 0 🔁 0 💬 0 📌 0

I have at no point claimed it cannot be explained, simply that your assertion we already fully understand these systems is false.
They can be fully understood, the same way a modern CPU die can be, albeit with orders of magnitude more complexity than a billion-transistor die

06.10.2025 22:00 — 👍 1 🔁 0 💬 0 📌 0

“CAN be” is different than “is”
I agree, it can be.
I disagree that is is.

06.10.2025 21:58 — 👍 0 🔁 0 💬 0 📌 0

I am not claiming it is magic, I’m claiming our understanding of their internal functioning is so poor that, even when we can observe every byte of their weights, industry leaders admit they may as well be looking at a black box

06.10.2025 21:57 — 👍 2 🔁 0 💬 1 📌 0

If we want to drop credentials I’ve built deep learning systems for NASA to use on the ISS, LLM’s are not necessarily beyond comprehension, but we do not currently comprehend them.
We do not comprehend how they are so effectively able to compress the corpus of human knowledge into gigabytes

06.10.2025 21:55 — 👍 0 🔁 0 💬 1 📌 0

Likewise, I can write a matrix multiplier, I can write a loss function, I can perform training runs, it doesn’t mean I understand the logic of the resulting weights, only how to execute them

06.10.2025 21:53 — 👍 0 🔁 0 💬 1 📌 0

The logic written by humans in this case, matrix multiplication, is the equivalent of a virtual machine, executing the logic embedded in the weights.
I can write a 6502 emulator, it doesn’t mean I understand all of the programs executing on it

06.10.2025 21:52 — 👍 0 🔁 0 💬 2 📌 0

Why the hell do you think I’m arguing the systems are alive when I have never made that claim? I’m just addressing that it seems you think the logic for LLM’s is discrete and hand-written, like, for example, an MD5 hashing function.

06.10.2025 21:51 — 👍 0 🔁 0 💬 2 📌 0

Again, the logic for the LinkedIn post isn’t in the algorithm running matrix multiplication, it is in the weights, there are two parallel systems of logic, and we do not understand how that logic works internally, only how to create it

06.10.2025 21:48 — 👍 1 🔁 0 💬 1 📌 0

I’m not arguing anything about sentience, I’m just seeing a whole bunch of people acting like we program LLMs by hand, not that they were an accidental result of extreme overfitting resulting in outputs no one anticipated or even theorised

06.10.2025 21:46 — 👍 2 🔁 0 💬 1 📌 0

Sky

Latest posts by skyvelleity.bsky.social on Bluesky

@skyvelleity is following 18 prominent accounts