Steven Walton's Avatar

Steven Walton

@swalton.ai.bsky.social

Ph.D. from University of Oregon Visiting Scholar SHI Lab @ Georgia Tech Opinions expressed are those of my cat

576 Followers  |  501 Following  |  545 Posts  |  Joined: 19.11.2024  |  2.2644

Latest posts by swalton.ai on Bluesky

I think it helps to specify. Many subjects are taught like a "game of telephone". It leads to people thinking they're the same and inferring inaccurate beliefs.

So if it's easy to distinguish, I think it's best to. Avoids confusion later on.

04.11.2025 22:16 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

So the way you walk (gait) and the way you type is considered biometric data but not your poop?

Something doesn't smell right and it's not my shit

21.10.2025 06:04 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Smart Pipe | Infomercials | Adult Swim
YouTube video by Adult Swim Smart Pipe | Infomercials | Adult Swim

If you haven't seen Smart Pipe, there hasn't been a better time

m.youtube.com/watch?v=DJkl...

21.10.2025 05:30 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Did Kohler really see Smart Pipe and go "What a great idea!"

For $600, and a monthly payment of $7, you too can send Kohler pictures of your poop!
www.kohlerhealth.com/dekoda/

21.10.2025 05:30 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

Okay

29.09.2025 18:39 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I also don't like that it's so hard to even have this basic conversation. We should be working hard to answer these questions as even progress in that direction has significant impact on how we should design these systems. We don't even know if we're going in the right direction

29.09.2025 08:06 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Yeah I have similar intuitions. They appear indistinguishable from fuzzy databases.

But my larger point is that this is extremely difficult to conclude. I just think "PhD level reasoning" is a much stronger claim so needs stronger evidence.

29.09.2025 08:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
ELIZA - Wikipedia

Okay but by your definition Eliza is conscious. It meets all your criteria
en.wikipedia.org/wiki/ELIZA

Per the death question, well we trained it on those. So if we programmed similar responses into Eliza does that make Eliza more alive or change your answer about her? Do we default to conscious?

29.09.2025 07:31 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Categorically, they are identical.

So let's modify our memorization question slightly: how do you differentiate reasoning from doing a similarity search on a lookup table?

Are those different things? Is the failure in figure 1 because a reasoning failure or a search failure? How do you know?

29.09.2025 07:27 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I'm fine using a broad definition of reasoning but not concluding that the type of reasoning is the same.

We train these machines very differently and so you can't evaluate them the same way.

So go to Figure 1 and tell me if those are in distribution or not. They are all (ax, by, cz) problems

29.09.2025 07:27 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Humans failing at larger numbers generally happens not because of a procedural errors but a copying or digit permutation error. But teaching humans we can give them 2 or 3 digit examples and they can then do it with an arbitrary number of digits. Using <<1% of the training examples too.

29.09.2025 07:27 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

model is reasoning and not memorizing? Hopefully you agree you can't *conclude* reasoning.

A problem here is what is considered OOD? Take this old example, what do you consider to be OOD? The number of digits? Some factorization? Why?
bsky.app/profile/swal...

29.09.2025 07:27 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

That paper says something different than what you said.

I want to differentiate reasoning from memorizing. We can agree here, right?

If they fail a problem that uses identical reasoning to problems that they succeed at and such problems are the same as those in the training, can you conclude the

29.09.2025 07:27 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Sure, I'm with you here. I didn't say our animatronic duck wasn't alive, just it's hard to differentiate.
But maybe you can help me. How do we know my calculator isn't conscious? What makes it uniquely unconscious? That it doesn't talk? Doesn't pursue its own goals? How do you differentiate

29.09.2025 06:58 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

That's the current paradigm. If we try to build something that is indistinguishable from a real duck, does that mean it is a real duck? You have to ask "indistinguishable to who?" and "indistinguishable under what conditions?", right? It's not so obvious what that answer is.

29.09.2025 06:40 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

likely a duck, but here we have a machine where we have trained it to predict the next token, to chat with humans in a way that humans prefer, and has read every psych textbook.

If we built a really sophisticated animatronic duck do you think you could easily differentiate it from a real duck?

29.09.2025 06:38 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

description of consciousness. This would make so many things easier lol.

You can't apply the duck test here.

Just because it looks like a duck, swims like a duck, and quacks like a duck does not mean it isn't an advanced animatronic duck. In a normal setting we should conclude that it is very

29.09.2025 06:38 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

You're getting too philosophical and anthropomorphising. The reason I need math to properly explain is because math is a precise language, not because were we to have a mathematical description of consciousness we would no longer consider it consciousness. In fact, we really want a mathematical

29.09.2025 06:38 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I'll also add that as someone who's focused on model architectures that such a result is expected. It's a necessary consequence of properties like superposition (there are others). But idk if I can have such a conversation without getting extremely technical or involving math.

29.09.2025 02:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Subliminal Learning: Language Models Transmit Behavioral Traits via Hidden Signals in Data

Here's a more recent and in depth study which shows more evidence. There's a lot of similar work but less well known.

I'd say this work demonstrates that you cannot conclude that their outputs are reliable representations of their processing.

alignment.anthropic.com/2025/sublimi...

29.09.2025 02:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Your belief of functional meta-awareness is based on the assumption that their outputs are accurate representations of their internal processes. But we have strong evidence that this is not true.

That Grok example above is illustrative of this.

29.09.2025 02:00 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

That's the comparison though. But yeah, just because I have a PhD also doesn't mean I won't surprise anyone by my own stupidity. I do and say lots of stupid things lol

28.09.2025 22:46 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

The claims here are not easily verifiable, so let's pretend for a second @victoramartinez.com and I were saying "1+1 = 7". Our education would be evidence that you should believe us. But you could go to a calculator and trivially prove us wrong.

That's exactly how the bias in claims works.

28.09.2025 22:21 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I want to add one thing. We're actually illustrating this in our conversation. You're talking to two people where one has a PhD and the other is getting a MS, both in AI.

Does this prove our claims? No. Does it give them evidence? Yes. In the same way you are using the links you provided (ethos)

28.09.2025 22:21 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

This is because the intermediate output isn't always strongly correlated to the final output. This is what @victoramartinez.com is talking about with distributions. But LLMs are fully capable of cheating, which makes evaluating them quite difficult.

28.09.2025 22:14 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
https://grok.com/s/c2hhcmQtNA%3D%3D_e15bb008-d252-4b4d-8233-5679c0a1789a

https://grok.com/s/c2hhcmQtNA%3D%3D_e15bb008-d252-4b4d-8233-5679c0a1789a

You need to determine if they're using actual logic or something else. Actually the stronger evidence is in their explanation, not their failure to get the right answer. I can even get them to get the right answer with bad logic

28.09.2025 21:56 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I think I ended up answering this further down so I'll defer to that.

As for not all knowing, sure. But I'm not the one making the claim that there's are PhD knowledge/skill level systems. I would strongly chastise a PhD scientist for such a mistake. And read their responses carefully.

28.09.2025 21:56 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

In other words:

Evidence allows you to put confidence bonds on your claims, but it's unable to rule out alternative explanations. (Often those are unknown unknowns)

A single counter example disproves because it is proof of such an alternative.

28.09.2025 21:48 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

There's a bias in how claims can be made. It's much easier to prove something isn't true than it is. A single counter example can do that. Proving something is true cannot be done through empirical evidence alone.

You can evidence that a claim is true. Just don't confuse evidence with proof.

28.09.2025 21:48 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

They actually often fail in distribution. Example above.

And we must be careful to differentiate the appearance of planning from planning. I'm not so sure we can accurately say that reasoning can't be performed via distributions but I strongly suspect it is not that simple.

28.09.2025 21:41 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@swalton.ai is following 20 prominent accounts