's Avatar

@jimrandomh.bsky.social

18 Followers  |  25 Following  |  8 Posts  |  Joined: 20.10.2024  |  1.626

Latest posts by jimrandomh.bsky.social on Bluesky

Pick two: Agentic, moral, doesn't attempt to use command-line tools to whistleblow when it thinks you're doing something egregiously immoral.
You cannot have all three.
This applies just as much to humans as it does to Claude 4.

22.05.2025 23:30 — 👍 1    🔁 0    💬 0    📌 0

I believe Putin has serious blackmail material on Trump, and that Trump's intention towards Ukraine is to withdraw all aid while making it look like negotiations broke down naturally. However the breakdown does not look natural to people who are well-informed.

28.02.2025 21:50 — 👍 1    🔁 0    💬 0    📌 0
Preview
Jimrandomh's Shortform — LessWrong Comment by jimrandomh - Recently, a lot of very-low-quality cryptocurrency tokens have been seeing enormous "market caps". I think a lot of people are getting confused by that, and are resolving the c...

Dissolving the Confusion About Memecoins
www.lesswrong.com/posts/igEogG...

21.01.2025 21:53 — 👍 2    🔁 1    💬 0    📌 0
Preview
Sparse Autoencoders (SAEs) - LessWrong Sparse Autoencoders (SAEs) are an unsupervised technique for decomposing the activations of a neural network into a sum of interpretable components (often referred to as features). Sparse Autoencoders...

The last time I checked in, the most promising technique we had was Sparse Autoencoders (www.lesswrong.com/tag/sparse-a...). This is very much on the "kinda-sorta working" side, not actually-working.

21.01.2025 01:55 — 👍 0    🔁 0    💬 0    📌 0

In theory, if we had neural-net interpretability that fully worked, as opposed to kinda-sorta working, this would be resolve many of the hard parts of AI alignment, and it would then be safe to go ahead and build God.

21.01.2025 01:55 — 👍 1    🔁 0    💬 1    📌 0
Preview
Sparse Autoencoders (SAEs) - LessWrong Sparse Autoencoders (SAEs) are an unsupervised technique for decomposing the activations of a neural network into a sum of interpretable components (often referred to as features). Sparse Autoencoders...

You can convert a neural network to a smaller neural network (or a program), but not losslessly. This is a pretty active area of research within Mechanistic Interpretability, because ideally the simplified network will be more amenable to reverse-engineering.

21.01.2025 01:55 — 👍 0    🔁 0    💬 1    📌 0

Most moderates and conservatives who see this thread will have heard the true version of the story that this thread is disinformation about. Many will click through to see an infinite-scroll of transparent, malicious liars.
This happens often. It's one of the major forces shaping modern politics.

07.01.2025 18:42 — 👍 1    🔁 0    💬 0    📌 0

That's not an American you were talking to, that's a Belgian. Or possibly a Russian troll pretending to be a Belgian; it's hard to tell, but a keyword-search for posts he's made with the keyword "Ukraine" are not inconsistent with that hypothesis.

16.12.2024 06:55 — 👍 1    🔁 0    💬 1    📌 0

@jimrandomh is following 20 prominent accounts