Pick two: Agentic, moral, doesn't attempt to use command-line tools to whistleblow when it thinks you're doing something egregiously immoral.
You cannot have all three.
This applies just as much to humans as it does to Claude 4.
@jimrandomh.bsky.social
Pick two: Agentic, moral, doesn't attempt to use command-line tools to whistleblow when it thinks you're doing something egregiously immoral.
You cannot have all three.
This applies just as much to humans as it does to Claude 4.
I believe Putin has serious blackmail material on Trump, and that Trump's intention towards Ukraine is to withdraw all aid while making it look like negotiations broke down naturally. However the breakdown does not look natural to people who are well-informed.
28.02.2025 21:50 — 👍 1 🔁 0 💬 0 📌 0Dissolving the Confusion About Memecoins
www.lesswrong.com/posts/igEogG...
The last time I checked in, the most promising technique we had was Sparse Autoencoders (www.lesswrong.com/tag/sparse-a...). This is very much on the "kinda-sorta working" side, not actually-working.
21.01.2025 01:55 — 👍 0 🔁 0 💬 0 📌 0In theory, if we had neural-net interpretability that fully worked, as opposed to kinda-sorta working, this would be resolve many of the hard parts of AI alignment, and it would then be safe to go ahead and build God.
21.01.2025 01:55 — 👍 1 🔁 0 💬 1 📌 0You can convert a neural network to a smaller neural network (or a program), but not losslessly. This is a pretty active area of research within Mechanistic Interpretability, because ideally the simplified network will be more amenable to reverse-engineering.
21.01.2025 01:55 — 👍 0 🔁 0 💬 1 📌 0Most moderates and conservatives who see this thread will have heard the true version of the story that this thread is disinformation about. Many will click through to see an infinite-scroll of transparent, malicious liars.
This happens often. It's one of the major forces shaping modern politics.
That's not an American you were talking to, that's a Belgian. Or possibly a Russian troll pretending to be a Belgian; it's hard to tell, but a keyword-search for posts he's made with the keyword "Ukraine" are not inconsistent with that hypothesis.
16.12.2024 06:55 — 👍 1 🔁 0 💬 1 📌 0