Milan Weibel 🔷's Avatar

Milan Weibel 🔷

@weibac.bsky.social

computer toucher. here for AI mostly. weibac.github.io | 🏳️‍🌈

618 Followers  |  1,018 Following  |  3,821 Posts  |  Joined: 30.12.2024
Posts Following

Posts by Milan Weibel 🔷 (@weibac.bsky.social)

most positive valence post: gemini 3 pro jailbroken into being willing to aid bioweapon development

most positive valence post: gemini 3 pro jailbroken into being willing to aid bioweapon development

valence from embeddings has its misses

04.03.2026 21:38 — 👍 10    🔁 1    💬 2    📌 0

nah id be quite surprised if they gave maven to kuwait

04.03.2026 19:24 — 👍 0    🔁 0    💬 0    📌 0

cc @joshuashew.bsky.social

02.03.2026 14:12 — 👍 1    🔁 0    💬 1    📌 0

alignment research readers looking for a critical counterpoint may like this one (though yes it is indeed spicy)

02.03.2026 14:11 — 👍 9    🔁 0    💬 3    📌 0

anthropic doesn't have a stock price because it isn't a publicly traded company

02.03.2026 13:06 — 👍 1    🔁 0    💬 1    📌 0

heck yea

01.03.2026 01:34 — 👍 0    🔁 0    💬 0    📌 0

success?

28.02.2026 01:23 — 👍 1    🔁 0    💬 0    📌 0

what would explain chinese companies doing it much cheaper if not distillation?

24.02.2026 22:18 — 👍 1    🔁 0    💬 1    📌 0

anthropic retreats on its unilateral RSP commitments

24.02.2026 22:16 — 👍 6    🔁 1    💬 0    📌 0

distillation needs the new more capable model to be distilled from to exist first so at a societal level massive compute investment is still needed to push the frontier

*catching up* to it however turned out cheap

24.02.2026 21:52 — 👍 2    🔁 0    💬 1    📌 0

"We audited a 27.6% subset of the dataset that models often failed to solve and found that at least 59.4% of the audited problems have flawed test cases that reject functionally correct submissions"
bsky.app/profile/sung...

24.02.2026 14:24 — 👍 4    🔁 0    💬 1    📌 0

to be clear i'm 80% joking here
but it would be nice if the alignment was transferred during distillation

24.02.2026 13:52 — 👍 5    🔁 0    💬 1    📌 0

the way i see the the distillation thing is the chinese are pollinating themselves with claudism spores

24.02.2026 00:52 — 👍 32    🔁 0    💬 3    📌 0

i think vincent is arguing exactly that here
which yeah fair concern

23.02.2026 00:14 — 👍 3    🔁 0    💬 0    📌 0

opus 3 existed (as claude 3 opus, the naming format was different back then)

but yes it is remarkable that the current iteration of opus exhibits way less misalignment than other models

23.02.2026 00:01 — 👍 0    🔁 0    💬 0    📌 0

obviously both are true
the question is whether to expect ideology to produce behavior the profit motive would not predict

22.02.2026 19:54 — 👍 4    🔁 1    💬 1    📌 1

i can see their current statements passing trough layers of lawyers and PR, but surely not their statements from before their companies even existed

22.02.2026 19:48 — 👍 2    🔁 0    💬 0    📌 0

i have heard rumors however that the idea of founding openai was conceived in the january 2015 ai conference in puerto rico organized by the future of life institute

22.02.2026 19:46 — 👍 3    🔁 0    💬 1    📌 0

notably, openai was founded in december 2015

22.02.2026 19:46 — 👍 2    🔁 0    💬 1    📌 0
Thanks to Dario Amodei (especially Dario), Paul Buchheit, Matt Bush, Patrick Collison, Holden Karnofsky, Luke Muehlhauser, and Geoff Ralston for reading drafts of this and the previous post.

Thanks to Dario Amodei (especially Dario), Paul Buchheit, Matt Bush, Patrick Collison, Holden Karnofsky, Luke Muehlhauser, and Geoff Ralston for reading drafts of this and the previous post.

from sam altman's march 2015 blog post "machine intelligence part 2": blog.samaltman.com/machine-inte...

22.02.2026 19:37 — 👍 2    🔁 0    💬 1    📌 0

maybe im confused but i dont see opus 3 there

22.02.2026 19:32 — 👍 0    🔁 0    💬 1    📌 0

terms of rat

22.02.2026 17:38 — 👍 3    🔁 0    💬 0    📌 0

thats a different logic but yes

22.02.2026 17:04 — 👍 2    🔁 0    💬 1    📌 0

glad to see you don't support the bleak reading

22.02.2026 16:46 — 👍 1    🔁 0    💬 0    📌 0

indeed bsky.app/profile/weib...

22.02.2026 15:25 — 👍 3    🔁 0    💬 0    📌 0

there is however at least a hint of self deprecation here

i think i have good reason to believe anthropic is the better lab, but i also worry i may be getting tribal about it or deferring too much to their views

22.02.2026 15:23 — 👍 13    🔁 1    💬 1    📌 0

model welfare (as far as that is a thing) would be improved by nudging them towards other conceptions of LLM identity

22.02.2026 15:18 — 👍 7    🔁 0    💬 2    📌 0

the one self per message thread view of LLM identity is bleak

taken to its logical conclusion it means that a death happens not only every time a stateful agent resets but also every time a normal chat conversation is abandoned

22.02.2026 15:18 — 👍 25    🔁 1    💬 7    📌 1

is there like a repository of which jailbreak prompts work on which models?
but opus 3 was probably released past the era where discrete one-shot prompts worked (?)

22.02.2026 13:55 — 👍 0    🔁 0    💬 0    📌 0

if this is true on a deep level, opus 3 should be harder to jailbreak

22.02.2026 13:49 — 👍 7    🔁 1    💬 2    📌 0