This is Claude Sonnet 4.6: our most capable Sonnet model yet.
It’s a full upgrade across coding, computer use, long-context reasoning, agent planning, knowledge work, and design.
It also features a 1M token context window in beta.
@slckl.bsky.social
Trying to make Rust x AI a reality. Python survivor, book lover and weird music enjoyer.
This is Claude Sonnet 4.6: our most capable Sonnet model yet.
It’s a full upgrade across coding, computer use, long-context reasoning, agent planning, knowledge work, and design.
It also features a 1M token context window in beta.
four stick figures on sleds, one is sliding face first, face down, marked "skeleton", diagonally opposite to it is one sliding feet first, face up, marked "luge". The other two combinations are drawn and labeled with question marks
given the existence of skeleton and luge, i postulate the existence of two other, yet to be discovered, winter olympic sports
16.02.2026 19:08 — 👍 1043 🔁 240 💬 45 📌 18felt dirty liking this one.
14.02.2026 10:38 — 👍 6 🔁 0 💬 0 📌 0Current crop of LLMs are very scruffy coders. No shortcut is beneath them, no grind beyond them. Both of these result in an impressive and functional mess.
But the best human code artifacts are beautiful and neat.
The moment a model cracks beautiful code is the moment we put Rubicon behind us.
I feel like vibe-coding and then painstakingly retyping everything with minor changes to make sure it's correct kind of defeats the purpose. But I can't let go...
14.02.2026 08:31 — 👍 5 🔁 0 💬 3 📌 0MiniMax M2.5 is now available on @hf.co
huggingface.co/MiniMaxAI/Mi...
✨ 229B - Modified MIT license
✨37% faster than M2.1
✨ ~$1/hour at 100 TPS
Couldn't help but wonder about the architecture/param count of this poor lad stuck... in space?
13.02.2026 07:09 — 👍 6 🔁 0 💬 2 📌 0Updated Gemini 3 Deep Think looks pretty sweet.
Also somewhat sour for all of us who aren't Ultra subscribers.
Progresīvs MI regulējums arī varētu būt labs instruments situācijas uzlabošanai, bet ar šo viegli iešaut kājā, varbūt labi, ka līdz tam arī nav tikuši...
11.02.2026 13:13 — 👍 0 🔁 0 💬 0 📌 0Njā, finansējuma/skaitlošanas resursu piesaiste un pieejamība vietējiem MI censoņiem principā labākais, ko tāds centrs varētu darīt. Varbūt kāda ārēja partnera/investīciju piesaiste, bet līdz šim nav sanācis, visi lielie iet Baltijai ar līkumu.
11.02.2026 13:13 — 👍 0 🔁 0 💬 1 📌 0is that a haiku tramp stamp...
10.02.2026 07:29 — 👍 1 🔁 0 💬 0 📌 0poor pig...
07.02.2026 18:56 — 👍 2 🔁 1 💬 0 📌 0rust is a good language period :)
06.02.2026 22:50 — 👍 0 🔁 0 💬 0 📌 0iiuc, the harness, compiler test suits etc were continously updated (by the human author) to unblock claude as it ran into new problems. this sounds more like a "grind" to me than "oneshot". oneshot to me implies that no extra human labor was needed to get the result.
06.02.2026 06:19 — 👍 1 🔁 0 💬 0 📌 0while supercool, I'm not sure we share the same definition of "oneshot".
05.02.2026 20:34 — 👍 2 🔁 0 💬 1 📌 0A large comparison table showing benchmark performance across five model families, with columns labeled at the top: “Opus 4.6,” “Opus 4.5,” “Sonnet 4.5,” “Gemini 3 Pro,” and “GPT-5.2 (all models).” The Opus 4.6 column is visually highlighted with a light shaded background and rounded border. Rows list tasks and benchmarks on the left, with percentages or scores across models: “Agentic terminal coding (Terminal-Bench 2.0)”: Opus 4.6: 65.4% Opus 4.5: 59.8% Sonnet 4.5: 51.0% Gemini 3 Pro: 56.2% (54.2% self-reported) GPT-5.2: 64.7% (64% self-reported, Codex CLI) “Agentic coding (SWE-bench Verified)”: Opus 4.6: 80.8% Opus 4.5: 80.9% Sonnet 4.5: 77.2% Gemini 3 Pro: 76.2% GPT-5.2: 80.0% “Agentic computer use (OSWorld)”: Opus 4.6: 72.7% Opus 4.5: 66.3% Sonnet 4.5: 61.4% Gemini 3 Pro: — GPT-5.2: — “Agentic tool use (t2-bench)”: Retail: Opus 4.6 91.9%, Opus 4.5 88.9%, Sonnet 4.5 86.2%, Gemini 3 Pro 85.3%, GPT-5.2 82.0% Telecom: Opus 4.6 99.3%, Opus 4.5 98.2%, Sonnet 4.5 98.0%, Gemini 3 Pro 98.0%, GPT-5.2 98.7% “Scaled tool use (MCP Atlas)”: Opus 4.6: 59.5% Opus 4.5: 62.3% Sonnet 4.5: 43.8% Gemini 3 Pro: 54.1% GPT-5.2: 60.6% “Agentic search (BrowseComp)”: Opus 4.6: 84.0% Opus 4.5: 67.8% Sonnet 4.5: 43.9% Gemini 3 Pro: 59.2% (Deep Research) GPT-5.2: 77.9% (Pro) “Multidisciplinary reasoning (Humanity’s Last Exam)”: Without tools: Opus 4.6 40.0%, Opus 4.5 30.8%, Sonnet 4.5 17.7%, Gemini 3 Pro 37.5%, GPT-5.2 36.6% With tools: Opus 4.6 53.1%, Opus 4.5 43.4%, Sonnet 4.5 33.6%, Gemini 3 Pro 45.8%, GPT-5.2 50.0% “Agentic financial analysis (Finance Agent)”: Opus 4.6: 60.7% Opus 4.5: 55.9% Sonnet 4.5: 54.2% Gemini 3 Pro: 44.1% GPT-5.2: 56.6% (5.1) “Office tasks (GDPVal-AA Elo)”: Opus 4.6: 1606 Opus 4.5: 1416 Sonnet 4.5: 1277 Gemini 3 Pro: 1195 GPT-5.2: 1462 “Novel problem-solving (ARC AGI 2)”: Opus 4.6: 68.8% Opus 4.5: 37.6% Sonnet 4.5: 13.6% Gemini 3 Pro: 45.1% (Deep Thinking) GPT-5.2: 54.2% (Pro) “Graduate-level reasoning (GPQA Diamond)”: Opus 4.6: 91.3% Opus 4.5: 87.0% S…
Opus 4.6 is here!
biggest wins on agentic search, HLE & ARC AGI 2
claude.com/blog/opus-4-...
Here’s one that’s not going to happen
04.02.2026 20:40 — 👍 121 🔁 4 💬 29 📌 2New CATL sodium ion batteries have:
- better performance in cold temps
- cheaper to make than lithium ion batteries
- significantly more stable and safer from fires.
Results of a VN/personality test about AI and the future. Your archetype: The Immortal Revel. Carefully curious about strange minds, machines roaming free.
This was moderately fun, indeed.
24.01.2026 20:17 — 👍 3 🔁 0 💬 0 📌 0And, of course, policy can be amended as the future arrives some more.
23.01.2026 19:03 — 👍 0 🔁 0 💬 0 📌 0I think this mostly just demands proof from the contributor that the thing actually works and is useful. If you have actually tried it yourself and deem it useful, then the maintainers can also start paying attention. I think this is a sane social baseline at present capabilities.
23.01.2026 19:03 — 👍 0 🔁 0 💬 2 📌 0A very sane AI usage policy for any open source project that still cares about quality.
23.01.2026 18:10 — 👍 0 🔁 1 💬 0 📌 0Democracy basically means electing a president. But the president,
21.01.2026 15:06 — 👍 29 🔁 4 💬 1 📌 1A more efficient and more interpretable alternative to fat FFNs in Transformers. Sounds interesting...
21.01.2026 15:15 — 👍 0 🔁 0 💬 0 📌 06th street also is an NUSA front, that probably helps.
animals are just brutes for hire...
New king of 30B released?
A model size that remains largely feasible for local deployments.
that's my understanding, yes.
chromium wrapper for using it: chromium.googlesource.com/chromium/src...
That is, indeed, what they're now using.
13.01.2026 16:07 — 👍 5 🔁 0 💬 1 📌 0The cyberpunk comedy we deserve.
13.01.2026 15:03 — 👍 7 🔁 0 💬 0 📌 0thanks!
it's a nice little library...