A.V.'s Avatar

A.V.

@slckl.bsky.social

Trying to make Rust x AI a reality. Python survivor, book lover and weird music enjoyer.

401 Followers  |  284 Following  |  311 Posts  |  Joined: 08.02.2024  |  2.2777

Latest posts by slckl.bsky.social on Bluesky

Video thumbnail

This is Claude Sonnet 4.6: our most capable Sonnet model yet.

It’s a full upgrade across coding, computer use, long-context reasoning, agent planning, knowledge work, and design.

It also features a 1M token context window in beta.

17.02.2026 17:49 — 👍 40    🔁 4    💬 2    📌 4
four stick figures on sleds, one is sliding face first, face down, marked "skeleton", diagonally opposite to it is one sliding feet first, face up, marked "luge". The other two combinations are drawn and labeled with question marks

four stick figures on sleds, one is sliding face first, face down, marked "skeleton", diagonally opposite to it is one sliding feet first, face up, marked "luge". The other two combinations are drawn and labeled with question marks

given the existence of skeleton and luge, i postulate the existence of two other, yet to be discovered, winter olympic sports

16.02.2026 19:08 — 👍 1043    🔁 240    💬 45    📌 18

felt dirty liking this one.

14.02.2026 10:38 — 👍 6    🔁 0    💬 0    📌 0

Current crop of LLMs are very scruffy coders. No shortcut is beneath them, no grind beyond them. Both of these result in an impressive and functional mess.

But the best human code artifacts are beautiful and neat.
The moment a model cracks beautiful code is the moment we put Rubicon behind us.

14.02.2026 08:37 — 👍 1    🔁 0    💬 1    📌 0

I feel like vibe-coding and then painstakingly retyping everything with minor changes to make sure it's correct kind of defeats the purpose. But I can't let go...

14.02.2026 08:31 — 👍 5    🔁 0    💬 3    📌 0
Post image

MiniMax M2.5 is now available on @hf.co

huggingface.co/MiniMaxAI/Mi...
✨ 229B - Modified MIT license
✨37% faster than M2.1
✨ ~$1/hour at 100 TPS

13.02.2026 14:40 — 👍 24    🔁 1    💬 0    📌 1

Couldn't help but wonder about the architecture/param count of this poor lad stuck... in space?

13.02.2026 07:09 — 👍 6    🔁 0    💬 2    📌 0
Post image

Updated Gemini 3 Deep Think looks pretty sweet.

Also somewhat sour for all of us who aren't Ultra subscribers.

12.02.2026 17:33 — 👍 23    🔁 4    💬 3    📌 1

Progresīvs MI regulējums arī varētu būt labs instruments situācijas uzlabošanai, bet ar šo viegli iešaut kājā, varbūt labi, ka līdz tam arī nav tikuši...

11.02.2026 13:13 — 👍 0    🔁 0    💬 0    📌 0

Njā, finansējuma/skaitlošanas resursu piesaiste un pieejamība vietējiem MI censoņiem principā labākais, ko tāds centrs varētu darīt. Varbūt kāda ārēja partnera/investīciju piesaiste, bet līdz šim nav sanācis, visi lielie iet Baltijai ar līkumu.

11.02.2026 13:13 — 👍 0    🔁 0    💬 1    📌 0

is that a haiku tramp stamp...

10.02.2026 07:29 — 👍 1    🔁 0    💬 0    📌 0

poor pig...

07.02.2026 18:56 — 👍 2    🔁 1    💬 0    📌 0

rust is a good language period :)

06.02.2026 22:50 — 👍 0    🔁 0    💬 0    📌 0

iiuc, the harness, compiler test suits etc were continously updated (by the human author) to unblock claude as it ran into new problems. this sounds more like a "grind" to me than "oneshot". oneshot to me implies that no extra human labor was needed to get the result.

06.02.2026 06:19 — 👍 1    🔁 0    💬 0    📌 0

while supercool, I'm not sure we share the same definition of "oneshot".

05.02.2026 20:34 — 👍 2    🔁 0    💬 1    📌 0
A large comparison table showing benchmark performance across five model families, with columns labeled at the top: “Opus 4.6,” “Opus 4.5,” “Sonnet 4.5,” “Gemini 3 Pro,” and “GPT-5.2 (all models).” The Opus 4.6 column is visually highlighted with a light shaded background and rounded border.

Rows list tasks and benchmarks on the left, with percentages or scores across models:

“Agentic terminal coding (Terminal-Bench 2.0)”:
Opus 4.6: 65.4%
Opus 4.5: 59.8%
Sonnet 4.5: 51.0%
Gemini 3 Pro: 56.2% (54.2% self-reported)
GPT-5.2: 64.7% (64% self-reported, Codex CLI)

“Agentic coding (SWE-bench Verified)”:
Opus 4.6: 80.8%
Opus 4.5: 80.9%
Sonnet 4.5: 77.2%
Gemini 3 Pro: 76.2%
GPT-5.2: 80.0%

“Agentic computer use (OSWorld)”:
Opus 4.6: 72.7%
Opus 4.5: 66.3%
Sonnet 4.5: 61.4%
Gemini 3 Pro: —
GPT-5.2: —

“Agentic tool use (t2-bench)”:
Retail: Opus 4.6 91.9%, Opus 4.5 88.9%, Sonnet 4.5 86.2%, Gemini 3 Pro 85.3%, GPT-5.2 82.0%
Telecom: Opus 4.6 99.3%, Opus 4.5 98.2%, Sonnet 4.5 98.0%, Gemini 3 Pro 98.0%, GPT-5.2 98.7%

“Scaled tool use (MCP Atlas)”:
Opus 4.6: 59.5%
Opus 4.5: 62.3%
Sonnet 4.5: 43.8%
Gemini 3 Pro: 54.1%
GPT-5.2: 60.6%

“Agentic search (BrowseComp)”:
Opus 4.6: 84.0%
Opus 4.5: 67.8%
Sonnet 4.5: 43.9%
Gemini 3 Pro: 59.2% (Deep Research)
GPT-5.2: 77.9% (Pro)

“Multidisciplinary reasoning (Humanity’s Last Exam)”:
Without tools: Opus 4.6 40.0%, Opus 4.5 30.8%, Sonnet 4.5 17.7%, Gemini 3 Pro 37.5%, GPT-5.2 36.6%
With tools: Opus 4.6 53.1%, Opus 4.5 43.4%, Sonnet 4.5 33.6%, Gemini 3 Pro 45.8%, GPT-5.2 50.0%

“Agentic financial analysis (Finance Agent)”:
Opus 4.6: 60.7%
Opus 4.5: 55.9%
Sonnet 4.5: 54.2%
Gemini 3 Pro: 44.1%
GPT-5.2: 56.6% (5.1)

“Office tasks (GDPVal-AA Elo)”:
Opus 4.6: 1606
Opus 4.5: 1416
Sonnet 4.5: 1277
Gemini 3 Pro: 1195
GPT-5.2: 1462

“Novel problem-solving (ARC AGI 2)”:
Opus 4.6: 68.8%
Opus 4.5: 37.6%
Sonnet 4.5: 13.6%
Gemini 3 Pro: 45.1% (Deep Thinking)
GPT-5.2: 54.2% (Pro)

“Graduate-level reasoning (GPQA Diamond)”:
Opus 4.6: 91.3%
Opus 4.5: 87.0%
S…

A large comparison table showing benchmark performance across five model families, with columns labeled at the top: “Opus 4.6,” “Opus 4.5,” “Sonnet 4.5,” “Gemini 3 Pro,” and “GPT-5.2 (all models).” The Opus 4.6 column is visually highlighted with a light shaded background and rounded border. Rows list tasks and benchmarks on the left, with percentages or scores across models: “Agentic terminal coding (Terminal-Bench 2.0)”: Opus 4.6: 65.4% Opus 4.5: 59.8% Sonnet 4.5: 51.0% Gemini 3 Pro: 56.2% (54.2% self-reported) GPT-5.2: 64.7% (64% self-reported, Codex CLI) “Agentic coding (SWE-bench Verified)”: Opus 4.6: 80.8% Opus 4.5: 80.9% Sonnet 4.5: 77.2% Gemini 3 Pro: 76.2% GPT-5.2: 80.0% “Agentic computer use (OSWorld)”: Opus 4.6: 72.7% Opus 4.5: 66.3% Sonnet 4.5: 61.4% Gemini 3 Pro: — GPT-5.2: — “Agentic tool use (t2-bench)”: Retail: Opus 4.6 91.9%, Opus 4.5 88.9%, Sonnet 4.5 86.2%, Gemini 3 Pro 85.3%, GPT-5.2 82.0% Telecom: Opus 4.6 99.3%, Opus 4.5 98.2%, Sonnet 4.5 98.0%, Gemini 3 Pro 98.0%, GPT-5.2 98.7% “Scaled tool use (MCP Atlas)”: Opus 4.6: 59.5% Opus 4.5: 62.3% Sonnet 4.5: 43.8% Gemini 3 Pro: 54.1% GPT-5.2: 60.6% “Agentic search (BrowseComp)”: Opus 4.6: 84.0% Opus 4.5: 67.8% Sonnet 4.5: 43.9% Gemini 3 Pro: 59.2% (Deep Research) GPT-5.2: 77.9% (Pro) “Multidisciplinary reasoning (Humanity’s Last Exam)”: Without tools: Opus 4.6 40.0%, Opus 4.5 30.8%, Sonnet 4.5 17.7%, Gemini 3 Pro 37.5%, GPT-5.2 36.6% With tools: Opus 4.6 53.1%, Opus 4.5 43.4%, Sonnet 4.5 33.6%, Gemini 3 Pro 45.8%, GPT-5.2 50.0% “Agentic financial analysis (Finance Agent)”: Opus 4.6: 60.7% Opus 4.5: 55.9% Sonnet 4.5: 54.2% Gemini 3 Pro: 44.1% GPT-5.2: 56.6% (5.1) “Office tasks (GDPVal-AA Elo)”: Opus 4.6: 1606 Opus 4.5: 1416 Sonnet 4.5: 1277 Gemini 3 Pro: 1195 GPT-5.2: 1462 “Novel problem-solving (ARC AGI 2)”: Opus 4.6: 68.8% Opus 4.5: 37.6% Sonnet 4.5: 13.6% Gemini 3 Pro: 45.1% (Deep Thinking) GPT-5.2: 54.2% (Pro) “Graduate-level reasoning (GPQA Diamond)”: Opus 4.6: 91.3% Opus 4.5: 87.0% S…

Opus 4.6 is here!

biggest wins on agentic search, HLE & ARC AGI 2

claude.com/blog/opus-4-...

05.02.2026 18:02 — 👍 88    🔁 7    💬 5    📌 3

Here’s one that’s not going to happen

04.02.2026 20:40 — 👍 121    🔁 4    💬 29    📌 2

New CATL sodium ion batteries have:
- better performance in cold temps
- cheaper to make than lithium ion batteries
- significantly more stable and safer from fires.

27.01.2026 00:53 — 👍 40    🔁 5    💬 2    📌 0
Results of a VN/personality test about AI and the future.
Your archetype: The Immortal Revel. Carefully curious about strange minds, machines roaming free.

Results of a VN/personality test about AI and the future. Your archetype: The Immortal Revel. Carefully curious about strange minds, machines roaming free.

This was moderately fun, indeed.

24.01.2026 20:17 — 👍 3    🔁 0    💬 0    📌 0

And, of course, policy can be amended as the future arrives some more.

23.01.2026 19:03 — 👍 0    🔁 0    💬 0    📌 0

I think this mostly just demands proof from the contributor that the thing actually works and is useful. If you have actually tried it yourself and deem it useful, then the maintainers can also start paying attention. I think this is a sane social baseline at present capabilities.

23.01.2026 19:03 — 👍 0    🔁 0    💬 2    📌 0

A very sane AI usage policy for any open source project that still cares about quality.

23.01.2026 18:10 — 👍 0    🔁 1    💬 0    📌 0

Democracy basically means electing a president. But the president,

21.01.2026 15:06 — 👍 29    🔁 4    💬 1    📌 1

A more efficient and more interpretable alternative to fat FFNs in Transformers. Sounds interesting...

21.01.2026 15:15 — 👍 0    🔁 0    💬 0    📌 0

6th street also is an NUSA front, that probably helps.

animals are just brutes for hire...

20.01.2026 22:50 — 👍 2    🔁 0    💬 0    📌 0

New king of 30B released?
A model size that remains largely feasible for local deployments.

19.01.2026 19:25 — 👍 2    🔁 0    💬 0    📌 0
rust/jxl/v0_2 - chromium/src/third_party - Git at Google

that's my understanding, yes.

chromium wrapper for using it: chromium.googlesource.com/chromium/src...

13.01.2026 16:21 — 👍 4    🔁 0    💬 0    📌 0

That is, indeed, what they're now using.

13.01.2026 16:07 — 👍 5    🔁 0    💬 1    📌 0

The cyberpunk comedy we deserve.

13.01.2026 15:03 — 👍 7    🔁 0    💬 0    📌 0

thanks!
it's a nice little library...

13.01.2026 09:41 — 👍 0    🔁 0    💬 0    📌 0

@slckl is following 20 prominent accounts