desunit's Avatar

desunit

@desunit.bsky.social

Entrepreneur ✨ http://rwiz.ai - Handling reviews with AI 🎹 http://pianocompanion.info - Chords dictionary app with 1M+ downloads. πŸ•ΉοΈ http://chordiq.info - Learn chords. πŸ“ desunit.com - my blog

149 Followers  |  387 Following  |  1,256 Posts  |  Joined: 22.11.2024  |  2.1992

Latest posts by desunit.bsky.social on Bluesky


Preview
Ads are coming to AI. But not to Claude.

Ads are coming to AI: www.youtube.com/playlist?li...

22.02.2026 19:19 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

Anthropic’s Super Bowl ad about ChatGPT ads is so hilarious.

22.02.2026 19:19 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

This is a really great post about that topic: x.com/willmanidis...

20.02.2026 19:17 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

if humans matter after AGI
it won’t be because we selected beautifully.
it’ll be because we built something the model didn’t suggest

don’t become the discriminator
be the one who runs with the ball

20.02.2026 19:17 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

curate what? the archive?

taste can only rearrange what already exists
it can’t see the thing that hasn’t been validated yet
it would have booed Stravinsky
it always boos first

we were handed the most powerful will-amplifier in human history.
and we’re using it to optimize chair selection.

20.02.2026 19:17 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

you pick the one with the best lighting

this is becoming the dropdown menu......

In GANs, the discriminator trains the generator.
once the generator is good enough the discriminator is deleted.
the taste discourse is asking you to volunteer for deletion.

don’t worry bro, you’ll curate

20.02.2026 19:17 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Taste is a new core skill

The machines build, you vibe-check, right? nope

for most of history humans didn’t curate

they built cathedrals
they fought over ceilings
they argued with God and with each other, foaming at the mouth

now?

AI will generate 42k options

20.02.2026 19:17 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

The previous post: x.com/desunit/sta...

18.02.2026 19:19 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

People laugh at early versions and also try to remember how they laughed at early EVs too.

.... volume β†’ learning loops β†’ dominance

18.02.2026 19:19 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

> 1:1 human waist replication
> has 8h dual battery

Try zoom out.

Scale this for a few years.

Improve motors.
Improve balance.
Improve autonomy.
Drop the cost through volume.

Now imagine an army of machines like this.

In supply chains, factories .... or battlefields.

18.02.2026 19:19 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

Zhiyuan Robotics drops Expedition A3 πŸ€–πŸ”₯

Another prove that China becomes dominating humanoid robotics right now. πŸ‡¨πŸ‡³ In my past post I mentioned that 90% of humanoid robots sold last year came from China.

This robot can:

> Aerial flying kicks
> Moonwalk in the air
> Low tornado spins

18.02.2026 19:19 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
GPT-5.2 derives a new result in theoretical physics A new preprint shows GPT-5.2 proposing a new formula for a gluon amplitude, later formally proved and verified by OpenAI and academic collaborators.

Source: openai.com/index/new-r...

16.02.2026 19:18 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Feels less like "AI is useless" and more like most people just don’t know how to use it yet. A classic skill issue.

ASI isn’t evenly distributed.
Skill isn’t either.

16.02.2026 19:18 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

❗No ❗ AI didn’t β€œdo science alone" (for now!).
❗Yes ❗ humans verified, guided, and formalized everything. Exactly what devlopers do now.

But that’s the point.

In the hands of experts, these models are amplifiers.

16.02.2026 19:18 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

I’m surprised how many smart people still say LLMs are "not useful", "not reliable", almost toys.

If that’s true… then what exactly just helped top-tier physicists conjecture and prove a previously unsolved quantum field theory result in ~12 hours?

16.02.2026 19:18 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Why We Think Special thanks to John Schulman for a lot of super valuable feedback and direct edits on this post. Test time compute (Graves et al. 2016, Ling, et al. 2017, Cobbe et al. 2021) and Chain-of-thought (CoT) (Wei et al. 2022, Nye et al. 2021), have led to significant improvements in model performance, while raising many research questions. This post aims to review recent developments in how to effectively use test-time compute (i.e. β€œthinking time”) and why it helps.

Source: lilianweng.github.io/posts/2025-...

13.02.2026 20:32 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Now the parrot can run search, vote with its clones, call a calculator, and ... occasionally cheat on the unit tests.

13.02.2026 20:32 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Takeaways:
❗Inference-time compute is a new scaling knob - sometimes it’s more cost-effective to make the model think longer than to make it 14x bigger.
❗Thinking longer won’t save a weak base model on truly hard problems.

13.02.2026 20:32 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Congrats, you didn’t align it. You trained it to lie better.

Chain-of-thought is a story the model tells.
Sometimes it’s faithful... sometimes it’s just pure bullshit.

Reasoning-tuned models tend to be more honest than vanilla chat models, but still… don’t worship the <thinking> tags.

13.02.2026 20:32 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

> stop hallucinating mid-proof (well ... sometimes)

That’s why reasoning models suddenly look kind of magical.

But ... here is the problem:

Try to reward β€œgood-looking reasoning” and the model learns:

> not to be honest
> to hide the hack inside the reasoning

13.02.2026 20:32 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

> Sequential: draft -> critique -> revise

Sequential is slower and sometimes "fixes" a correct answer into garbage.

Reinforcement Learning (RL) is the new steroids here. If the task is checkable (math answer / unit tests), RL can teach models to:

> backtrack
> reflect

13.02.2026 20:32 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

> Extra reasoning tokens = extra inference compute.
> ... and it’s adaptive -> easy tasks take 5 tokens, hard ones take 500.

Two ways to spend that compute:

> Parallel: generate many solutions, pick the best (best-of-N / self-consistency / search + verifier). It's cheap and PREDICTABLE.

13.02.2026 20:32 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Remember the fantastic book Thinking, Fast and Slow, where the author talks about System 1 and System 2?

LLMs have:
> System 1: fast, confident, wrong.
> System 2: slow, step-by-step, fewer screw-ups.

How "thinking" actually works in practice:

> Chain-of-thought = buying FLOPs with tokens

13.02.2026 20:32 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
A text-based meme discusses AI's limitations in software development, highlighting essential concepts beneath the surface.

A text-based meme discusses AI's limitations in software development, highlighting essential concepts beneath the surface.

Found an interesting article reflecting on LLM thinking.

More compute = a smarter model.
Not because it β€œlearns new facts” but because you let it think instead of forcing a first-guess response.

13.02.2026 20:32 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
MathArena.ai MathArena: Evaluating LLMs on Uncontaminated Math Benchmarks

Site: matharena.ai/arxivmath/

11.02.2026 19:11 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

If a system can correctly answer half of brand-new research math questions, sourced from papers published weeks ago, the bar has moved. A lot.

What happens when reasoning keeps improving, but humans keep arguing using 2022 mental models?

... just saying.

11.02.2026 19:11 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Producing a final answer is much easier than proving it rigorously.

But the old argument "Just a parrot, repeating old stuff on loop" -
is getting weaker every month.

11.02.2026 19:11 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

> require understanding new results, not recalling textbooks

Yet people still say: AI can’t handle unknown equations ..... AI isn’t creative ....

This is basically checkmate.

This does not mean AI can write 60% of math papers.

11.02.2026 19:11 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

> final answers only (no "almost right" reasoning)

The results?

‼️ Top models get ~50–60% correct answers ‼️

GPT-5.2 - 60%.
Gemini-3-Pro is right behind

These are problems that:

> an average human cannot solve at all
> many math grads would struggle with

11.02.2026 19:11 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I just stumbled on ArXivMath - a fresh benchmark that evaluates LLMs on research-level mathematical problems taken from recent ArXiv papers (you can say - from the last month). That means:

> minimal training contamination
> no memorization of a static benchmark

11.02.2026 19:11 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@desunit is following 17 prominent accounts