Ads are coming to AI: www.youtube.com/playlist?li...
22.02.2026 19:19 β π 0 π 0 π¬ 0 π 0@desunit.bsky.social
Entrepreneur β¨ http://rwiz.ai - Handling reviews with AI πΉ http://pianocompanion.info - Chords dictionary app with 1M+ downloads. πΉοΈ http://chordiq.info - Learn chords. π desunit.com - my blog
Ads are coming to AI: www.youtube.com/playlist?li...
22.02.2026 19:19 β π 0 π 0 π¬ 0 π 0Anthropicβs Super Bowl ad about ChatGPT ads is so hilarious.
22.02.2026 19:19 β π 1 π 0 π¬ 1 π 0This is a really great post about that topic: x.com/willmanidis...
20.02.2026 19:17 β π 0 π 0 π¬ 0 π 0if humans matter after AGI
it wonβt be because we selected beautifully.
itβll be because we built something the model didnβt suggest
donβt become the discriminator
be the one who runs with the ball
curate what? the archive?
taste can only rearrange what already exists
it canβt see the thing that hasnβt been validated yet
it would have booed Stravinsky
it always boos first
we were handed the most powerful will-amplifier in human history.
and weβre using it to optimize chair selection.
you pick the one with the best lighting
this is becoming the dropdown menu......
In GANs, the discriminator trains the generator.
once the generator is good enough the discriminator is deleted.
the taste discourse is asking you to volunteer for deletion.
donβt worry bro, youβll curate
Taste is a new core skill
The machines build, you vibe-check, right? nope
for most of history humans didnβt curate
they built cathedrals
they fought over ceilings
they argued with God and with each other, foaming at the mouth
now?
AI will generate 42k options
The previous post: x.com/desunit/sta...
18.02.2026 19:19 β π 0 π 0 π¬ 0 π 0People laugh at early versions and also try to remember how they laughed at early EVs too.
.... volume β learning loops β dominance
> 1:1 human waist replication
> has 8h dual battery
Try zoom out.
Scale this for a few years.
Improve motors.
Improve balance.
Improve autonomy.
Drop the cost through volume.
Now imagine an army of machines like this.
In supply chains, factories .... or battlefields.
Zhiyuan Robotics drops Expedition A3 π€π₯
Another prove that China becomes dominating humanoid robotics right now. π¨π³ In my past post I mentioned that 90% of humanoid robots sold last year came from China.
This robot can:
> Aerial flying kicks
> Moonwalk in the air
> Low tornado spins
Source: openai.com/index/new-r...
16.02.2026 19:18 β π 0 π 0 π¬ 0 π 0Feels less like "AI is useless" and more like most people just donβt know how to use it yet. A classic skill issue.
ASI isnβt evenly distributed.
Skill isnβt either.
βNo β AI didnβt βdo science alone" (for now!).
βYes β humans verified, guided, and formalized everything. Exactly what devlopers do now.
But thatβs the point.
In the hands of experts, these models are amplifiers.
Iβm surprised how many smart people still say LLMs are "not useful", "not reliable", almost toys.
If thatβs trueβ¦ then what exactly just helped top-tier physicists conjecture and prove a previously unsolved quantum field theory result in ~12 hours?
Now the parrot can run search, vote with its clones, call a calculator, and ... occasionally cheat on the unit tests.
13.02.2026 20:32 β π 0 π 0 π¬ 1 π 0Takeaways:
βInference-time compute is a new scaling knob - sometimes itβs more cost-effective to make the model think longer than to make it 14x bigger.
βThinking longer wonβt save a weak base model on truly hard problems.
Congrats, you didnβt align it. You trained it to lie better.
Chain-of-thought is a story the model tells.
Sometimes itβs faithful... sometimes itβs just pure bullshit.
Reasoning-tuned models tend to be more honest than vanilla chat models, but stillβ¦ donβt worship the <thinking> tags.
> stop hallucinating mid-proof (well ... sometimes)
Thatβs why reasoning models suddenly look kind of magical.
But ... here is the problem:
Try to reward βgood-looking reasoningβ and the model learns:
> not to be honest
> to hide the hack inside the reasoning
> Sequential: draft -> critique -> revise
Sequential is slower and sometimes "fixes" a correct answer into garbage.
Reinforcement Learning (RL) is the new steroids here. If the task is checkable (math answer / unit tests), RL can teach models to:
> backtrack
> reflect
> Extra reasoning tokens = extra inference compute.
> ... and itβs adaptive -> easy tasks take 5 tokens, hard ones take 500.
Two ways to spend that compute:
> Parallel: generate many solutions, pick the best (best-of-N / self-consistency / search + verifier). It's cheap and PREDICTABLE.
Remember the fantastic book Thinking, Fast and Slow, where the author talks about System 1 and System 2?
LLMs have:
> System 1: fast, confident, wrong.
> System 2: slow, step-by-step, fewer screw-ups.
How "thinking" actually works in practice:
> Chain-of-thought = buying FLOPs with tokens
A text-based meme discusses AI's limitations in software development, highlighting essential concepts beneath the surface.
Found an interesting article reflecting on LLM thinking.
More compute = a smarter model.
Not because it βlearns new factsβ but because you let it think instead of forcing a first-guess response.
Site: matharena.ai/arxivmath/
11.02.2026 19:11 β π 0 π 0 π¬ 0 π 0If a system can correctly answer half of brand-new research math questions, sourced from papers published weeks ago, the bar has moved. A lot.
What happens when reasoning keeps improving, but humans keep arguing using 2022 mental models?
... just saying.
Producing a final answer is much easier than proving it rigorously.
But the old argument "Just a parrot, repeating old stuff on loop" -
is getting weaker every month.
> require understanding new results, not recalling textbooks
Yet people still say: AI canβt handle unknown equations ..... AI isnβt creative ....
This is basically checkmate.
This does not mean AI can write 60% of math papers.
> final answers only (no "almost right" reasoning)
The results?
βΌοΈ Top models get ~50β60% correct answers βΌοΈ
GPT-5.2 - 60%.
Gemini-3-Pro is right behind
These are problems that:
> an average human cannot solve at all
> many math grads would struggle with
I just stumbled on ArXivMath - a fresh benchmark that evaluates LLMs on research-level mathematical problems taken from recent ArXiv papers (you can say - from the last month). That means:
> minimal training contamination
> no memorization of a static benchmark