Ari ari-holtzman - Bluesky Statics

just had one of those days, where every time I started to get into something I realized I was late to something else

26.02.2026 03:18 — 👍 0 🔁 0 💬 0 📌 0

the question when planning an AI event is not whether to invite a Jason or even which Jason to invite, it's how many Jasons to invite

25.02.2026 17:32 — 👍 0 🔁 0 💬 0 📌 0

For instance, try freezing everything except component X, finetune across different Xs (early MLP, late MLP, attention O matrix, etc.), then compare the residual stream — e.g. PCA right before prediction. H/t to Zihao and Victor arxiv.org/html/2502.11...

24.02.2026 21:51 — 👍 0 🔁 0 💬 0 📌 0

Turns out tiny, arbitrary-seeming parameter subsets can learn tasks. Has anyone compared what the same task looks like when learned through different components? Can we map how LMs encode information by seeing what stays the same in task representations over different components?

24.02.2026 21:51 — 👍 1 🔁 0 💬 1 📌 0

idea: a tea bag the screams when it's done steeping

21.02.2026 22:50 — 👍 1 🔁 0 💬 0 📌 0

If a company decides to build a better Siri-like app than Siri, I will donate footage of myself trying to use Siri for simple tasks and becoming increasingly annoyed for your advertisements

21.02.2026 18:45 — 👍 1 🔁 0 💬 0 📌 0

I would pay so much money for a non-chummy LLM

21.02.2026 18:41 — 👍 0 🔁 0 💬 0 📌 0

free idea for @perplexity_ai, @Google , etc:

1. ask an LLM to "describe this page" for each page in your index, teacher force "it's giving", and store whatever it generates next
2. embed the outputs
3. make an interface that lets me browse pages by 'close vibe but far by clicks'

14.02.2026 20:23 — 👍 3 🔁 0 💬 0 📌 0

when asking Claude for candidates, it suggested "vehement" and its suggested mispronouncation IS THE WAY I PRONOUNCE IT

13.02.2026 23:54 — 👍 0 🔁 0 💬 0 📌 0

some words I was confused by as a kid because I never heard someone say them and only saw them in books:

- colonel
- awry
- buoy
- facade
- hors d'oeuvres
- indict
- genuine
- genre
- yacht
- plaid
- San Jose

13.02.2026 23:51 — 👍 2 🔁 0 💬 0 📌 1

📖 ≠ 🧪 The Story is Not the Science.
Code is submitted but rarely executed during peer review—an issue likely to worsen with research agents. 🧑‍🔬
We introduce 𝐌𝐞𝐜𝐡𝐄𝐯𝐚𝐥𝐀𝐠𝐞𝐧𝐭, an execution-grounded evaluation of narrative + execution. 𝐕𝐞𝐫𝐢𝐟𝐲 𝐭𝐡𝐞 𝐬𝐜𝐢𝐞𝐧𝐜𝐞, 𝐧𝐨𝐭 𝐣𝐮𝐬𝐭 𝐭𝐡𝐞 𝐬𝐭𝐨𝐫𝐲.
1/n

10.02.2026 19:44 — 👍 8 🔁 4 💬 2 📌 0

Thinking about this more—I feel like 'the machine that infodumps' is now 'the machine that describes everything in terms of high-level, opaque tradeoffs'

08.02.2026 16:31 — 👍 1 🔁 0 💬 0 📌 0

why is the 'aisles' feature on Amazon Fresh so bad? is this just a genuinely hard HCI problem or a lack of investment? (sure, it's both, but if your reply 'both' without adding anything you lose)

08.02.2026 02:38 — 👍 0 🔁 1 💬 0 📌 0

Everyone's complaining about slop...and listen...I'm The Enemy: I want to make AI that will tell story after story that breaks your heart. But where is the celebration of the human mind as wellspring of delights, given that no AI has produced good media on its own yet?

07.02.2026 22:04 — 👍 0 🔁 0 💬 0 📌 1

I find it really sad that there was this golden window where talking to ChatGPT while I did the dishes was useful and I could mull-over concepts and chew on ideas in dialogue, but the overwhelmingly annoying and hedging persona has made it too vapid for this to be useful

06.02.2026 23:43 — 👍 1 🔁 0 💬 0 📌 0

Why Slop Matters | ACM AI Letters AI-generated “slop” is often seen as digital pollution. We argue that this dismissal of the topic risks missing important aspects of AI Slop which deserve rigorous study. AI Slop serves a social funct...

New paper on Why Slop Matters w/ great group of co authors (@hoytlong.bsky.social @eduede.bsky.social @ari-holtzman.bsky.social + others not on Bluesky) from ACM AI Letters. We try to move the debate re: AI Slop past normative, neg claims & towards parsing its social uses. dl.acm.org/doi/10.1145/...

04.02.2026 18:03 — 👍 22 🔁 8 💬 1 📌 7

open.substack.com/pub/theholtz...

31.01.2026 17:00 — 👍 0 🔁 0 💬 0 📌 0

Yesterday, I started to read Factory Girls by Leslie T. Chang. I had to stop for the day after a couple pages.

31.01.2026 17:00 — 👍 0 🔁 0 💬 1 📌 0

If you scramble token→surface mappings and feed it to an LM, does it eventually crack the code? If so, does the residual stream look normal (mod the shuffle)? Or is it permanently broken because the model cann never fully remap embeddings?

30.01.2026 23:44 — 👍 0 🔁 0 💬 0 📌 0

it's interesting that because of modern media consumption habits, I don't hear about movies that 'got good in the second half' (people stopped watching) but I hear about a lot of sports games that have this structure.

I wonder how much 'starts slow' media I'm missing out on

30.01.2026 22:25 — 👍 0 🔁 0 💬 0 📌 0

Confession: I've been in denial for years about how powerful a technique Nostalegbraist's Logit Lens is. That said, it's clear there's information it misses. Can we 'delete' the information LL captures from a hidden state/the residual stream and see what's left?

29.01.2026 19:48 — 👍 2 🔁 0 💬 0 📌 0

open LLM releases give us a way to test genuine generalization test, since seeing if models can predict post-release information accurately is more or less the only test it's difficult to contaminate

28.01.2026 16:46 — 👍 1 🔁 0 💬 0 📌 0

The Obvious | News & Analysis A new paradigm emerges in Washington as transactional thinking begins to dominate diplomatic relations, with implications that extend far beyond traditional geopolitical calculations.

obsessed with this news site that claims to be up to date and cut through the noise but has no articles when you click and is completely empty in every way

28.01.2026 03:45 — 👍 1 🔁 0 💬 0 📌 0

yeah, I thought this paper was cool! but I think it's not obvious how good the ability to predict the future here is. you'd need. way more granular study to see

28.01.2026 03:17 — 👍 1 🔁 0 💬 0 📌 0

How conditional is concept space in LLMs?

When an LLM wants to emit 'washing machine' with high probability is there a direction/encoding in the residual stream of that, or is 'washing' promoted and then 'machine' becomes likely due to the conditional information?

27.01.2026 02:55 — 👍 1 🔁 0 💬 1 📌 0

LLMs are bad at 'automate X'. But if I had grown up with LLMs, I think my ability to navigate information and figure out where and how to learn what I wanted would have been easily an order of magnitude more expansive. That's something.

24.01.2026 01:15 — 👍 0 🔁 0 💬 0 📌 0

we immediately got hatemail about how "we're trapped [sic] and turning to entertainment...makes things far worse."

I genuinely feel sad if you don't see how many folks are building challenging entertainment that expands the human spirit. We should use AI to make new kinds of 'thick entertainment'!

19.01.2026 23:36 — 👍 1 🔁 0 💬 2 📌 0

AI as Entertainment Generative AI systems are predominantly designed, evaluated, and marketed as intelligent systems which will benefit society by augmenting or automating human cognitive labor, promising to increase per...

I love this new preprint from Cody Kommers + @ari-holtzman.bsky.social so much. arxiv.org/abs/2601.08768
Super contrarian & generative argument that we need to start better evaluating AI systems for their capacity to delight/entertain, not just perform intelligence/cognition - as cultural machines.

19.01.2026 19:06 — 👍 14 🔁 5 💬 1 📌 0

@richardjeanso.bsky.social fight me

16.01.2026 03:35 — 👍 1 🔁 0 💬 1 📌 0

if you don’t let yourself be a little cliché alone at home, just for the joy of it, you may have forgotten how to love life in an honest and unmediated way

16.01.2026 03:35 — 👍 0 🔁 0 💬 1 📌 0

Posts by Ari (@ari-holtzman.bsky.social)