Stuart Gray @sgray - Bluesky Profile

Harness engineering: leveraging Codex in an agent-first world By Ryan Lopopolo, Member of the Technical Staff

openai.com/index/harnes...

i have been mildly obsessed with the ideas in here for the past month or so.

incredibly validating, total victory: my suspicions are correct, actually. "This functions like garbage collection" is a sentence I have said out loud. craziness

13.02.2026 16:17 — 👍 114 🔁 14 💬 12 📌 3

Sure, I didn’t mean on a technical level, more that none of the frontier model providers would go near it (except maybe grok), because it would kill off any alignment & safety features.

The models that have done this tend to be smaller, for local use.

13.02.2026 20:34 — 👍 1 🔁 0 💬 0 📌 0

Artificial Humanities is a bestseller on Amazon's kindle! Snatch it while it's free :)

11.02.2026 19:32 — 👍 5 🔁 2 💬 0 📌 0

LLMs can use similes and make allusions; they can be vivid and concrete, &c.

But they cannot spend 100 pages making you think Wickham is the charming love interest while inserting deniable clues that will—only in retrospect!—reveal you should have known he’s a cad.

They’re not trained to mislead.+

13.02.2026 12:02 — 👍 47 🔁 7 💬 6 📌 1

Not only that, it’s hard to see how you could train an LLM to embody these qualities *and* not fall foul of alignment issues.

I mean, if you encouraged all of this things, an LLM or user could use them to freely work around any restrictions or safety guards without any complex jailbreaks.

13.02.2026 14:47 — 👍 3 🔁 0 💬 2 📌 0

In some respects this should be entirely unsurprising.

All of the qualities that combine to make an interesting story like intrigue, mystery, deception, misdirection, etc… would be absolutely *terrible* qualities in a helpful personal assistant!

LLMs are actively trained against them.

13.02.2026 14:44 — 👍 5 🔁 1 💬 1 📌 0

Having seen what agentic frameworks can do for coding, i suspect that a specialised fiction focused writing equivalent, combined with the right set of agentic skill definitions *might* just be able to pull it off.

13.02.2026 14:11 — 👍 2 🔁 0 💬 1 📌 0

that roleplay tools generally struggle with - developing, and advancing a plot, drawing a scene to a close, etc…

My biggest takeaway has so far has been that you likely need a combination of modes; planning for plot & constraints, short roleplays for character work, but somethings still missing.

13.02.2026 14:09 — 👍 1 🔁 0 💬 1 📌 0

Characters definition in the prompt - bad for token caching, but better for believability as they don’t get exposed to “private” character backstory.

Once you start to see things like this, you can see varying degrees of it everywhere, not just chars.

And that’s before you get to main weaknesses

13.02.2026 14:09 — 👍 1 🔁 0 💬 1 📌 0

If you haven’t already, you should probably try role playing a bit first.

It’ll give a you a good insight into some of the problems LLMs have with story writing and ways to try and work around it with varying degrees of success.

E.g. SilliTaverns group chat defaults to only including the current

13.02.2026 14:09 — 👍 1 🔁 0 💬 1 📌 0

Meta Plans to Add Facial Recognition Technology to Its Smart Glasses

Facebook plans to put facial recognition in its glasses and they think we’re too stupid to fight back.

Their internal memo: “We will launch during a dynamic political environment where many civil society groups that we would expect to attack us would have their resources focused on other concerns.”

13.02.2026 12:16 — 👍 1613 🔁 921 💬 91 📌 294

AutoDiscovery: Open-ended Scientific Discovery via Bayesian Surprise ArXiv link for AutoDiscovery: Open-ended Scientific Discovery via Bayesian Surprise

AUTODISCOVERY leverages Bayesian surprise for hypothesis generation, achieving 5-29% more unexpected findings than previous methods, promising to accelerate scientific exploration and transform AI-driven research. https://arxiv.org/abs/2507.00310

13.02.2026 10:11 — 👍 1 🔁 1 💬 0 📌 1

i am linking this to the next person who tells me that llms do not make money remotely comparable to their investment. anthropic's annual revenue is 1/2 the size of their funding round

also lol a billion dollars of that is claude code subs they've sold in the last month

13.02.2026 06:28 — 👍 149 🔁 16 💬 13 📌 2

UK ban on Palestine Action unlawful, high court judges rule Protest group’s co-founder wins legal challenge against decision to proscribe it under anti-terrorism laws

Wow!

Palestine Action win judicial review - Guardian report.

This is a *big* legal win.

www.theguardian.com/uk-news/2026...

13.02.2026 10:11 — 👍 1777 🔁 754 💬 24 📌 0

Introducing FragCoord: My ultimate shader editing tool!

13.02.2026 02:20 — 👍 235 🔁 69 💬 10 📌 6

A new AI-powered weather model could be key to the future of your forecast – but there’s a catch | CNN Accurately predicting the weather is hard — really hard, but a new AI-powered forecast model just hit a milestone that has experts saying your forecast could soon get more accurate, and further out, t...

Significantly more accurate up to 15 days ahead, there’s plenty of coverage available including a paper in Nature:

edition.cnn.com/2024/12/13/w...

12.02.2026 23:21 — 👍 0 🔁 0 💬 1 📌 0

WeatherNext 2 WeatherNext 2 is our most accurate AI weather forecasting technology.

That’s not true in the slightest. There’s a wide range of AI systems, some of which are specifically designed for predicting things and can be quite good at it.

Take Googles recent AI weather forecaster: 8x faster than conventional models, 5-10 times longer range

deepmind.google/science/weat...

12.02.2026 23:01 — 👍 1 🔁 0 💬 2 📌 0

The $285 Billion 'SaaSpocalypse' Is the Wrong Panic AI labs aren’t winning by intelligence alone, SaaS isn’t dying, and the real battle is over who becomes the system of action in the agentic enterprise. That battle is a symetric one.

tl;dr - be a system of record

www.decodingdiscontinuity.com/p/285-billio...

12.02.2026 16:15 — 👍 5 🔁 1 💬 0 📌 0

There isn’t one and they know it, which is why they’re trying to move up the stack and integrate with apps, services, and enterprise suites before they run out of runway.

12.02.2026 20:10 — 👍 0 🔁 0 💬 0 📌 0

Why the fuck do companies pull this shit?

It’s already hard to enough to pick a frontier AI provider based on their principles & values, and Anthropic was about the only reasonable contender for those who cared.

Time to re-evaluate Mistral & the Chinese I guess 🤬

12.02.2026 16:03 — 👍 5 🔁 2 💬 0 📌 0

I love this.

But also, the crazy small size makes it seem like one of those old school “code golf” challenges to see how small it can be made & still function 😂

12.02.2026 14:10 — 👍 0 🔁 0 💬 0 📌 0

microgpt microgpt. GitHub Gist: instantly share code, notes, and snippets.

@karpathy.bsky.social 's microgpt.py

Train and inference GPT in 243 lines of pure, dependency-free Python.

gist.github.com/karpathy/862...

11.02.2026 23:56 — 👍 83 🔁 14 💬 1 📌 3

Rates dogs guy is all out of fucks

12.02.2026 12:28 — 👍 7 🔁 3 💬 0 📌 0

How I used Claude Code in a real data journalism project This morning three colleagues and I published a story outlining how the federal government is using AI. Here’s how I used Claude Code to help. Agencies are required to publish a spreadsheet of AI use ...

New blog post: A case study on using Claude Code in data journalism ->
kschaul.com/post/2026/02...

09.02.2026 21:36 — 👍 17 🔁 5 💬 2 📌 0

You can turn on “require alt text” for images in the Bluesky accessibility settings to prevent that

12.02.2026 09:52 — 👍 2 🔁 0 💬 1 📌 0

It’s also very weirdly blinkered in ignoring Anthropic’s own well regarded research into how its own models work?

It’s a bizarre level of corporate self-ignorance.

I get, and applaud to a degree, Anthropic’s free reign, but surely she should be a *bit* curious about what her colleagues *do* know?

11.02.2026 16:52 — 👍 2 🔁 0 💬 0 📌 0

Embedding Links in Bluesky Posts Learn how to embed website links in your Bluesky posts with rich previews while saving character space.

E.g. www.bskyinfo.com/guides/getti...

11.02.2026 16:46 — 👍 1 🔁 0 💬 0 📌 0

Actually, you might be able to squeeze a little more data out by using a custom link embed?

Link to a service the generates an embed with the extra data, before deleting the URL from the post.

I’m not sure if/how Bluesky persist embed info though? I presume it does because the url gets deleted?

11.02.2026 16:45 — 👍 2 🔁 0 💬 1 📌 0

Ahh, for a minute I thought they were getting all fancy and suggesting that maybe a data URL would take advantage of some hidden Bluesky rules to save precious chars.

But yeah, offsite is kinda cheating. You could abuse image alt text too, but that’s another bad smell :/

11.02.2026 16:35 — 👍 2 🔁 0 💬 1 📌 0

I know you’re joking, but now you’e got me wondering what kind of post size a custom client with compression support could reach & still stick within the underlying ATProto constraints to remain compatible with Bluesky 🤔

11.02.2026 14:17 — 👍 2 🔁 1 💬 2 📌 0

Stuart Gray

Latest posts by sgray.bsky.social on Bluesky

@sgray is following 20 prominent accounts