Harness engineering: leveraging Codex in an agent-first world
By Ryan Lopopolo, Member of the Technical Staff
openai.com/index/harnes...
i have been mildly obsessed with the ideas in here for the past month or so.
incredibly validating, total victory: my suspicions are correct, actually. "This functions like garbage collection" is a sentence I have said out loud. craziness
13.02.2026 16:17 โ ๐ 114 ๐ 14 ๐ฌ 12 ๐ 3
Sure, I didnโt mean on a technical level, more that none of the frontier model providers would go near it (except maybe grok), because it would kill off any alignment & safety features.
The models that have done this tend to be smaller, for local use.
13.02.2026 20:34 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
Artificial Humanities is a bestseller on Amazon's kindle! Snatch it while it's free :)
11.02.2026 19:32 โ ๐ 5 ๐ 2 ๐ฌ 0 ๐ 0
LLMs can use similes and make allusions; they can be vivid and concrete, &c.
But they cannot spend 100 pages making you think Wickham is the charming love interest while inserting deniable clues that willโonly in retrospect!โreveal you should have known heโs a cad.
Theyโre not trained to mislead.+
13.02.2026 12:02 โ ๐ 47 ๐ 7 ๐ฌ 6 ๐ 1
Not only that, itโs hard to see how you could train an LLM to embody these qualities *and* not fall foul of alignment issues.
I mean, if you encouraged all of this things, an LLM or user could use them to freely work around any restrictions or safety guards without any complex jailbreaks.
13.02.2026 14:47 โ ๐ 3 ๐ 0 ๐ฌ 2 ๐ 0
In some respects this should be entirely unsurprising.
All of the qualities that combine to make an interesting story like intrigue, mystery, deception, misdirection, etcโฆ would be absolutely *terrible* qualities in a helpful personal assistant!
LLMs are actively trained against them.
13.02.2026 14:44 โ ๐ 5 ๐ 1 ๐ฌ 1 ๐ 0
Having seen what agentic frameworks can do for coding, i suspect that a specialised fiction focused writing equivalent, combined with the right set of agentic skill definitions *might* just be able to pull it off.
13.02.2026 14:11 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
that roleplay tools generally struggle with - developing, and advancing a plot, drawing a scene to a close, etcโฆ
My biggest takeaway has so far has been that you likely need a combination of modes; planning for plot & constraints, short roleplays for character work, but somethings still missing.
13.02.2026 14:09 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
Characters definition in the prompt - bad for token caching, but better for believability as they donโt get exposed to โprivateโ character backstory.
Once you start to see things like this, you can see varying degrees of it everywhere, not just chars.
And thatโs before you get to main weaknesses
13.02.2026 14:09 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
If you havenโt already, you should probably try role playing a bit first.
Itโll give a you a good insight into some of the problems LLMs have with story writing and ways to try and work around it with varying degrees of success.
E.g. SilliTaverns group chat defaults to only including the current
13.02.2026 14:09 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
Meta Plans to Add Facial Recognition Technology to Its Smart Glasses
Facebook plans to put facial recognition in its glasses and they think weโre too stupid to fight back.
Their internal memo: โWe will launch during a dynamic political environment where many civil society groups that we would expect to attack us would have their resources focused on other concerns.โ
13.02.2026 12:16 โ ๐ 1613 ๐ 921 ๐ฌ 91 ๐ 294
AutoDiscovery: Open-ended Scientific Discovery via Bayesian Surprise
ArXiv link for AutoDiscovery: Open-ended Scientific Discovery via Bayesian Surprise
AUTODISCOVERY leverages Bayesian surprise for hypothesis generation, achieving 5-29% more unexpected findings than previous methods, promising to accelerate scientific exploration and transform AI-driven research. https://arxiv.org/abs/2507.00310
13.02.2026 10:11 โ ๐ 1 ๐ 1 ๐ฌ 0 ๐ 1
i am linking this to the next person who tells me that llms do not make money remotely comparable to their investment. anthropic's annual revenue is 1/2 the size of their funding round
also lol a billion dollars of that is claude code subs they've sold in the last month
13.02.2026 06:28 โ ๐ 149 ๐ 16 ๐ฌ 13 ๐ 2
Introducing FragCoord: My ultimate shader editing tool!
13.02.2026 02:20 โ ๐ 235 ๐ 69 ๐ฌ 10 ๐ 6
WeatherNext 2
WeatherNext 2 is our most accurate AI weather forecasting technology.
Thatโs not true in the slightest. Thereโs a wide range of AI systems, some of which are specifically designed for predicting things and can be quite good at it.
Take Googles recent AI weather forecaster: 8x faster than conventional models, 5-10 times longer range
deepmind.google/science/weat...
12.02.2026 23:01 โ ๐ 1 ๐ 0 ๐ฌ 2 ๐ 0
There isnโt one and they know it, which is why theyโre trying to move up the stack and integrate with apps, services, and enterprise suites before they run out of runway.
12.02.2026 20:10 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
Why the fuck do companies pull this shit?
Itโs already hard to enough to pick a frontier AI provider based on their principles & values, and Anthropic was about the only reasonable contender for those who cared.
Time to re-evaluate Mistral & the Chinese I guess ๐คฌ
12.02.2026 16:03 โ ๐ 5 ๐ 2 ๐ฌ 0 ๐ 0
I love this.
But also, the crazy small size makes it seem like one of those old school โcode golfโ challenges to see how small it can be made & still function ๐
12.02.2026 14:10 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
microgpt
microgpt. GitHub Gist: instantly share code, notes, and snippets.
@karpathy.bsky.social 's microgpt.py
Train and inference GPT in 243 lines of pure, dependency-free Python.
gist.github.com/karpathy/862...
11.02.2026 23:56 โ ๐ 83 ๐ 14 ๐ฌ 1 ๐ 3
Rates dogs guy is all out of fucks
12.02.2026 12:28 โ ๐ 7 ๐ 3 ๐ฌ 0 ๐ 0
You can turn on โrequire alt textโ for images in the Bluesky accessibility settings to prevent that
12.02.2026 09:52 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
Itโs also very weirdly blinkered in ignoring Anthropicโs own well regarded research into how its own models work?
Itโs a bizarre level of corporate self-ignorance.
I get, and applaud to a degree, Anthropicโs free reign, but surely she should be a *bit* curious about what her colleagues *do* know?
11.02.2026 16:52 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0
Actually, you might be able to squeeze a little more data out by using a custom link embed?
Link to a service the generates an embed with the extra data, before deleting the URL from the post.
Iโm not sure if/how Bluesky persist embed info though? I presume it does because the url gets deleted?
11.02.2026 16:45 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
Ahh, for a minute I thought they were getting all fancy and suggesting that maybe a data URL would take advantage of some hidden Bluesky rules to save precious chars.
But yeah, offsite is kinda cheating. You could abuse image alt text too, but thatโs another bad smell :/
11.02.2026 16:35 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
I know youโre joking, but now youโe got me wondering what kind of post size a custom client with compression support could reach & still stick within the underlying ATProto constraints to remain compatible with Bluesky ๐ค
11.02.2026 14:17 โ ๐ 2 ๐ 1 ๐ฌ 2 ๐ 0
AI Agents, Social Computing, Agent Swarm, GraphRAG, Data Fabric
Reviewing the UK's Sticky Toffee Puddings
Lawyer, author, EFF's first hire, Godwin's Law creator (he/him). Retweeting!=endorsing. I tell jokes here, mostly. My opinions here don't necessarily represent any employer or any client. You may have known me as @sfmnemonic on Twitter.
Official Firefox account for people who build on the web. Learn about the things we're working on to grow and improve the web platform.
Privacy first, open-source browser extensions from @seth.bsky.social.
Learn more: https://tinyextensions.com
Cloudflare is the worldโs leading connectivity cloud, and we have our eyes set on an ambitious goal โ to help build a better Internet.
A weekly, slop-free, human-written digest about the ATProto ecosystem.
Subcribe here: https://atprotodigest.substack.com
computer toucher. here for AI mostly.
weibac.github.io | ๐ณ๏ธโ๐
Data enthusiast, Father, consultant
Director of Technology @ Solita.fi | Tech management & leadership, Dev, AI, agents, SWE, low-code, innovation and emerging tech.
Offline: Runner, angler, RES/SRA, Tibetan Terrier owner.
this endless blue sky is driving me insane
Insipid but well-meaning software developing art gremlin.
"Incoherent; mayhem." - 4/5 stars.
"Couldn't make any sense of any of it, it's like he's talking to himself." - 2/5 stars.
https://mastodon.esmevane.com/ironchamber
https://esmevane.com
Cleverly push monsters around a grid and blow them up.
Demo๐ https://store.steampowered.com/app/2994110/
haskell/rust/art/shitposting/assyrian
she/her (like a ship, not like a person)
๐ The lifting leftist. I love my wife. Filthy Java / Kotlin casual. Class unites almost all of us. Ethnic & Racial pluralism is the only way. Trans & Queer ally, or the closest thing. Please, don't ever feel obligated to follow back.
ML Professor at รcole Polytechnique. Python open source developer. Co-creator/maintainer of POT, SKADA. https://remi.flamary.com/
AI for Science, deep generative models, inverse problems. Professor of AI and deep learning @universitedeliege.bsky.social. Previously @CERN, @nyuniversity. https://glouppe.github.io
Senior Data Scientist at BuzzFeed in San Francisco // AI content generation ethics and R&D // plotter of pretty charts
https://minimaxir.com
Game Dev โข Generative Artist โข Programmer โข Size Coder โข Musician โข Zen Buddhist โข Wizard โข Friend to Cats ๐ linktr.ee/frankforce
An experimental AI videogame studio.