If you ever want to see a really interesting AI thinking trace, push it really hard on literature or poetry suggestions.
Here is Claude 4.6 Opus working through poetry in its reasoning when I asked it to find something that captures the feeling of AI while avoiding its usual favorites (eg Rilke)
Don't tell anyone but such courses is the one place where I find AI browsers like Atlas from OpenAI very useful, it may take it for you ;)
Benchmarks from historians show that AI transcription from handwriting is now better than human, and a very cheap model is as good as people.
There are now massive troves of documents that could be made available for research that would have been impossible or prohibitive to transcribe before.
I complain a lot about RL lately, and here we go again.
The CS view of RL is wrong in how it thinks about rewards, already at the setup level. Briefly, the reward computation should be part of the agent, not part of the environment.
More at length here:
gist.github.com/yoavg/3eb3e7...
How can we use neural networks to bolster mathematical discovery? Geordie Williamson's @simonsfoundation.org Presidential Lecture is online, catch up now:
www.youtube.com/watch?v=Uxr_...
Fresh on the arXiv: @booleananalysis.bsky.social, Kewen Wu, and I present new classical algorithms for the Short Integer Solution problem (under infinity norm) that outperform the elegant Chen-Liu-Zhandry quantum algorithm, showing that there is no exponential quantum speed up anymore.
We are starting to see some nuanced discussions of what it means to work with advanced AI in its current state
In this case, GPT-5 Pro was able to do novel math, but only when guided by a math professor (though the paper also noted the speed of advance since GPT-4)
The reflection is worth reading.
A fully autonomous robot which, every morning, sets plates on the table, fetches ingredients in the kitchen, and prepares avocado toast.
"Move things and breakfast."
(In case you hadn't been following, the environmental impact of current AI models is now much lower, generating 100,000 words with AI uses less power than watching Netflix for 45 minutes on your TV)
If you haven't done this with o3, you haven't really seen what these models can do.
This is one of the most-shared posts on Bluesky in the past day and it's just completely false. You might think ChatGPT is a *bad* search engine, or prefer another search engine. But it has had integrated web search since last year.
"o3, You are a consultant hired by the Dark Lord, analyze the org chart of Mordor. How would you improve it for today's changing Middle Earth"
o3 does some actual satire, ending with: “One Org to rule them all, One Org to find them, One Org to bring them all, And in the darkness, align them.”
For years I've been throwing the same puzzle challenge at new GPT models. Every one has failed, until now.
matthodges.com/posts/2025-0...
Oh, I see!! Yes, the id is totally unnecessary in this place. Probably a leftover or compatibility for the other API where you don't repeat everything on each call. Sorry for the confusion!!
Which API exactly is this? Is it function calling in OpenAI Responses API? Do you really need to send the whole history? The weather example doesn't seem to do it? platform.openai.com/docs/guides/function-calling?api-mode=responses
Now that's a good reason to ask why...
Isn't it because it's happening asynchronously over the network on different machines possibly for many tool innovations and chats in parallel and the id makes sure you find the path to the right place?
Exciting news: @waymo.bsky.social is beginning public service on the Peninsula, starting with Palo Alto, Mountain View, and Los Altos! Initial service area below.
This was fun: "o1, build a simulator of a D&D guild hall. Persistent characters come in, get quests, interact with each other, leave & return, make it procedurally generated"
I kept asking it to add other ideas (relationships, etc) 8 times, got no errors, just worked each time. Desire-based coding!
The NIH overhead cut doesn't just hurt universities.
It's deadly to the US economy.
The US is a world leader in tech due to the ecosystem that NIH and NSF propel. It drives innovation for tech transfer, creates a highly-skilled sci/tech workforce, and fosters academic/industry crossfertilization.
Saturday ice update - #Arctic sea ice extent is currently the *lowest* on record (JAXA data)
• about 790,000 km² below the 2010s mean
• about 1,450,000 km² below the 2000s mean
• about 2,040,000 km² below the 1990s mean
• about 2,430,000 km² below the 1980s mean
Plots: zacklabe.com/arctic-sea-i...
Neither read nor wrote, no illegal access at all!!
OpenAI’s deep research is very good. Unlike Google’s version, which is mostly a good summarizer of many sources, OpenAI is more like engaging an opinionated (often almost PhD-level!) researcher who follows lead.
Look at how it hunts down a concept in the literature (& works around problems)
In all seriousness how batshit is it that a Chinese AI bot is censoring a book THAT HASN'T EVEN BEEN PUBLISHED YET. What dystopia are we all living in.
this post is trending in my feed but it does not make sense. i don't see any reasonable interpretation by which DeepSeek demonstrate that model scaling is not the best way to develop AI. their model is very large, and their training corpus is very large. they were just scaling more efficiently.
This post mostly argues about variants of training on test - maybe only a verifier, maybe only validation in test. None of that happened. The other point is more generally that hiding funding is a bad idea - and I personally agree very much, unsure why it happened as it's an especially bad idea here
It's certainly a weird one - but I only learned about it from the press, as I did about that dataset, I didn't realize OpenAI was involved until after they published their first paper. As I said - researchers may not agree (or even know) about many things, but that doesn't mean we train on test.
Also, as far as I can tell (I'm not a lawyer) there's nothing very non-standard in OpenAI work contracts. I have one and certainly have never agreed to lie or deceive. Not only that, but I actually find the culture internally very open to debate and criticism and very opposed to cheating of any kind