I didn't know where this post was going when I started and I'm not sure where it went now that it ended, but that felt correct in some way.
www.alexirpan.com/2025/11/16/a...
@alexirpan.bsky.social
Research Scientist @ Google DeepMind. Formerly Robotics, now AI Safety. Has a blog. Views are my own.
I didn't know where this post was going when I started and I'm not sure where it went now that it ended, but that felt correct in some way.
www.alexirpan.com/2025/11/16/a...
First paper since switching into AI safety teamπ
We look at problems that could be solved if the model behaved consistently over a set of prompts, and tried training that in output space and internal activations. Both were effective. See thread or paper for details.
Today is my 10 year blogging anniversary.
www.alexirpan.com/2025/08/18/t...
For the past month I have been working on a blog post about niche MLP fandom drama. Well here it is.
www.alexirpan.com/2025/07/21/b...
"I don't play gacha games because they're a scam"
vs
"Let me do one more hyperparam sweep before giving up. One more prompt tuning run. I swear we'll beat baseline. I know it's gonna beat the baseline this time. It's gonna win. This time for sure."
My MIT Mystery Hunt post for the year
www.alexirpan.com/2025/01/28/m...
I am now back from #MITMysteryHunt with no memory of anything besides Hunt from MLK weekend. Really this is probably for the best.
21.01.2025 16:38 β π 7 π 0 π¬ 0 π 0It is time for more posts about Neopets
www.alexirpan.com/2025/01/09/d...
The ship has sailed, but I wish the ML reporting default was % incorrect rather than % correct. It better matches loss curves and magnifies the capture of edge cases.
95% accuracy -> 97.5% accuracy = meh
5% error -> 2.5% error = omg we've halved the error rate
The question of "how's o1 using its test compute" is better asked to someone who worked on it, since AFAIK that hasn't been disclosed. But yes, language models having really dynamic / freeform actions makes them hard to think about.
05.12.2024 04:56 β π 3 π 0 π¬ 1 π 0I wrote some stuff on OpenAI o1
www.alexirpan.com/2024/12/04/l...