First time at CMU
13.02.2026 15:35 β π 7 π 0 π¬ 2 π 0@natolambert.bsky.social
A LLN - large language Nathan - (RL, RLHF, society, robotics), athlete, yogi, chef Writes http://interconnects.ai At Ai2 via HuggingFace, Berkeley, and normal places
First time at CMU
13.02.2026 15:35 β π 7 π 0 π¬ 2 π 0Fun to set up real analytics and learn that my RLHF Book pdf is downloaded 50-100 times a day from my site (doesnt include Arxiv downloads/views).
Thanks for reading!
Codex app is nice.
Im just a few minutes in and think it'll make some of the crazy things i was doing way easier to monitor.
Poll. Do you see the famous METR plot holding true on Jan. 1st of 2027 (~20 hours), or 2028 (~50 hours).
What would be the right way to measure tasks of that scope?
Beautiful RL scaling plot from Cursor.
cursor.com/blog/compose...
TLDR: codex is a very useful coding tool, claude is the first agent.
09.02.2026 15:40 β π 26 π 1 π¬ 0 π 0I spent a long time testing the new Opus 4.6 and Codex 5.3 models, but the most striking thing was how so many people are reacting to model releases wrong with how we now use models. In my post-benchmark era.
Claude is still king, but codex is closer than ever
www.interconnects.ai/p/opus-46-vs...
People don't want to accept that the top used open model families in 2026 are.
Overall:
1. Qwen
2. Llama
3. GPT-OSS
Big models:
1. DeepSeek
2. GPT-OSS/Qwen/everyone else
Llama's inertia says a lot about how the ecosystem works.
I want there to be a nanoGPT style speedrunning setup for RL.
06.02.2026 19:29 β π 29 π 0 π¬ 2 π 1The best compliment i can give OpenAI's Codex 5.3 is that it feels way more like Claude Code
06.02.2026 18:07 β π 54 π 2 π¬ 1 π 1GPT Codex 5.3 sounds like a much bigger change than Claude Opus 4.6, will be curious if this holds up in real testing.
05.02.2026 18:31 β π 35 π 0 π¬ 1 π 1βDue to GPTβ5.3-Codex being so different from its predecessors, the data from alpha testing exhibited numerous unusual and counter-intuitive resultsβ
Sounds worth giving a go. Big changes are good.
Reward models (RMs) are supposed to represent human values. But RMs are NOT blank slates β they inherit measurable biases from their base models that stubbornly persist through preference training. #ICLR2026 π§΅
04.02.2026 16:30 β π 18 π 7 π¬ 1 π 1Ending your day at >99% Claude rate limit usage but not maxing out feels like a masterpiece.
05.02.2026 03:32 β π 47 π 0 π¬ 1 π 0Transcript etc: www.interconnects.ai/p/why-nvidia...
04.02.2026 18:05 β π 3 π 0 π¬ 0 π 0Nvidiaβs Nemotron is the closest thing the U.S. has to a Qwen approach to open models, but most people donβt know it yet.
Iβm very bullish on Nvidiaβs open model efforts in 2026.
Interconnects interview #17 on the past, present, and future of the Nemotron project.
www.youtube.com/watch?v=Y3Vb...
Qwen already dropping models for CNY
03.02.2026 17:48 β π 77 π 4 π¬ 5 π 0Gemini not being in the conversation at all with Claude Code and Codex is the real βcode redβ emergency.
03.02.2026 15:23 β π 46 π 0 π¬ 12 π 0Is documented! I did a full memory sweep. The training becomes FLOP limited before memory saturated.
02.02.2026 20:36 β π 0 π 0 π¬ 0 π 0Latest open artifacts (#18): Arcee's 400B MoE, LiquidAI's underrated 1B model, new Kimi, and anticipation of a busy month
Tons of useful "niche" models and anticipation of big releases coming soon.
www.interconnects.ai/p/latest-ope...
Despite being banned, Chinese users (likely via VPNs) are HuggingFace's top user group. They definitely have the most people *building* open models.
01.02.2026 17:07 β π 28 π 5 π¬ 2 π 0claude code writing, codex code review, GPT Pro for planning made a working DPO (and related algorithms) repository from scratch for my RLHF book, and the curves are looking right.
On the dgx spark finetuning olmo 2 1b sft. Built by referencing the original repositories + TRL
Recorded a podcast, think itβs pretty good and comprehensive, hope you like it ;) youtu.be/EV7WhVT270Q?...
31.01.2026 23:06 β π 38 π 4 π¬ 1 π 1I'm visiting CMU for a talk at the Language Technologies Institute on feb 12/13th. Looking forward to chatting with folks about frontiers in RL and building agentic language models.
Email me with "CMU Visit" in the subject if you're interested in chatting & why!
More people should think about future AIs as part of the audience for their writing (or work).
31.01.2026 16:40 β π 40 π 2 π¬ 4 π 4My raw thoughts on the job market -- both for those hiring and those searching -- at the cutting edge of AI.
On standing out and finding gems.
www.interconnects.ai/p/thoughts-o...
0-1%
28.01.2026 16:13 β π 2 π 0 π¬ 1 π 0More at atomproject.ai
28.01.2026 15:30 β π 3 π 0 π¬ 0 π 0The method for this is taking the top10 open models in terms of total tokens processed on the OpenRouter platform and then normalizing to be a share of 100%.
This assumes the top10 models are most of the usage, which is often true, and adds some noise without the long tail.