i still find it hilarious that thereโs no number after it. thereโs not going to be a number 2
06.08.2025 23:32 โ ๐ 3 ๐ 0 ๐ฌ 1 ๐ 0@timkellogg.me.bsky.social
AI Architect | North Carolina | AI/ML, IoT, science WARNING: I talk about kids sometimes
i still find it hilarious that thereโs no number after it. thereโs not going to be a number 2
06.08.2025 23:32 โ ๐ 3 ๐ 0 ๐ฌ 1 ๐ 0one way it might help โ being ClosedAI might be hurting the research. By opening up a model that shares common critical aspects, researchers can collaborate outside of OpenAI on ideas
06.08.2025 23:32 โ ๐ 5 ๐ 0 ๐ฌ 2 ๐ 0what benefits are there to open source AI though? it costs a ton, if you get roasted your entire lab loses credibility, and nobody can really give back without a crap ton of capital
06.08.2025 23:28 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0grand ~= sounding somewhat sarcastic
heโs a SV CEO, he doesnโt do things out of the goodness of his heart. he makes product. what product play was this?
flowchart TD subgraph Home user gpt-oss end user --> gpt5 --> web gpt5 --> gpt-oss --> MCPtools
one criticism i havenโt heard: gpt-oss is a bad agent
they say lots of bad stuff about it, but its tool calling and long-horizon reasoning seems on point
what if this is the plan โ gpt-oss on the edge, only to agentify local resources
yes i donโt believe for a second that sama didnโt have some grand strategy with the open weights
iโm still thinking itโs going to be an edge piece somehow. maybe on a phone/laptop or his new affair with Jonny Ive
lol, theyโre super smart guys though
06.08.2025 23:05 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0yeah, simply basing on GPT-4.1 is going to be a big deal. iโm curious to what extent itโll actually route
06.08.2025 23:04 โ ๐ 3 ๐ 0 ๐ฌ 0 ๐ 0thatโs my boi
06.08.2025 18:04 โ ๐ 7 ๐ 0 ๐ฌ 1 ๐ 0OpenAl v @OpenAI Follow LIVETREAM THURSDAY 10AM PT
GPT-5
06.08.2025 17:33 โ ๐ 13 ๐ 1 ๐ฌ 2 ๐ 0Bar chart comparing performance of four Qwen3-4B model variantsโThinking-2507 (red), Thinking (gray), Instruct-2507 (blue), and Non-Thinking (beige)โacross five benchmarks: 1. GPQA โข Thinking-2507: 65.8 โข Thinking: 62.0 โข Instruct-2507: 55.9 โข Non-Thinking: 41.7 2. AIME25 โข Thinking-2507: 81.3 โข Thinking: 65.6 โข Instruct-2507: 47.4 โข Non-Thinking: 19.1 3. LiveCodeBench v6 (25.02โ25.05) โข Thinking-2507: 55.2 โข Thinking: 48.4 โข Instruct-2507: 35.1 โข Non-Thinking: 26.4 4. Arena-Hard v2 โข Thinking-2507: 34.9 โข Instruct-2507: 43.4 โข Thinking: 13.7 โข Non-Thinking: 9.5 5. BFCL-v3 โข Thinking-2507: 71.2 โข Thinking: 65.9 โข Instruct-2507: 61.9 โข Non-Thinking: 57.6 Across all benchmarks, the Thinking-2507 variant (red bars) consistently leads in performance except for Arena-Hard v2, where Instruct-2507 scores highest. A watermark-style logo is faintly visible in the background behind the bars.
Qwen3-4B Instruct & Thinking
uuuh, guys this isnโt a boring model
This crushes all the agentic benchmarks, even beating out the already-impressive qwen3-30b-a4b
Itโs hanging with some already impressive mid-sized models at only 4B
Trump says itโll be 3000% cheaper
06.08.2025 16:31 โ ๐ 3 ๐ 0 ๐ฌ 2 ๐ 0imo thatโs a failure mode โ iโve been fairly successful at staying high level, checking in every 20 min, and using the main time to do meetings, docs, etc.
if i have to step in and actually write code it jams up my rhythm, and my day basically halts. so i end up doing a lot of up-front planning
yes!
06.08.2025 16:27 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0word is GPT-5 will have a 13.5% increase in version number over GPT-4.5
06.08.2025 16:18 โ ๐ 49 ๐ 5 ๐ฌ 4 ๐ 0Junyang Lin โข &7 @JustinLin610 drinking still. will announce later. 4b is fabulous. i do love it. 11:53 AM โข 8/6/25 โข 4.2K Views
qwen coming in hot in their usual style
06.08.2025 16:16 โ ๐ 6 ๐ 0 ๐ฌ 1 ๐ 0just sent a meeting request with โ..unless GPT-5 comes outโ in the title
06.08.2025 15:10 โ ๐ 16 ๐ 0 ๐ฌ 0 ๐ 0iโm not sure youโre right about โthat ship has sailedโ
itโs just that thereโs tiers of openness, more is better than less, but the tiers arenโt equidistant
yes, thatโs what iโm talking about!
06.08.2025 14:22 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0Okay, I'll chime in, as a person with less mental health than average.
Working with agents can be really easy or really trying. It's really easy when you remember that the agent won't get mad at you or walk out of the meeting. It's easy when you remember that agents aren't scary like people.
you should
06.08.2025 14:21 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0interestingly, i got so much flak that i wrote a retraction, and the retraction was far more popular than the actual piece. idkโฆ
06.08.2025 13:53 โ ๐ 4 ๐ 0 ๐ฌ 2 ๐ 0signal 1: iโm working with a marketing person at work whoโs fucking killing it with claude code
signal 2: some people are discovering that frameworks are often not worth the time anymore
i think both are still panning out
iโm getting some early signals that i may have been right about this, i wrote back in January
tl;dr is software wont be boxed & shipped unless it solves a truly hard problem
timkellogg.me/blog/2025/01...
honestly i just need this but for myself
bsky.app/profile/timk...
they overdial on speed. @cameron.pfiffer.org noticed that they canโt tool call for shit
the crazy part is they use their own special quantization. while itโs also 4 bits, itโs a different scheme that what openai trained with, so itโs braindead in subtle ways and they donโt care bc itโs fast
alright, letโs see if my magic dice work today
06.08.2025 13:31 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0there will be releases today but not the cool stuff
06.08.2025 13:31 โ ๐ 4 ๐ 0 ๐ฌ 3 ๐ 0