Tim Kellogg @timkellogg.me - Bluesky Profile

i still find it hilarious that there’s no number after it. there’s not going to be a number 2

06.08.2025 23:32 — 👍 3 🔁 0 💬 1 📌 0

one way it might help — being ClosedAI might be hurting the research. By opening up a model that shares common critical aspects, researchers can collaborate outside of OpenAI on ideas

06.08.2025 23:32 — 👍 5 🔁 0 💬 2 📌 0

what benefits are there to open source AI though? it costs a ton, if you get roasted your entire lab loses credibility, and nobody can really give back without a crap ton of capital

06.08.2025 23:28 — 👍 2 🔁 0 💬 1 📌 0

grand ~= sounding somewhat sarcastic

he’s a SV CEO, he doesn’t do things out of the goodness of his heart. he makes product. what product play was this?

06.08.2025 23:22 — 👍 4 🔁 0 💬 2 📌 0

flowchart TD subgraph Home user gpt-oss end user --> gpt5 --> web gpt5 --> gpt-oss --> MCPtools

one criticism i haven’t heard: gpt-oss is a bad agent

they say lots of bad stuff about it, but its tool calling and long-horizon reasoning seems on point

what if this is the plan — gpt-oss on the edge, only to agentify local resources

06.08.2025 23:13 — 👍 6 🔁 0 💬 2 📌 0

yes i don’t believe for a second that sama didn’t have some grand strategy with the open weights

i’m still thinking it’s going to be an edge piece somehow. maybe on a phone/laptop or his new affair with Jonny Ive

06.08.2025 23:07 — 👍 11 🔁 0 💬 4 📌 0

lol, they’re super smart guys though

06.08.2025 23:05 — 👍 2 🔁 0 💬 0 📌 0

yeah, simply basing on GPT-4.1 is going to be a big deal. i’m curious to what extent it’ll actually route

06.08.2025 23:04 — 👍 3 🔁 0 💬 0 📌 0

that’s my boi

06.08.2025 18:04 — 👍 7 🔁 0 💬 1 📌 0

Qwen/Qwen3-4B-Thinking-2507 · Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co/Qwen/Qwen3-4...

06.08.2025 17:43 — 👍 3 🔁 1 💬 0 📌 0

OpenAl v @OpenAI Follow LIVETREAM THURSDAY 10AM PT

GPT-5

06.08.2025 17:33 — 👍 13 🔁 1 💬 2 📌 0

Bar chart comparing performance of four Qwen3-4B model variants—Thinking-2507 (red), Thinking (gray), Instruct-2507 (blue), and Non-Thinking (beige)—across five benchmarks: 1. GPQA • Thinking-2507: 65.8 • Thinking: 62.0 • Instruct-2507: 55.9 • Non-Thinking: 41.7 2. AIME25 • Thinking-2507: 81.3 • Thinking: 65.6 • Instruct-2507: 47.4 • Non-Thinking: 19.1 3. LiveCodeBench v6 (25.02–25.05) • Thinking-2507: 55.2 • Thinking: 48.4 • Instruct-2507: 35.1 • Non-Thinking: 26.4 4. Arena-Hard v2 • Thinking-2507: 34.9 • Instruct-2507: 43.4 • Thinking: 13.7 • Non-Thinking: 9.5 5. BFCL-v3 • Thinking-2507: 71.2 • Thinking: 65.9 • Instruct-2507: 61.9 • Non-Thinking: 57.6 Across all benchmarks, the Thinking-2507 variant (red bars) consistently leads in performance except for Arena-Hard v2, where Instruct-2507 scores highest. A watermark-style logo is faintly visible in the background behind the bars.

Qwen3-4B Instruct & Thinking

uuuh, guys this isn’t a boring model

This crushes all the agentic benchmarks, even beating out the already-impressive qwen3-30b-a4b

It’s hanging with some already impressive mid-sized models at only 4B

06.08.2025 17:24 — 👍 54 🔁 4 💬 6 📌 2

a close up of a man 's face with a blurred background ALT: a close up of a man 's face with a blurred background

some of their marketing videos are eye opening

06.08.2025 16:51 — 👍 1 🔁 0 💬 0 📌 0

Trump says it’ll be 3000% cheaper

06.08.2025 16:31 — 👍 3 🔁 0 💬 2 📌 0

imo that’s a failure mode — i’ve been fairly successful at staying high level, checking in every 20 min, and using the main time to do meetings, docs, etc.

if i have to step in and actually write code it jams up my rhythm, and my day basically halts. so i end up doing a lot of up-front planning

06.08.2025 16:30 — 👍 4 🔁 0 💬 1 📌 0

yes!

06.08.2025 16:27 — 👍 0 🔁 0 💬 0 📌 0

word is GPT-5 will have a 13.5% increase in version number over GPT-4.5

06.08.2025 16:18 — 👍 49 🔁 5 💬 4 📌 0

Junyang Lin • &7 @JustinLin610 drinking still. will announce later. 4b is fabulous. i do love it. 11:53 AM • 8/6/25 • 4.2K Views

qwen coming in hot in their usual style

06.08.2025 16:16 — 👍 6 🔁 0 💬 1 📌 0

just sent a meeting request with “..unless GPT-5 comes out” in the title

06.08.2025 15:10 — 👍 16 🔁 0 💬 0 📌 0

i’m not sure you’re right about “that ship has sailed”

it’s just that there’s tiers of openness, more is better than less, but the tiers aren’t equidistant

06.08.2025 15:09 — 👍 1 🔁 0 💬 0 📌 0

yes, that’s what i’m talking about!

06.08.2025 14:22 — 👍 1 🔁 0 💬 0 📌 0

Okay, I'll chime in, as a person with less mental health than average.

Working with agents can be really easy or really trying. It's really easy when you remember that the agent won't get mad at you or walk out of the meeting. It's easy when you remember that agents aren't scary like people.

06.08.2025 14:02 — 👍 8 🔁 1 💬 2 📌 0

you should

06.08.2025 14:21 — 👍 1 🔁 0 💬 0 📌 0

interestingly, i got so much flak that i wrote a retraction, and the retraction was far more popular than the actual piece. idk…

06.08.2025 13:53 — 👍 4 🔁 0 💬 2 📌 0

signal 1: i’m working with a marketing person at work who’s fucking killing it with claude code

signal 2: some people are discovering that frameworks are often not worth the time anymore

i think both are still panning out

06.08.2025 13:53 — 👍 5 🔁 0 💬 1 📌 0

Normware: The Decline of Software Engineering

i’m getting some early signals that i may have been right about this, i wrote back in January

tl;dr is software wont be boxed & shipped unless it solves a truly hard problem

timkellogg.me/blog/2025/01...

06.08.2025 13:53 — 👍 18 🔁 0 💬 2 📌 2

honestly i just need this but for myself

bsky.app/profile/timk...

06.08.2025 13:43 — 👍 4 🔁 0 💬 0 📌 0

they overdial on speed. @cameron.pfiffer.org noticed that they can’t tool call for shit

the crazy part is they use their own special quantization. while it’s also 4 bits, it’s a different scheme that what openai trained with, so it’s braindead in subtle ways and they don’t care bc it’s fast

06.08.2025 13:39 — 👍 2 🔁 0 💬 1 📌 0

alright, let’s see if my magic dice work today

06.08.2025 13:31 — 👍 0 🔁 0 💬 0 📌 0

there will be releases today but not the cool stuff

06.08.2025 13:31 — 👍 4 🔁 0 💬 3 📌 0

Tim Kellogg

Latest posts by timkellogg.me on Bluesky

@timkellogg.me is following 20 prominent accounts