Tim Kellogg's Avatar

Tim Kellogg

@timkellogg.me.bsky.social

AI Architect | North Carolina | AI/ML, IoT, science WARNING: I talk about kids sometimes

6,346 Followers  |  673 Following  |  8,381 Posts  |  Joined: 13.08.2024  |  2.0279

Latest posts by timkellogg.me on Bluesky

i still find it hilarious that thereโ€™s no number after it. thereโ€™s not going to be a number 2

06.08.2025 23:32 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

one way it might help โ€” being ClosedAI might be hurting the research. By opening up a model that shares common critical aspects, researchers can collaborate outside of OpenAI on ideas

06.08.2025 23:32 โ€” ๐Ÿ‘ 5    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

what benefits are there to open source AI though? it costs a ton, if you get roasted your entire lab loses credibility, and nobody can really give back without a crap ton of capital

06.08.2025 23:28 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

grand ~= sounding somewhat sarcastic

heโ€™s a SV CEO, he doesnโ€™t do things out of the goodness of his heart. he makes product. what product play was this?

06.08.2025 23:22 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
flowchart TD
subgraph Home
  user
  gpt-oss
end
    user --> gpt5 --> web
    gpt5 --> gpt-oss --> MCPtools

flowchart TD subgraph Home user gpt-oss end user --> gpt5 --> web gpt5 --> gpt-oss --> MCPtools

one criticism i havenโ€™t heard: gpt-oss is a bad agent

they say lots of bad stuff about it, but its tool calling and long-horizon reasoning seems on point

what if this is the plan โ€” gpt-oss on the edge, only to agentify local resources

06.08.2025 23:13 โ€” ๐Ÿ‘ 6    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

yes i donโ€™t believe for a second that sama didnโ€™t have some grand strategy with the open weights

iโ€™m still thinking itโ€™s going to be an edge piece somehow. maybe on a phone/laptop or his new affair with Jonny Ive

06.08.2025 23:07 โ€” ๐Ÿ‘ 11    ๐Ÿ” 0    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 0

lol, theyโ€™re super smart guys though

06.08.2025 23:05 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

yeah, simply basing on GPT-4.1 is going to be a big deal. iโ€™m curious to what extent itโ€™ll actually route

06.08.2025 23:04 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

thatโ€™s my boi

06.08.2025 18:04 โ€” ๐Ÿ‘ 7    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Qwen/Qwen3-4B-Thinking-2507 ยท Hugging Face Weโ€™re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co/Qwen/Qwen3-4...

06.08.2025 17:43 โ€” ๐Ÿ‘ 3    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
OpenAl v
@OpenAI
Follow
LIVETREAM THURSDAY 10AM PT

OpenAl v @OpenAI Follow LIVETREAM THURSDAY 10AM PT

GPT-5

06.08.2025 17:33 โ€” ๐Ÿ‘ 13    ๐Ÿ” 1    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Bar chart comparing performance of four Qwen3-4B model variantsโ€”Thinking-2507 (red), Thinking (gray), Instruct-2507 (blue), and Non-Thinking (beige)โ€”across five benchmarks:
	1.	GPQA
	โ€ข	Thinking-2507: 65.8
	โ€ข	Thinking: 62.0
	โ€ข	Instruct-2507: 55.9
	โ€ข	Non-Thinking: 41.7
	2.	AIME25
	โ€ข	Thinking-2507: 81.3
	โ€ข	Thinking: 65.6
	โ€ข	Instruct-2507: 47.4
	โ€ข	Non-Thinking: 19.1
	3.	LiveCodeBench v6 (25.02โ€“25.05)
	โ€ข	Thinking-2507: 55.2
	โ€ข	Thinking: 48.4
	โ€ข	Instruct-2507: 35.1
	โ€ข	Non-Thinking: 26.4
	4.	Arena-Hard v2
	โ€ข	Thinking-2507: 34.9
	โ€ข	Instruct-2507: 43.4
	โ€ข	Thinking: 13.7
	โ€ข	Non-Thinking: 9.5
	5.	BFCL-v3
	โ€ข	Thinking-2507: 71.2
	โ€ข	Thinking: 65.9
	โ€ข	Instruct-2507: 61.9
	โ€ข	Non-Thinking: 57.6

Across all benchmarks, the Thinking-2507 variant (red bars) consistently leads in performance except for Arena-Hard v2, where Instruct-2507 scores highest. A watermark-style logo is faintly visible in the background behind the bars.

Bar chart comparing performance of four Qwen3-4B model variantsโ€”Thinking-2507 (red), Thinking (gray), Instruct-2507 (blue), and Non-Thinking (beige)โ€”across five benchmarks: 1. GPQA โ€ข Thinking-2507: 65.8 โ€ข Thinking: 62.0 โ€ข Instruct-2507: 55.9 โ€ข Non-Thinking: 41.7 2. AIME25 โ€ข Thinking-2507: 81.3 โ€ข Thinking: 65.6 โ€ข Instruct-2507: 47.4 โ€ข Non-Thinking: 19.1 3. LiveCodeBench v6 (25.02โ€“25.05) โ€ข Thinking-2507: 55.2 โ€ข Thinking: 48.4 โ€ข Instruct-2507: 35.1 โ€ข Non-Thinking: 26.4 4. Arena-Hard v2 โ€ข Thinking-2507: 34.9 โ€ข Instruct-2507: 43.4 โ€ข Thinking: 13.7 โ€ข Non-Thinking: 9.5 5. BFCL-v3 โ€ข Thinking-2507: 71.2 โ€ข Thinking: 65.9 โ€ข Instruct-2507: 61.9 โ€ข Non-Thinking: 57.6 Across all benchmarks, the Thinking-2507 variant (red bars) consistently leads in performance except for Arena-Hard v2, where Instruct-2507 scores highest. A watermark-style logo is faintly visible in the background behind the bars.

Qwen3-4B Instruct & Thinking

uuuh, guys this isnโ€™t a boring model

This crushes all the agentic benchmarks, even beating out the already-impressive qwen3-30b-a4b

Itโ€™s hanging with some already impressive mid-sized models at only 4B

06.08.2025 17:24 โ€” ๐Ÿ‘ 54    ๐Ÿ” 4    ๐Ÿ’ฌ 6    ๐Ÿ“Œ 2
Preview
a close up of a man 's face with a blurred background ALT: a close up of a man 's face with a blurred background

some of their marketing videos are eye opening

06.08.2025 16:51 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Trump says itโ€™ll be 3000% cheaper

06.08.2025 16:31 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

imo thatโ€™s a failure mode โ€” iโ€™ve been fairly successful at staying high level, checking in every 20 min, and using the main time to do meetings, docs, etc.

if i have to step in and actually write code it jams up my rhythm, and my day basically halts. so i end up doing a lot of up-front planning

06.08.2025 16:30 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

yes!

06.08.2025 16:27 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

word is GPT-5 will have a 13.5% increase in version number over GPT-4.5

06.08.2025 16:18 โ€” ๐Ÿ‘ 49    ๐Ÿ” 5    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 0
Junyang Lin โ€ข &7
@JustinLin610
drinking still. will announce later. 4b is fabulous. i
do love it.
11:53 AM โ€ข 8/6/25 โ€ข 4.2K Views

Junyang Lin โ€ข &7 @JustinLin610 drinking still. will announce later. 4b is fabulous. i do love it. 11:53 AM โ€ข 8/6/25 โ€ข 4.2K Views

qwen coming in hot in their usual style

06.08.2025 16:16 โ€” ๐Ÿ‘ 6    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

just sent a meeting request with โ€œ..unless GPT-5 comes outโ€ in the title

06.08.2025 15:10 โ€” ๐Ÿ‘ 16    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

iโ€™m not sure youโ€™re right about โ€œthat ship has sailedโ€

itโ€™s just that thereโ€™s tiers of openness, more is better than less, but the tiers arenโ€™t equidistant

06.08.2025 15:09 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

yes, thatโ€™s what iโ€™m talking about!

06.08.2025 14:22 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Okay, I'll chime in, as a person with less mental health than average.

Working with agents can be really easy or really trying. It's really easy when you remember that the agent won't get mad at you or walk out of the meeting. It's easy when you remember that agents aren't scary like people.

06.08.2025 14:02 โ€” ๐Ÿ‘ 8    ๐Ÿ” 1    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

you should

06.08.2025 14:21 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

interestingly, i got so much flak that i wrote a retraction, and the retraction was far more popular than the actual piece. idkโ€ฆ

06.08.2025 13:53 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

signal 1: iโ€™m working with a marketing person at work whoโ€™s fucking killing it with claude code

signal 2: some people are discovering that frameworks are often not worth the time anymore

i think both are still panning out

06.08.2025 13:53 โ€” ๐Ÿ‘ 5    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Normware: The Decline of Software Engineering

iโ€™m getting some early signals that i may have been right about this, i wrote back in January

tl;dr is software wont be boxed & shipped unless it solves a truly hard problem

timkellogg.me/blog/2025/01...

06.08.2025 13:53 โ€” ๐Ÿ‘ 18    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 2

honestly i just need this but for myself

bsky.app/profile/timk...

06.08.2025 13:43 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

they overdial on speed. @cameron.pfiffer.org noticed that they canโ€™t tool call for shit

the crazy part is they use their own special quantization. while itโ€™s also 4 bits, itโ€™s a different scheme that what openai trained with, so itโ€™s braindead in subtle ways and they donโ€™t care bc itโ€™s fast

06.08.2025 13:39 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

alright, letโ€™s see if my magic dice work today

06.08.2025 13:31 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

there will be releases today but not the cool stuff

06.08.2025 13:31 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 0

@timkellogg.me is following 20 prominent accounts