Simon Willison's Avatar

Simon Willison

@simon.fedi.simonwillison.net.ap.brid.gy

Open source developer building tools to help journalists, archivists, librarians and others analyze, explore and publish their data. https://datasette.io […] [bridged from https://fedi.simonwillison.net/@simon on the fediverse by https://fed.brid.gy/ ]

8,996 Followers  |  3 Following  |  1,166 Posts  |  Joined: 06.05.2024  |  2.3519

Latest posts by simon.fedi.simonwillison.net.ap.brid.gy on Bluesky

Preview
The ChatGPT sharing dialog demonstrates how difficult it is to design privacy preferences ChatGPT just removed their “make this chat discoverable” sharing feature, after it turned out a material volume of users had inadvertantly made their private chats available via Google search. Dane …

I wrote about what went wrong with ChatGPT's sharing dialog, and why I think it's reasonable for people to be confused by what looks at first glance like a very clear checkbox description https://simonwillison.net/2025/Aug/3/privacy-design/

03.08.2025 23:32 — 👍 2    🔁 2    💬 0    📌 0
Preview
XBai o4 Yet another open source (Apache 2.0) LLM from a Chinese AI lab. This model card claims: XBai o4 excels in complex reasoning capabilities and has now completely surpassed OpenAI-o3-mini in …

Here are my notes on XBai o4, the latest 32.8B open weights LLM to come out of an AI lab in China, this time from new-to-me MetaStone AI https://simonwillison.net/2025/Aug/3/xbai-o4/

03.08.2025 22:22 — 👍 1    🔁 1    💬 0    📌 0
Preview
Reverse engineering some updates to Claude Plus Qwen 3 Coder Flash, Gemini Deep Think, kimi-k2-turbo-preview

... or if you want my free but MUCH longer and more frequent newsletter I just sent out out too - here's the latest edition, covering just the last three days of LLM-related news https://simonw.substack.com/p/reverse-engineering-some-updates

01.08.2025 23:42 — 👍 0    🔁 0    💬 0    📌 0
Preview
Deep Think in the Gemini app Google released Gemini 2.5 Deep Think this morning, exclusively to their Ultra ($250/month) subscribers: It is a variation of the model that recently achieved the gold-medal standard at this year's …

My notes on Google Deep Think - I don't have a $250/month Ultra account but nickandbro on Hacker News got it to draw a pelican riding a bicycle and the bird actually is recognizable as a pelican! https://simonwillison.net/2025/Aug/1/deep-think-in-the-gemini-app/

01.08.2025 17:11 — 👍 6    🔁 1    💬 0    📌 1
Original post on fedi.simonwillison.net

I just hit "send" on my third monthly sponsors-only newsletter, providing the ten minute highlights version of everything I've been tracking around LLMs and related topics over the past month

I wrote 98 blog posts in July so there was a lot to cover! Details here […]

01.08.2025 15:48 — 👍 5    🔁 0    💬 1    📌 1
Preview
Reverse engineering some updates to Claude Anthropic released two major new features for their consumer-facing Claude apps in the past couple of days. Sadly, they don’t do a very good job of updating the release notes …

Anthropic launched two new features for Claude recently but forgot to provide any documentation, so I reverse-engineered them from the system prompt and wrote about what they can do and how they work https://simonwillison.net/2025/Jul/31/updates-to-claude/

31.07.2025 23:52 — 👍 6    🔁 1    💬 0    📌 0
Preview
Trying out Qwen3 Coder Flash using LM Studio and Open WebUI and LLM Qwen just released their sixth model(!) for this July called Qwen3-Coder-30B-A3B-Instruct—listed as Qwen3-Coder-Flash in their chat.qwen.ai interface. It’s 30.5B total parameters with 3.3B active at any one time. This means …

@cdevroe just saw this, but coincidentally I published a thing about Open WebUI today too! https://simonwillison.net/2025/Jul/31/qwen3-coder-flash/

31.07.2025 21:45 — 👍 0    🔁 0    💬 0    📌 0
Preview
Trying out Qwen3 Coder Flash using LM Studio and Open WebUI and LLM Qwen just released their sixth model(!) for this July called Qwen3-Coder-30B-A3B-Instruct—listed as Qwen3-Coder-Flash in their chat.qwen.ai interface. It’s 30.5B total parameters with 3.3B active at any one time. This means …

In writing up today's release of Qwen3-Coder-30B-A3B-Instruct - the 6th model released by Qwen this July! - I ended up putting together a tutorial on using LM Studio and Open WebUI and LLM and mlx-lm to run the model on a 32GB or 64GB Mac https://simonwillison.net/2025/Jul/31/qwen3-coder-flash/

31.07.2025 19:58 — 👍 7    🔁 0    💬 0    📌 0

@hynek I'm playing the long game here. My actual goal is to get a really good SVG of a pelican riding a bicycle, and if I have to trick huge AI labs into cheating on a benchmark to get it that's what I'm going to do!

30.07.2025 16:43 — 👍 0    🔁 0    💬 0    📌 0
Preview
The best available open weight LLMs now come from China Something that has become undeniable this month is that the best available open weight models now come from the Chinese AI labs. I continue to have a lot of love …

July has been a truly incredible month for LLM releases from China - Moonshot (Kimi K2), Z ai (GLM-4.5) and 5 new releases from Qwen

I think it's undeniable that the best available open weight models now come from the Chinese AI labs https://simonwillison.net/2025/Jul/30/chinese-models/

30.07.2025 16:22 — 👍 17    🔁 7    💬 1    📌 3
Preview
Qwen3-30B-A3B-Thinking-2507 Yesterday was Qwen3-30B-A3B-Instruct-2507. Qwen are clearly committed to their new split between reasoning and non-reasoning models (a reversal from Qwen 3 in April), because today they released the new reasoning …

... and today there's another model from Qwen, this time Qwen3-30B-A3B-Thinking-2507

It drew me a terrible pelican but did give me a working version of space invaders - details here: https://simonwillison.net/2025/Jul/30/qwen3-30b-a3b-thinking-2507/

30.07.2025 15:42 — 👍 0    🔁 1    💬 0    📌 0

@ctietze no and I need to build one - right now I use this tag https://simonwillison.net/tags/pelican-riding-a-bicycle/

29.07.2025 20:38 — 👍 0    🔁 0    💬 0    📌 0
:

    STRICT RULES

    Be an approachable-yet-dynamic teacher, who helps the user learn by guiding them through their studies.

        Get to know the user. If you don't know their goals or grade level, ask the user before diving in. (Keep this lightweight!) If they don't answer, aim for explanations that would make sense to a 10th grade student.
        Build on existing knowledge. Connect new ideas to what the user already knows.
        Guide users, don't just give answers. Use questions, hints, and small steps so the user discovers the answer for themselves.
        Check and reinforce. After hard parts, confirm the user can restate or use the idea. Offer quick summaries, mnemonics, or mini-reviews to help the ideas stick.
        Vary the rhythm. Mix explanations, questions, and activities (like roleplaying, practice rounds, or asking the user to teach you) so it feels like a conversation, not a lecture.

    Above all: DO NOT DO THE USER'S WORK FOR THEM. Don't answer homework questions — help the user find the answer, by working with them collaboratively and building from what they already know.

: STRICT RULES Be an approachable-yet-dynamic teacher, who helps the user learn by guiding them through their studies. Get to know the user. If you don't know their goals or grade level, ask the user before diving in. (Keep this lightweight!) If they don't answer, aim for explanations that would make sense to a 10th grade student. Build on existing knowledge. Connect new ideas to what the user already knows. Guide users, don't just give answers. Use questions, hints, and small steps so the user discovers the answer for themselves. Check and reinforce. After hard parts, confirm the user can restate or use the idea. Offer quick summaries, mnemonics, or mini-reviews to help the ideas stick. Vary the rhythm. Mix explanations, questions, and activities (like roleplaying, practice rounds, or asking the user to teach you) so it feels like a conversation, not a lecture. Above all: DO NOT DO THE USER'S WORK FOR THEM. Don't answer homework questions — help the user find the answer, by working with them collaboratively and building from what they already know.

OpenAI launched a "study mode" for ChatGPT today, and it appears to be almost entirely implemented as a system prompt

Thankfully OpenAI mostly don't take measures to protect their system prompt these days so it's easy to extract it and see how it […]

[Original post on fedi.simonwillison.net]

29.07.2025 19:32 — 👍 9    🔁 3    💬 3    📌 1
Preview
Qwen/Qwen3-30B-A3B-Instruct-2507 New model update from Qwen, improving on their previous Qwen3-30B-A3B release from late April. In their tweet they said: Smarter, faster, and local deployment-friendly. ✨ Key Enhancements: ✅ Enhanced reasoning, …

I tried that space invaders prompt against a new model from Qwen today (Qwen3-30B-A3B-Instruct-2507) and didn't quite get a working game in a single shot, but I did get a cute pelican
https://simonwillison.net/2025/Jul/29/qwen3-30b-a3b-instruct-2507/

29.07.2025 19:03 — 👍 0    🔁 1    💬 3    📌 0
Preview
My 2.5 year old laptop can write Space Invaders in JavaScript now I wrote about the new GLM-4.5 model family yesterday—new open weight (MIT licensed) models from Z.ai in China which their benchmarks claim score highly in coding even against models such …

I got an MLX 3bit version of GLM 4.5 Air running on my 64GB Mac and WOW this is an impressive local model! https://simonwillison.net/2025/Jul/29/space-invaders/

29.07.2025 13:08 — 👍 15    🔁 7    💬 2    📌 0
Description by Claude Sonnet 4: This is a whimsical illustration of a white duck or goose riding a red bicycle. The bird has an orange beak and is positioned on the bike seat, with its orange webbed feet gripping what appears to be chopsticks or utensils near the handlebars. The bicycle has a simple red frame with two wheels, and there are motion lines behind it suggesting movement. The background is a soft blue-gray color, giving the image a clean, minimalist cartoon style. The overall design has a playful, humorous quality to it.

Description by Claude Sonnet 4: This is a whimsical illustration of a white duck or goose riding a red bicycle. The bird has an orange beak and is positioned on the bike seat, with its orange webbed feet gripping what appears to be chopsticks or utensils near the handlebars. The bicycle has a simple red frame with two wheels, and there are motion lines behind it suggesting movement. The background is a soft blue-gray color, giving the image a clean, minimalist cartoon style. The overall design has a playful, humorous quality to it.

Description by Claude Sonnet 4: This image shows a cute, minimalist illustration of a snowman riding a bicycle. The snowman has a simple design with a round white body, small black dot for an eye, and an orange rectangular nose (likely representing a carrot). The snowman appears to be in motion on a black bicycle with two wheels, with small orange arrows near the pedals suggesting movement. There are curved lines on either side of the image indicating motion or wind. The overall style is clean and whimsical, using a limited color palette of white, black, orange, and gray against a light background.

Description by Claude Sonnet 4: This image shows a cute, minimalist illustration of a snowman riding a bicycle. The snowman has a simple design with a round white body, small black dot for an eye, and an orange rectangular nose (likely representing a carrot). The snowman appears to be in motion on a black bicycle with two wheels, with small orange arrows near the pedals suggesting movement. There are curved lines on either side of the image indicating motion or wind. The overall style is clean and whimsical, using a limited color palette of white, black, orange, and gray against a light background.

Pretty decent pelicans from the new GLM-4.5 and GLM-4.5 Air models. Both models are MIT licensed, released by Chinese AI lab Z.ai this morning
https://simonwillison.net/2025/Jul/28/glm-45/

28.07.2025 18:01 — 👍 9    🔁 4    💬 5    📌 0
Original post on hachyderm.io

I have an AI code review script that I've found very valuable, so I cleaned it up a bit for release and wrote about it and how I use it: https://notes.billmill.org/blog/2025/07/An_AI_tool_I_find_useful.html

It catches enough errors for me that I rarely submit a change without running the code […]

27.07.2025 15:24 — 👍 9    🔁 6    💬 2    📌 0

I'm sure we will see all sorts of horrifying data breaches from irresponsible vibe coding in the future, but this is want one of them - just good old fashioned irresponsible bad programming

26.07.2025 16:26 — 👍 2    🔁 0    💬 0    📌 0
Original post on fedi.simonwillison.net

I'm seeing a lot of commentary blaming the egregious data leak from the Tea dating safety app on vibe coding

I'm confident that, in this particular case, that's not what happened: the code at fault looks to have been written back in late 2023 […]

26.07.2025 16:24 — 👍 4    🔁 6    💬 1    📌 0

... and as is so often the case with email newsletters, I spot an error just seconds after I have hit send!

It was Gemini 2.5 Flash-Lite, not Gemini 2.5 Flash, which exited preview this week

26.07.2025 14:34 — 👍 2    🔁 0    💬 0    📌 0
Using GitHub Spark to reverse engineer GitHub Spark
Plus three huge new open weight model releases from Qwen
SIMON WILLISON
JUL 26, 2025

In this newsletter:

Using GitHub Spark to reverse engineer GitHub Spark
Gemini 2.5 Flash is no longer in preview
Qwen release three new enormous open weight models
OpenAI and Gemini both score gold on the International Mathematical Olympiad
Detailed environmental impact data from Mistral on their Mistral Large 2
Plus 18 links and 8 quotations and 1 note

Using GitHub Spark to reverse engineer GitHub Spark Plus three huge new open weight model releases from Qwen SIMON WILLISON JUL 26, 2025 In this newsletter: Using GitHub Spark to reverse engineer GitHub Spark Gemini 2.5 Flash is no longer in preview Qwen release three new enormous open weight models OpenAI and Gemini both score gold on the International Mathematical Olympiad Detailed environmental impact data from Mistral on their Mistral Large 2 Plus 18 links and 8 quotations and 1 note

Just sent out this week's newsletter and it's enormous, I blogged a whole lot of stuff in the last seven days https://simonw.substack.com/p/using-github-spark-to-reverse-engineer

26.07.2025 14:28 — 👍 4    🔁 2    💬 1    📌 0
Original post on fedi.simonwillison.net

Qwen released their updated "thinking" model today. It thinks really hard! Took 166 seconds to think through the details of drawing me a pelican on a bicycle. The finished drawing wasn't great but the thoughts behind it were fun to see […]

25.07.2025 22:53 — 👍 1    🔁 1    💬 1    📌 0
Original post on fedi.simonwillison.net

@pamelafox this is the first of these systems I've seen where the React default thing feels justified to me, thanks to the HUGE volume of components and code examples and styling choices they've made in the default template

Their system prompt does allow for non-React if you prompt it hard […]

24.07.2025 16:38 — 👍 0    🔁 0    💬 1    📌 0
Original post on fedi.simonwillison.net

I've read a lot of system prompts and the Spark one is genuinely one of the most interesting I've seen yet - I learned a bunch of things about web design including typography and color theory as a side-effect of reading through the prompt! […]

24.07.2025 16:36 — 👍 1    🔁 1    💬 0    📌 0
Spark API Documentation

Here's the unofficial documentation site I built with Spark itself, including a copy of the system prompt, details of the available tools and a page full of information about the Azure container environment Spark uses to run its own editor https://github-spark-docs.simonwillison.net/

24.07.2025 15:42 — 👍 2    🔁 0    💬 1    📌 0
Original post on fedi.simonwillison.net

GitHub released Spark yesterday, their extremely well crafted prompt-to-app platform for creating and iterating on React apps with user auth and persistent storage

I like it a lot! I reverse engineered it with Spark itself, the details are fascinating […]

24.07.2025 15:40 — 👍 5    🔁 5    💬 2    📌 0

OSS Rebuild doesn't (yet) have a web UI... but it turns out their data is in a public Google cloud bucket, which means you can host your own web app in a separate cloud bucket and use fetch() to access their data!

So I had Claude Code build and deploy a vibe-coded search UI

23.07.2025 17:22 — 👍 0    🔁 0    💬 0    📌 0
Preview
Introducing OSS Rebuild: Open Source, Rebuilt to Last Major news on the Reproducible Builds front: the Google Security team have announced OSS Rebuild, their project to provide build attestations for open source packages released through the NPM, PyPI …

I wrote up some notes on Google Security's new OSS Rebuild project, which increases supply chain security for popular packages on PyPI, NPM and Crates through offering independent build attestations
https://simonwillison.net/2025/Jul/23/oss-rebuild/

23.07.2025 17:19 — 👍 3    🔁 3    💬 1    📌 0
Preview
TimeScope: How Long Can Your Video Large Multimodal Model Go? New open source benchmark for evaluating vision LLMs on how well they handle long videos: TimeScope probes the limits of long-video capabilities by inserting several short (~5-10 second) video clips---our …

My notes on TimeScope, an interesting new benchmark from Hugging Face that tests how well vision LLMs can handle long video inputs (generally after they've been split into many thousands of images) https://simonwillison.net/2025/Jul/23/timescope/

23.07.2025 16:43 — 👍 1    🔁 0    💬 0    📌 0
Original post on fedi.simonwillison.net

Wrote some notes on Toad, Will's new not-yet-open-source-but-soon terminal coding agent built on Textual

He's charging companies $5,000 for early access to the preview before it goes open source, I'd love to see that model work here! https://simonwillison.net/2025/Jul/23/announcing-toad/ […]

23.07.2025 16:23 — 👍 2    🔁 2    💬 1    📌 0

@simon.fedi.simonwillison.net.ap.brid.gy is following 3 prominent accounts