Ethan Mollick @emollick - Bluesky Profile

It is getting harder and harder to test AIs as they get "smarter" at a wide variety of tasks. The average task in GDPval took an hour for experts to assess, and even those tasks did not push current AIs to their limits.

25.11.2025 01:59 — 👍 36 🔁 1 💬 3 📌 0

No

24.11.2025 21:54 — 👍 1 🔁 0 💬 0 📌 0

Weber's Iron Cage Strategy Game - Interactive Sociology Simulation Explore Max Weber's authority types through an immersive 3D strategy game. Build societies, manage resources, and experience rationalization's effects in this educational simulation.

Play them:
Sociology: claude.ai/public/artif...

Space: claude.ai/public/artif...

Opera: claude.ai/public/artif...

Grill: claude.ai/public/artif...

24.11.2025 21:27 — 👍 15 🔁 2 💬 1 📌 0

Weber's Iron Cage Strategy Game - Interactive Sociology Simulation Explore Max Weber's authority types through an immersive 3D strategy game. Build societies, manage resources, and experience rationalization's effects in this educational simulation.

You can play it: claude.ai/public/artif...

24.11.2025 21:25 — 👍 1 🔁 0 💬 1 📌 0

All the discussions were pretty charming

24.11.2025 20:49 — 👍 14 🔁 1 💬 2 📌 0

Me: Claude 4.5 Opus, I need a strategy game based on the work of Weber

Claude: Here's one based on David Weber's space operas

Me: Not that Weber

C: Here's a game based on sociologist Max Weber

Me: Not that one

C: The operas of Carl Maria von Weber?

Me: No

C: Here is one using Weber grills!

24.11.2025 20:29 — 👍 62 🔁 4 💬 4 📌 3

I had early access to Opus 4.5 & it is a very impressive model that seem to be right at the frontier

Big gains in ability to do practical work (like make a PowerPoint from an Excel) and the best results ever (& in one shot) in my Lem poetry test, plus good results in Claude Code

24.11.2025 18:59 — 👍 57 🔁 6 💬 2 📌 0

The recent history of AI in 32 otters Three years of progress as shown by marine mammals

History of the benchmark: www.oneusefulthing.org/p/the-recent...

21.11.2025 14:56 — 👍 20 🔁 2 💬 1 📌 0

I think my “otters on a plane using WiFi” benchmark may saturated now that nano banana pro can do this.

21.11.2025 14:55 — 👍 143 🔁 9 💬 8 📌 2

Ruining great art with the nano banana pro command “Make this much more cheerful with as few changes as possible”

21.11.2025 13:19 — 👍 179 🔁 26 💬 12 📌 8

Tell all the truth but tell it slant—
Success in Circuit lies
Too bright for our infirm Delight
The Truth's superb surprise

This paper finds poetry is a universal single shot jailbreak for LLMs. Systems built to stop prosaic attacks fail when the request is phrased in verse arxiv.org/abs/2511.15304

20.11.2025 21:47 — 👍 40 🔁 12 💬 1 📌 4

Nano banana Pro: “i need a flowchart for how to toast bread, make it as wacky and over the top and complicated as possible.“

Not absolutely perfect, but I can’t believe how much there is a coherent through-line, how clear the text is, and also parts of it are actually funny?

20.11.2025 19:19 — 👍 98 🔁 16 💬 9 📌 1

I estimate I used around 10,000 tokens (likely less), so that would translate to about 2-5 Wh (a standard query is .3 Wh), which would be about as much power as 4 minutes of watching Netflix on a TV.

I suspect that viewing and uploading the video uses more power than generating the code for it.

19.11.2025 22:26 — 👍 15 🔁 0 💬 1 📌 0

"Hey, Gemini 3, So I need DOOM, but more root vegetables, also no guns or demons or mars. And more of a focus on different flooring styles. but otherwise EXACTLY the same as DOOM."

Gemini: "Here is F.L.O.O.R. (First-person Lino Observation & Ornamental Review)."

Pretty good!

19.11.2025 21:08 — 👍 106 🔁 6 💬 2 📌 3

How well can Gemini 3 make a Henry James simulator? Finally, a benchmark for LLMs with real-world value

As a fan of weird but revealing benchmarks, I enjoyed this historian’s attempts to have different frontier AIs build “a full featured RPG game where you play as Henry James wandering as a flâneur at the 1889 Universal Exposition in Paris.” HenryBench? open.substack.com/pub/resobscu...

19.11.2025 04:13 — 👍 67 🔁 14 💬 0 📌 2

Fun little Gemini 3 experiment where I asked it "build me a time machine simulator, make it very very good" and then "make it better" a few times. I like that it added calls to Gemini within the application, including adding speech & nano banana images. Play it: gemini.google.com/share/02e4e8...

18.11.2025 22:28 — 👍 49 🔁 7 💬 5 📌 0

Three Years from GPT-3 to Gemini 3 From chatbots to agents

I had access to Gemini 3. It is a very good, very fast model. It also demonstrates the change from chatbot to agent. www.oneusefulthing.org/p/three-year...

18.11.2025 18:57 — 👍 90 🔁 15 💬 4 📌 4

Interesting changes from Grok 4 to Grok 4.1. Decreases in harmful responses but also increases in sycophancy and deception.

It isn’t clear how to interpret the sycophancy score, but the MASK score for deception is quite high compared to big models.

Sycophancy leads to higher LMArena scores…

18.11.2025 02:55 — 👍 61 🔁 6 💬 7 📌 6

We are now seeing the first long-anticipated use of AI for semi-autonomous cyberattacks.

"This approach allowed the threat actor to achieve operational scale typically associated with nation-state campaigns while maintaining minimal direct involvement" www.anthropic.com/news/disrupt...

13.11.2025 19:12 — 👍 53 🔁 11 💬 0 📌 9

Some pretty eye-opening data on the effect of AI coding.

When Cursor added agentic coding in 2024, adopters produced 39% more code merges, with no sign of a decrease in quality (revert rates were the same, bugs dropped) and no sign that the scope of the work shrank. papers.ssrn.com/sol3/papers....

13.11.2025 05:18 — 👍 90 🔁 10 💬 2 📌 3

Giving your AI a Job Interview As AI advice becomes more important, we are going to need to get better at assessing it

As AIs get smarter & more useful, our benchmarks become less useful. Measuring general knowledge or coding ability gives us only a glimpse into what an AI model can do.

Anyone who wants to use AI seriously for real work will need to assess it themselves. www.oneusefulthing.org/p/giving-you...

12.11.2025 02:55 — 👍 58 🔁 10 💬 4 📌 3

I keep warning that so many of our systems are still built around the assumption that quality writing and analysis are costly and therefore meaningful signals.

Our systems are very much not ready for the revelation that this is no longer true, as this planning objection AI shows

09.11.2025 23:39 — 👍 87 🔁 13 💬 3 📌 2

This is a cool paper showing that first-gen college students don't realize a lot of unwritten rules that lead to success (the value of internships, student clubs, letters from professors).

But giving them access to an LLM for guidance significantly closes the gap. mgcuna.github.io/website/JMP_...

09.11.2025 14:55 — 👍 95 🔁 12 💬 5 📌 7

Sora: "that infamous dramatic Oscar winning scene where the lead keeps getting hit by the boom mic but nobody notices"

05.11.2025 04:32 — 👍 56 🔁 1 💬 2 📌 0

I have been writing for years about the fact that we are not ready for the destruction of costly signalling mechanisms. Writing used to be a way of measuring effort, ability and diligence. We still have no easy substitute

Now this paper confirms that cover letters have lost their value as predictor

05.11.2025 01:48 — 👍 101 🔁 12 💬 4 📌 5

Inside the Data Centers That Train A.I. and Drain the Electrical Grid A data center, which can use as much electricity as Philadelphia, is the new American factory, creating the future and propping up the economy. How long can this last?

www.newyorker.com/magazine/202...

03.11.2025 06:24 — 👍 32 🔁 2 💬 0 📌 0

The big article on data centers in the New Yorker is pretty good, which I wasn’t expecting given the reaction on X. Lots of talk of the good and bad of AI, and it covers both bubble & non-bubble arguments.

It also featured the best version of “I spoke to a local farmer about a data center”

03.11.2025 06:23 — 👍 237 🔁 43 💬 7 📌 4

I don’t think how people are tracking how quickly this is happening, for better or worse.

02.11.2025 23:59 — 👍 145 🔁 25 💬 12 📌 7

Describing

02.11.2025 01:11 — 👍 9 🔁 1 💬 1 📌 0

The other option, from Pater

02.11.2025 01:10 — 👍 7 🔁 2 💬 1 📌 0

Ethan Mollick

Latest posts by emollick.bsky.social on Bluesky

@emollick is following 20 prominent accounts