Ethan Mollick's Avatar

Ethan Mollick

@emollick.bsky.social

Professor at Wharton, studying AI and its implications for education, entrepreneurship, and work. Author of Co-Intelligence. Book: https://a.co/d/bC2kSj1 Substack: https://www.oneusefulthing.org/ Web: https://mgmt.wharton.upenn.edu/profile/emollick

31,635 Followers  |  145 Following  |  1,842 Posts  |  Joined: 07.09.2024  |  1.9899

Latest posts by emollick.bsky.social on Bluesky

It is getting harder and harder to test AIs as they get "smarter" at a wide variety of tasks. The average task in GDPval took an hour for experts to assess, and even those tasks did not push current AIs to their limits.

25.11.2025 01:59 โ€” ๐Ÿ‘ 36    ๐Ÿ” 1    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 0

No

24.11.2025 21:54 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Weber's Iron Cage Strategy Game - Interactive Sociology Simulation Explore Max Weber's authority types through an immersive 3D strategy game. Build societies, manage resources, and experience rationalization's effects in this educational simulation.

Play them:
Sociology: claude.ai/public/artif...

Space: claude.ai/public/artif...

Opera: claude.ai/public/artif...

Grill: claude.ai/public/artif...

24.11.2025 21:27 โ€” ๐Ÿ‘ 15    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Weber's Iron Cage Strategy Game - Interactive Sociology Simulation Explore Max Weber's authority types through an immersive 3D strategy game. Build societies, manage resources, and experience rationalization's effects in this educational simulation.

You can play it: claude.ai/public/artif...

24.11.2025 21:25 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image Post image Post image Post image

All the discussions were pretty charming

24.11.2025 20:49 โ€” ๐Ÿ‘ 14    ๐Ÿ” 1    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Video thumbnail

Me: Claude 4.5 Opus, I need a strategy game based on the work of Weber

Claude: Here's one based on David Weber's space operas

Me: Not that Weber

C: Here's a game based on sociologist Max Weber

Me: Not that one

C: The operas of Carl Maria von Weber?

Me: No

C: Here is one using Weber grills!

24.11.2025 20:29 โ€” ๐Ÿ‘ 62    ๐Ÿ” 4    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 3
Post image Post image

I had early access to Opus 4.5 & it is a very impressive model that seem to be right at the frontier

Big gains in ability to do practical work (like make a PowerPoint from an Excel) and the best results ever (& in one shot) in my Lem poetry test, plus good results in Claude Code

24.11.2025 18:59 โ€” ๐Ÿ‘ 57    ๐Ÿ” 6    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Preview
The recent history of AI in 32 otters Three years of progress as shown by marine mammals

History of the benchmark: www.oneusefulthing.org/p/the-recent...

21.11.2025 14:56 โ€” ๐Ÿ‘ 20    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

I think my โ€œotters on a plane using WiFiโ€ benchmark may saturated now that nano banana pro can do this.

21.11.2025 14:55 โ€” ๐Ÿ‘ 143    ๐Ÿ” 9    ๐Ÿ’ฌ 8    ๐Ÿ“Œ 2
Post image Post image Post image Post image

Ruining great art with the nano banana pro command โ€œMake this much more cheerful with as few changes as possibleโ€

21.11.2025 13:19 โ€” ๐Ÿ‘ 179    ๐Ÿ” 26    ๐Ÿ’ฌ 12    ๐Ÿ“Œ 8
Post image Post image

Tell all the truth but tell it slantโ€”
Success in Circuit lies
Too bright for our infirm Delight
The Truth's superb surprise

This paper finds poetry is a universal single shot jailbreak for LLMs. Systems built to stop prosaic attacks fail when the request is phrased in verse arxiv.org/abs/2511.15304

20.11.2025 21:47 โ€” ๐Ÿ‘ 40    ๐Ÿ” 12    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 4
Post image

Nano banana Pro: โ€œi need a flowchart for how to toast bread, make it as wacky and over the top and complicated as possible.โ€œ

Not absolutely perfect, but I canโ€™t believe how much there is a coherent through-line, how clear the text is, and also parts of it are actually funny?

20.11.2025 19:19 โ€” ๐Ÿ‘ 98    ๐Ÿ” 16    ๐Ÿ’ฌ 9    ๐Ÿ“Œ 1

I estimate I used around 10,000 tokens (likely less), so that would translate to about 2-5 Wh (a standard query is .3 Wh), which would be about as much power as 4 minutes of watching Netflix on a TV.

I suspect that viewing and uploading the video uses more power than generating the code for it.

19.11.2025 22:26 โ€” ๐Ÿ‘ 15    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Video thumbnail

"Hey, Gemini 3, So I need DOOM, but more root vegetables, also no guns or demons or mars. And more of a focus on different flooring styles. but otherwise EXACTLY the same as DOOM."

Gemini: "Here is F.L.O.O.R. (First-person Lino Observation & Ornamental Review)."

Pretty good!

19.11.2025 21:08 โ€” ๐Ÿ‘ 106    ๐Ÿ” 6    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 3
Preview
How well can Gemini 3 make a Henry James simulator? Finally, a benchmark for LLMs with real-world value

As a fan of weird but revealing benchmarks, I enjoyed this historianโ€™s attempts to have different frontier AIs build โ€œa full featured RPG game where you play as Henry James wandering as a flรขneur at the 1889 Universal Exposition in Paris.โ€ HenryBench? open.substack.com/pub/resobscu...

19.11.2025 04:13 โ€” ๐Ÿ‘ 67    ๐Ÿ” 14    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 2
Video thumbnail

Fun little Gemini 3 experiment where I asked it "build me a time machine simulator, make it very very good" and then "make it better" a few times. I like that it added calls to Gemini within the application, including adding speech & nano banana images. Play it: gemini.google.com/share/02e4e8...

18.11.2025 22:28 โ€” ๐Ÿ‘ 49    ๐Ÿ” 7    ๐Ÿ’ฌ 5    ๐Ÿ“Œ 0
Preview
Three Years from GPT-3 to Gemini 3 From chatbots to agents

I had access to Gemini 3. It is a very good, very fast model. It also demonstrates the change from chatbot to agent. www.oneusefulthing.org/p/three-year...

18.11.2025 18:57 โ€” ๐Ÿ‘ 90    ๐Ÿ” 15    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 4
Post image Post image

Interesting changes from Grok 4 to Grok 4.1. Decreases in harmful responses but also increases in sycophancy and deception.

It isnโ€™t clear how to interpret the sycophancy score, but the MASK score for deception is quite high compared to big models.

Sycophancy leads to higher LMArena scoresโ€ฆ

18.11.2025 02:55 โ€” ๐Ÿ‘ 61    ๐Ÿ” 6    ๐Ÿ’ฌ 7    ๐Ÿ“Œ 6
Post image

We are now seeing the first long-anticipated use of AI for semi-autonomous cyberattacks.

"This approach allowed the threat actor to achieve operational scale typically associated with nation-state campaigns while maintaining minimal direct involvement" www.anthropic.com/news/disrupt...

13.11.2025 19:12 โ€” ๐Ÿ‘ 53    ๐Ÿ” 11    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 9
Post image Post image

Some pretty eye-opening data on the effect of AI coding.

When Cursor added agentic coding in 2024, adopters produced 39% more code merges, with no sign of a decrease in quality (revert rates were the same, bugs dropped) and no sign that the scope of the work shrank. papers.ssrn.com/sol3/papers....

13.11.2025 05:18 โ€” ๐Ÿ‘ 90    ๐Ÿ” 10    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 3
Preview
Giving your AI a Job Interview As AI advice becomes more important, we are going to need to get better at assessing it

As AIs get smarter & more useful, our benchmarks become less useful. Measuring general knowledge or coding ability gives us only a glimpse into what an AI model can do.

Anyone who wants to use AI seriously for real work will need to assess it themselves. www.oneusefulthing.org/p/giving-you...

12.11.2025 02:55 โ€” ๐Ÿ‘ 58    ๐Ÿ” 10    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 3
Post image Post image

I keep warning that so many of our systems are still built around the assumption that quality writing and analysis are costly and therefore meaningful signals.

Our systems are very much not ready for the revelation that this is no longer true, as this planning objection AI shows

09.11.2025 23:39 โ€” ๐Ÿ‘ 87    ๐Ÿ” 13    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 2
Post image

This is a cool paper showing that first-gen college students don't realize a lot of unwritten rules that lead to success (the value of internships, student clubs, letters from professors).

But giving them access to an LLM for guidance significantly closes the gap. mgcuna.github.io/website/JMP_...

09.11.2025 14:55 โ€” ๐Ÿ‘ 95    ๐Ÿ” 12    ๐Ÿ’ฌ 5    ๐Ÿ“Œ 7
Video thumbnail

Sora: "that infamous dramatic Oscar winning scene where the lead keeps getting hit by the boom mic but nobody notices"

05.11.2025 04:32 โ€” ๐Ÿ‘ 56    ๐Ÿ” 1    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Post image Post image

I have been writing for years about the fact that we are not ready for the destruction of costly signalling mechanisms. Writing used to be a way of measuring effort, ability and diligence. We still have no easy substitute

Now this paper confirms that cover letters have lost their value as predictor

05.11.2025 01:48 โ€” ๐Ÿ‘ 101    ๐Ÿ” 12    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 5
Preview
Inside the Data Centers That Train A.I. and Drain the Electrical Grid A data center, which can use as much electricity as Philadelphia, is the new American factory, creating the future and propping up the economy. How long can this last?

www.newyorker.com/magazine/202...

03.11.2025 06:24 โ€” ๐Ÿ‘ 32    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

The big article on data centers in the New Yorker is pretty good, which I wasnโ€™t expecting given the reaction on X. Lots of talk of the good and bad of AI, and it covers both bubble & non-bubble arguments.

It also featured the best version of โ€œI spoke to a local farmer about a data centerโ€

03.11.2025 06:23 โ€” ๐Ÿ‘ 237    ๐Ÿ” 43    ๐Ÿ’ฌ 7    ๐Ÿ“Œ 4
Post image

I donโ€™t think how people are tracking how quickly this is happening, for better or worse.

02.11.2025 23:59 โ€” ๐Ÿ‘ 145    ๐Ÿ” 25    ๐Ÿ’ฌ 12    ๐Ÿ“Œ 7
Post image

Describing

02.11.2025 01:11 โ€” ๐Ÿ‘ 9    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

The other option, from Pater

02.11.2025 01:10 โ€” ๐Ÿ‘ 7    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

@emollick is following 20 prominent accounts