's Avatar

@utherwayn.bsky.social

Just a bunny loving game developer

17 Followers  |  53 Following  |  2 Posts  |  Joined: 18.11.2024  |  1.5328

Latest posts by utherwayn.bsky.social on Bluesky

Potemkin Understanding in Large Language Models Large language models (LLMs) are regularly evaluated using benchmark datasets. But what justifies making inferences about an LLM's capabilities based on its answers to a curated set of questions? This...

@simonwillison.net I'm not trying to be an LLM denier here, but man this paper hit home for me as not an ML kind of person and I'd love to see your take on it?

[2506.21521] Potemkin Understanding in Large Language Models share.google/W9cKIwYoWI5W...

Coherence seems like an important metric.

28.06.2025 19:20 — 👍 2    🔁 0    💬 1    📌 0

These are great!

18.11.2024 19:37 — 👍 2    🔁 0    💬 0    📌 0

@utherwayn is following 20 prominent accounts