This work couldn’t be more urgent. We need better measurement practices in AI evaluation — asap. Here, we aim to clarify and inform, and show what better looks like for accuracy metrics and confidence estimates, with bonuses such as deeper evaluation understanding. Excellent work, team!
19.02.2026 16:00 — 👍 1 🔁 0 💬 0 📌 0