's Avatar

@dpeskoff.bsky.social

8 Followers  |  28 Following  |  1 Posts  |  Joined: 05.04.2025
Posts Following

Posts by (@dpeskoff.bsky.social)

Post image

๐Ÿšจ New Position Paper ๐Ÿšจ

Multiple choice evals for LLMs are simple and popular, but we know they are awful ๐Ÿ˜ฌ

We complain they're full of errors, saturated, and test nothing meaningful, so why do we still use them? ๐Ÿซ 

Here's why MCQA evals are broken, and how to fix them ๐Ÿงต

24.02.2025 21:03 โ€” ๐Ÿ‘ 46    ๐Ÿ” 13    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

Hello, World!

05.04.2025 03:21 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0