π¨ NeurIPS 2024 Spotlight
Did you know we lack standards for AI benchmarks, despite their role in tracking progress, comparing models, and shaping policy? π€― Enter BetterBenchβour framework with 46 criteria to assess benchmark quality: betterbench.stanford.edu 1/x