Colin White

Colin White

@crwhite-ml.bsky.social

LLM evaluation Head of Research at Abacus.AI. PhD from CMU https://crwhite.ml

15 Followers 16 Following 1 Posts Joined Nov 2024
1 year ago
The LiveBench leaderboard showing llama-3.3-70b-instruct-turbo in the leading position by average instruction following performance

Shiny! The newly released Llama 3.3 LLM leads the LiveBench ranking for instruction following¹, beating Claude 3.5, GPT-4o, OpenAI o1, and you can run it on your local² machine.

> ollama run llama3.3

livebench.ai#/?IF=as

7 1 3 0
1 year ago

It might have been because they implemented this rule for the first time this year: "All authors who are on 3 or more papers must serve as a reviewer for at least 6 papers."
I'm a fan of that rule!

0 0 1 0