Shiny! The newly released Llama 3.3 LLM leads the LiveBench ranking for instruction following¹, beating Claude 3.5, GPT-4o, OpenAI o1, and you can run it on your local² machine.
> ollama run llama3.3
livebench.ai#/?IF=as
It might have been because they implemented this rule for the first time this year: "All authors who are on 3 or more papers must serve as a reviewer for at least 6 papers."
I'm a fan of that rule!