Great to see BALROG on @bsky.app as well!
25.11.2024 15:00 β π 1 π 0 π¬ 0 π 0@epignatelli.com.bsky.social
Assistant professor (UK Lecturer) at @UCL. PhD at @UCL. Past architect. Previously ML Lead at @burohappold. RL, credit assignment, reward-genesis.
Great to see BALROG on @bsky.app as well!
25.11.2024 15:00 β π 1 π 0 π¬ 0 π 0Tired of saturated benchmarks? Want scope for a significant leap in capabilities?
π₯ Introducing BALROG: a Benchmark for Agentic LLM and VLM Reasoning On Games!
BALROG is a challenging benchmark for LLM agentic capabilities, designed to stay relevant for years to come.
1/π§΅