What's wrong with evaluating #LLMs after a single interaction? Come find out @iclr-conf.bsky.social and learn how cultural attraction theory can help us do better. Poster #288, 10 am.
23.04.2025 22:11 — 👍 7 🔁 2 💬 1 📌 2@kovacgrgur.bsky.social
PhD student at INRIA in the Flowers team. https://grgkovac.github.io Twitter: @KovacGrgur
What's wrong with evaluating #LLMs after a single interaction? Come find out @iclr-conf.bsky.social and learn how cultural attraction theory can help us do better. Poster #288, 10 am.
23.04.2025 22:11 — 👍 7 🔁 2 💬 1 📌 2🚀 Introducing 🧭MAGELLAN—our new metacognitive framework for LLM agents! It predicts its own learning progress (LP) in vast natural language goal spaces, enabling efficient exploration of complex domains.🌍✨Learn more: 🔗 arxiv.org/abs/2502.07709 #OpenEndedLearning #LLM #RL
24.03.2025 15:09 — 👍 9 🔁 3 💬 1 📌 4The leaderboard is explained in our previous tweet (haven't transferred it to Bluesky yet) 😐:
x.com/KovacGrgur/s...
LLama 3.3 is great, but Nemotron is still the leader in our StickToYourRole Leaderboard !
Nemotron 🥇
Llama 3.3 🥈
huggingface.co/spaces/flowe...
I'm excited to announce that this work has been accepted at
@blog.neurips.cc.web.brid.gy 🧠🤖 We hope to spark conversations on goal selection in biological and artificial agents.
Check it out at openreview.net/forum?id=Gbq...
With Cédric Colas, Pierre-Yves Oudeyer, & Anne Collins
🚨New preprint🚨
When testing LLMs with questions, how can we know they did not see the answer in their training? In this new paper we propose a simple out of the box and fast method to spot contamination on short texts with @stepalminteri.bsky.social and Pierre-Yves Oudeyer !