๐ Join our MAGELLAN talk on July 2!
We'll explore how LLM agents can monitor their own learning progress and choose what to learn next, like curious humans ๐ค
1h presentation + 1h Q&A on autotelic agents & more!
๐
July 2, 4:30 PM CEST
๐๏ธ forms.gle/1PC2fxJx1PZYfqFr7
25.06.2025 15:14 โ ๐ 1 ๐ 1 ๐ฌ 0 ๐ 1
๐จNew preprint๐จ
When testing LLMs with questions, how can we know they did not see the answer in their training? In this new paper we propose a simple out of the box and fast method to spot contamination on short texts with @stepalminteri.bsky.social and Pierre-Yves Oudeyer !
15.11.2024 13:47 โ ๐ 9 ๐ 4 ๐ฌ 1 ๐ 0
๐งญMAGELLAN is built on on many works that use of LP to drive automatic curriculum learning e.g. by @rockt.ai @egrefen.bsky.social @jeffclune.com @tomssilver.bsky.social @tambetm.bsky.social @jrsrichmond.bsky.social @ryanpsullivan.bsky.social
24.03.2025 15:09 โ ๐ 3 ๐ 0 ๐ฌ 0 ๐ 0
Thanks to @lorisgaven.bsky.social @clementromac.bsky.social for the fun time doing research on this topic and huge thanks also to, @ccolas.bsky.social Sylvain Lamprier, Olivier Sigaud and @pyoudeyer.bsky.social for their supervision!!
24.03.2025 15:09 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
๐๐: ๐๐๐๐ฉ๐ญ๐๐ญ๐ข๐จ๐ง ๐ญ๐จ ๐๐ฏ๐จ๐ฅ๐ฏ๐ข๐ง๐ ๐๐จ๐๐ฅ ๐๐ฉ๐๐๐๐ฌ
We replaced the ๐๐ง๐ญ๐ข๐ซ๐ ๐ ๐จ๐๐ฅ ๐ฌ๐ฉ๐๐๐ with unseen goals from the same categories. ๐งญMAGELLAN generalized LP and retained exceptional performanceโmatching baselines that rely on human expertise! ๐โจ
24.03.2025 15:09 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
๐๐: ๐๐๐ง๐๐ซ๐๐ฅ๐ข๐ณ๐๐ญ๐ข๐จ๐ง
At the end of training, ๐งญMAGELLAN has ๐ฌ๐ญ๐ซ๐ฎ๐๐ญ๐ฎ๐ซ๐๐ ๐ญ๐ก๐ ๐ ๐จ๐๐ฅ ๐๐ฆ๐๐๐๐๐ข๐ง๐ ๐ฌ๐ฉ๐๐๐, consistently ๐ฉ๐ซ๐๐๐ข๐๐ญ๐ข๐ง๐ success probability ๐๐จ๐ซ ๐ฎ๐ง๐ฌ๐๐๐ง ๐ ๐จ๐๐ฅ๐ฌ, a key step toward scalable open-ended learning!
24.03.2025 15:09 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
Evolution of the observed competence (SR) when evaluating policies on 64 training goals per category every 5000 episodes. We report the average SR over evaluated goals along with standard deviation (8 seeds). Icons indicate the average time step at which a method mastered a goal (i.e. SR $> 90\%$). We add stars to MAGELLAN, denoting significantly earlier mastery of a category compared to the method with the star's color (p-value $<8\times10^{-4}$). The dotted line (EK-Online-ALP) indicates that the method relies on expert knowledge.
๐๐: ๐๐ฎ๐ซ๐ซ๐ข๐๐ฎ๐ฅ๐ฎ๐ฆ ๐๐๐๐ซ๐ง๐ข๐ง๐
๐งญMAGELLAN autonomously discovers goal families (โ๐ฟ๐ฎ๐ฆ) across ๐๐๐ค ๐ ๐จ๐๐ฅ๐ฌ, performing on par with expert knowledge-augmented baselinesโbut ๐ฐ๐ข๐ญ๐ก๐จ๐ฎ๐ญ ๐ซ๐๐ช๐ฎ๐ข๐ซ๐ข๐ง๐ ๐ฉ๐ซ๐๐๐๐๐ข๐ง๐๐ ๐ ๐จ๐๐ฅ ๐๐ฅ๐ฎ๐ฌ๐ญ๐๐ซ๐ฌ! ๐
24.03.2025 15:09 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
๐ฏ ๐๐: ๐๐จ๐ฆ๐ฉ๐๐ญ๐๐ง๐๐ ๐๐ฌ๐ญ๐ข๐ฆ๐๐ญ๐ข๐จ๐ง
๐งญMAGELLAN matches expert baselines in estimating competence over tens of thousands of goals but with ๐ฆ๐ข๐ง๐ข๐ฆ๐๐ฅ ๐๐จ๐ฌ๐ญ & ๐๐ซ๐ซ๐จ๐ซ! Unlike other methods, it efficiently tracks competence transfer across large goal spaces
24.03.2025 15:09 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
We studied 4 scientific questions:
Q1 How does ๐งญMAGELLAN's competence estimation compare to classical approaches?
Q2 Can it be used to build an efficient curriculum?
Q3 Can it generalize on unseen goals?
Q4 Can it adapt to an evolving goal space?
Let's dive in! ๐
24.03.2025 15:09 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
By capturing semantic relationships between goals, ๐งญMAGELLAN enables efficient ๐๐ ๐๐ฌ๐ญ๐ข๐ฆ๐๐ญ๐ข๐จ๐ง & adaptive goal prioritizationโall ๐ฐ๐ข๐ญ๐ก๐จ๐ฎ๐ญ ๐ซ๐๐ฅ๐ฒ๐ข๐ง๐ ๐จ๐ง ๐๐ฑ๐ฉ๐๐ซ๐ญ-๐๐๐๐ข๐ง๐๐ ๐ ๐ซ๐จ๐ฎ๐ฉ๐ข๐ง๐ ๐ฌ! ๐ฅ #CurriculumLearning
24.03.2025 15:09 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
Our LLM agent uses ๐งญMAGELLAN to estimate past & current competence, computing ๐๐๐ฌ๐จ๐ฅ๐ฎ๐ญ๐ ๐ฅ๐๐๐ซ๐ง๐ข๐ง๐ ๐ฉ๐ซ๐จ๐ ๐ซ๐๐ฌ๐ฌ (๐๐๐) for each goal. The agent then selects goals that maximize ALP, learning efficiently via online RL. ๐ #ReinforcementLearning #LLM
24.03.2025 15:09 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
๐ Humans thrive in open-ended exploration, but AI struggles with infinite goal spaces. Learning progress (LP) helps, but scaling it is tough! ๐งญMAGELLAN tackles this by efficiently generalising LP to goals not practice, allowing the agent to navigate large and complex domains๐ข
24.03.2025 15:09 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
MAGELLAN: Metacognitive predictions of learning progress guide...
Open-ended learning agents must efficiently prioritize goals in vast possibility spaces, focusing on those that maximize learning progress (LP). When such autotelic exploration is achieved by LLM...
๐ Introducing ๐งญMAGELLANโour new metacognitive framework for LLM agents! It predicts its own learning progress (LP) in vast natural language goal spaces, enabling efficient exploration of complex domains.๐โจLearn more: ๐ arxiv.org/abs/2502.07709 #OpenEndedLearning #LLM #RL
24.03.2025 15:09 โ ๐ 9 ๐ 3 ๐ฌ 1 ๐ 4
Researcher in Machine Learning and Genetics. Here to explore projects in ALife and machine learning - particularly interested in self organising systems and interpretability! (he/him)
PhD Candidate at the University of Maryland researching reinforcement learning and autocurricula in complex, open-ended environments.
Previously RL intern @ SonyAI, RLHF intern @ Google Research, and RL intern @ Amazon Science
Research Scientist at ๐ค @huggingface, PhD. student at @FlowersINRIA.
Studying how autonomous Deep RL agents ๐ค can leverage LLMs ๐
Also playing bass ๐ธ
Reinforcement learning researcher, dabbled in robotics, and generative techniques that were later made out of date by diffusion. Currently at Sony AI, working on game AI
Computational cognitive scientist interested in learning and decision-making in human and machiches
Research director of the Human Reinforcement Learning team
Ecole Normale Supรฉrieure (ENS)
Institut National de la Santรฉ et Recherche Mรฉdicale (INSERM)
PhD student working on the cognition of LLMs | HRL team - ENS Ulm | FLOWERS - Inria Bordeaux
Building autotelic agents from socio-cultural interactions
https://ccolas.github.io/
A LLN - large language Nathan - (RL, RLHF, society, robotics), athlete, yogi, chef
Writes http://interconnects.ai
At Ai2 via HuggingFace, Berkeley, and normal places
22
Bolton
Singer/Songwriter heard on BBC Radio, Radio XS Manchester and Radio X
Research director @Inria, Head of @flowersInria
lab, prev. @MSFTResearch @SonyCSLParis
Artificial intelligence, cognitive sciences, sciences of curiosity, language, self-organization, autotelic agents, education, AI and society
http://www.pyoudeyer.com
Professor a NYU; Chief AI Scientist at Meta.
Researcher in AI, Machine Learning, Robotics, etc.
ACM Turing Award Laureate.
http://yann.lecun.com
I work at Sakana AI ๐๐ ๐ก โ @sakanaai.bsky.social
https://sakana.ai/careers
Professor of Computer Science at Oxford. Senior Staff Research Scientist at Waymo.
Staff research scientist at Google DeepMind. AI and neuro.
Former physicist, current human.
Find more at www.janexwang.com
Staff Research Scientist at Google DeepMind. Artificial and biological brains ๐ค ๐ง
FR/US/GB AI/ML Person, Director of Research at Google DeepMind, Honorary Professor at UCL DARK, ELLIS Fellow. Ex Oxford CS, Meta AI, Cohere.