's Avatar

@cartathomas.bsky.social

24 Followers  |  67 Following  |  12 Posts  |  Joined: 22.03.2025  |  1.9473

Latest posts by cartathomas.bsky.social on Bluesky

๐Ÿ”” Join our MAGELLAN talk on July 2!

We'll explore how LLM agents can monitor their own learning progress and choose what to learn next, like curious humans ๐Ÿค”

1h presentation + 1h Q&A on autotelic agents & more!

๐Ÿ“… July 2, 4:30 PM CEST
๐ŸŽŸ๏ธ forms.gle/1PC2fxJx1PZYfqFr7

25.06.2025 15:14 โ€” ๐Ÿ‘ 1    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
Post image Post image

๐ŸšจNew preprint๐Ÿšจ
When testing LLMs with questions, how can we know they did not see the answer in their training? In this new paper we propose a simple out of the box and fast method to spot contamination on short texts with @stepalminteri.bsky.social and Pierre-Yves Oudeyer !

15.11.2024 13:47 โ€” ๐Ÿ‘ 9    ๐Ÿ” 4    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

๐ŸงญMAGELLAN is built on on many works that use of LP to drive automatic curriculum learning e.g. by @rockt.ai @egrefen.bsky.social @jeffclune.com @tomssilver.bsky.social @tambetm.bsky.social @jrsrichmond.bsky.social @ryanpsullivan.bsky.social

24.03.2025 15:09 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Thanks to @lorisgaven.bsky.social @clementromac.bsky.social for the fun time doing research on this topic and huge thanks also to, @ccolas.bsky.social Sylvain Lamprier, Olivier Sigaud and @pyoudeyer.bsky.social for their supervision!!

24.03.2025 15:09 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
GitHub - flowersteam/MAGELLAN: MAGELLAN: Metacognitive predictions of learning progress guide autotelic LLM agents in large goal spaces MAGELLAN: Metacognitive predictions of learning progress guide autotelic LLM agents in large goal spaces - flowersteam/MAGELLAN

github.com/flowersteam...

24.03.2025 15:09 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐๐Ÿ’: ๐€๐๐š๐ฉ๐ญ๐š๐ญ๐ข๐จ๐ง ๐ญ๐จ ๐„๐ฏ๐จ๐ฅ๐ฏ๐ข๐ง๐  ๐†๐จ๐š๐ฅ ๐’๐ฉ๐š๐œ๐ž๐ฌ
We replaced the ๐ž๐ง๐ญ๐ข๐ซ๐ž ๐ ๐จ๐š๐ฅ ๐ฌ๐ฉ๐š๐œ๐ž with unseen goals from the same categories. ๐ŸงญMAGELLAN generalized LP and retained exceptional performanceโ€”matching baselines that rely on human expertise! ๐Ÿš€โœจ

24.03.2025 15:09 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Video thumbnail

๐๐Ÿ‘: ๐†๐ž๐ง๐ž๐ซ๐š๐ฅ๐ข๐ณ๐š๐ญ๐ข๐จ๐ง
At the end of training, ๐ŸงญMAGELLAN has ๐ฌ๐ญ๐ซ๐ฎ๐œ๐ญ๐ฎ๐ซ๐ž๐ ๐ญ๐ก๐ž ๐ ๐จ๐š๐ฅ ๐ž๐ฆ๐›๐ž๐๐๐ข๐ง๐  ๐ฌ๐ฉ๐š๐œ๐ž, consistently ๐ฉ๐ซ๐ž๐๐ข๐œ๐ญ๐ข๐ง๐  success probability ๐Ÿ๐จ๐ซ ๐ฎ๐ง๐ฌ๐ž๐ž๐ง ๐ ๐จ๐š๐ฅ๐ฌ, a key step toward scalable open-ended learning!

24.03.2025 15:09 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Evolution of the observed competence (SR) when evaluating policies on 64 training goals per category every 5000 episodes. We report the average SR over evaluated goals along with standard deviation (8 seeds). Icons indicate the average time step at which a method mastered a goal (i.e. SR $> 90\%$). We add stars to MAGELLAN, denoting significantly earlier mastery of a category compared to the method with the star's color (p-value $<8\times10^{-4}$). The dotted line (EK-Online-ALP) indicates that the method relies on expert knowledge.

Evolution of the observed competence (SR) when evaluating policies on 64 training goals per category every 5000 episodes. We report the average SR over evaluated goals along with standard deviation (8 seeds). Icons indicate the average time step at which a method mastered a goal (i.e. SR $> 90\%$). We add stars to MAGELLAN, denoting significantly earlier mastery of a category compared to the method with the star's color (p-value $<8\times10^{-4}$). The dotted line (EK-Online-ALP) indicates that the method relies on expert knowledge.

๐๐Ÿ: ๐‚๐ฎ๐ซ๐ซ๐ข๐œ๐ฎ๐ฅ๐ฎ๐ฆ ๐‹๐ž๐š๐ซ๐ง๐ข๐ง๐ 
๐ŸงญMAGELLAN autonomously discovers goal families (โœŠ๐ŸŒฟ๐Ÿฎ๐Ÿฆ) across ๐Ÿ๐Ÿ“๐ค ๐ ๐จ๐š๐ฅ๐ฌ, performing on par with expert knowledge-augmented baselinesโ€”but ๐ฐ๐ข๐ญ๐ก๐จ๐ฎ๐ญ ๐ซ๐ž๐ช๐ฎ๐ข๐ซ๐ข๐ง๐  ๐ฉ๐ซ๐ž๐๐ž๐Ÿ๐ข๐ง๐ž๐ ๐ ๐จ๐š๐ฅ ๐œ๐ฅ๐ฎ๐ฌ๐ญ๐ž๐ซ๐ฌ! ๐Ÿš€

24.03.2025 15:09 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐ŸŽฏ ๐๐Ÿ: ๐‚๐จ๐ฆ๐ฉ๐ž๐ญ๐ž๐ง๐œ๐ž ๐„๐ฌ๐ญ๐ข๐ฆ๐š๐ญ๐ข๐จ๐ง
๐ŸงญMAGELLAN matches expert baselines in estimating competence over tens of thousands of goals but with ๐ฆ๐ข๐ง๐ข๐ฆ๐š๐ฅ ๐œ๐จ๐ฌ๐ญ & ๐ž๐ซ๐ซ๐จ๐ซ! Unlike other methods, it efficiently tracks competence transfer across large goal spaces

24.03.2025 15:09 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

We studied 4 scientific questions:
Q1 How does ๐ŸงญMAGELLAN's competence estimation compare to classical approaches?
Q2 Can it be used to build an efficient curriculum?
Q3 Can it generalize on unseen goals?
Q4 Can it adapt to an evolving goal space?
Let's dive in! ๐Ÿ‘‡

24.03.2025 15:09 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

By capturing semantic relationships between goals, ๐ŸงญMAGELLAN enables efficient ๐‹๐ ๐ž๐ฌ๐ญ๐ข๐ฆ๐š๐ญ๐ข๐จ๐ง & adaptive goal prioritizationโ€”all ๐ฐ๐ข๐ญ๐ก๐จ๐ฎ๐ญ ๐ซ๐ž๐ฅ๐ฒ๐ข๐ง๐  ๐จ๐ง ๐ž๐ฑ๐ฉ๐ž๐ซ๐ญ-๐๐ž๐Ÿ๐ข๐ง๐ž๐ ๐ ๐ซ๐จ๐ฎ๐ฉ๐ข๐ง๐ ๐ฌ! ๐Ÿ”ฅ #CurriculumLearning

24.03.2025 15:09 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Our LLM agent uses ๐ŸงญMAGELLAN to estimate past & current competence, computing ๐š๐›๐ฌ๐จ๐ฅ๐ฎ๐ญ๐ž ๐ฅ๐ž๐š๐ซ๐ง๐ข๐ง๐  ๐ฉ๐ซ๐จ๐ ๐ซ๐ž๐ฌ๐ฌ (๐€๐‹๐) for each goal. The agent then selects goals that maximize ALP, learning efficiently via online RL. ๐Ÿš€ #ReinforcementLearning #LLM

24.03.2025 15:09 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

๐ŸŒ Humans thrive in open-ended exploration, but AI struggles with infinite goal spaces. Learning progress (LP) helps, but scaling it is tough! ๐ŸงญMAGELLAN tackles this by efficiently generalising LP to goals not practice, allowing the agent to navigate large and complex domains๐Ÿšข

24.03.2025 15:09 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
MAGELLAN: Metacognitive predictions of learning progress guide... Open-ended learning agents must efficiently prioritize goals in vast possibility spaces, focusing on those that maximize learning progress (LP). When such autotelic exploration is achieved by LLM...

๐Ÿš€ Introducing ๐ŸงญMAGELLANโ€”our new metacognitive framework for LLM agents! It predicts its own learning progress (LP) in vast natural language goal spaces, enabling efficient exploration of complex domains.๐ŸŒโœจLearn more: ๐Ÿ”— arxiv.org/abs/2502.07709 #OpenEndedLearning #LLM #RL

24.03.2025 15:09 โ€” ๐Ÿ‘ 9    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 4

@cartathomas is following 19 prominent accounts