Gautier Hamon's Avatar

Gautier Hamon

@hamongautier.bsky.social

PhD student at INRIA Flowers team. MVA master reytuag.github.io/gautier-hamon/

52 Followers  |  80 Following  |  11 Posts  |  Joined: 21.11.2024  |  1.8449

Latest posts by hamongautier.bsky.social on Bluesky


Video thumbnail

Complex cell-like structures in Flow Lenia

02.04.2025 11:52 β€” πŸ‘ 8    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Preview
MAGELLAN: Metacognitive predictions of learning progress guide... Open-ended learning agents must efficiently prioritize goals in vast possibility spaces, focusing on those that maximize learning progress (LP). When such autotelic exploration is achieved by LLM...

πŸš€ Introducing 🧭MAGELLANβ€”our new metacognitive framework for LLM agents! It predicts its own learning progress (LP) in vast natural language goal spaces, enabling efficient exploration of complex domains.🌍✨Learn more: πŸ”— arxiv.org/abs/2502.07709 #OpenEndedLearning #LLM #RL

24.03.2025 15:09 β€” πŸ‘ 9    πŸ” 3    πŸ’¬ 1    πŸ“Œ 4

we are recruiting interns for a few projects with @pyoudeyer
in bordeaux
> studying llm-mediated cultural evolution with @nisioti_eleni
@Jeremy__Perez

> balancing exploration and exploitation with autotelic rl with @ClementRomac

details and links in 🧡
please share!

27.11.2024 17:43 β€” πŸ‘ 6    πŸ” 6    πŸ’¬ 1    πŸ“Œ 0
1e9 steps on craftax with transformerXL PPO

1e9 steps on craftax with transformerXL PPO

4e9 steps on craftax with transformerXL PPO

4e9 steps on craftax with transformerXL PPO

8/ For the curious, here are the achievements success rate on craftax across training, training for 1e9 steps (left) and training for 4e9 steps (right).

22.11.2024 10:15 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
GitHub - Reytuag/transformerXL_PPO_JAX Contribute to Reytuag/transformerXL_PPO_JAX development by creating an account on GitHub.

7/ The JAX ecosystem in RL is currently blooming with wonderful open-sources projects from others that I linked at the bottom of the repository. github.com/Reytuag/tran...
This work was done at @FlowersINRIA
.
Also feel free to reach me if you have questions or suggestions !

22.11.2024 10:15 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

6/ Potential next steps could be to test it on Xland-Minigrid
, to test it on an Open-Ended meta-RL environment github.com/dunnolab/xla...
I'm also curious to implement Muesli (arxiv.org/abs/2104.06159) with transformerXL as in arxiv.org/abs/2301.07608

22.11.2024 10:15 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

5/Here is the training curve obtained from training for 1e9 steps, reporting the scores from PPO and PPO-RNN provided in the craftax repo.
Noting that PPO-RNN was already beating other baselines with Unsupervised Environment Design and intrinsic motivation. arxiv.org/pdf/2402.16801

22.11.2024 10:15 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
GitHub - MichaelTMatthews/Craftax: (Crafter + NetHack) in JAX. ICML 2024 Spotlight. (Crafter + NetHack) in JAX. ICML 2024 Spotlight. Contribute to MichaelTMatthews/Craftax development by creating an account on GitHub.

4/ Testing it on the challenging Craftax from github.com/MichaelTMatt...
(with little hyperparameter tuning), it obtained higher returns in 1e9 steps than PPO-RNN.
Training it for longer, led to the 3rd floor in craftax, making it the first to get advanced achievements.

22.11.2024 10:15 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

3/
Training a 3M parameters Transformer for 1e6 steps in MemoryChain-bsuite (from gymnax) takes 10s on a A100. (with 512 env)
Training a 5M parameters Transformer for 1e9 steps in craftax takes ~6h on a single A100. (with 1024 envs)
We also support multi-GPU training.

22.11.2024 10:15 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Stabilizing Transformers for Reinforcement Learning Owing to their ability to both effectively integrate information over long time horizons and scale to massive amounts of data, self-attention architectures have recently shown breakthrough success in ...

2/ We implement TransformerXL-PPO following "Stabilizing Transformers for Reinforcement
Learning" arxiv.org/abs/1910.06764
The code follows the template from PureJaxRL github.com/luchris429/p...
⚑️Training is fast thanks to JAX

22.11.2024 10:15 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

1/⚑️Looking for a fast and simple Transformer baseline for your RL environment in JAX ?
Sharing my implementation of transformerXL-PPO: github.com/Reytuag/tran...
The implementation is the first to attain the 3rd floor and obtain advanced achievements in the challenging Craftax

22.11.2024 10:15 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

The video encoding might not do it full justice.
Paper: direct.mit.edu/isal/proceed...

22.11.2024 10:04 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail
22.11.2024 10:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

Putting some Flow Lenia here too

22.11.2024 09:51 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Now that @jeffclune.bsky.social and @joelbot3000.bsky.social are here, time for an Open-Endedness starter pack.

go.bsky.app/MdVxrtD

20.11.2024 07:08 β€” πŸ‘ 105    πŸ” 32    πŸ’¬ 16    πŸ“Œ 5
Post image Post image

🚨New preprint🚨
When testing LLMs with questions, how can we know they did not see the answer in their training? In this new paper we propose a simple out of the box and fast method to spot contamination on short texts with @stepalminteri.bsky.social and Pierre-Yves Oudeyer !

15.11.2024 13:47 β€” πŸ‘ 9    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0

@hamongautier is following 19 prominent accounts