Complex cell-like structures in Flow Lenia
02.04.2025 11:52 β π 8 π 2 π¬ 0 π 0@hamongautier.bsky.social
PhD student at INRIA Flowers team. MVA master reytuag.github.io/gautier-hamon/
Complex cell-like structures in Flow Lenia
02.04.2025 11:52 β π 8 π 2 π¬ 0 π 0π Introducing π§MAGELLANβour new metacognitive framework for LLM agents! It predicts its own learning progress (LP) in vast natural language goal spaces, enabling efficient exploration of complex domains.πβ¨Learn more: π arxiv.org/abs/2502.07709 #OpenEndedLearning #LLM #RL
24.03.2025 15:09 β π 9 π 3 π¬ 1 π 4we are recruiting interns for a few projects with @pyoudeyer
in bordeaux
> studying llm-mediated cultural evolution with @nisioti_eleni
@Jeremy__Perez
> balancing exploration and exploitation with autotelic rl with @ClementRomac
details and links in π§΅
please share!
1e9 steps on craftax with transformerXL PPO
4e9 steps on craftax with transformerXL PPO
8/ For the curious, here are the achievements success rate on craftax across training, training for 1e9 steps (left) and training for 4e9 steps (right).
22.11.2024 10:15 β π 1 π 0 π¬ 0 π 07/ The JAX ecosystem in RL is currently blooming with wonderful open-sources projects from others that I linked at the bottom of the repository. github.com/Reytuag/tran...
This work was done at @FlowersINRIA
.
Also feel free to reach me if you have questions or suggestions !
6/ Potential next steps could be to test it on Xland-Minigrid
, to test it on an Open-Ended meta-RL environment github.com/dunnolab/xla...
I'm also curious to implement Muesli (arxiv.org/abs/2104.06159) with transformerXL as in arxiv.org/abs/2301.07608
5/Here is the training curve obtained from training for 1e9 steps, reporting the scores from PPO and PPO-RNN provided in the craftax repo.
Noting that PPO-RNN was already beating other baselines with Unsupervised Environment Design and intrinsic motivation. arxiv.org/pdf/2402.16801
4/ Testing it on the challenging Craftax from github.com/MichaelTMatt...
(with little hyperparameter tuning), it obtained higher returns in 1e9 steps than PPO-RNN.
Training it for longer, led to the 3rd floor in craftax, making it the first to get advanced achievements.
3/
Training a 3M parameters Transformer for 1e6 steps in MemoryChain-bsuite (from gymnax) takes 10s on a A100. (with 512 env)
Training a 5M parameters Transformer for 1e9 steps in craftax takes ~6h on a single A100. (with 1024 envs)
We also support multi-GPU training.
2/ We implement TransformerXL-PPO following "Stabilizing Transformers for Reinforcement
Learning" arxiv.org/abs/1910.06764
The code follows the template from PureJaxRL github.com/luchris429/p...
β‘οΈTraining is fast thanks to JAX
1/β‘οΈLooking for a fast and simple Transformer baseline for your RL environment in JAX ?
Sharing my implementation of transformerXL-PPO: github.com/Reytuag/tran...
The implementation is the first to attain the 3rd floor and obtain advanced achievements in the challenging Craftax
The video encoding might not do it full justice.
Paper: direct.mit.edu/isal/proceed...
Putting some Flow Lenia here too
22.11.2024 09:51 β π 4 π 1 π¬ 1 π 0Now that @jeffclune.bsky.social and @joelbot3000.bsky.social are here, time for an Open-Endedness starter pack.
go.bsky.app/MdVxrtD
π¨New preprintπ¨
When testing LLMs with questions, how can we know they did not see the answer in their training? In this new paper we propose a simple out of the box and fast method to spot contamination on short texts with @stepalminteri.bsky.social and Pierre-Yves Oudeyer !