Julien Pourcel's Avatar

Julien Pourcel

@jul-p.bsky.social

PhD student at INRIA (FLOWERS team) working on LLM4code | Prev. (MVA) ENS ParisSaclay

13 Followers  |  218 Following  |  11 Posts  |  Joined: 21.11.2024  |  1.664

Latest posts by jul-p.bsky.social on Bluesky

πŸ€— This project wouldn't have been possible without my incredible co-author team, @ccolas.bsky.social & @pyoudeyer.bsky.social
#LLM #AI #ProgramSynthesis #ICML2025

10.07.2025 16:04 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I’ll be at ICML next weekβ€”let’s chat if you’re interested in self-improving LLMs, program synthesis, ARC, or other related subjects.

10.07.2025 16:04 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Self-Improving Language Models for Evolutionary Program Synthesis: A Case Study on ARC-AGI Article about SOAR paper

Want to learn more? We've made everything public:

πŸ“— Blog Post: julienp.netlify.app/posts/soar/
πŸ€— Models (7/14/32/72/123b) & Data: huggingface.co/collections/...
πŸ’» Code: github.com/flowersteam/...
πŸ“„ Paper: icml.cc/virtual/2025...

10.07.2025 16:04 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

πŸš€ **Broader Impact**: This isn't just about ARC puzzles. SOAR's framework could enhance program synthesis tasks where search-based LLM methods are limited by static model capabilities (FunSearch, AlphaEvolve, … )

10.07.2025 16:04 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

🌟 **Test-Time Learning**: Even on new problems, SOAR continues improving by focusing on solutions that work well on the given examples. This enables real-time adaptation to novel challenges.

10.07.2025 16:04 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸ“ˆ **Results**:
- Qwen-7B model: 6% β†’ 36% accuracy
- Qwen-32B model: 13% β†’ 45% accuracy
- Mistral-Large-2: 20% -> 46% accuracy
- Combined ensemble: 52% on ARC-AGI test set
- Outperforms much larger models like o3-mini and Claude-4-Sonnet

10.07.2025 16:04 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

🎯 Key Insight: Failed programs aren't useless! Through "hindsight relabeling," SOAR treats each failed program as the *correct* solution to a different (synthetic) problem. This massively expands the training data diversity.

10.07.2025 16:04 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

🧠 **The Learning Process**: The system learns TWO skills simultaneously:
- **Sampling**: Generate better initial solutions
- **Refinement**: Enhance initial solutions
We also find that learning both together works better than specializing!

10.07.2025 16:04 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

πŸ”„ SOAR doesn't just search harder β€” it gets SMARTER. It alternates between:
- Evolutionary search: LLM samples and refines candidate programs.
- Hindsight learning: The model learns from all its search attempts, successes and failures, to fine-tune its skills for the next round.

10.07.2025 16:04 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸ”¬ Why This Matters? Most coding tasks are too hard for even the best language models to solve in one shot. Traditional search methods help, but they hit a wall because the model’s abilities are fixed. SOAR breaks through this barrier by letting the model improve itself over time

10.07.2025 16:04 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Introducing SOAR πŸš€, a self-improving framework for prog synth that alternates between search and learning (accepted to #ICML!)

It brings LLMs from just a few percent on ARC-AGI-1 up to 52%

We’re releasing the finetuned LLMs, a dataset of 5M generated programs and the code.

🧡

10.07.2025 16:04 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 2    πŸ“Œ 1

@jul-p is following 20 prominent accounts