Ruiyi Wang's Avatar

Ruiyi Wang

@ruiyiwang.bsky.social

2nd year PhD at UCSD w/ @rajammanabrolu.bsky.social Prev: @ltiatcmu.bsky.social @umich.edu Research: Agents🤖, Reasoning🧠, Games👾

911 Followers  |  1,991 Following  |  10 Posts  |  Joined: 18.11.2024  |  1.5323

Latest posts by ruiyiwang.bsky.social on Bluesky

Preview
A Practitioner's Guide to Multi-turn Agentic Reinforcement Learning We study what actually works and what doesn't for training large language models as agents via multi-turn reinforcement learning. Despite rapid progress, existing frameworks and definitions are fragme...

(🧵6/6)
🌟Huge thanks to my advisor @rajammanabrolu.bsky.social for his invaluable guidance throughout this work! 🙏

Questions/feedback welcome below 👇

🖇️Paper: arxiv.org/abs/2510.01132
💻Code: github.com/pearls-lab/m...

26.10.2025 21:36 — 👍 1    🔁 0    💬 0    📌 0
Preview
GitHub - pearls-lab/meow-tea-taro: A Practitioner's Guide to M(eow)ti Turn Agentic ReinfOrcement learning A Practitioner's Guide to M(eow)ti Turn Agentic ReinfOrcement learning - pearls-lab/meow-tea-taro

(🧵5/6)
To help the community, we're releasing 🐈🍵Meow-Tea-Taro 💜: a modular framework where you can configure 🌎environments, 🤖policies, and ⭐rewards.

We also provide our recipes, analyses, and tutorials on building agentic multi-turn RL pipelines in the codebase.

Code: github.com/pearls-lab/m...

26.10.2025 21:36 — 👍 1    🔁 0    💬 1    📌 0
Post image Post image

(🧵4/6)
⭐Reward:

Dense rewards significantly improve multi-turn RL performance, with optimal density varying by RL algorithm.

26.10.2025 21:36 — 👍 1    🔁 0    💬 1    📌 0
Post image Post image

(🧵3/6)
🤖Policy:

1. Good SFT priors achieve the same performance with fewer RL episodes; however, RL is needed for generalization.
2. Given a fixed compute budget, there's an optimal SFT:RL data ratio.
3. Both PPO/GRPO (biased) and RLOO (unbiased) methods achieve improvements over base models

26.10.2025 21:36 — 👍 1    🔁 0    💬 1    📌 0
Post image Post image

(🧵2/6) Here are some key takeaways:
🌎Environment:

1. Agents trained on simpler environments can generalize to more complex environments.
2. Agents trained on a subset of tasks can generalize to unseen tasks.

26.10.2025 21:36 — 👍 1    🔁 0    💬 1    📌 0
Video thumbnail

🔥Excited to share our new work: "A Practitioner's Guide to Multi-turn Agentic Reinforcement Learning"!

We study what actually works for agentic multi-turn RL with varying 🌎Environment, 🤖Policy, and ⭐Reward.

We conduct various ablations and empirical analysis on 🧩TextWorld, 🧙ALFWorld, and 🧑‍💻SWE-Gym.

26.10.2025 21:36 — 👍 9    🔁 2    💬 1    📌 0

Would love to be added! I’m a PhD student at UCSD. Thank you!

12.12.2024 11:59 — 👍 3    🔁 0    💬 0    📌 0

Would love to be added! I’m a PhD student at UCSD working on LLM agents. Thank you!

12.12.2024 11:54 — 👍 1    🔁 0    💬 0    📌 0

Could you add me please? I’m a PhD student working on NLP at UCSD. Thank you so much!

11.12.2024 14:36 — 👍 1    🔁 0    💬 1    📌 0

Could you add me? Thanks!

21.11.2024 18:37 — 👍 1    🔁 0    💬 0    📌 0

Started a SoCal AI/ML/NLP researchers starter pack! It's a bit sparse right now, and perhaps more NLP heavy, but hey, nominate yourself and others! go.bsky.app/6QckPj9

19.11.2024 15:28 — 👍 43    🔁 8    💬 17    📌 1

@ruiyiwang is following 20 prominent accounts