This was a collaborative work with Siddhant Agarwal,
@pranayajajoo.bsky.social
, Samyak Parajuli, Caleb Chuck, Max Rudolph, Peter Stone, Amy Zhang, @scottniekum.bsky.social .Also, a joint effort across universities:
@texasrobotics.bsky.social
, UMass Amherst, U Alberta
11.12.2024 07:12 β π 0 π 0 π¬ 0 π 0
RL Zero: Zero-Shot Language to Behaviors without any Supervision
Rewards remain an uninterpretable way to specify tasks for Reinforcement Learning, as humans are often unable to predict the optimal behavior of any given reward function, leading to poor reward desig...
(10/n) For the first time to our knowledge, we present a zero-shot end-to-end unsupervised algorithm that gives a pathway from language to low-level control.
Check out the work here for more details:
Paper: arxiv.org/abs/2412.05718
Website: hari-sikchi.github.io/rlzero/
11.12.2024 07:11 β π 3 π 0 π¬ 1 π 0
(9/n) For instance,
a) future approaches can initialize a behavior instantly by prompting for later finetuning,
b) Or come up with approaches to plan in lang. space and translate each instruction to low-level control
c) With gen. video models getting better (e.g. Sora) RLZero will only get better.
11.12.2024 07:11 β π 0 π 0 π¬ 1 π 0
(8/n) Zero-shot = no inference time training (no costly/unsafe RL training during inference)
+
Unsupervised = no costly dataset labeling (a big issue for robotics!)
is a promising recipe for scaling up robot learning.
11.12.2024 07:11 β π 0 π 0 π¬ 1 π 0
(7/n) This project is close to my heart as it realizes a dream I shared with @scottniekum.bsky.social when I started my PhD to go beyond the limitation of matching observations in imitation learning rather than capturing the semantic understanding of what doing a task means.
11.12.2024 07:11 β π 0 π 0 π¬ 1 π 0
(6/n) With RLZero, you can just pass in a YouTube video and ask an agent to mimic the behavior instantly. This brings us closer to true zero-shot cross-embodiment transfer.
11.12.2024 07:11 β π 1 π 0 π¬ 1 π 0
(5/n) RLZeroβs Prompt to Policy: Asking a humanoid agent to perform a headstand.
11.12.2024 07:11 β π 0 π 0 π¬ 1 π 0
(4/n) Reward is an inconvenient and easily hackable form of task specification. Now, we can prompt and obtain behaviors zero-shot with language. Example: Asking a walker agent to perform a cartwheel.
11.12.2024 07:11 β π 0 π 0 π¬ 1 π 0
(3/n) Given a text prompt, RL Zero imagines π§ the expected behavior of the agent using generative video models. The imaginations are projected and grounded to the observations that the agent has encountered in the past. Finally, zero-shot imitation learning converts the grounded obs into a policy.
11.12.2024 07:11 β π 0 π 0 π¬ 1 π 0
(2/n) RL Zero enables prompt-to-policy generation, and we believe this unlocks new capabilities in scaling up language-conditioned RL, providing an interpretable link between RL agents and humans and achieving true cross-embodiment transfer.
11.12.2024 07:11 β π 0 π 0 π¬ 1 π 0
π€ Introducing RL Zero π€: a new approach to transform language into behavior zero-shot for embodied agents without labeled datasets!
11.12.2024 07:11 β π 15 π 5 π¬ 1 π 2
(7/n) This project is close to my heart as it realizes a dream I shared with @scottniekum.bsky.social when I started my PhD to go beyond the limitation of matching observations in imitation learning rather than capturing the semantic understanding of what doing a task means.
10.12.2024 08:14 β π 0 π 0 π¬ 0 π 0
(6/n) With RLZero, you can just pass in a YouTube video and ask an agent to mimic the behavior instantly. This brings us closer to true zero-shot cross-embodiment transfer.
10.12.2024 08:14 β π 0 π 0 π¬ 1 π 0
(5/n) RLZeroβs Prompt to Policy: Asking a humanoid agent to perform a headstand.
10.12.2024 08:14 β π 0 π 0 π¬ 1 π 0
(4/n) Reward is an inconvenient and easily hackable form of task specification. Now, we can prompt and obtain behaviors zero-shot with language. Example: Asking a walker agent to perform a cartwheel.
10.12.2024 08:14 β π 0 π 0 π¬ 1 π 0
(3/n) Given a text prompt, RL Zero imagines π§ the expected behavior of the agent using generative video models. The imaginations are projected and grounded to the observations that the agent has encountered in the past. Finally, zero-shot imitation learning converts the grounded obs into a policy.
10.12.2024 08:14 β π 0 π 0 π¬ 1 π 0
(2/n) RL Zero enables prompt-to-policy generation, and we believe this unlocks new capabilities in scaling up language-conditioned RL, providing an interpretable link between RL agents and humans and achieving true cross-embodiment transfer.
10.12.2024 08:14 β π 0 π 0 π¬ 1 π 0
(3/5) We give an efficient algorithm to learn such basis, and once these are learned as a part of pretraining, inference amounts to solving a simple linear program. This allows PSM to do zero-shot RL in a way that is more performant and stable than baselines.
03.12.2024 00:33 β π 0 π 0 π¬ 1 π 0
(2/5) Our work, Proto-Successor Measures (PSM), shows that valid successor measures form an affine set. PSM learns a basis of the affine set where the dimensionality of the basis controls the compression of MDP (or the information lost). After all, learning is compression.
03.12.2024 00:33 β π 1 π 0 π¬ 1 π 0
What if I told you all solutions for RL lie on a (hyper) plane? Then, we can use that fact to learn a compressed representation for MDP that unlocks efficient policy inference for any reward fn. On this plane, solving RL is equivalent to solving a linear constrained optimization!
03.12.2024 00:33 β π 7 π 0 π¬ 1 π 0
Harshit Sikchi
I will be attending @neuripsconf.bsky.social and am on the job market. Hit me up to chat about topics in RL (Zero-shot RL, Imitation Learning, Offline RL, Deep RL) or Alignment!
Learn more about my research interests: hari-sikchi.github.io/research/
02.12.2024 00:39 β π 9 π 1 π¬ 0 π 0
we should catch up if you are available!
01.12.2024 00:42 β π 1 π 0 π¬ 1 π 0
This is just a bad year for ICLR authors and reviewers π₯
25.11.2024 16:35 β π 4 π 0 π¬ 1 π 0
Can you add me too? πββοΈ
17.11.2024 01:18 β π 1 π 0 π¬ 1 π 0
πββοΈSeems relevant to me too!
17.11.2024 01:15 β π 0 π 0 π¬ 0 π 0
Chief Models Officer @ Stealth Startup; Inria & MVA - Ex: Llama @AIatMeta & Gemini and BYOL @GoogleDeepMind
Chilean π¨π± living in France. I build DL models and pipelines. ML Engineer at W&B
cargobike β₯π΄
https://tcapelle.github.io/
Reinforcement Learning @ University of Alberta
pranayajajoo.github.io
AI & Transportation | MIT Associate Professor
Interests: AI for good, sociotechnical systems, machine learning, optimization, reinforcement learning, public policy, gov tech, open science.
Science is messy and beautiful.
http://www.wucathy.com
The Multi-disciplinary Conference on Reinforcement Learning and Decision Making.
11-14 June 2025.
Trinity College Dublin.
https://rldm.org/
I work at Sakana AI ππ π‘ β @sakanaai.bsky.social
https://sakana.ai/careers
The Thirty-Eighth Annual Conference on Neural Information Processing Systems will be held in Vancouver Convention Center, on Tuesday, Dec 10 through Sunday, Dec 15.
https://neurips.cc/
International Conference on Learning Representations https://iclr.cc/
Information and updates about RLC 2025 at the University of Alberta from Aug. 5th to 8th!
https://rl-conference.cc
Gemini Thinking @ Google DeepMind. Previously @ Google Brain. Previously @ DeepMind. Intelligent Agents
Asst. Prof. @ Olin College of Engineering | CMU, MIT alum | reinforcement learning, AI ethics, equity and justice, baking | ADHD πππ
π§π»ββοΈ scientist at Meta NYC | http://bamos.github.io
Associate prof @ UMass Amherst CICS.
AIignment, reinforcement learning, imitation learning, and robotics.
Stupid #robotics guy at ETHz
Twitter: https://x.com/ChongZitaZhang
Research Website: https://zita-ch.github.io/
PhD student at @cmurobotics.bsky.social working on interactive algorithms for agentic alignment (e.g. imitation/RLHF). no model is an island. https://gokul.dev/.
Assistant prof at LTI CMU; Research scientist at Meta AI. Working on NLP: language interfaces, applied pragmatics, language-to-code, grounding. https://dpfried.github.io/
PhD Student at UC San Diego | LLM Agents, Reinforcement Learning, Human-AI Collaboration, Multi-Agent Systems
RS DeepMind. Works on Unsupervised Environment Design, Problem Specification, Game/Decision Theory, RL, AIS. prev CHAI_Berkeley
PhD student at Northeastern University | MARL | Ex Mila | rupalibhati.github.io