Abhishek Gupta @abhishekunique7

Q&A: New AI training method lets systems better adjust to users’ values University of Washington researchers created a method for training AI systems — both for large language models like ChatGPT and for robots — that can better reflect users’ diverse values. It...

UW News put out a Q&A about our recent work on Variational Preference Learning, a technique for personalizing Reinforcement Learning from Human Feedback (RLHF) washington.edu/news/2024/12...

18.12.2024 21:51 — 👍 30 🔁 8 💬 1 📌 0

Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL In order to mitigate the sample complexity of real-world reinforcement learning, common practice is to first train a policy in a simulator where samples are cheap, and then deploy this policy in the r...

The key insight is - if you’re going to transfer from sim-to-real, be careful about transferring exploration behavior rather than just policies.Take a look at our paper: arxiv.org/abs/2410.20254. Fun collaboration with Andrew Wagenmaker, Kevin Huang, Kay Ke, Byron Boots, @kjamieson.bsky.social(6/6)

06.12.2024 00:46 — 👍 1 🔁 0 💬 0 📌 0

Given these insights,we develop a practical instantiation using a diversity driven policy learning algorithm in sim. This learns diverse exploration in the neighborhood of an optimal policy. Exploring using this policy learned in sim can then enable efficient policy learning in the real world (5/6)

06.12.2024 00:46 — 👍 1 🔁 0 💬 1 📌 0

We show that exploration policies learned in sim can be played in the real world to collect data, performing policy improvement with RL. These policies cannot be naively played but must be combined with some degree of random exploration. However, this exploration is now provably efficient! (4/6)

06.12.2024 00:46 — 👍 0 🔁 0 💬 1 📌 0

We propose a simple fix - instead of learning *optimal* policies in sim, learn exploratory policies in sim. Since data collection in sim is cheap, we learn exploratory policies that have broad coverage. Even with domain gap, exploratory policies in sim can explore in the real world. (3/6)

06.12.2024 00:46 — 👍 0 🔁 0 💬 1 📌 0

The typical paradigm for sim2real transfers a policy and hopes for few-shot finetuning with naive exploration. First, we show that there exist envs where this can be exponentially inefficient. An overconfident, wrong policy in an incorrect sim can lead to poor real-world exploration (2/6)

06.12.2024 00:46 — 👍 0 🔁 0 💬 1 📌 0

We’ve been investigating how sim, while wrong, can be useful for real-world robotic RL! In our #NeurIPS2024 work, we theoretically showed how naive sim2real transfer can be inefficient, but if you *learn to explore* in sim, this transfers to the real world! We show this works on real robots! 🧵(1/6)

06.12.2024 00:46 — 👍 13 🔁 5 💬 2 📌 0

Robot Learning with Super-Linear Scaling Scaling robot learning requires data collection pipelines that scale favorably with human effort. In this work, we propose Crowdsourcing and Amortizing Human Effort for Real-to-Sim-to-Real(CASHER), a ...

I’m excited about the doors this opens for generalizable robot pre-training!

Paper: arxiv.org/abs/2412.01770
Website: casher-robot-learning.github.io/CASHER/

Fun project w/ @marcelto.bsky.social , @arhanjain.bsky.social , Carrie Yuan, Macha V, @ankile.bsky.social, Anthony S, Pulkit Agrawal :)

05.12.2024 02:12 — 👍 2 🔁 1 💬 0 📌 0

I’m also a sucker for a fun website. Check out our interactive demo where you can see some of the environments and learned behaviors. We’ve also open sourced USDZ assets of the sourced environments. (8/N)

05.12.2024 02:12 — 👍 0 🔁 0 💬 1 📌 0

Why do I care - we’re going to have to consider off-domain data for robotics, and realistic simulation constructed cheaply from video provides a scalable way to source this data. Building methods that scale sublinearly with human effort make this practical, and generalizable. (7/N)

05.12.2024 02:12 — 👍 1 🔁 0 💬 1 📌 0

Step 5: One neat feature is that in a test environment, human demos aren’t even required. Scan in an environment video to build a test-time simulation and let the generalist model provide itself demos and improve with RL in sim. Results in over 50% improvement with 0 human effort (6/N)

05.12.2024 02:12 — 👍 0 🔁 0 💬 1 📌 0

Step 4: Transfer over to the real world, either zero-shot or with some co-training. Shows scaling laws as more experience is encountered, and robust performance across distractors, object positions, visual conditions and disturbances. (5/N)

05.12.2024 02:12 — 👍 0 🔁 0 💬 1 📌 0

Step 3: Providing even 10 demos per env is still expensive. By training generalists from RL data, we get cross-environment generalization that allows the model to provide *itself* demos and only use human effort when necessary. The better the generalist gets, the less human effort is required. (4/N)

05.12.2024 02:12 — 👍 0 🔁 0 💬 1 📌 0

Step 2: Train policies on these environments with demo-bootstrapped RL. A couple of demos are needed to guide exploration, but the heavy lifting is done with large scale RL in simulation . This takes success rates from 2-3% to >90% success from <10 human demos. (3/N)

05.12.2024 02:12 — 👍 0 🔁 0 💬 1 📌 0

Step 1: Collect lots of environments with video scans - anyone can do it with their phone. I even had my parents scan in a bunch :) use 3D reconstruction methods like Gaussian splats to make diverse, visually & geometrically realistic sim environments for training policies (2/N)

05.12.2024 02:12 — 👍 0 🔁 0 💬 1 📌 0

I'm excited about scaling up robot learning! We’ve been scaling up data gen with RL in realistic sims generated from crowdsourced videos. Enables data collection far more cheaply than real world teleop. Importantly, data becomes *cheaper* with more environments and transfers to real robots! 🧵 (1/N)

05.12.2024 02:12 — 👍 21 🔁 11 💬 3 📌 0

Abhishek Gupta

Latest posts by abhishekunique7.bsky.social on Bluesky

@abhishekunique7 is following 20 prominent accounts