Kaustubh Sridhar's Avatar

Kaustubh Sridhar

@kaustubhsridhar.bsky.social

Building generalist agents. Final-year PhD student @UPenn, Prev: @Amazon @IITBombay http://kaustubhsridhar.github.io/

949 Followers  |  37 Following  |  37 Posts  |  Joined: 14.11.2024  |  2.0866

Latest posts by kaustubhsridhar.bsky.social on Bluesky

REGENT will be presented as an Oral at ICLR 2025 in Singapore πŸ‡ΈπŸ‡¬, given to the top 1.8% of 11672 submissions! More details at our website: bit.ly/regent-research

22.02.2025 18:55 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Is scaling current agent architectures the most effective way to build generalist agents that can rapidly adapt?

Introducing πŸ‘‘REGENTπŸ‘‘, a generalist agent that can generalize to unseen robotics tasks and games via retrieval-augmentation and in-context learning.

14.12.2024 21:49 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 1    πŸ“Œ 1

Bsky doesn’t want you to see the awesome gifs! Find them on our website: bit.ly/regent-research

14.12.2024 22:26 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

This whole project would not have been possible without Souradeep Dutta, Dinesh Jayaraman, and Insup Lee.

We have many more results, ablations, code, dataset, model, and the paper at our website: bit.ly/regent-research

The arxiv link: arxiv.org/abs/2412.04759

14.12.2024 21:49 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

REGENT is far from perfect.

It cannot generalize to new embodiments (unseen mujoco envs) or long-horizon envs (like spaceinvaders & stargunner). It cannot generalize to completely new suites (i.e. requires similarities between pre-training and unseen envs).

Few failed rollouts:

14.12.2024 21:49 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Here is a qualitative visualization of deploying REGENT in the unseen atari-pong environment.

14.12.2024 21:49 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

While REGENT’s design choices are aimed at generalization, its gains are not limited to unseen environments: it even performs better than current generalist agents when deployed within the pre-training environments.

14.12.2024 21:49 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

In the four unseen ProcGen environments, REGENT also outperforms the only other generalist agent, MTT, that can generalize to unseen environments via in-context learning. REGENT does so with an OOM less pretraining data and 1/3rd the number of params.

14.12.2024 21:49 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

REGENT also outperforms the β€˜All Data’ variants of JAT/Gato which were pre-trained on 5-10x the amount of data.

For context, the Multi-Game DT uses 1M states to finetune to new atari envs. REGENT generalizes via RAG from ~10k states. REGENT Finetuned further improves over REGENT

14.12.2024 21:49 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

In the unseen metaworld & atari envs in the Gato setting, REGENT and R&P outperform SOTA generalist agents like JAT/Gato (the open source reproduction of Gato). REGENT outperforms JAT/Gato even after JAT/Gato is finetuned on data from the unseen envs.

14.12.2024 21:49 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We also evaluate on unseen levels and unseen environments in the ProcGen setting.

14.12.2024 21:49 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We evaluate REGENT on unseen robotics and game environments in the Gato setting.

14.12.2024 21:49 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

REGENT has a few key ingredients, including an interpolation between R&P and the transformer. This allows the transformer to more readily generalize to unseen envs, since it is given the easier task of predicting the residual to the R&P action rather than the complete action.

14.12.2024 21:49 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

R&P simply picks the nearest retrieved state sβ€² to the query state st, and plays the corresponding action a’.

REGENT retrieves the 19 closest states, throws the corresponding (s, r, a) tuples in the context with query (st, rt-1), and acts via in-context learning in unseen envs.

14.12.2024 21:49 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Inspired by RAG and the success of a simple retrieval-based 1-nearest neighbor baseline that we call Retrieve-and-Play (R&P),

REGENT pretrains a transformer policy whose inputs are not just the query state st and previous reward rt-1, but also retrieved tuples of (state, previous reward, action).

14.12.2024 21:49 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

REGENT is pretrained on data from many training envs (left). REGENT is then deployed on the held-out envs (right) with a few demos from which it can retrieve states, rewards, and actions to use for in-context learning. **It never finetunes on the demos in the held-out envs.**

14.12.2024 21:49 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Is scaling current agent architectures the most effective way to build generalist agents that can rapidly adapt?

Introducing πŸ‘‘REGENTπŸ‘‘, a generalist agent that can generalize to unseen robotics tasks and games via retrieval-augmentation and in-context learning.

14.12.2024 21:49 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 1    πŸ“Œ 1
REGENT: A Retrieval-Augmented Generalist Agent That Can Act In-Context In New Environments. REGENT: A Retrieval-Augmented Generalist Agent That Can Act In-Context In New Environments.

Bluesky doesn't want you to see these gifs! :) Please see the rollouts in unseen environments in our website: bit.ly/regent-research

14.12.2024 19:43 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

We are also presenting REGENT in the Adaptive Foundation Models (today afternoon, Saturday Dec 14) and Open World Agents (tomorrow afternoon, Sunday Dec 15) workshops in NeurIPS. Please come by if you’d like to hear more!

14.12.2024 19:39 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
REGENT: A Retrieval-Augmented Generalist Agent That Can Act In-Context In New Environments. REGENT: A Retrieval-Augmented Generalist Agent That Can Act In-Context In New Environments.

This whole project would not have been possible without Souradeep Dutta, Dinesh Jayaraman, and Insup Lee.

We have many more results, ablations, code, dataset, model, and the paper at our website: bit.ly/regent-research

The arxiv link: arxiv.org/abs/2412.04759

14.12.2024 19:39 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

REGENT is far from perfect.

It cannot generalize to new embodiments (unseen mujoco envs) or long-horizon envs (like spaceinvaders & stargunner). It cannot generalize to completely new suites (i.e. requires similarities between pre-training and unseen envs).

Few failed rollouts:

14.12.2024 19:39 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Gemini 2.0 Flash multi-modal streaming demo
YouTube video by Simon Willison Gemini 2.0 Flash multi-modal streaming demo

Cool demo of Gemini 2.0 Flash's new streaming API, by @simonwillison.net.
www.youtube.com/watch?v=mpgW...

12.12.2024 21:29 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

Vancouver is so beautiful!

10.12.2024 19:28 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

What would deep thought cost for the ultimate question? bsky.app/profile/nato...

05.12.2024 17:27 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

What's missing to get to deep thought? :D

05.12.2024 17:25 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

In the hitchhikers guide to the galaxy, when they built a huge computer (Deep Thought) to answer the ultimate question (of Life, the Universe and Everything), and it took 7.5 million years, it seems like they clearly did both train-time and test-time scaling.

05.12.2024 17:24 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Can no longer tell if LLMs are sounding like humans or some humans have always sounded like LLMs

04.12.2024 01:45 β€” πŸ‘ 31    πŸ” 1    πŸ’¬ 2    πŸ“Œ 0
Video thumbnail

I'd like to introduce what I've been working at @hellorobot.bsky.social: Stretch AI, a set of open-source tools for language-guided autonomy, exploration, navigation, and learning from demonstration.

Check it out: github.com/hello-robot/...

Thread ->

03.12.2024 16:51 β€” πŸ‘ 132    πŸ” 23    πŸ’¬ 6    πŸ“Œ 4

I'm still waiting for the "react/respond to the author rebuttal" from a couple of reviewers :_(

30.11.2024 17:15 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Oh damn haha. Thank you for the info

29.11.2024 00:38 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@kaustubhsridhar is following 20 prominent accounts