Sarvesh Patil's Avatar

Sarvesh Patil

@nagababa.bsky.social

Your friendly neighborhood roboticist! PhD student @cmurobotics.bsky.social Interested in Dexterous Manipulation, Democratization of Robots and Sensors, Sample Efficient RL, Soft Robotics, Causality, Multi-Agent Systems. servo97.github.io

237 Followers  |  290 Following  |  12 Posts  |  Joined: 23.11.2024  |  1.7255

Latest posts by nagababa.bsky.social on Bluesky

If you are not familiar of the robot-learning/RL papers, and if you have the bandwidth to go through to them, I'm more than happy to share my source papers! Please let me know either ways!

28.06.2025 14:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

And that some intervention is still needed for robot learning pipelines to demonstrate respectable ICL?

The 8th thread graph makes me very curious, and I'd love to hear your thoughts on this phenomenon!

28.06.2025 14:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

i.e. converting robot joints into discrete cosine transforms seems to significantly improve sample efficiency and generalizability.

Why do we see this happen? Is that merely a ephemeral local minima that were stuck into?

28.06.2025 14:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

However when SOTA papers use transformers for learning policies and Q functions in RL, the observation seems to be that "Distributional RL works better for offline RL tasks", or Frequency action space tokenization.

28.06.2025 14:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Hi Daniel, great post!
I'm curious what do you think about continuous control use-cases? From my very quick read over the post (not the paper itself) it seems that ICL emerges as a property of the model being able to handle diversity of continuous variable regression.

1/n

28.06.2025 14:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 4    πŸ“Œ 0
Post image

It was a dream come true to teach the course I wish existed at the start of my PhD. We built up the algorithmic foundations of modern-day RL, imitation learning, and RLHF, going deeper than the usual "grab bag of tricks". All 25 lectures + 150 pages of notes are now public!

20.06.2025 03:53 β€” πŸ‘ 47    πŸ” 10    πŸ’¬ 3    πŸ“Œ 2
Scholar Inbox Scholar Inbox is a personal paper recommender which enables researchers to stay up-to-date with the most relevant progress in their field based on their personal research interests. Scholar Inbox is f...

I like scholar-inbox
www.scholar-inbox.com

28.04.2025 00:07 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
A First Introduction to Cooperative Multi-Agent Reinforcement Learning Multi-agent reinforcement learning (MARL) has exploded in popularity in recent years. While numerous approaches have been developed, they can be broadly categorized into three main types: centralized ...

I have a draft of my introduction to cooperative multi-agent reinforcement learning on arxiv. Check it out and let me know any feedback you have. The plan is to polish and extend the material into a more comprehensive text with Frans Oliehoek.

arxiv.org/abs/2405.06161

07.01.2025 16:25 β€” πŸ‘ 78    πŸ” 19    πŸ’¬ 3    πŸ“Œ 3

Store it on an old school magnetic HDD!
You only gotta power it on every decade or so to maintain the hardware.

05.01.2025 12:49 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

RL as a refinement tool has been used in dexterous manipulation for some time!
It used to be quite hard to do tabula rasa learning for dexterous manipulation. And still is, for the most part!

28.12.2024 19:51 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
We Didn't Win A Nobel (Billy Joel Parody)
YouTube video by MUSICODE We Didn't Win A Nobel (Billy Joel Parody)

The are lots of people who've influenced AI but haven't won Nobel prizes.
I discuss a tiny sliver of them in this parody of @billyjoelofficial.bsky.social 's "We didn't start the fire"...
Enjoy!

youtube.com/shorts/qDSYA...

23.12.2024 13:43 β€” πŸ‘ 24    πŸ” 7    πŸ’¬ 0    πŸ“Œ 1

Hard agree. Although how to reconcile that with when you're writing a paper?

Like what do you think is a reasonable process to pick "baselines"?

I've seen some students get jaded cos reviewers ask them to incl "baselines" with shitty git implementations :(

20.12.2024 13:40 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

If you are into ML theory (RL or not) with a proven track record, and you are interested in an industry research position, PM me. Feel free to spread the word.

19.12.2024 00:55 β€” πŸ‘ 74    πŸ” 31    πŸ’¬ 3    πŸ“Œ 0

Hi! Can you please add me to the list?
Thanks for making it!

16.12.2024 14:14 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

A common question nowadays: Which is better, diffusion or flow matching? πŸ€”

Our answer: They’re two sides of the same coin. We wrote a blog post to show how diffusion models and Gaussian flow matching are equivalent. That’s great: It means you can use them interchangeably.

02.12.2024 18:45 β€” πŸ‘ 255    πŸ” 58    πŸ’¬ 7    πŸ“Œ 7

πŸ™‹β€β™‚οΈπŸ‘‹

30.11.2024 14:59 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

Anne Gagneux, Ségolène Martin, @quentinbertrand.bsky.social Remi Emonet and I wrote a tutorial blog post on flow matching: dl.heeere.com/conditional-... with lots of illustrations and intuition!

We got this idea after their cool work on improving Plug and Play with FM: arxiv.org/abs/2410.02423

27.11.2024 09:00 β€” πŸ‘ 356    πŸ” 102    πŸ’¬ 12    πŸ“Œ 11
Video thumbnail

Intro Post
Hello World!
I'm a 2nd year Robotics PhD student at CMU, working on distributed dexterous manipulation, accessible soft robots and sensors, sample efficient robot learning, and causal inference.

Here are my cute robots:
PS: Videos are old and sped up. They move slower in real-world :3

23.11.2024 18:49 β€” πŸ‘ 15    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

@nagababa is following 20 prominent accounts