Ankur Handa's Avatar

Ankur Handa

@ankurhandos.bsky.social

Training robots in simulation.

1,140 Followers  |  78 Following  |  28 Posts  |  Joined: 17.11.2024  |  1.9699

Latest posts by ankurhandos.bsky.social on Bluesky

This work was mainly a collaboration for GTC so it all came together quickly in 2 months and we didn't want to change much :)

We are working on improving the system and will release a tech report in a few months.

29.04.2025 03:09 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

*us being lazy not "using" :)

29.04.2025 01:33 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

The robot is rewarded to lift the object beyond a certain height to ensure that the grasp is stable. So it lifts it first and takes it to a certain height and then does the dropping. This was using being lazy for not changing the reward - vestigial stuff. The lift reward here.

29.04.2025 01:33 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

Stereo camera images that the networks use as input. They go directly into the network without any pre-processing and out comes action that is sent to the robot as target.

29.04.2025 00:54 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Arm you glad to see me, Atlas? | Boston Dynamics
YouTube video by Boston Dynamics Arm you glad to see me, Atlas? | Boston Dynamics

Collaboration with Boston Dynamics on extending our DextrAH-RGB work on their robot.

The head movement is deliberate and learned thru training.

Each arm has its own independent controller. Which one to invoke when is determined by its proximity to the object.

www.youtube.com/watch?v=dFOb...

29.04.2025 00:52 β€” πŸ‘ 11    πŸ” 2    πŸ’¬ 1    πŸ“Œ 2
Post image

Nice substack post that has a funny legend about Brunelleschi and his challenge to make an egg stand on its end. Whoever can make it stand gets to build the dome of santa maria del fiore.

www.james-lucas.com/p/it-always-...

07.04.2025 02:32 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

*cupola

07.04.2025 02:31 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I have been using cursor and it's great. It's a vscode fork so I like it as I have been using vscode for years now.

13.02.2025 04:37 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Just remembered this session "Discussion for Direct versus Features Session" link.springer.com/chapter/10.1... that I cited in my phd thesis but can't download now as it is behind paywall. But I remember it being very interesting and fun to read.

12.02.2025 05:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

The message here is you should try to stay as close to raw pixels as possible - it just works out much better in the long run.

I love this hacker news comment that I saw on twitter few years ago.

11.02.2025 03:57 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Michal Irani, Michael Black, Padmanabham Anandan, Rick Szeliski, Harpreet Sawhney were all looking at recovering camera pose transformation directly from image pixels. And the tracking in ARIA glasses is uses "direct".

11.02.2025 03:55 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We have been using "direct" image to action mapping for a while now and the word "direct" took me back to tracking in SLAM. Back in the 90s many people were looking at recovering camera pose directly from the images rather than doing feature tracking first and then recovering homography afterwards.

11.02.2025 03:52 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Great work by my colleagues Ritvik Singh, Karl Van Wyk, Arthur Allshire and Nathan Ratliff.
.

10.02.2025 05:04 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

This is the next one in the line in our dex-series of work where we started off with pose estimation as the representation of the object and gradually moved towards more general end-to-end image based direct pixels to action mapping.

10.02.2025 05:03 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

When doing distillation, we also regress to the location of the object, which serves as a diagnostic tool for us to see what the network is predicting and use it for any state machine on top.

10.02.2025 05:03 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Our (stereo) vision network takes inspiration from the dust3r/mast3r work (with no explicit epipolar geometry imposed) where image embeddings are passed to a transformer with cross attention.

10.02.2025 05:02 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

This approach differs from the common two-stage pipeline, where grasping or pick location is first regressed and then followed by motion planning. Instead, it integrates both stages into a single process and trains the entire system end-to-end using RL.

10.02.2025 05:02 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Installation β€” Scene Synthesizer 1.12.6 documentation

Scene Synthesizer: scene-synthesizer.github.io/getting_star...

ControlNets: github.com/lllyasviel/C...

10.02.2025 05:02 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

The benefits of training with parallel tiled rendering in simulation are still underappreciated. With modern tools like Scene Synthesizer and ControlNets, which transform synthetic images into photorealistic ones, the value of simulation-based training will only continue to grow.

10.02.2025 05:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Depth sensors are noisy and haven’t seen major improvements in a while and pure vision-based systems have caught up and most frontier models today use raw rgb pixels. We always wanted to move towards direct rgb based control and this work is our first attempt at doing so.

10.02.2025 05:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We train a teacher via RL with state vectors and student via distillation on images for learning control for 23 DoF multi-fingered hand and arm system. Doing end-to-end with real data as in BC-like systems is hard already but doing end-to-end with simulation is even harder due to the sim-to-real gap

10.02.2025 05:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

Our new work has made a big leap moving away from depth based end-to-end to raw rgb pixels based end-to-end. We have two versions: mono and stereo, all trained entirely in simulation (IsaacLab).

10.02.2025 04:59 β€” πŸ‘ 21    πŸ” 2    πŸ’¬ 1    πŸ“Œ 1
Preview
Artificial Intelligence Cartoons Please feel free to use these cartoons in your presentations, university courses, books etc, for free, as long as you drop me a line to let me know what you have planned (telliott at timoelliott do…

Love the AI cartoons here:

timoelliott.com/blog/cartoon...

29.12.2024 19:06 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Can you share the other ones that you liked?

20.11.2024 19:25 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Why were you not convinced before? Is it because it uses images and did a lot more scaling than the previous work?

20.11.2024 18:42 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Is this the PoliFormer work?

20.11.2024 18:31 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

My workshop talk at CoRL on our work on dexterity that we've been doing for the past 5 years is here docs.google.com/presentation...

We have only started to scratch the surface of what we can do with simulations & I hope we can leverage ideas of self-play and alphazero going forward for robotics.

19.11.2024 17:47 β€” πŸ‘ 19    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Yeah, I've almost stopped using google search :)

18.11.2024 16:46 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@ankurhandos is following 19 prominent accounts