We believe that leveraging EC-Diffuserβs state generation capability for planning is a promising avenue for future work.
This is a joint work with an amazing team: Carl Qi, Dan Haramati, Tal Daniel, Aviv Tamar, and Amy Zhang.
@taldaniel.bsky.social
Postdoc @ CMU Robotics Institute PhD from the Technion ECE Reseach interests include Unsupervised Representation Learning, Generative Modeling, RL and Robotics. https://taldatech.github.io
We believe that leveraging EC-Diffuserβs state generation capability for planning is a promising avenue for future work.
This is a joint work with an amazing team: Carl Qi, Dan Haramati, Tal Daniel, Aviv Tamar, and Amy Zhang.
We also trained EC-Diffuser on the real world Language-Table dataset and showed it can also produce high-quality real world rollouts. This demonstrates that the model implicitly matches objects and enforces object consistency over time, aiding in predicting multi-object dynamics.
19.02.2025 16:16 β π 0 π 0 π¬ 1 π 0The result? EC-Diffuser outperforms baselines and achieves zero-shot generalization to novel object configurationsβeven scaling to more objects than seen during training. See more of the rollouts: sites.google.com/view/ec-diff...
19.02.2025 16:15 β π 0 π 0 π¬ 1 π 0It also enables the Transformer to denoise unordered object-centric particles and actions jointly, capturing multi-modal behavior distributions and complex inter-object dynamics.
19.02.2025 16:14 β π 0 π 0 π¬ 1 π 0Why diffusion? Since noise is added independently to each particle, a simple L1 loss is effective for particle-wise denoisingβeliminating the need for complex set-based metrics.
19.02.2025 16:14 β π 0 π 0 π¬ 1 π 0Our model takes in a sequence of unordered state particles (from multi-view images) and actions. Conditioned on the current state and goal, it generates a denoised sequence of future states and actions that can be used for MPC-style controlβby executing the first action.
19.02.2025 16:13 β π 0 π 0 π¬ 1 π 0We encode actions as a separate particle. This design allows our Transformer to treat actions and state particles in the same embedding space. We further condition the Transformer with the diffusion timestep and the action tokens via Adaptive layer normalization (AdaLN).
19.02.2025 16:13 β π 0 π 0 π¬ 1 π 0Our entity-centric Transformer is designed to process these unordered particle inputs with a permutation-equivariant architecture, computing self-attention over object-level features without positional embeddings.
19.02.2025 16:13 β π 0 π 0 π¬ 1 π 0We begin by converting high-dimensional pixels into unsupervised object-centric representations using Deep Latent Particles (DLP). Each image is decomposed into an unordered set of latent βparticlesβ from multiple views, capturing key object properties.
19.02.2025 16:12 β π 0 π 0 π¬ 1 π 0This work was led by Carl Qi!
Object manipulation from pixels is challenging: high-dimensional, unstructured data creates a combinatorial explosion in states & goals, making multi-object control hard. Traditional BC methods need massive data/compute and still miss the diverse behaviors required.
Check out our new #ICLR2025 paper: EC-Diffuser leverages a novel Transformer-based diffusion denoiser to learn goal-conditioned multi-object manipulation policy from pixels!π
Paper: www.arxiv.org/abs/2412.18907
Project page: sites.google.com/view/ec-diff...
Code: github.com/carl-qi/EC-D...
If interested on our take on addressing inverse RL in large state spaces, go to meet @filippo_lazzati and @alberto_metelli in the poster session 5 #NeurIPS2024 today (paper -> arxiv.org/abs/2406.03812)
13.12.2024 14:33 β π 5 π 2 π¬ 1 π 0Want to learn / teach RL? β¨
Check out new book draft:
Reinforcement Learning - Foundationsβ¨sites.google.com/view/rlfound...
W/ Shie Mannor & Yishay Mansour
This is a rigorous first course in RL, based on our teaching at TAU CS and Technion ECE.