Supervised Policy Learning for Real Robots

Supervised Learning Sequential Decision Making Real Robot

RSS 2024 Tutorial

Friday, July 19 Afternoon (2PM - 6PM Central European time, 8AM - 12AM Eastern Time).
Tutorial recording Slides Reference code Schedule Reading list

Tutorial recording

The tutorial took place at RSS 2024 on Friday, July 19, 2024. The recording of the tutorial is available on YouTube and posted below.

Tutorial preview

Diffusion policy

Robotic Transformer

Dobb-E

Overview


Creating robots that can perform a wide variety of complex tasks, with generalization to unstructured real-world settings, has long been a north star for the field. Recently, advances in machine learning and data collection frameworks have allowed supervised learning-from-demonstration approaches to take significant strides in this direction (e.g. Diffusion policy, RT-1/2/X, or Dobb-E). In this paradigm, policy learning is cast as a supervised learning problem to learn a mapping from raw observations to demonstrated actions.

So what matters for Supervised Policy Learning (SPL)? Unlike image processing or language modeling, SPL has some extra challenges: the environment dynamics are stochastic and hard to model, and the target “action” variable in a human demonstration dataset is continuous, often noisy, high-dimensional, and has multiple modes. Because often these decision making problems are sequential and real world dynamics is chaotic, even a small imprecision in modeling the demonstrated behavior in a single step can lead to cascading failures ten steps down. Thus, modern policies for robot learning try to achieve the following objectives: to model long-and short-term correlations in actions, to capture and generate from diverse modes of behavior, and to improvise in unseen situations while remaining precise in seen ones. Crucially, these properties need to be achieved with a relatively small amount of embodied data, and so design choices become important for successful policies.

Topics covered


In this tutorial, we provide a brief introduction to supervised behavior policy learning, including an overview of current state-of-the-art methods for learning real-world robot policies. In particular, we will focus on how to implement or adapt the state-of-the-art in behavior cloning algorithm for a new robotic task. The intended audience of this tutorial is robotics researchers, both in academia and industry, who are interested in applying supervised learning at scale to their robot learning problems.

This tutorial will teach attendees how to apply SPL algorithms to their own datasets and robots, starting from simple multi-layer perceptron or nearest neighbor baselines, all the way to transformers or diffusion based policies. We will also discuss selection of appropriate algorithms given problem complexity, dataset size, diversity, and compute requirements. Finally, the attendees will hear from current practitioners about problems that often arise in real-world deployment of such policies, and will learn how to start debugging their own behavior cloning based systems when faced with similar problems.

The list of topics covered in this tutorial includes:
  1. SPL algorithms and their applications in real-world robot learning.
  2. What problems are currently solvable by SPL and what classes of problems have fundamental bottlenecks?
  3. Considerations on robot dataset collection and nuances in selecting appropriate algorithms for the task.
  4. Details that matter when training policies from real world data and real-world robot deployment.

Event Schedule

Friday, July 19, 2024 (Central European time)

2:00 PM to 2:50 PM

Introduction

Russ Tedrake

2:00 PM to 2:05 PM

Supervised policy learning: background and history

2:05 PM to 2:10 PM

Recent applications from across the field

2:10 PM to 2:25 PM

Why (or why not) supervised policy learning?

2:25 PM to 2:30 PM

Core ideas and challenges

2:30 PM to 2:40 PM

High level taxonomy of architectures and algorithms

2:40 PM to 2:50 PM

What we are yet to understand well

2:50 PM to 3:35 PM

Hands-on Supervised Policy Learning (Part I)

Mahi Shafiullah

2:50 PM to 2:55 PM

Setting up an environment

2:55 PM to 3:05 PM

Dissecting a robot demonstration dataset: open-loop replays

3:05 PM to 3:15 PM

Setting up a base policy: BC-MLP

3:15 PM to 3:25 PM

Going non-parametric: Nearest Neighbors and VINN

3:25 PM to 3:35 PM

Failure cases: noise, multi-modality

3:35 PM to 4:00 PM

Coffee break

4:00 PM to 5:00 PM

Hands-on Supervised Policy Learning (Part II)

Mahi Shafiullah & Russ Tedrake

4:00 PM to 4:25 PM

Multi-modality through tokenization: Behavior transformer

4:25 PM to 4:30 PM

Q&A: Behavior transformer

4:30 PM to 4:55 PM

Multi-modality through denoising: Diffusion policy

4:55 PM to 5:00 PM

Q&A: Diffusion Policy

5:00 PM to 5:30 PM

What matters in supervised policy learning: words from the practitioners

Lerrel Pinto

5:00 PM to 5:20 PM

Interview excerpts from practitioners

5:20 PM to 5:25 PM

Q&A from the audience

5:25 PM to 5:30 PM

Conclusions

Materials

See above on the schedule for paired links with schedule. For the full set of recording and slides follow the links below.

Full set of slides Video recording

Reading list

While we do not require any prior reading for the tutorial, the following papers may be of interest to attendees, and may help elucidate some of the topics we will cover:

Organizers

Nur Muhammad "Mahi" Shafiullah

New York University

Siyuan Feng

Toyota Research Institute

Lerrel Pinto

New York University

Russ Tedrake

Massachusetts Institute of Technology, Toyota Research Institute

Citation

This RSS 2024 version of the tutorial maybe cited as:

N. Shafiullah, S. Feng, L. Pinto & R. Tedrake. (2024, July). Supervised Policy Learning for Real Robots. Tutorial presented at the Robotics: Science and Systems (RSS), Delft. https://supervised-robot-learning.github.io/.

            @misc{shafiullah2024supervised,
                author = {Shafiullah, Nur Muhammad Mahi. and Feng, Siyuan. and Pinto, Lerrel. and Tedrake, Russ.},
                title = {Supervised Policy Learning for Real Robots},
                year = {2024},
                month = {July},
                note = {Tutorial presented at the Robotics: Science and Systems (RSS), Delft},
                url = {https://supervised-robot-learning.github.io}
            }
        

Thanks

Special thanks goes to Ben Burchfiel for help and advice, Dale McConachie, Rick Cory, Hongkai Dai, Ian Mcmahon, Grant Gould, Maya Angeles, Matthew Ferreira, Owen Pfannenstiehl, Matthew Tran, and Allison Henry for their contributions to developing some of the tutorial components, the LeRobot team, including Alexander Soare and Remi Cadene, for the base repository of our code, and Yochan Lab for the website template.

References

  1. Chi, C., Xu, Z., Feng, S., Cousineau, E., Du, Y., Burchfiel, B., Tedrake, R., & Song, S. (2024). Diffusion Policy: Visuomotor Policy Learning via Action Diffusion. The International Journal of Robotics Research.
  2. Chi, C., Xu, Z., Pan, C., Cousineau, E., Burchfiel, B., Feng, S., Tedrake, R., & Song, S. (2024). Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots. Proceedings of Robotics: Science and Systems (RSS).
  3. Lee, S., Wang, Y., Etukuru, H., Kim, H. J., Shafiullah, N. M. M., & Pinto, L. (2024). Behavior generation with latent actions. ArXiv Preprint ArXiv:2403.03181.
  4. Pari*, J., Shafiullah*, N. M., Arunachalam, S. P., & Pinto, L. (2022). The Surprising Effectiveness of Representation Learning for Visual Imitation. Proceedings of Robotics: Science and Systems, XVIII, 10–15607.
  5. Shafiullah, N. M. M., Cui, Z. J., Altanzaya, A., & Pinto, L. (2022). Behavior Transformers: Cloning k modes with one stone. Thirty-Sixth Conference on Neural Information Processing Systems. https://openreview.net/forum?id=agTr-vRQsa
  6. Shafiullah, N. M. M., Rai, A., Etukuru, H., Liu, Y., Misra, I., Chintala, S., & Pinto, L. (2023). On bringing robots home. ArXiv Preprint ArXiv:2311.16098.
  7. Zhao, T. Z., Kumar, V., Levine, S., & Finn, C. (2023). Learning fine-grained bimanual manipulation with low-cost hardware. ArXiv Preprint ArXiv:2304.13705.