[Paper Notes] DayDreamer: World Models for Physical Robot Learning - CoRL 2022

1 minute read

Published: February 06, 2024

Key information

This paper learns two models: a world model trained on off-policy sequences through supervised learning, and an actor-critic model to learn behaviors from trajectories predicted by the learned model.
The data collection and learning updates are decoupled, enabling fast training without waiting for the environment. A learner thread continuously trains the world model and actor-critic behavior, while an actor thread in parallel computes actions for environment interaction.

World model learning

The world model can be thought of as a fast simulator of the environment that the robot learns autonomously, despite that the physical robot runs in real environment.
The world model is based on the Recurrent State-Space Model (RSSM) which consists of encoder, decoder, dynamics and reward networks.
The encoder network fuses all sensory inputs $x_t$ together into the stochastic representations $z_t$. The dynamics model learns to predict the sequence of stochastic representations by using its recurrent state $h_t$. The reward network predicts task rewards by letting the robot interact with the real world. (It appears that the decoder network is not in use in this paper.)
All components of the world model are jointly optimized by stochastic backpropagation

Actor-critic learning

The actor-critic algorithm learns a behavior that is specific to the task at hand, in which the actor-network decides which action to take in a given state by maximizing returns, whereas the critic-network evaluates the action by regressing the returns.

Experiments

Experiments are carried out on four different robots with different tasks.

Unitree A1 Quadruped Walking
UR5 Multi-Object Visual Pick and Place
XArm Visual Pick and Place
- Rainbow algorithm in comparison
Sphero Navigation

My takeaways

The paper gives a method in which RL can be trained in real environments aside from a simulator. A world model is trained and used for quick updates, and the data collection and learning updates are decoupled. These techniques provide insightful ideas for future reinforcement learning architectural design.

Reproduction of this paper

Physical robots (May require MuJoCo or Isaac to reproduce due to lack of hardware)
Games (Easier to reproduce)

Share on

Twitter Facebook LinkedIn

Lixin Xu

[Paper Notes] DayDreamer: World Models for Physical Robot Learning - CoRL 2022

Key information

World model learning

Actor-critic learning

Experiments

My takeaways

Reproduction of this paper

Share on

You May Also Enjoy

[Paper Notes] SAM-RL: Sensing-Aware Model-Based Reinforcement Learning via Differentiable Physics-Based Simulation and Rendering - RSS 2023

Key information