[Paper Notes] DayDreamer: World Models for Physical Robot Learning - CoRL 2022
Published:
Key information
- This paper learns two models: a world model trained on off-policy sequences through supervised learning, and an actor-critic model to learn behaviors from trajectories predicted by the learned model.
- The data collection and learning updates are decoupled, enabling fast training without waiting for the environment. A learner thread continuously trains the world model and actor-critic behavior, while an actor thread in parallel computes actions for environment interaction.