Tag: muzero

Planning in Stochastic Environments with a Learned Model

After extending to arbitrary action spaces and offline RL, we recently published our next step in generalizing MuZero: [cached]Stochastic MuZero, which learns and plans with a stochastic model of the environment that can model not only a single deterministic outcome of our actions, but rather a full set of possibilities.

Previous approaches such as AlphaZero and MuZero have achieved very good results in fully-observed deterministic environments, but this type of environment is insufficiently general to model many problems and real world tasks. Beyond the stochasticity inherent in many domains (think roll of the dice or draw of the cards …

MuZero does YouTube

If you've been watching YouTube lately, the encoding settings for the video you watched might have been selected by MuZero - using less of your bandwidth for the same quality. The task here is rate control: selecting the quantization parameters within the VP9 codec to maximize the quality at a specified bitrate.

This is a constrained RL problem, requiring us to optimize two conflicting objectives of variable difficulty at the same time. To deal with this challenge we introduce a self-competition based reward mechanism, where the reward depends on how successful other recent episodes were at maximizing the quality while staying …

Mastering Atari Games with Limited Data

Another interesting paper based on MuZero was published at NeurIPS 2021: Mastering Atari Games with Limited Data, aka EfficientZero. This paper by Weirui Ye, Shaohuai Liu, Thanard Kurutach, Pieter Abbeel and Yang Gao focuses on the application of MuZero to very low data tasks, such as Atari 100k (only two hours of gameplay!) or DMControl 100k.

To tackle these tasks, the author propose three main techniques:

First they introduce a Self-Supervised Consistency Loss, to ensure that the embeddings produced by MuZero's dynamics function are consistent with the embeddings from the representation function. This loss is insipired by [cached]SimSiam-style …

Online and Offline Reinforcement Learning by Planning with a Learned Model

After extending to arbitrary action spaces, our next step in generalizing MuZero was to work on data efficiency, and to extend it all the way to the offline RL case. We've now published this work as MuZero Unplugged at NeurIPS, below I will give a brief summary of the main ideas.

Environment interactions are often expensive or have safety considerations, while existing datasets frequently already demonstrate good behaviour. We want to learn efficiently from any such data source, without being restricted by off-policy issues or limited to the performance of the policy that generated the data: as always, we want …

Learning and Planning in Complex Action Spaces

We've been working on making MuZero as easy to use as possible, allowing it to be applied to any RL problem of interest. First and foremost, this required us to make some algorithmic extensions.

The original MuZero algorithm achieved good results in discrete action environments with a small action space - classical board games and Atari, with up to a few thousand different actions. However, many interesting problems have action spaces that cannot be easily enumerated, or are not discrete at all - just think of controlling a robot arm with many degrees of freedom, smoothly moving each joint. So far, MuZero …

MuZero Intuition

To celebrate the publication of our MuZero paper in [cached]Nature ([cached]full-text), I've written a high level description of the MuZero algorithm. My focus here is to give you an intuitive understanding and general overview of the algorithm; for the full details please read the paper. Please also see our [cached]official DeepMind blog post, it has great animated versions of the figures!

MuZero is a very exciting step forward - it requires no special knowledge of game rules or environment dynamics, instead learning a model of the environment for itself and using this model to plan. Even though it …

MuZero talk - ICAPS 2020

I gave a detailed talk about MuZero at ICAPS 2020, at the workshop "Bridging the Gap Between AI Planning and Reinforcement Learning".

In addition to giving an overview of the algorithm in general, I also went into more detail about reanalyse - the technique that allows MuZero to use the model based search to repeatedly learn more from the same episode data.

I hope you find the talk useful! I've also uploaded my slides for easy reference.