To celebrate the publication of our MuZero paper in Nature (full-text), I've written a high level description of the MuZero algorithm. My focus here is to give you an intuitive understanding and general overview of the algorithm; for the full details please read the paper. Please also see our official DeepMind blog post, it has great animated versions of the figures!
MuZero is a very exciting step forward - it requires no special knowledge of game rules or environment dynamics, instead learning a model of the environment for itself and using this model to plan. Even though it uses such a …
I gave a detailed talk about MuZero at ICAPS 2020, at the workshop "Bridging the Gap Between AI Planning and Reinforcement Learning".
In addition to giving an overview of the algorithm in general, I also went into more detail about reanalyse - the technique that allows MuZero to use the model based search to repeatedly learn more from the same episode data.
I hope you find the talk useful! I've also uploaded my slides for easy reference.
My previous Getting into Machine Learning post is one of my most popular; since then much has changed.
There's a new kid on the block: [cached]JAX. A thin, but powerful layer over Autograd and XLA, it makes it easy to concisely express algorithms with the same syntax as numpy while getting the full performance of TPUs and GPUs.
The resources I recommended in my previous …
From Musk's "Potentially more dangerous than nukes." tweet, increased funding for the Machine Intelligence Research Institute (MIRI) to the founding of cross-industry groups like the Partnership on AI, AI is being taken more seriously.
One worry that is sometimes cited, as in the book Superintelligence by Nick Bostrom, is that once we reach human-level AI, it might rapidly improve itself past anything humans can envision, becoming impossible to control. This is called "Singularity", because anything after such a point is unforseeable.
The argument for a Singularity rests on the fact that a hypothetical AI could devote all its resources to …
I'm excited to finally share some more details on what we've been working on since AlphaZero.
Recently, we made our latest paper - Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model, aka MuZero - available on arXiv:
Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess and Go, where a perfect simulator is available. However, in real-world problems the dynamics governing the environment are often complex and unknown. In this work we present the MuZero …
Update: I've published a newer version of this post.
It's a great documentary and really captures the history of AlphaGo very well - every time I watch it it takes me right back to the excitement of those months! If you are interested in AI, Go, or just like documentaries in general I really recommend you give it a try.
Usually in software, version numbers tend to go up, not down. With AlphaGo Zero, we did the opposite - by taking out handcrafted human knowledge, we ended up with both a simpler and more beautiful algorithm and a stronger Go program.
At the core is a self-improvement loop based on self-play and Monte Carlo Tree Search (MCTS): We start with a randomly initialized network, then use this network in the MCTS to play the first games. The network is …
You might have heard about our [cached]recent games with AlphaGo in China, at the Future of Go summit. No only did we play the legendary Ke Jie, but there were also two new and exciting formats: Team Go and Pair Go.
This match was also very exciting on the technical side because we had improved AlphaGo to the point where we ran it on [cached]a single machine in the Google Cloud - that's [cached]one tenth of the computation power compared to the distributed version we used in the last match!
Personally, I also really enjoyed the Pair Go …