Mastering the Game of Go with Deep Neural Networks and Tree Search

Today we published our [cached]paper on beating the human state of the art in Go, the only major board game where humans (or at least top professionals) could still beat computers. No more. Our program AlphaGo achieved a 99% winning rate against the strongest existing Go programs, and defeated the human European champion by 5 games to 0.

(That's me playing at 0:10)

The first major breakthrough in computer Go - after remaining at weak amateur level for decades - came with the advent of [cached]Monte Carlo Tree Search (MCTS) around 2007, massively improving playing strength. Still, Go programs remained much weaker than strong human players and started to plateau again fairly soon - Crazy Stone's feat of defeating a professional player with 4 (!) handicap stones in 2013 remained unsurpassed since 2013.

There are many reasons Go is difficult for computers - a huge state space (10^171 vs 10^47 in chess), many available moves (on average 250 vs 35 in chess) and many more moves per game (150 vs 80) - rendering straightforward search all but infeasible. As we have seen, MCTS helps somewhat (focusing on promising parts of the search tree due to it's sampling mechanism), but not nearly enough.

So how did we do it?

For the full details obviously I recommend just reading the [cached]paper, but here's the gist of it.

We still use MCTS for search, like previous programs, but we made heavy use of convolution neural networks in two critical areas.
A policy network trained to predict the next move for a given board state. This is used as a prior to guide the tree search, focusing the search on more promising nodes.
A value network trained to evaluate a board state into a winning percentage. This is complementary to the normal random playouts used in MCTS and similar to the evaluation function used by alpha-beta search in chess.

These neural networks are responsible for the huge increase in playing strength (hence the title) and training them is also the main difficulty.

The Games

To validate our findings, we played the European Go Champion in an even match consisting of 5 games, of which our Go program AlphaGo won all 5. Below you can find the SGF records of all 5 games:

Monday