I was asked some questions about the development of AlphaGo and figured my answers might be of wider interest, so here we go:
How low-level does the code need to be in order to get the maximum performance both in training and, possibly, live game?
To maximize playing strength it is necessary to balance maximum utilization of the accelerators (GPUs or TPUs) and focusing on exploring and evaluating the most promising line of play. Unfortunately there are several conflicting tendencies:
-
The more board positions we can evaluate the better our estimates of how good each move is, and the higher the chance that we'll consider the best move.
-
Evaluating board positions that are too far from the optimal line is at best useless and at worst actively harmful, as we can end up with incorrect value estimates.
-
Ideally we would like to evaluate one position at a time, so that we can use the evaluation results in deciding which position to evaluate next.
-
Accelerators perform best when evaluating a large batch of positions all at once.
This lead us to the use of Monte Carlo Tree Search, see below. To effectively utilize our TPUs we used virtual loss to evaluate the N most promising positions in parallel, see the Lessons From Alpha Zero series for a good discussion of how this works.
In terms of code, the majority of the system was written in C++, with the machine learning part in Python + JAX. Before the availability of JAX we used TensorFlow, and before that Lua with Torch.
Nowadays I would use Rust instead of C++ as well, debugging distributed concurrency issues in C++ can be painful.
How do you represent a board such as the one in Go? Did you try to implement some sort of new type that can contain as many bits as possible in order to use a pseudo-bitboard?
We did not find it necessary to minimize the number of bits used to represent the board, but there are common techniques used in Go engines such as pseudo-liberties to speed up the playing of moves. This was mostly relevant for AlphaGo were we still used traditional rollouts, in AlphaGo Zero and AlphaZero which solely use (comparatively very slow) neural networks the speed of the Go board implementation is much less important.
You can see our open-sourced GoBoard implementation in OpenSpiel. This is an improved and cleaned up implementation of what we used in AlphaGo and AlphaZero.
What were the hardest challenges of the development and epiphany moments in which you understood something of crucial importance? (I hope you stil remember after all this time)
One of the biggest difficulties of working with neural network based systems is that bugs and mistakes are often very hard to find - neural networks have a tendency to produce reasonable predictions even if given corrupted inputs, or if trained on slightly incorrect data. Our most effective strategy for dealing with this was to visualize as much as possible - by staring at the data we discovered many issues that we never suspected!
In terms of epiphanies it was definitely how powerful these networks can be, and how the system became better and better the more hand-crafted components we stripped away, and the more we relied on end-to-end learning.
Did AlphaGo require rethinking the minimax algorithm? Did it use some smarter way of doing pruning than the most common used?
Yes, classical minimax search does not work well with neural networks that provide only approximate value estimates instead of exact bounds. Instead we used Monte Carlo Tree Search, see my MuZero Intuition post for details.
Tags: go, alphazero, programming, ai