Monte Carlo Tree Search (MCTS) in AlphaGo Zero

MCTS Basic

In this example, the current board position is s₃.

  • the policy p (p is a probability distribution scoring each action), and
  • the value function v (how likely to win at a board position).
  • exploitation: perform more searches that look promising (i.e. high Q value).
  • exploration: perform searches that we know less (i.e. low visit count N).


Before going into the details, here are the 4 major steps in MCTS. As shown, we will start with the board position s.

  • Step (a) selects a path (a sequence of moves) that it wants to search further. Starting from the root node, it searches for the next action that has the highest value in combing the Q-value (Q =W/N) and an exploration bonus adversely related to the visit count N. The search continues until reaching an action that has no node attached.
  • Step (b) expands the search tree by adding the state (node) associated with the last action in the path. We add new edges for actions associated with the new node and record p, computed by f, for each edge.
  • Step (c) updates the current path backward on W and N.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store