How to build an AI that wins: the basics of minimax search

In this post, we're going to look at how to build a smart agent to play fully observable turn based games. By "fully observable," I mean that each player can see the state of the entire game (chess, tic tac toe, ...). This excludes games like poker or go fish. There are ways to build agents for these games, but they are beyond the scope of this post. We'll go through some theory and then build a tic-tac-toe AI in Javascript. Here's a demo of what we'll build:

Let's start by formalizing the states of a game. Each $State$ is defined by its $Player$, $Board$ and $NextStates$. $Board$ is a description of the current state of the game (where the pieces are for example). $NextStates$ is a list of states can be reached by one move from $Player$. We'll also loosely define $Benefit$ as "how good a certain move is for the agent." A more formal definition is presented later.

We can look at this system of $States$ as a tree where each node is a $Board$ that points to all of its $NextStates$. A partial example with tic-tac-toe is shown below

In the complete tree, every node has at least one child unless the game is finished at that node (win, lose, or tie).

We will now formally define $Benefit$ for leaf nodes (nodes without children):

$Benefit = \begin{cases}1 & \text{if AI wins}\\ 0 & \text{if tie} \\ -1 & \text{if AI loses} \end{cases}$

The goal of the AI is to win the game. This means that it wants to get to a leaf node with the maximum possible $Benefit$ (one in this example). The goal of the opponent is to get to a leaf node with a minimum possible $Benefit$ (negative one in this example). Therefore we define $Benefit$ for $State$ $s$ (which is a non-leaf node) as follows:

$Benefit = \begin{cases}max_{t \in s.NextStates}(t.Benefit) & \text{if s.Player is AI}\\ min_{t \in s.NextStates}(t.Benefit) & \text{if s.Player is not AI}\end{cases}$

This assumes that the AI is playing against an optimal player. If it is playing against an optimal player (or a suboptimal player), we can show that the AI will win or tie*. Note that this definition of $Benefit$ allows the AI to focus on the end goal instead of just the current move. This lets the AI do interesting things like make sacrifices if it will ultimately lead to winning. Sounds great right? Not completely...

Exponential Growth

In tic-tac-toe, the number of nodes in the tree isn't that large; it's around 1 million. This code to iterate through all of them takes around 3 seconds:

def tictactoe(board):
    for i in range(3):
        for j in range(3):
            if board[i][j] is None:
                board[i][j] = 1
                tictactoe(board)
                board[i][j] = None

board = [
    [None, None, None],
    [None, None, None],
    [None, None, None]
]

tictactoe(board)

Howerver, when we get to a game like chess, where each state points to many different states and games can go on for hundreds of moves, it isn't feasible to look at all possible paths to a leaf node.

This means that we can't calculate $Benefit$ the same way that we were doing above. In order to fix this, we'll set a max number of moves that we'll look into the future. We'll treat the nodes at this limit as "psuedo-leaf nodes." Unfortunately, we can't use the $Benefit$ formula for leaf nodes here. We need to come up with another definition that can handle this limit. An example complete definition for a minimax search based AI follows:

For a $State$ $s$,

$Benefit = \begin{cases}1000 & \text{if AI wins}\\ 0 & \text{if tie} \\ -1000 & \text{if AI loses} \\ \text{Count(AI's pieces) - Count(opponent's pieces)} & \text{if recursion limit reached}\\ max_{t \in s.NextStates}(t.Benefit) & \text{if s.Player is AI}\\ min_{t \in s.NextStates}(t.Benefit) & \text{if s.Player is not AI} \end{cases}$

Note that the first three cases are only executed if $s$ is a leaf node, the fourth case is executed only when we hit the limit on the number of moves we can look into the future, and the last cases are executed if none of the others are. Also notice that the values for the first and third case are much larger than the values that can be generated by the fourth case. This will help the AI "prioritize" nodes where we win over nodes where we are doing better than the opponent.

Code

We're going to make a tic tac toe AI that will always win or tie. We will be building this in JavaScript.

Here's what we'll end up with:

Play a few games before moving on! The AI is guaranteed to either win or tie.

Let's start by building the HTML table that we're interacting with:

After that, some basic CSS:

Let's start writing some Javascript! We'll store our board as a 2d array and create a variable to keep track of whose turn it is:

var board = [
    [null, null, null],
    [null, null, null],
    [null, null, null]
]

var myMove = false;

Let's create some helper functions. This one will check if someone won (and if so, return who won):

function getWinner(board) {

    // Check if someone won
    vals = [true, false];
    var allNotNull = true;
    for (var k = 0; k < vals.length; k++) {
        var value = vals[k];

        // Check rows, columns, and diagonals
        var diagonalComplete1 = true;
        var diagonalComplete2 = true;
        for (var i = 0; i < 3; i++) {
            if (board[i][i] != value) {
                diagonalComplete1 = false;
            }
            if (board[2 - i][i] != value) {
                diagonalComplete2 = false;
            }
            var rowComplete = true;
            var colComplete = true;
            for (var j = 0; j < 3; j++) {
                if (board[i][j] != value) {
                    rowComplete = false;
                }
                if (board[j][i] != value) {
                    colComplete = false;
                }
                if (board[i][j] == null) {
                    allNotNull = false;
                }
            }
            if (rowComplete || colComplete) {
                return value ? 1 : 0;
            }
        }
        if (diagonalComplete1 || diagonalComplete2) {
            return value ? 1 : 0;
        }
    }
    if (allNotNull) {
        return -1;
    }
    return null;
}

Here are a couple more self-explanatory helper functions:

function restartGame() {
    board = [
        [null, null, null],
        [null, null, null],
        [null, null, null]
    ];
    myMove = false;
    updateMove();
}

function updateMove() {
    updateButtons();
    var winner = getWinner(board);
    $("#winner").text(winner == 1 ? "AI Won!" : winner == 0 ? "You Won!" : winner == -1 ? "Tie!" : "");
    $("#move").text(myMove ? "AI's Move" : "Your move");
}

function updateButtons() {
    for (var i = 0; i < 3; i++) {
        for (var j = 0; j < 3; j++) {
            $("#c" + i + "" + j).text(board[i][j] == false ? "x" : board[i][j] == true ? "o" : "");
        }
    }
}

Now on to the meat of the code. This is just an implementation of the algorithm described above:

var numNodes = 0;

function recurseMinimax(board, player) {
    numNodes++;
    var winner = getWinner(board);
    if (winner != null) {
        switch(winner) {
            case 1:
                // AI wins
                return [1, board]
            case 0:
                // opponent wins
                return [-1, board]
            case -1:
                // Tie
                return [0, board];
        }
    } else {
        // Next states
        var nextVal = null;
        var nextBoard = null;

        for (var i = 0; i < 3; i++) {
            for (var j = 0; j < 3; j++) {
                if (board[i][j] == null) {
                    board[i][j] = player;
                    var value = recurseMinimax(board, !player)[0];
                    if ((player && (nextVal == null || value > nextVal)) || (!player && (nextVal == null || value < nextVal))) {
                        nextBoard = board.map(function(arr) {
                            return arr.slice();
                        });
                        nextVal = value;
                    }
                    board[i][j] = null;
                }
            }
        }
        return [nextVal, nextBoard];
    }
}

function makeMove() {
    board = minimaxMove(board);
    console.log(numNodes);
    myMove = false;
    updateMove();
}

function minimaxMove(board) {
    numNodes = 0;
    return recurseMinimax(board, true)[1];
}

Note that we can make this code more efficient using memoization, but this is not included in the post for the sake of simplicity. Basically, we'd cache the results of recurseMinimax so we'd only need to do the computation at most once for every (board, player) pair. Finally, let's set up all the listeners and start the game:

if (myMove) {
    makeMove();
}

$(document).ready(function() {
    $("button").click(function() {
        var cell = $(this).attr("id")
        var row = parseInt(cell[1])
        var col = parseInt(cell[2])
        if (!myMove) {
            board[row][col] = false;
            myMove = true;
            updateMove();
            makeMove();
        }
    });
    $("#restart").click(restartGame);
});

updateMove();

* this is only for cases where we can enumerate the entire tree (like in tic tac toe). In games like chess, there is no guarantee that the AI will not lose.

Check out the discussion on Hacker News!