Arnav Paruthi

Some white text so the formating works out

Ultimate Tic-Tac-Toe

Ultimate tic tac toe is very different from the original version. It takes the concept of getting three in a row, and adds another dimension.

Watch the video to find out about how it works!

The algorithm

AlphaZero

I used a reinforcement learning algorithm named AlphaZero to train the bot to play Ultimate Tic Tac Toe. The algorithm is capable of mastering any deterministic, fully observable game. These are games where taking a move always yields the same result, and the bot has knowledge of all relevant variables.

The algorithm uses a combination of a monte-carlo tree search and a neural network. The tree search looks ahead at various possible moves to estimate which move will yield the greatest chance of a win. It’s similar to how humans think ahead by imagining the various combinations of moves that could be made from a specific state.

The neural network is then trained towards the monte-carlo tree search’s predictions.

Read the article to learn more about how it works!

You can also download the github repo and run "comp_vs_human.py" to play against the bot.

Article Github Code

Training an AI to play Ultimate Tic-Tac-Toe

Ultimate Tic-Tac-Toe

AlphaZero