CatchemAlphaZero: AI techniques for solving games

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Features

PyPI - Python Version

Installation

pip install catchem-alpha-zero

Project Overview

CatchemAlphaZero (CAZ) showcases techniques in artificial intelligence for solving two-player games without any human knowledge or input except for the rules of the game. Any two-player deterministic game can, in principle, leverage this framework. Little is needed beyond the ability to:

get the games current legal moves
play a move
determine if the game has terminated.

The project includes:

Alpha-Beta Minimax which is well suited to games with a small to medium sized decision tree (or games for which one can easily evaluate a position's likelihood of winning)
Monte Carlo Tree Search which uses random simulation to estimate the value of a state, and biases the search towards paths that seem most promising
AlphaZero: an attempt to replicate the brilliant work of DeepMind, who used deep convolutional neural networks in conjunction with Monte Carlo Tree SearchTo significantly improve upon the MCTS algorithm. The neural network learns the optimal policy entirely through self-play reinforcement learning.

Supported Games

Chess
ConnectX (Connect 4 for boards of arbitrary sizes)
Tic Tac Toe

Play Chess Against `CatchemAlphaZero`

The crowning glory of this project is chess. CatchemAlphaZero bundles a GUI application with the wheel, alpha, installed into your virtual environments Scripts folder. To launch the GUI, either click the application within the scripts folder or type alpha into the command line (having activated your virtual environment). The application will launch with a splashscreen presenting the following options:

Play as White
Play as Black
Play as Both
Spectate Mode (watch CAZ play against itself)

Key features

MCTS Num Sims: Increase the size of the Monte Carlo Tree Search to increase the difficulty (1,000 sims corresponds to a chess ELO of about 1,800).
Show Eval Bar to see how CAZ evaluates the current position
Asynchronous Ponder: CAZ is fully asynchronous and will ponder on your thinking time. CAZ efficiently recycles the tree search between moves and explores whilst its opponent is thinking. Take time to think but remember that CAZ is thinking too!
Watch: Click here to watch a demo of the application.

How strong is CAZ?

The current Elo of CAZ is not precisely known (and obviously varies with increasing number of MCTS simulations). The ultimate plan is to integrate CAZ as a Lichess bot. For now, the strength of CAZ has been calculated by playing friends and colleagues who (are far better at chess than me and) have generously offered to play the bot. To date, CAZ has not lost on 1,000 simulations and Chess.com consistently estimates its Elo to be in the range 1,900 to 2,300. My suspicion is that it is at the lower end of this spectrum - like AlphaZero, CAZ is strong at strategic play and closed positions but struggles with tactical play and endgames (where technique matters more).

At the time of writing, the 'state of the art' is to use an Efficiently Updating Sparse Flat Neighborhood Network (SF NNUE). This is a smaller, faster CPU bound network that allows for deeper searches and therefore shines in tactical play as well as strategic play. This project uses a large convolutional ResNet architecture exactly as per DeepMind's specification. It works best on a GPU (but CPU only runs are supported as well).

A friend (white) vs CAZ (Black) 0-1 under classical conditions. CAZ wins via resignation. Chess.com estimated the Elo scores as 1,800 (white), 2,350 (black).

It's possible to train CAZ directly but if you'd like to use the weights that I have produced, you can download them from here. CAZ assumes that the weights are saved in a weights folder within the active current directory.

How it works

CatchemAlphaZero is a project that explores artificial intelligence techniques for two player games. The project started with minimax, which was then extended to alpha-beta minimax. MCTS was added to support games where leaf nodes could not be reached via brute force search. AlphaZero extends MCTS by using neural networks to both guide the search and provide and evaluation of each position rather than entering the rollout phase.

CAZ produces beautiful visualisations for each game. In particular, it is possible to render the full tree search as well as the policy & evaluation associated with each state. The image below shows the output of a tree search with 10 simulations (please note that CAZ assumes that you have the GraphViz application already installed.)

Rendered with ❤️ using CAZ. See the tutorial for details on how.

CAZ also renders each game state as HTML and will show the policy associated with each state as an HTML grid with a heatmap overlay. The darker squares correspond to squares favoured by the policy. The evaluation shows that CAZ believes it has a 100% chance of winning from here and correctly identifies the three moves that can lead to a win.

Heatmap policy for Tic-Tac-Toe. States and policied states are understood natively by IPython.

Learning through self-play reinforcement learning

CAZ is able to learn games entirely through self-play reinforcement learning. The idea is to use MCTS to explore a sample of the search space that seems most promising. At each step, the MCTS is guided by a policy retrieved from the neural network and evaluates a position using the neural network's evalution result. The neural network is a two-headed (policy head and value head) ResNet architecture that is described in detail in DeepMind's Nature paper. Below, is an illustration of the architecture used for TicTacToe. For chess, the architecture uses 19 residual network blocks:

Visualisation of the network architecture for TicTacToe using Tensorboard.

To improve the neural network, CAZ learns through self-play. The steps are:

Play a game against itself relying on MCTS and the neural network
At each step, improve the current policy via MCTS - MCTS is a policy improvement operator
Record the outcome of the match after the game is played to completion
Play 100s of matches and batch the results together
Train the neural network against the match results. In particular, learn an improved policy based on the MCTS. Additionally, improve the network's evaluation function by assigning a score {-1, 0, +1} to each state depending upon the final outcome.

If you'd like to train the network yourself, check out the command line interface below.

Command line Interface Features

CAZ exposes five entry points via the command line: run, learn, kaggle, super, hyper

Run CAZ from the terminal to see it play a game
Learn through self-play if you wish to train a netowrk
Kaggle enables you to run a Kaggle compliant agent that complies with the Kaggle Simulation Competitions API
super trains a neural network in chess through supervised learning. Whilst it is entirely possible to learn solely through self-play reinforcement learning, it would take a very long time to do this exclusively for chess (Deepmind used 5,000 TPUs for training which is out of my budget!). Supervised learning was used to accelerate the learning process specifically for chess but not for any other game.
Hyper performs a search over CAZ's various hyperparameters to ensure that it is learning optimally as training progresses

Examples of how to run the various entry points are shown below:

caz run --game=tic --player1=mcts --player2=az # MCTS vs AlphaZero algorithm.
caz run --game=connectx --player1=mcts --player2=mcts
caz run --game=chess --player1=az --player2=az
caz learn --game=connectx
caz kaggle --game=connectx --player1=mcts --player2=mcts
caz super
caz hyper --game=connectx --gen=53

Chess Representation

CAZ follows Deepmind's proposal for representing input features and the action space per the follow up paper. In particular, a state is represented by an 8x8x20 input tensor where the 20 planes correspond to:

Current player's pawn positions
Current player's rook positions
Current player's knight positions
Current player's bishop positions
Current player's queen positions
Opposing player's king positions
Opposing player's pawn positions
Opposing player's rook positions
Opposing player's knight positions
Opposing player's bishop positions
Opposing player's queen positions
Opposing player's king positions
8x8 matrix of 0's or 1's denoting the player's turn (all zeroes or all ones)
8x8 matrix of 0's or 1's denoting current player's kingside castling rights
8x8 matrix of 0's or 1's denoting current player's queenside castling rights
8x8 matrix of 0's or 1's denoting opposing player's kingside castling rights
8x8 matrix of 0's or 1's denoting opposing player's queenside castling rights
8x8 matrix of showing the halfmove clock
indicators for the position of any en passant squares
8x8 matrix denoting whether the state is a two-fold repetition

The action space is an 8x8x73 tensor where each 8x8 plane corresponds to:

x7 N moves (the maximum number of squares you can move north is 7)
x7 NE moves (again, moving 1-7 moves in a north-east direction)
x7 E moves
x7 SE moves
x7 S moves
x7 SW moves
x7 W moves
x7 NW moves
Knight move NNE (i.e. one square east and two north)
Knight move ENE (i.e. two square east and two north)
Knight move ESE
Knight move SSE
Knight move SSW
Knight move WSW
Knight move WNW
Knight move NNW
Promote NW Knight (underpromote to a knight by capturing north-west)
Promote N Knight (underpromote to a knight by advancing north)
Promote NE Knight
Promote NW Rook
Promote N Rook
Promote NE Rook
Promote NW Bishop
Promote N Bishop
Promote NE Bishop

Visualisation of the observation space (left) and action space (right). Here, CAZ is set to show the top two moves which align with common theory for the Fried Liver Attack. You can tweak the slider to show more/fewer moves or interrogate top moves by action plane.

Tutorial

CAZ is not, nor ever intends to be a top chess engine*. Rather, it is a showcase in AI techniques and my attempt to reproduce Deepmind's work. If you'd like to know more about the inner working of self-play reinforcement learning, check out the tutorials. They contain beautifully rendered visualisations to help you understand the way the project works. The tutorials have been organised into four sections:

MCTS Demo which showcases MCTS including the UCB formula. You can render search trees to see it in action
Convolutional Layers which shows how convolutional layers work and can combine to transform the 'images' of the game
Alpha Zero where you can see CatchemAlphaZero's policies and play against it within an IPython setting
Chess - which illustrates the exact details of the chess setup. You can interrogate the network and play againt it here (although GUI app preferred)

* However, I am a glutton for punishment and will probably rewrite this project in C++ to make it stronger at some point...

Thanks for making it this far!

CatchemAL

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.0.4

Jul 2, 2023

0.0.3

Jul 2, 2023

0.0.2

Jul 2, 2023

0.0.1

Jul 2, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

catchem-alpha-zero-0.0.4.tar.gz (218.1 kB view hashes)

Uploaded Jul 2, 2023 Source

Built Distribution

catchem_alpha_zero-0.0.4-py3-none-any.whl (229.9 kB view hashes)

Uploaded Jul 2, 2023 Python 3

Hashes for catchem-alpha-zero-0.0.4.tar.gz

Hashes for catchem-alpha-zero-0.0.4.tar.gz
Algorithm	Hash digest
SHA256	`a6dc4210466abb32dca8f4590da7674bddcdb17f52fa802abffb90aaac3024d7`
MD5	`0d1edc64b500abf4d0e8d591839c6935`
BLAKE2b-256	`40b4008c39092a2b72f92cd227b0c6020e3ed6a5dc81c14c8eb58d06ecf158e1`

Hashes for catchem_alpha_zero-0.0.4-py3-none-any.whl

Hashes for catchem_alpha_zero-0.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4dd881a3a560ad5721f8ee4a853c7fd7496a71a1b4108c98f7483c6038c360cc`
MD5	`7d40c7d07147092aa3b01102a5644465`
BLAKE2b-256	`4d2a9830bbceeac994c2bd77518153ab8f12b0444f3bd778695d9cbc83463365`

catchem-alpha-zero 0.0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Features