Skip to main content

Play the Squadro Board Game against Someone Else or an AI

Project description

Squadro

Release CI CD Coverage Downloads License: MIT

Documentation

Go to my website for a visual and qualitative description.

Other games?

The code is modular enough to be easily applied to other games. To do so, you must implement its state in state.py, and make a few other changes in the code base depending on your needs. Please raise an issue if discussion is needed.

Demo

Alt Text

Installation

[!TIP] If running on a Linux machine without intent to use a GPU, run this beforehand to install only the CPU version of the pytorch library:

pip install torch --index-url https://download.pytorch.org/whl/cpu

The most straightforward way is to simply install it from PyPI via:

pip install squadro

If you want to install it from source, which is necessary for development, follow the instructions here.

If some dependencies release changes that break the code, you can install the project from its lock file—which fixes the dependency versions to ensure reproducibility:

pip install -r requirements.txt

Usage

This package can be used in the following ways:

Play

You can play against someone else or many different types of computer algorithms. See the Agents section below for more details.

[!TIP] If you run into the following error on a Linux machine when launching the game:

libGL error: failed to load driver

Then try setting the following environment variable beforehand:

export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libstdc++.so.6

Play against another human

To play the game with someone else, run the following command:

import squadro

squadro.GamePlay(n_pawns=5, first=None).run()

To access all the parameters to play, see the doc:

help(squadro.GamePlay.__init__)  # for the arguments to RealTimeAnimatedGame

Play against the computer

To play against the computer, set agent_1 to one of the squadro.AVAILABLE_AGENTS.

For instance:

squadro.GamePlay(agent_1='random').run()

[!TIP] To play against our best algorithm, run:

squadro.GamePlay(agent_1='best').run()

Let us know if you ever beat it!

Play against your trained AI

After training your AI as described in the Training section, you can play against her using:

</code></pre>
<h4>Play against a benchmarked AI</h4>
<p>If you do not want to train a model, as described in the <a href="#Training">Training</a> section, you can still play against a benchmarked model available online. After passing <code>init_from='online'</code>, you can set <code>model_path</code> to any of those currently supported models:</p>
<table>
<thead>
<tr>
<th><code>model_path</code></th>
<th># layers</th>
<th># heads</th>
<th>embed dims</th>
<th># params</th>
<th>size</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>...</code></td>
<td>12</td>
<td>12</td>
<td>768</td>
<td>124M</td>
<td>500 MB</td>
</tr>
</tbody>
</table>
<p>Note that the first time you use a model, it needs to be downloaded from the internet; so it can take a few minutes.</p>
<p>Example:</p>
<pre lang="python"><code>...

Agents

Most computer algorithms discretize the game into states and actions. Here, the state is the position of the pawns and the available actions are the possible moves of the pawns.

Squadro is a finite state machine, meaning that the next state of the game is completely determined by the current state and the action played. With this definition, one can see that the game is a Markov Decision Process (MDP). At each state, the current player can play different actions, which lead to different states. Then the next player can play different actions from any of those new states, etc. The future of the game can be represented as a tree, whose branches are the actions that lead to different states.

An algorithm can explore that space of possibilities to infer the best move to play now. As the tree is huge, it is not possible to explore all the possible paths until the end of the game. Typically, they explore only a small fraction of the tree and then use the information gathered from those states to make a decision. More precisely, those two phases are:

  • State exploration: exploring the space of states by a careful choice of actions. The most common exploration methods are Minimax and Monte Carlo Tree Search (MCTS). Minimax explores all the states up to a specific depth, while MCTS navigates until it finds a state that has not been visited yet. Minimax can be sped up by skipping the search in the parts of the tree that won't affect the final decision; this method is called alpha-beta pruning.
  • State evaluation: evaluating a state. If we have a basic understanding of the game and how to win, one can design a heuristic (state evaluation function) that gives an estimate of how good it is to be in that state / position. Otherwise, it can often be better to use a computer algorithm to evaluate the state.
    • The simplest algorithm to estimate the state is to randomly let the game play until it is over (i.e., pick random actions for both players). When played enough times, it can give the probability to win in that state.
    • More complex, and hence accurate, algorithms are using reinforcement learning (AI). They learn from experience by storing information about each state/action in one of:
      • Q value function, a lookup table for each state and action;
      • deep Q network (DQN), a neural network that approximates the Q value function, which is necessary when the state space is huge (i.e., cannot be stored in memory).

List of available agents:

  • human: another local human player (i.e., both playing on the same computer)
  • random: a computer that plays randomly among all available moves
  • ab_relative_advancement: a computer that lists the possible moves from the current position and evaluates them directly (i.e., it "thinks" only one move ahead), where the evaluation function is the player's advancement
  • relative_advancement: a computer that lists the possible moves from the current position and evaluates them directly (i.e., it "thinks" only one move ahead), where the evaluation function is the player's advancement compared to the other player
  • ab_relative_advancement: a computer that plays minimax with alpha-beta pruning (depth ~4), where the evaluation function is the player's advancement compared to the other player
  • mcts_advancement: Monte Carlo tree search, where the evaluation function is the player's advancement compared to the other player
  • mcts_rollout: Monte Carlo tree search, where the evaluation function is determined by a random playout until the end of the game
  • mcts_q_learning: Monte Carlo tree search, where the evaluation function is determined by a lookup table
  • mcts_deep_q_learning: Monte Carlo tree search, where the evaluation function is determined by a convolutional neural network

You can also access the most updated list of available agents with:

import squadro

print(squadro.AVAILABLE_AGENTS)

Training

One can train a model using reinforcement learning (RL) algorithms. Currently, Squadro supports two such algorithms:

Q-Learning

One needs to train a lookup table mapping each state to its value.

import squadro

squadro.logger.setup(section='training')

trainer = squadro.QLearningTrainer(
    n_pawns=3,
    lr=.3,
    eval_steps=100,
    eval_interval=300,
    n_steps=100_000,
    parallel=8,
    model_path='path/to/model'
)
trainer.run()

It should take a few hours to train on a typical CPU (8-16 cores).

Note that there are many more parameters to tweak, if desired. See all of them in the doc:

help(squadro.QLearningTrainer)

Deep Q-Learning

Here the state-action value is approximated by a neural network.

It should take a few hours to train on a typical CPU (8-16 cores), and it is much faster on a GPU.

It will stop training when the evaluation loss stops improving. Once done, one can use the model; see the next section below (setting the appropriate value for model_path, e.g., '...').

Simulations

You can simulate a game between two computer algorithms. Set agent_0 and agent_1 to any of the AVAILABLE_AGENTS above and run:

game = squadro.Game(agent_0='random', agent_1='random')
game.run()
print(game)
game.to_file('game_results.json')

Animations

You can render an animation of a game between two computer algorithms. Press the left and right keys to navigate through the game.

game = squadro.Game(agent_0='random', agent_1='random')
squadro.GameAnimation(game).show()

Tests

pytest squadro

Feedback

For any issue / bug report / feature request, open an issue.

Contributions

To provide upgrades or fixes, open a pull request.

Contributors

Contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

squadro-1.0.1.tar.gz (88.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

squadro-1.0.1-py3-none-any.whl (113.8 kB view details)

Uploaded Python 3

File details

Details for the file squadro-1.0.1.tar.gz.

File metadata

  • Download URL: squadro-1.0.1.tar.gz
  • Upload date:
  • Size: 88.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for squadro-1.0.1.tar.gz
Algorithm Hash digest
SHA256 40493127598325f339260b5efde263b514e6e2233ca543d31ed81efd23b181ae
MD5 878c2ffceb6ad373e489dc3a4b038817
BLAKE2b-256 f2477fa6f70c18d5492ca5e93c4ae1e4b39eefd9c8340856bf124bb1bd79f2c4

See more details on using hashes here.

File details

Details for the file squadro-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: squadro-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 113.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for squadro-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3fc4483bb78ab718eb8521b10b255a9855c2eb66ccb371247f9572afa8e9e38f
MD5 80baa5c6b21305b86732dd7f9916d2d7
BLAKE2b-256 609c70c1b02f278fb14d46171014f6e918a5f298ed28922234ed3af733c1bfae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page