Play the Squadro Board Game against Someone Else or an AI
Project description
Squadro
Documentation
Go to my website for a visual and qualitative description.
Other games?
The code is modular enough to be easily applied to other games. To do so, you must implement its state in state.py, and make a few other changes in the code base depending on your needs. Please raise an issue if discussion is needed.
Demo
Installation
[!TIP] If running on a Linux machine without intent to use a GPU, run this beforehand to install only the CPU version of the
pytorchlibrary:pip install torch --index-url https://download.pytorch.org/whl/cpu
The most straightforward way is to simply install it from PyPI via:
pip install squadro
If you want to install it from source, which is necessary for development, follow the instructions here.
If some dependencies release changes that break the code, you can install the project from its lock file—which fixes the dependency versions to ensure reproducibility:
pip install -r requirements.txt
Usage
This package can be used in the following ways:
Play
You can play against someone else or many different types of computer algorithms. See the Agents section below for more details.
[!TIP] If you run into the following error on a Linux machine when launching the game:
libGL error: failed to load driver
Then try setting the following environment variable beforehand:
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libstdc++.so.6
Play against another human
To play the game with someone else, run the following command:
import squadro
squadro.GamePlay(n_pawns=5, first=None).run()
To access all the parameters to play, see the doc:
help(squadro.GamePlay.__init__) # for the arguments to RealTimeAnimatedGame
Play against the computer
To play against the computer, set agent_1 to one of the squadro.AVAILABLE_AGENTS.
For instance:
squadro.GamePlay(agent_1='random').run()
[!TIP] To play against our best algorithm, run:
squadro.GamePlay(agent_1='best').run()Let us know if you ever beat it!
Play against your trained AI
After training your AI as described in the Training section, you can play against her using:
</code></pre>
<h4>Play against a benchmarked AI</h4>
<p>If you do not want to train a model, as described in the <a href="#Training">Training</a> section, you can still play against a benchmarked model available online. After passing <code>init_from='online'</code>, you can set <code>model_path</code> to any of those currently supported models:</p>
<table>
<thead>
<tr>
<th><code>model_path</code></th>
<th># layers</th>
<th># heads</th>
<th>embed dims</th>
<th># params</th>
<th>size</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>...</code></td>
<td>12</td>
<td>12</td>
<td>768</td>
<td>124M</td>
<td>500 MB</td>
</tr>
</tbody>
</table>
<p>Note that the first time you use a model, it needs to be downloaded from the internet; so it can take a few minutes.</p>
<p>Example:</p>
<pre lang="python"><code>...
Agents
Most computer algorithms discretize the game into states and actions. Here, the state is the position of the pawns and the available actions are the possible moves of the pawns.
Squadro is a finite state machine, meaning that the next state of the game is completely determined by the current state and the action played. With this definition, one can see that the game is a Markov Decision Process (MDP). At each state, the current player can play different actions, which lead to different states. Then the next player can play different actions from any of those new states, etc. The future of the game can be represented as a tree, whose branches are the actions that lead to different states.
An algorithm can explore that space of possibilities to infer the best move to play now. As the tree is huge, it is not possible to explore all the possible paths until the end of the game. Typically, they explore only a small fraction of the tree and then use the information gathered from those states to make a decision. More precisely, those two phases are:
- State exploration: exploring the space of states by a careful choice of actions. The most common exploration methods are Minimax and Monte Carlo Tree Search (MCTS). Minimax explores all the states up to a specific depth, while MCTS navigates until it finds a state that has not been visited yet. Minimax can be sped up by skipping the search in the parts of the tree that won't affect the final decision; this method is called alpha-beta pruning.
- State evaluation: evaluating a state. If we have a basic understanding of the game and how to win, one can design a heuristic (state evaluation function) that gives an estimate of how good it is to be in that state / position. Otherwise, it can often be better to use a computer algorithm to evaluate the state.
- The simplest algorithm to estimate the state is to randomly let the game play until it is over (i.e., pick random actions for both players). When played enough times, it can give the probability to win in that state.
- More complex, and hence accurate, algorithms are using reinforcement learning (AI). They learn from experience by storing information about each state/action in one of:
- Q value function, a lookup table for each state and action;
- deep Q network (DQN), a neural network that approximates the Q value function, which is necessary when the state space is huge (i.e., cannot be stored in memory).
List of available agents:
- human: another local human player (i.e., both playing on the same computer)
- random: a computer that plays randomly among all available moves
- ab_relative_advancement: a computer that lists the possible moves from the current position and evaluates them directly (i.e., it "thinks" only one move ahead), where the evaluation function is the player's advancement
- relative_advancement: a computer that lists the possible moves from the current position and evaluates them directly (i.e., it "thinks" only one move ahead), where the evaluation function is the player's advancement compared to the other player
- ab_relative_advancement: a computer that plays minimax with alpha-beta pruning (depth ~4), where the evaluation function is the player's advancement compared to the other player
- mcts_advancement: Monte Carlo tree search, where the evaluation function is the player's advancement compared to the other player
- mcts_rollout: Monte Carlo tree search, where the evaluation function is determined by a random playout until the end of the game
- mcts_q_learning: Monte Carlo tree search, where the evaluation function is determined by a lookup table
- mcts_deep_q_learning: Monte Carlo tree search, where the evaluation function is determined by a convolutional neural network
You can also access the most updated list of available agents with:
import squadro
print(squadro.AVAILABLE_AGENTS)
Training
One can train a model using reinforcement learning (RL) algorithms. Currently, Squadro supports two such algorithms:
Q-Learning
One needs to train a lookup table mapping each state to its value.
import squadro
squadro.logger.setup(section='training')
trainer = squadro.QLearningTrainer(
n_pawns=3,
lr=.3,
eval_steps=100,
eval_interval=300,
n_steps=100_000,
parallel=8,
model_path='path/to/model'
)
trainer.run()
It should take a few hours to train on a typical CPU (8-16 cores).
Note that there are many more parameters to tweak, if desired. See all of them in the doc:
help(squadro.QLearningTrainer)
Deep Q-Learning
Here the state-action value is approximated by a neural network.
It should take a few hours to train on a typical CPU (8-16 cores), and it is much faster on a GPU.
It will stop training when the evaluation loss stops improving. Once done, one can use the model; see the next section below (setting the appropriate value for model_path, e.g., '...').
Simulations
You can simulate a game between two computer algorithms. Set agent_0 and agent_1 to any of the AVAILABLE_AGENTS above and run:
game = squadro.Game(agent_0='random', agent_1='random')
game.run()
print(game)
game.to_file('game_results.json')
Animations
You can render an animation of a game between two computer algorithms. Press the left and right keys to navigate through the game.
game = squadro.Game(agent_0='random', agent_1='random')
squadro.GameAnimation(game).show()
Tests
pytest squadro
Feedback
For any issue / bug report / feature request, open an issue.
Contributions
To provide upgrades or fixes, open a pull request.
Contributors
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file squadro-0.1.2.tar.gz.
File metadata
- Download URL: squadro-0.1.2.tar.gz
- Upload date:
- Size: 88.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
75c0f7338a1f86b66da3678e45821a3150fe5574660fdbd0ae1472034347809c
|
|
| MD5 |
cf8a3a1b4afe4df3108923cea194f50d
|
|
| BLAKE2b-256 |
8249e4e54100761083c8704ce4db3ccf9ad2258ce0898eab1ba27156716b5dd2
|
File details
Details for the file squadro-0.1.2-py3-none-any.whl.
File metadata
- Download URL: squadro-0.1.2-py3-none-any.whl
- Upload date:
- Size: 5.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
083ea37218dfe57d11b72858e5f961100395ad34347da069160fb0eeefd1425c
|
|
| MD5 |
419ad84eff456bf27a3905d38d437da1
|
|
| BLAKE2b-256 |
0ccc3a75a2ac59e0b6f90a86fdd4a48dd8720b89434e19d0b7b2a171c2a90a2c
|