Skip to main content

Play the Squadro Board Game against Someone Else or an AI

Project description

from squadro.state.evaluators.ml import ModelConfigfrom squadro.agents.montecarlo_agent import MonteCarloDeepQLearningAgent# Squadro

Release CI CD Coverage Downloads License: MIT

Documentation

Squadro is a two-player board game on a 5x5 board. The goal is to have four of our pawns perform a return trip before the opponent. Each pawn has a respective speed given by the number of dots (1–3) at their starting position. If an opponent's pawn crosses one of my pawns, then my pawn returns to the side of the board.

Visit my website for a visual and qualitative description.

Demo

drawing

Other games?

The code is modular enough to be easily applied to other games. To do so, you must implement its state in state.py, and make a few other changes in the code base depending on your needs. Please raise an issue if discussion is needed.

Installation

[!TIP] If running on a Linux machine without intent to use a GPU, run this beforehand to install only the CPU version of the pytorch library:

pip install torch --index-url https://download.pytorch.org/whl/cpu

The most straightforward way is to simply install it from PyPI via:

pip install squadro

If you want to install it from source, which is necessary for development, follow the instructions here.

If some dependencies release changes that break the code, you can install the project from its lock file—which fixes the dependency versions to ensure reproducibility:

pip install -r requirements.txt

Usage

This package can be used in the following ways:

Play

You can play against someone else or many different types of computer algorithms. See the Agents section below for more details.

[!TIP] If you run into the following error on a Linux machine when launching the game:

libGL error: failed to load driver

Then try setting the following environment variable beforehand:

export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libstdc++.so.6

Play against another human

To play the game with someone else, run the following command:

import squadro

squadro.GamePlay(n_pawns=5, first=None).run()

To access all the parameters to play, see the doc:

help(squadro.GamePlay.__init__)  # for the arguments to RealTimeAnimatedGame

Play against the computer

To play against the computer, set agent_1 to one of the squadro.AVAILABLE_AGENTS.

For instance:

squadro.GamePlay(agent_1='random').run()

[!TIP] To play against our best algorithm, run:

squadro.GamePlay(agent_1='best').run()

Let us know if you ever beat it!

Play against your trained AI

After training your AI as described in the Training section, you can play against her using:

import squadro

agent = squadro.MonteCarloDeepQLearningAgent(model_path='path/to/model')
squadro.GamePlay(agent_1=agent).run()

Agents

Most computer algorithms discretize the game into states and actions. Here, the state is the position of the pawns and the available actions are the possible moves of the pawns.

Squadro is a finite state machine, meaning that the next state of the game is completely determined by the current state and the action played. With this definition, one can see that the game is a Markov Decision Process (MDP). At each state, the current player can play different actions, which lead to different states. Then the next player can play different actions from any of those new states, etc. The future of the game can be represented as a tree, whose branches are the actions that lead to different states.

An algorithm can explore that space of possibilities to infer the best move to play now. As the tree is huge, it is not possible to explore all the possible paths until the end of the game. Typically, they explore only a small fraction of the tree and then use the information gathered from those states to make a decision. More precisely, those two phases are:

  • State exploration: exploring the space of states by a careful choice of actions. The most common exploration methods are Minimax and Monte Carlo Tree Search (MCTS). Minimax explores all the states up to a specific depth, while MCTS navigates until it finds a state that has not been visited yet. Minimax can be sped up by skipping the search in the parts of the tree that won't affect the final decision; this method is called alpha-beta pruning.
  • State evaluation: evaluating a state. If we have a basic understanding of the game and how to win, one can design a heuristic (state evaluation function) that gives an estimate of how good it is to be in that state / position. Otherwise, it can often be better to use a computer algorithm to evaluate the state.
    • The simplest algorithm to estimate the state is to randomly let the game play until it is over (i.e., pick random actions for both players). When played enough times, it can give the probability to win in that state.
    • More complex, and hence accurate, algorithms are using reinforcement learning (AI). They learn from experience by storing information about each state/action in one of:
      • Q value function, a lookup table for each state and action;
      • deep Q network (DQN), a neural network that approximates the Q value function, which is necessary when the state space is huge (i.e., cannot be stored in memory).

List of available agents:

  • human: another local human player (i.e., both playing on the same computer)
  • random: a computer that plays randomly among all available moves
  • ab_relative_advancement: a computer that lists the possible moves from the current position and evaluates them directly (i.e., it "thinks" only one move ahead), where the evaluation function is the player's advancement
  • relative_advancement: a computer that lists the possible moves from the current position and evaluates them directly (i.e., it "thinks" only one move ahead), where the evaluation function is the player's advancement compared to the other player
  • ab_relative_advancement: a computer that plays minimax with alpha-beta pruning (depth ~4), where the evaluation function is the player's advancement compared to the other player
  • mcts_advancement: Monte Carlo tree search, where the evaluation function is the player's advancement compared to the other player
  • mcts_rollout: Monte Carlo tree search, where the evaluation function is determined by a random playout until the end of the game
  • mcts_q_learning: Monte Carlo tree search, where the evaluation function is determined by a lookup table
  • mcts_deep_q_learning: Monte Carlo tree search, where the evaluation function is determined by a convolutional neural network

You can also access the most updated list of available agents with:

import squadro

print(squadro.AVAILABLE_AGENTS)

Training

One can train a model using reinforcement learning (RL) algorithms. Currently, Squadro supports two such algorithms:

Q-Learning

One needs to train a lookup table mapping each state to its value.

import squadro

squadro.logger.setup(section='training')

trainer = squadro.QLearningTrainer(
    n_pawns=3,
    lr=.3,
    eval_steps=100,
    eval_interval=300,
    n_steps=100_000,
    parallel=8,
    model_path='path/to/model'
)
trainer.run()

It should take a few hours to train on a typical CPU (8-16 cores).

Note that there are many more parameters to tweak, if desired. See all of them in the doc:

help(squadro.QLearningTrainer)

Deep Q-Learning

Here the state-action value is approximated by a neural network.

import squadro

squadro.logger.setup(section=['training', 'benchmark'])

trainer = squadro.DeepQLearningTrainer(
    eval_games=50,
    eval_interval=300,
    backprop_interval=20,
    model_path='path/to/model',
    model_config=squadro.ModelConfig(),
    init_from=None,
    n_pawns=5,
)
trainer.run()

For three pawns, it should take a few hours to train on a typical CPU (8–16 cores), and it is much faster on a GPU. For five pawns, it may take a few days.

Once done, one can use the model; see the next section below (setting the appropriate value for model_path, e.g., '...').

Simulations

You can simulate a game between two computer algorithms. Set agent_0 and agent_1 to any of the AVAILABLE_AGENTS above and run:

game = squadro.Game(agent_0='random', agent_1='random')
game.run()
print(game)
game.to_file('game_results.json')

Animations

You can render an animation of a game between two computer algorithms. Press the left and right keys to navigate through the game.

game = squadro.Game(agent_0='random', agent_1='random')
squadro.GameAnimation(game).show()

Tests

pytest squadro

Feedback

For any issue / bug report / feature request, open an issue.

Contributions

To provide upgrades or fixes, open a pull request.

Contributors

Contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

squadro-1.0.2.tar.gz (88.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

squadro-1.0.2-py3-none-any.whl (114.2 kB view details)

Uploaded Python 3

File details

Details for the file squadro-1.0.2.tar.gz.

File metadata

  • Download URL: squadro-1.0.2.tar.gz
  • Upload date:
  • Size: 88.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for squadro-1.0.2.tar.gz
Algorithm Hash digest
SHA256 e58d78f74de0ae51867f5a40bfb9b4b110e78257664c9c64e7968c28cd134bed
MD5 c77ad47b6e2f52f15f2f764e19684413
BLAKE2b-256 1408142e7df853c3c5cd9b422116ebf8201995947762e1f8f9f83bee98921c69

See more details on using hashes here.

File details

Details for the file squadro-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: squadro-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 114.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for squadro-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6a093a9ece8957ce4315f5dfdef150e48080c5f800cd88d95ab82613a08a8338
MD5 3fdef1aafa51218a789fea1acbeff208
BLAKE2b-256 9122a6a5ab98ac1802421b6a0177469471f636c5dcbf4848bd5d365e14df3331

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page