Skip to main content

OpenAI Gym environments for adversarial games for the operation beat ourselves organisation.

Project description

adversarial-gym

Adversarial gym hosts a range of adversarial turn based games within the OpenAI gym framework. The games currently supported are:

  1. Chess
  2. TicTacToe

Installation

Depending on the use case you can install in developer mode or using pypi.

Use the package manager pip to install adversarial_gym

pip install adversarial-gym

Install from Source

Installation from source can be used to edit the environments.This is useful when developing or if your use case requires changes to the current API

cd Dir/To/Install/In
git clone git@github.com:OperationBeatMeChess/adversarial-gym.git
cd adversarial-gym
pip install -e .

Usage

import gym
import adversarial_gym

# env = gym.make("Chess-v0", render_mode='human')
env = gym.make("TicTacToe-v0", render_mode='human')
print('reset')
env.reset()
terminal = False
while not terminal:
    action = env.action_space.sample()
    observation, reward, terminal, truncated, info = env.step(action)
env.close()

Adversarial Environment API

Each adversarial api follows the structure of the defined base class. This API has a few small additions to the standard OpenAI gym environment to help with the turn based structure of adversarial games. The basic adversarial API follows the below criteria:

class AdversarialEnv(gym.Env):
    """Abstract Adversarial Environment"""

    @abstractproperty
    def current_player(self):
        """
        Returns:
            current_player: Returns identifier for which player currently has their turn.
        """
        pass

    @abstractproperty
    def previous_player(self):
        """
        Returns:
            previous_player: Returns identifier for which player previously had their turn.
        """
        pass

    @abstractproperty
    def starting_player(self):
        """
        Returns:
            starting_player: Returns identifier for which player started the game.
        """
        pass

    @abstractmethod
    def get_string_representation(self):
        """
        Returns:
            board_string: Returns string representation of current game state.
        """
        pass
    
    @abstractmethod
    def set_string_representation(self, board_string):
        """
        Input:
            board_string: sets game state to match the string representation of board_string.
        """
        pass

    @abstractmethod
    def _get_canonical_observation(self):
        """
        Returns:
            canonical_state: returns canonical form of board. The canonical form
                            should be independent of players turn. For e.g. in chess,
                            the canonical form can be chosen to be from the pov
                            of white. When the player is white, we can return
                            board as is. When the player is black, we can invert
                            the colors and return the board.
            current_player: returns indentifier of which player is the current player in the canonicial state. 
                            This is used to decode the invariant canonical form.
        """
        pass  

    @abstractmethod
    def _game_result(self):
        """
        Returns:
            winner: returns None when game is not finished else returns int value 
                    for the winning player or draw.
            reward: Reward value given the game result. Should not consider the player who won.
               
        """
        pass

    @abstractmethod
    def _do_action(self, action):
        """
        Input:
            action: Execute action from current game state.
        """
        pass
    
    @abstractmethod
    def _reset_game(self):
        """
        Reset the state of the game to the initial state. 
        This includes reseting the current player to the starting player.
        """

    @abstractmethod
    def _get_frame(self):
        """
        Returns:
            frame: returns py_game frame for the current state of the game. 
            This will be used by render to render the frame for human visualization
               
        """
        pass

    @abstractmethod
    def _get_img(self):
        """
        Returns:
            img: returns rgb_array of the image for the current state of the game. 
               
        """
        pass

    def game_result(self):
        return self._game_result()[0]

    def skip_next_human_render(self):
        """
        Skips the next automatic human render in step or reset. 
        Used for rollouts or similar non visualized moves.
        """
        self.skip_next_render = True

    def step(self, action):
        self._do_action(action)
        observation = self._get_canonical_observation()
        info = self._get_info()
        result, reward = self._game_result()
        terminated = result is not None

        if self.render_mode == "human":
            self.render()

        return observation, reward, terminated, False, info

    def reset(self, seed=None, options=None):
        super().reset(seed=seed)

        self._reset_game()
        observation = self._get_canonical_observation()
        info = self._get_info()

        if self.render_mode == "human":
            self.render()
        
        return observation, info
        
    def render(self):
        if self.render_mode == "human":
            
            if self.clock is None:
                self.clock = pygame.time.Clock()
            if self.window is None:
                pygame.init()
                pygame.display.init()
                self.window = pygame.display.set_mode((self.render_size, self.render_size))

            canvas = self._get_frame()
            # The following line copies our drawings from `canvas` to the visible window
            self.window.blit(canvas, canvas.get_rect())
            pygame.display.update()
            # We need to ensure that human-rendering occurs at the predefined framerate.
            # The following line will automatically add a delay to keep the framerate stable.
            self.clock.tick(self.metadata["render_fps"])

        elif self.render_mode == "rgb_array":
            return self._get_img()

The major differences between a standard gym environment and the adversarial environment is the adversarial environment keeps track of both the game state and each players state. In other words we must know which player is currently making a move and the state which corresponds with this player. Additionally this must be expressed in the result of the game.

Additional features which were added for convenience were the ability to hash the environment state with a string representation (useful for representing the game as an action tree where each hashed state can search some position). Also, there are a few private member functions required for step and reset.

Finally, there are two functions used for rendering the pygame window or getting the rgb_array of state.

This adversarial environment is then also paired with its corresponding adversarial Action_Space. This is required because most games have a subset of the total moves which are legal dependent on the current state of the game. This means it is non trivial to represent the move space with the vanilla gym spaces. To work around this while staying compliant with OpenAI gym API we created the following action space.

class AdversarialActionSpace(gym.spaces.Space):

    def sample(self):
        actions = self.legal_actions
        return actions[np.random.randint(len(actions))]

    def contains(self, action, is_legal=True):
        is_contained = action in range(self.action_space_size())
        and_legal = action in self.legal_actions if is_legal else True
        return is_contained and and_legal

    @abstractproperty
    def legal_actions(self):
        """
        Returns:
            legal_actions: Returns a list of all the legal moves in the current position.
        """
        pass
    
    @abstractproperty
    def action_space_size(self):
        """
        Returns:
            action_space_size: returns the number of all possible actions.
        """
        pass

The action space is assumed to be a value in the set

{1, 2, 3, 4, ..., total_number_actions}

This means the action space is linear. however, we will have to decode the action into its corresponding move in which ever game. The legal actions will then just be a mask of which actions in the total set of actions can be played in any position. The action space size is just the total_number_actions.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adversarial_gym-0.0.1.tar.gz (11.8 kB view details)

Uploaded Source

Built Distribution

adversarial_gym-0.0.1-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file adversarial_gym-0.0.1.tar.gz.

File metadata

  • Download URL: adversarial_gym-0.0.1.tar.gz
  • Upload date:
  • Size: 11.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for adversarial_gym-0.0.1.tar.gz
Algorithm Hash digest
SHA256 4d28d1fbbda7fd50ce0a9dd5f454192a2bcb1e2f12cead82d30266284333ae2b
MD5 2a963b4d92a225a9cfddfe08d8112f55
BLAKE2b-256 64eab0361dae3b5fb4b104b45c09808d5427fe7e5e79bf366404fe74976748e6

See more details on using hashes here.

File details

Details for the file adversarial_gym-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for adversarial_gym-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7ce5fa49e3b9a545f646542c0714359f760653120f43f30ef3416fc80c1043ff
MD5 e502d27c357cf15327ca1bf152e160fb
BLAKE2b-256 f05c5836e9328cd6759fe0ac052544a101c409762005f6c78f2e40905c52e7c4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page