Skip to main content

Tic-Tac-Toe environment in OpenAI gym

Project description

Tic Tac Toe Game in OpenAI Gym

The 3D version of Tic Tac Toe is implemented as an OpenAI's Gym environment. The learning folder includes several Jupyter notebooks for deep neural network models used to implement a computer-based player.

Complexity

The traditional (2D) Tic Tac Toe has a very small game space (9^3). In comparison, the 3D version in this repo has a much larger space which is in the order of 81^3. This makes computer-based players using search and pruning techniques of the game space prohibitively expensive.

Rather, the current learning models are based on policy gradient and deep Q-learning. The DQN model has produced very promising results. Feel free to experience on your own and contribute if interested. The PG-based model needs more work :)

Contributions

The repo is also open for pull requests and collaborations both in game development as well as learning.

Dependencies

  • Base dependency: gym.
  • Plot-rendering dependencies: numpy, matplotlib.
  • DQN learning dependencies: tensorflow, numpy.

Installation

To install run:

# In your virtual environment
pip install gym-tictactoe

Usage

Currently 2 types of environments with different rendering modes are supported.

Textual rendering

To use textual rendering create environment as tictactoe-v0 like so:

import gym
import gym_tictactoe

def play_game(actions, step_fn=input):
  env = gym.make('tictactoe-v0')
  env.reset()

  # Play actions in action profile
  for action in actions:
    print(env.step(action))
    env.render()
    if step_fn:
      step_fn()
  return env

actions = ['1021', '2111', '1221', '2222', '1121']
_ = play_game(actions, None)

The output produced is:

Step 1:
- - -    - - -    - - -    
- - x    - - -    - - -    
- - -    - - -    - - -    

Step 2:
- - -    - - -    - - -    
- - x    - o -    - - -    
- - -    - - -    - - -    

Step 3:
- - -    - - -    - - -    
- - x    - o -    - - x    
- - -    - - -    - - -    

Step 4:
- - -    - - -    - - -    
- - x    - o -    - - x    
- - -    - - -    - - o    

Step 5:
- - -    - - -    - - -    
- - X    - o X    - - X    
- - -    - - -    - - o   

The winning sequence after gameplay: (0,2,1), (1,2,1), (2,2,1).

Plotted rendering

To use textual rendering create environment as tictactoe-plt-v0 like so:

import gym
import gym_tictactoe

def play_game(actions, step_fn=input):
  env = gym.make('tictactoe-plt-v0')
  env.reset()

  # Play actions in action profile
  for action in actions:
    print(env.step(action))
    env.render()
    if step_fn:
      step_fn()
  return env

actions = ['1021', '2111', '1221', '2222', '1121']
_ = play_game(actions, None)

This produces the following gameplay:

Step 1:

Step 2:

Step 3:

Step 4:

Step 5:

DQN Learning

The current models are under learning folder. See Jupyter notebook for a DQN learning with a 2-layer neural network and using actor-critic technique.

Sample game plays produced by the trained model (the winning sequence is (0,0,0), (1,0,0), (2,0,0)):

Project details


Release history Release notifications | RSS feed

This version

0.30

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

gym_tictactoe-0.30-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file gym_tictactoe-0.30-py3-none-any.whl.

File metadata

  • Download URL: gym_tictactoe-0.30-py3-none-any.whl
  • Upload date:
  • Size: 5.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.5

File hashes

Hashes for gym_tictactoe-0.30-py3-none-any.whl
Algorithm Hash digest
SHA256 389e9078990a1f1419dfe09f3881e75d0d8405cbc79f34ab7e470d3bdca91c0c
MD5 7ef844626c8d534fc9ab349744a8f527
BLAKE2b-256 6e29368a5dc8abc95ced695c458fa5bf8175f5941ed8404b2f0337bc331506d2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page