Skip to main content

A no limit hold'em environment for training RL agents.

Project description

Pokerenv

Pokerenv is an openAI gym (https://gym.openai.com/docs/) compliant reinforcement learning environment for No Limit Texas Hold'em. It supports 2-6 player tables.

The environment can be configured to output hand history files, which can be viewed with any pokerstars compatible tracking software (holdem manager, pokertracker, etc.), allowing you to easily track the learning process.

Installation and dependencies

pip install numpy
pip install treys
pip install pokerenv

Toy example

Define an agent

import numpy as np
import pokergym.obs_indices as indices
from pokergym.table import Table
from pokergym.common import PlayerAction, Action, action_list


class ExampleRandomAgent:
    def __init__(self):
        self.actions = []
        self.observations = []
        self.rewards = []

    def get_action(self, observation):
        # Only save the observation if it corresponds to an actual choice, 
        # not if the action to be taken is "don't care"
        if not observation[indices.ACTION_DONT_CARE]:
            self.observations.append(observation)
            valid_actions = np.argwhere(observation[indices.VALID_ACTIONS] == 1).flatten()
            valid_bet_low = observation[indices.VALID_BET_LOW]
            valid_bet_high = observation[indices.VALID_BET_HIGH]
            chosen_action = PlayerAction(np.random.choice(valid_actions))
            bet_size = 0
            if chosen_action is PlayerAction.BET:
                bet_size = np.random.uniform(valid_bet_low, valid_bet_high)
            table_action = Action(chosen_action, bet_size)
            self.actions.append(table_action)
        else:
            # If the action is "don't care", 
            # we are only feeding dummy actions to get the final end of hand rewards back
            table_action = Action(PlayerAction.CHECK, 0)
        return table_action

Create an environment

active_players = 6
agents = [ExampleRandomAgent() for _ in range(6)]
random_seed = 1
low_stack_bbs = 50
high_stack_bbs = 200
hh_location = 'hands/'
invalid_penalty = 0
table = Table(active_players, random_seed, low_stack_bbs, high_stack_bbs, hh_location, invalid_penalty)

Implement learning loop

iteration = 1
while True:
    if iteration == 50:
        table.hand_history_enabled = True
        iteration = 0
    # Set a random number of players for each hand
    table.n_players = np.random.randint(2, 7)
    obs = table.reset()
    next_acting_player = int(obs[indices.ACTING_PLAYER])
    while True:
        action = agents[next_acting_player].get_action(obs)
        action_dont_care = obs[indices.ACTION_DONT_CARE]
        obs, reward, finished, _ = table.step(action)

        if not action_dont_care:
            # If the action was not a "don't care", the reward corresponds to the action that we just took
            agents[next_acting_player].rewards.append(reward)
        else:
            # If the action was a "don't care", 
            # the reward is a delayed end of game reward which should correspond to the last valid action
            agents[next_acting_player].rewards[-1] += reward
        if finished:
            break
        next_acting_player = int(obs[indices.ACTING_PLAYER])
    iteration += 1
    table.hand_history_enabled = False

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pokerenv-0.1.2.tar.gz (9.3 kB view hashes)

Uploaded Source

Built Distribution

pokerenv-0.1.2-py3-none-any.whl (10.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page