Skip to main content

A no limit hold'em environment for training RL agents.

Project description

Pokerenv

Pokerenv is an openAI gym (https://gym.openai.com/docs/) compliant reinforcement learning environment for No Limit Texas Hold'em. It supports 2-6 player tables.

The environment can be configured to output hand history files, which can be viewed with any pokerstars compatible tracking software (holdem manager, pokertracker, etc.), allowing you to easily track the learning process.

The observation space and other details are described in the wiki (WIP): https://github.com/trouverun/pokerenv/wiki/

Installation

pip install pokerenv

Toy example

Define an agent

import numpy as np
from pokergym.table import Table
from pokergym.common import PlayerAction, Action


class ExampleRandomAgent:
    def __init__(self):
        self.actions = []
        self.observations = []
        self.rewards = []

    def get_action(self, observation):
        # Only save valid observations
        if not observation['info']['hand_is_over']:
            self.observations.append(observation)
            action_list = observation['info']['valid_actions']['actions_list']
            bet_range = observation['info']['valid_actions']['bet_range']
            chosen = PlayerAction(np.random.choice(action_list))
            betsize = 0
            if chosen is PlayerAction.BET:
                betsize = np.random.uniform(bet_range[0], bet_range[1])
            action = Action(chosen, betsize)
            self.actions.append(action)
        else:
            # Hand is over and we are only collecting final rewards, actions are ignored,
            # so send a dummy action without recording it
            action = Action(0, 0)
        return action

Create an environment

active_players = 6
agents = [ExampleRandomAgent() for _ in range(6)]
random_seed = 1
low_stack_bbs = 50
high_stack_bbs = 200
hh_location = 'hands/'
invalid_penalty = -15

table = Table(active_players, random_seed, low_stack_bbs, high_stack_bbs, hh_location, invalid_penalty, obs_format='dict')

Implement learning loop

iteration = 1
while True:
    if iteration == 50:
        table.hand_history_enabled = True
        iteration = 0
    # Set a random number of players each hand
    table.n_players = np.random.randint(2, 7)
    obs = table.reset()
    next_acting_player = obs['info']['next_player_to_act']
    while True:
        action = agents[next_acting_player].get_action(obs)
        obs, reward, finished = table.step(action)
        
        # Check if the reward corresponds to the previous action taken, 
        # or if it is a delayed reward given at the end of a game (should be added to latest reward)
        if not obs['info']['delayed_reward']:
            agents[next_acting_player].rewards.append(reward)
        else:
            agents[next_acting_player].rewards[-1] += reward
        
        if finished:
            break
        next_acting_player = obs['info']['next_player_to_act']
    iteration += 1
    
    table.hand_history_enabled = False
  

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pokerenv-0.1.0.tar.gz (9.2 kB view hashes)

Uploaded Source

Built Distribution

pokerenv-0.1.0-py3-none-any.whl (10.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page