Skip to main content

A Toolkit for Reinforcement Learning in Card Games

Project description

RLCard: A Toolkit for Reinforcement Learning in Card Games

Logo

Build Status Codacy Badge Coverage Status

RLCard is a toolkit for Reinforcement Learning (RL) in card games. It supports multiple card environments with easy-to-use interfaces. The goal of RLCard is to bridge reinforcement learning and imperfect information games, and push forward the research of reinforcement learning in domains with multiple agents, large state and action space, and sparse reward. RLCard is developed by DATA Lab at Texas A&M University.

News:

  • New game Gin Rummy available. Thanks for the contribution of @billh0420.
  • PyTorch implementation available. Thanks for the contribution of @mjudell.
  • We have just initialized a list of Awesome-Game-AI resources. Check it out!

Cite this work

If you find this repo useful, you may cite:

@article{zha2019rlcard,
  title={RLCard: A Toolkit for Reinforcement Learning in Card Games},
  author={Zha, Daochen and Lai, Kwei-Herng and Cao, Yuanpu and Huang, Songyi and Wei, Ruzhe and Guo, Junyu and Hu, Xia},
  journal={arXiv preprint arXiv:1910.04376},
  year={2019}
}

Installation

Make sure that you have Python 3.5+ and pip installed. We recommend installing rlcard with pip as follow:

git clone https://github.com/datamllab/rlcard.git
cd rlcard
pip install -e .

or use PyPI with:

pip install rlcard

To use tensorflow implementation, run:

pip install rlcard[tensorflow]

To try out PyTorch implementation for DQN and NFSP, please run:

pip install rlcard[torch]

If you meet any problems when installing PyTorch with the command above, you may follow the instructions on PyTorch official website to manually install PyTorch.

Examples

Please refer to examples/. A short example is as below.

import rlcard
from rlcard.agents.random_agent import RandomAgent

env = rlcard.make('blackjack')
env.set_agents([RandomAgent(action_num=env.action_num)])

trajectories, payoffs = env.run()

We also recommend the following toy examples.

Demo

Run examples/leduc_holdem_human.py to play with the pre-trained Leduc Hold'em model. Leduc Hold'em is a simplified version of Texas Hold'em. Rules can be found here.

>> Leduc Hold'em pre-trained model

>> Start a new game!
>> Agent 1 chooses raise

=============== Community Card ===============
┌─────────┐
│░░░░░░░░░│
│░░░░░░░░░│
│░░░░░░░░░│
│░░░░░░░░░│
│░░░░░░░░░│
│░░░░░░░░░│
│░░░░░░░░░│
└─────────┘
===============   Your Hand    ===============
┌─────────┐
│J        │
│         │
│         │
│    ♥    │
│         │
│         │
│        J│
└─────────┘
===============     Chips      ===============
Yours:   +
Agent 1: +++
=========== Actions You Can Choose ===========
0: call, 1: raise, 2: fold

>> You choose action (integer):

Documents

Please refer to the Documents for general introductions. API documents are available at our website.

Available Environments

We provide a complexity estimation for the games on several aspects. InfoSet Number: the number of information sets; InfoSet Size: the average number of states in a single information set; Action Size: the size of the action space. Name: the name that should be passed to rlcard.make to create the game environment. We also provide the link to the documentation and the random example.

Game InfoSet Number InfoSet Size Action Size Name Usage
Blackjack (wiki, baike) 10^3 10^1 10^0 blackjack doc, example
Leduc Hold’em (paper) 10^2 10^2 10^0 leduc-holdem doc, example
Limit Texas Hold'em (wiki, baike) 10^14 10^3 10^0 limit-holdem doc, example
Dou Dizhu (wiki, baike) 10^53 ~ 10^83 10^23 10^4 doudizhu doc, example
Simple Dou Dizhu (wiki, baike) - - - simple-doudizhu doc, example
Mahjong (wiki, baike) 10^121 10^48 10^2 mahjong doc, example
No-limit Texas Hold'em (wiki, baike) 10^162 10^3 10^4 no-limit-holdem doc, example
UNO (wiki, baike) 10^163 10^10 10^1 uno doc, example
Gin Rummy (wiki, baike) - - - gin-rummy doc, example

Evaluation

The perfomance is measured by winning rates through tournaments. Example outputs are as follows: Learning Curves

Library Structure

The purposes of the main modules are listed as below:

API Cheat Sheet

  • rlcard.make(env_id, config={}): Make an environment. env_id is a string of a environment; config is a dictionary specifying some environment configurations, which are as follows.
    • allow_step_back: Defualt False. True if allowing step_back function to traverse backward in the tree.
    • allow_raw_data: Default False. True if allowing raw data in the state.
    • single_agent_mode: Default False. True if using single agent mode, i.e., Gym style interface with other players as pretrained/rule models.
    • active_player: Defualt 0. If single_agent_mode is True, active_player will specify operating on which player in single agent mode.
    • record_action: Default False. If True, a field of action_record will be in the state to record the historical actions. This may be used for human-agent play.
  • env.init_game(): Initialize a game. Return the state and the first player ID.
  • env.step(action, raw_action=False): Take one step in the environment. action can be raw action or integer; raw_action should be True if the action is raw action (string).
  • env.step_back(): Available only when allow_step_back is True. Take one step backward. This can be used for algorithms that operate on the game tree, such as CFR.
  • env.get_payoffs(): In the end of the game, return a list of payoffs for all the players.
  • env.get_perfect_information(): (Currently only support some of the games) Obtain the perfect information at the current state.
  • env.set_agents(agents): agents is a list of Agent object. The length of the the list should equal to the number of the player in the game.
  • env.run(is_training=False): Run a complete game and return trajectories and payoffs. The function can be used after the set_agents is called. If is_training is True, the function will use step function in the agent to play the game. If is_training is False, eval_step will be called instead.
  • State Definition: State will always have observation state['obs'] and legal actions state['legal_actions']. If allow_raw_data is True, state will have raw observation state['raw_obs'] and raw legal actions state['raw_legal_actions'].

For basic usage, env.set_agents and env.run() are a good chioce. For advanced useage, you may also play the game step be step with env.init_game() and env.step().

Contributing

Contribution to this project is greatly appreciated! Please create an issue for feedbacks/bugs. If you want to contribute codes, please refer to Contributing Guide.

Acknowledgements

We would like to thank JJ World Network Technology Co.,LTD for the generous support.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rlcard-0.1.15.tar.gz (2.1 MB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page