Baseline implementation of MuZero agent

These details have not been verified by PyPI

Project links

Homepage

Project description

supported platforms supported python versions dependencies status license MIT

MuZero General

A commented and documented implementation of MuZero based on the Google DeepMind paper (Nov 2019) and the associated pseudocode. It is designed to be easily adaptable for every games or reinforcement learning environments (like gym). You only need to add a game file with the hyperparameters and the game class. Please refer to the documentation and the example.

MuZero is a state of the art RL algorithm for board games (Chess, Go, ...) and Atari games. It is the successor to AlphaZero but without any knowledge of the environment underlying dynamics. MuZero learns a model of the environment and uses an internal representation that contains only the useful information for predicting the reward, value, policy and transitions. MuZero is also close to Value prediction networks. See How it works.

Disclaimer

This repository is fork of base MuZero implementation. Main target of fork allow higher customiztion and simple usage as library, more simular to OpenAI stable-baseelines.

Getting started

Installation

pip install muzero-baseline

Preapare game and configuration

from muzero_baseline.games.abstract_game import AbstractGame

# Create config for agent and network

class MuZeroConfig:
  def __init__(self): 
    self.seed = 0  # Seed for numpy, torch and the game
    self.max_num_gpus = None  # Fix the maximum number of GPUs to use. It's usually faster to use a single GPU (set it to 1) if it has enough memory. None will use every GPUs available

    ### Game
    self.observation_shape = (1, 1, 4)  # Dimensions of the game observation, must be 3D (channel, height, width). For a 1D array, please reshape it to (1, 1, length of array)
    self.action_space = list(range(2))  # Fixed list of all possible actions. You should only edit the length
    self.players = list(range(1))  # List of players. You should only edit the length
    self.stacked_observations = 0  # Number of previous observations and previous actions to add to the current observation

    # ...

class Game(AbstractGame):
    """
    Game wrapper.
    """

    def __init__(self, seed = None):

        self.env = gym.make("CartPole-v1")

        if seed is not None:
            self.env.seed(seed)

    # ...

More examples of configs and games can be found in games folder, you can adapt them for you needs.

More information is also available in wiki.

Initialize MuZero instance

from muzero_baseline.muzero import MuZero

# Initialize config
config = MuZeroConfig()
# Game object will be initialized in each thread separetly
mz = MuZero(TraidingGame, config)

Train agent

mz.train()

During training agent will save metrics and chekpoints of netowork and replay buffer in results folder.

Metrics can accessed though tensorboard

%load_ext tensorboard
%tensorboard --logdir ./results

Test agent

mz.test()

For test in same thread

mz.test_direct()

Load existing model

mz.load_model(
    checkpoint_path = 'results/2021-07-15--16-06-15/model.checkpoint', 
    replay_buffer_path = 'results/2021-07-15--16-06-15/replay_buffer.pkl'
)

Features

Residual Network and Fully connected network in PyTorch
Multi-Threaded/Asynchronous/Cluster with Ray
Multi GPU support for the training and the selfplay
TensorBoard real-time monitoring
Model weights automatically saved at checkpoints
Single and two player mode
Commented and documented
Easily adaptable for new games
Examples of board games, Gym and Atari games (See list of implemented games)
Pretrained weights available
Windows support (Experimental / Workaround: Use the notebook in Google Colab)

Further improvements

These improvements are active research, they are personal ideas and go beyond MuZero paper. We are open to contributions and other ideas.

Hyperparameter search
Continuous action space
Tool to understand the learned model
Support of stochastic environments
Support of more than two player games
RL tricks (Never Give Up, Adaptive Exploration, ...)

Demo

All performances are tracked and displayed in real time in TensorBoard :

cartpole training summary

Testing Lunar Lander :

lunarlander training preview

Games already implemented

Cartpole (Tested with the fully connected network)
Lunar Lander (Tested in deterministic mode with the fully connected network)
Gridworld (Tested with the fully connected network)
Tic-tac-toe (Tested with the fully connected network and the residual network)
Connect4 (Slightly tested with the residual network)
Gomoku
Twenty-One / Blackjack (Tested with the residual network)
Atari Breakout

Tests are done on Ubuntu with 16 GB RAM / Intel i7 / GTX 1050Ti Max-Q. We make sure to obtain a progression and a level which ensures that it has learned. But we do not systematically reach a human level. For certain environments, we notice a regression after a certain time. The proposed configurations are certainly not optimal and we do not focus for now on the optimization of hyperparameters. Any help is welcome.

Code structure

code structure

Network summary:

Authors

Werner Duvaud
Aurèle Hainaut
Paul Lenoir
Contributors

Please use this bibtex if you want to cite this repository (master branch) in your publications:

@misc{muzero-general,
  author       = {Werner Duvaud, Aurèle Hainaut},
  title        = {MuZero General: Open Reimplementation of MuZero},
  year         = {2019},
  publisher    = {GitHub},
  journal      = {GitHub repository},
  howpublished = {\url{https://github.com/werner-duvaud/muzero-general}},
}

Getting involved

GitHub Issues: For reporting bugs.
Pull Requests: For submitting code contributions.
Discord server: For discussions about development or any general questions.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.4.0

Jul 23, 2021

0.3.0

Jul 22, 2021

This version

0.2.0

Jul 17, 2021

0.1.2

Jul 16, 2021

0.1.1

Jul 16, 2021

0.1.0

Jul 16, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

muzero-baseline-0.2.0.tar.gz (44.8 kB view details)

Uploaded Jul 17, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

muzero_baseline-0.2.0-py3-none-any.whl (79.0 kB view details)

Uploaded Jul 17, 2021 Python 3

File details

Details for the file muzero-baseline-0.2.0.tar.gz.

File metadata

Download URL: muzero-baseline-0.2.0.tar.gz
Upload date: Jul 17, 2021
Size: 44.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.7.10

File hashes

Hashes for muzero-baseline-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`64ad354770e3d488c4c142dbfaacc3cea747697bb566f22c71e3864f04cf0925`
MD5	`a3ed62093fa87e7d6e9208314291e742`
BLAKE2b-256	`2b0d7e776f6c064dce96d75cbaff392d9caf7670c5c3614084d0723ed62940b4`

See more details on using hashes here.

File details

Details for the file muzero_baseline-0.2.0-py3-none-any.whl.

File metadata

Download URL: muzero_baseline-0.2.0-py3-none-any.whl
Upload date: Jul 17, 2021
Size: 79.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.7.10

File hashes

Hashes for muzero_baseline-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3ec52b509f5593a8cb296113a563e547242a804c0d1e4692373bde5bcaafe46b`
MD5	`11e1fcc753584655b8f7d6c3eb8262ea`
BLAKE2b-256	`328c9d52608f5907cda81927539788f0ac3465d8c38292bc9befff5722f220e4`

See more details on using hashes here.

muzero-baseline 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

MuZero General

Disclaimer

Getting started

Installation

Preapare game and configuration

Initialize MuZero instance

Train agent

Metrics can accessed though tensorboard

Test agent

Load existing model

Features

Further improvements

Demo

Games already implemented

Code structure

Authors

Getting involved

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes