A flexible and efficient implementation of the Proximal Policy Optimization (PPO) algorithm for reinforcement learning.

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

nanoPPO

nanoPPO is a Python package that provides a simple and efficient implementation of the Proximal Policy Optimization (PPO) algorithm for reinforcement learning. It is designed to support both continuous and discrete action spaces, making it suitable for a wide range of applications.

Installation

You can install nanoPPO directly from PyPI using pip:

pip install nanoPPO

Alternatively, you can clone the repository and install from source:

git clone https://github.com/jamesliu/nanoPPO.git
cd nanoPPO
pip install .

Usage

Here are examples of how to use nanoPPO to train an agent.

On the MountaionCarContinuous-v0 environment:

    from nanoppo.train_ppo_agent import train_agent
    env_name = 'MountainCarContinuous-v0'
    ...
    ppo, model_file, metrics_file = train_agent(env_name=env_name, max_episodes=max_episodes, policy_lr=policy_lr, value_lr=value_lr,
                                                vl_coef=vl_coef,
                                                checkpoint_dir=checkpoint_dir, 
                                                checkpoint_interval=checkpoint_interval, log_interval=log_interval, 
                                                wandb_log=wandb_log)
    ppo.load(model_file)
    print("Loaded best weights from", model_file)
    metrics = pickle.load(open(metrics_file, 'rb'))
    print("Loaded metrics from", metrics_file)
    best_reward = metrics['best_reward']
    episode = metrics['episode']
    print("best_reward", best_reward, 'episode', episode)

On the CartPole-v1 environment:

from nanoppo.discrete_action_ppo import PPO
import gym

env = gym.make('CartPole-v1')
ppo = PPO(env.observation_space.shape[0], env.action_space.n)

# Training code here...

Examples

See the examples directory for more comprehensive usage examples.

examples/train_mountaincar.sh

python nanoppo/train_ppo_agent.py --env_name=MountainCarContinuous-v0 --policy_lr=0.0005 --value_lr=0.0005 --max_episodes=50 --vl_coef=0.5 --wandb_log

mountaincar

examples/train_pointmass1d.sh

python nanoppo/train_ppo_agent.py --env_name=PointMass1D-v0 --policy_lr=0.0005 --value_lr=0.0005 --max_episodes=50 --vl_coef=0.5 --wandb_log

examples/train_pointmass2d.sh

python nanoppo/train_ppo_agent.py --env_name=PointMass2D-v0 --policy_lr=0.0005 --value_lr=0.0005 --max_episodes=50 --vl_coef=0.5 --wandb_log

Documentation

Full documentation is available here.

Contributing

We welcome contributions to nanoPPO! If you're interested in contributing, please see our contribution guidelines and code of conduct.

License

nanoPPO is licensed under the Apache License 2.0. See the LICENSE file for more details.

Support

For support, questions, or feature requests, please open an issue on our GitHub repository or contact the maintainers.

Changelog

See the releases page for a detailed changelog of each version.

Project details

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.15.post2

Nov 28, 2023

0.15.post1

Nov 6, 2023

0.15

Nov 6, 2023

0.14

Oct 8, 2023

This version

0.13.post2

Sep 19, 2023

0.13.post1

Sep 19, 2023

0.13

Sep 19, 2023

0.1.post1

Aug 21, 2023

0.1

Aug 21, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nanoppo-0.13.post2.tar.gz (10.9 MB view hashes)

Uploaded Sep 19, 2023 Source

Built Distribution

nanoppo-0.13.post2-py2.py3-none-any.whl (11.2 MB view hashes)

Uploaded Sep 19, 2023 Python 2 Python 3

Hashes for nanoppo-0.13.post2.tar.gz

Hashes for nanoppo-0.13.post2.tar.gz
Algorithm	Hash digest
SHA256	`93fa72c3673e9c0b33fe77a14b5fb4f69c32068cc83892f18cf55178f404f55f`
MD5	`5a98066620c46d9d7e17c0c80975cbe6`
BLAKE2b-256	`3e84b116a7fe902957ed85aff801b59b2c49c58b06b573345d18c9e8cadfbb63`

Hashes for nanoppo-0.13.post2-py2.py3-none-any.whl

Hashes for nanoppo-0.13.post2-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`3ab56cdb1c01561fc04a8ef3235af7bf4e3c31afe93e97eca8073a8a3da3ec1d`
MD5	`b82bb0526338bdcf50192bd7a48f5bb7`
BLAKE2b-256	`ef8655aebfcb1b992365f32af17cfb92d48af78a2da88aa7cd294a09e06e6061`