A flexible and efficient implementation of the Proximal Policy Optimization (PPO) algorithm for reinforcement learning.
Project description
nanoPPO
nanoPPO is a Python package that provides a simple and efficient implementation of the Proximal Policy Optimization (PPO) algorithm for reinforcement learning. It is designed to support both continuous and discrete action spaces, making it suitable for a wide range of applications.
Installation
You can install nanoPPO directly from PyPI using pip:
pip install nanoPPO
Alternatively, you can clone the repository and install from source:
git clone https://github.com/jamesliu/nanoPPO.git
cd nanoPPO
pip install .
Usage
Here are examples of how to use nanoPPO to train an agent.
On the MountaionCarContinuous-v0 environment:
from nanoppo.train_ppo_agent import train_agent
env_name = 'MountainCarContinuous-v0'
...
ppo, model_file, metrics_file = train_agent(env_name=env_name, max_episodes=max_episodes, policy_lr=policy_lr, value_lr=value_lr,
vl_coef=vl_coef,
checkpoint_dir=checkpoint_dir,
checkpoint_interval=checkpoint_interval, log_interval=log_interval,
wandb_log=wandb_log)
ppo.load(model_file)
print("Loaded best weights from", model_file)
metrics = pickle.load(open(metrics_file, 'rb'))
print("Loaded metrics from", metrics_file)
best_reward = metrics['best_reward']
episode = metrics['episode']
print("best_reward", best_reward, 'episode', episode)
On the CartPole-v1 environment:
from nanoppo.discrete_action_ppo import PPO
import gym
env = gym.make('CartPole-v1')
ppo = PPO(env.observation_space.shape[0], env.action_space.n)
# Training code here...
Examples
See the examples directory for more comprehensive usage examples.
examples/train_mountaincar.sh
python nanoppo/train_ppo_agent.py --env_name=MountainCarContinuous-v0 --policy_lr=0.0005 --value_lr=0.0005 --max_episodes=50 --vl_coef=0.5 --wandb_log
examples/train_pointmass1d.sh
examples/train_pointmass2d.sh
Documentation
Full documentation is available here.
Contributing
We welcome contributions to nanoPPO! If you're interested in contributing, please see our contribution guidelines and code of conduct.
License
nanoPPO is licensed under the Apache License 2.0. See the LICENSE file for more details.
Support
For support, questions, or feature requests, please open an issue on our GitHub repository or contact the maintainers.
Changelog
See the releases page for a detailed changelog of each version.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for nanoppo-0.13.post1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5f3fe98be976c5a4ea8395701a968c79432886a670431765afb1cb418847347b |
|
MD5 | 48c2826a4343993a84c51f6af3eadb91 |
|
BLAKE2b-256 | 9c1e06889e7b6df2387414c025e46e54aab5281b14414462d8ffe13c57b5aff3 |