Python Implementation of Parameter-exploring Policy Gradients Evolution Strategy

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- POSIX :: Linux
Programming Language
- Python :: 3

Project description

Parameter-exploring Policy Gradients

Python Implementation of Parameter-exploring Policy Gradients [3] Evolution Strategy

Requirements

Python >= 3.6
Numpy

Optional

gym
mpi4py

Install

From PyPI

pip3 install pepg-es

From Source

git clone https://github.com/goktug97/PEPG-ES
cd PEPG-ES
python3 setup.py install --user

About Implementation

I implemented several things differently from the original paper;

Applied rank transformation [1] to the fitness scores.
Used Adam [2] optimizer to update the mean.
Weight decay is applied to the mean, similar to [4].

Usage

Refer to PEPG-ES/examples folder for more complete examples.

XOR Example

Find Neural Network parameters for XOR Gate.
Black-box optimization algorithms like PEPG are competitive in the area of reinforcement learning because they don't require backpropagation to calculate the gradients. In supervised learning using backpropagation is faster and more reliable. Thus, using backpropagation to solve the XOR problem would be faster. I demonstrated library by solving XOR because it was easy and understandable.

from pepg import PEPG, NeuralNetwork, Adam, sigmoid

import numpy as np


network = NeuralNetwork(input_size = 2, output_size = 1, hidden_sizes = [2],
                        hidden_activation = sigmoid,
                        output_activation = sigmoid)

# Adam Optimizer is the default optimizer, it is written for the example
optimizer_kwargs = {'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08} # Adam Parameters

es = PEPG(population_size = 100, theta_size = network.number_of_parameters,
          mu_init = 0, sigma_init = 2.0,
          mu_lr = 0.3, sigma_lr = 0.2, optimizer = Adam,
          optimizer_kwargs = optimizer_kwargs)

truth_table = [[0, 1],[1, 0]]
solution_found = False

while True:
    print(f'Step: {es.step}')
    solutions = es.get_parameters()
    rewards = []
    for solution in solutions:
        network.weights = solution
        error = 0
        for input_1 in range(len(truth_table)):
            for input_2 in range(len(truth_table[0])):
                output = int(round(network([input_1, input_2])[0]))
                error += abs(truth_table[input_1][input_2] - output)
        reward = (4 - error) ** 2
        rewards.append(reward)
    es.update(rewards)
    if es.best_fitness == 16:
        print('Solution Found')
        print(f'Parameters: {es.best_theta}')
        break

Output:

Step: 233
Step: 234
Step: 235
Step: 236
Step: 237
Solution Found
Parameters: [ 1.25863047 -0.73151503 -2.53377723  1.01802355  3.02723507  1.23112726
 -2.00288859 -3.66789242  4.56593794]

Documentation

PEPG Class

es = PEPG(self, population_size, theta_size,
          mu_init, sigma_init, mu_lr,
          sigma_lr, l2_coeff = 0.005,
          optimizer = Adam, optimizer_kwargs = {})

Parameters:
- population_size: int: Population size of the evolution strategy.
- theta_size int: Number of parameters that will be optimized.
- mu_init float: Initial mean.
- sigma_init float: Initial sigma.
- mu_lr float: Learning rate for the mean.
- sigma_lr float: Learning rate for the sigma.
- l2_coeff float: Weight decay coefficient.
- optimizer Optimizer: Optimizer to use
- optimizer_kwargs Dict[str, Any]: Parameters for optimizer except learning rate.

solutions = self.get_parameters(self)

Creates symmetric samples around the mean and returns a numpy array with the size of [population_size, theta_size]

self.update(self, rewards)

Parameters:
- rewards: List[float]: Rewards for the given solutions.

Update the mean and the sigma.

self.save_checkpoint(self)

Creates a checkpoint and save it into created time.time().checkpoint file.

es = PEPG.load_checkpoint(cls, filename)

Creates a new PEPG class and loads the checkpoint.

self.save_best(self, filename)

Saves the best theta and the mu and the sigma that used to create the best theta.

theta, mu, sigma = PEPG.load_best(cls, filename)

Load the theta, the mu, and the sigma arrays from the given file.

NeuralNetwork Class

NeuralNetwork(self, input_size, output_size, hidden_sizes = [],
              hidden_activation = lambda x: x,
              output_activation = lambda x: x,
              bias = True):

Parameters:
- input_size: int: Input size of network.
- output_size: int: Output size of the network.
- hidden_sizes: List[int]: Sizes for the hidden layers.
- hidden_activation: Callable[[float], float]: Activation function used in hidden layers.
- output_activation: Callable[[float], float]: Activation function used at the output.
- bias: bool: Add bias node.

self.save_network(self, filename)

Save the network to a file.

network = NeuralNetwork.load_network(cls, filename)

Creates a new NeuralNetwork class and loads the given network file.

Custom Optimizer Example

from pepg import PEPG, Optimizer, NeuralNetwork

class CustomOptimizer(Optimizer):
    def __init__(self, alpha, parameter, another_parameter):
        self.alpha = alpha
        self.parameter = parameter
        self.another_parameter = another_parameter

    def __call__(self, gradients):
        gradients = (gradients + self.parameter) * self.another_parameter
        return -self.alpha * gradients

network = NeuralNetwork(input_size = 2, output_size = 1)

optimizer_kwargs = {'parameter': 0.3, 'another_parameter': 0.2}
es = PEPG(population_size = 100, theta_size = network.number_of_parameters,
          mu_init = 0.0, sigma_init = 2.0,
          mu_lr = 0.3, sigma_lr = 0.2, optimizer = CustomOptimizer,
          optimizer_kwargs = optimizer_kwargs)

References

Daan Wierstra, Tom Schaul, Tobias Glasmachers, Yi Sun, Jan Peters and Jurgen Schmidhuber. Natural Evolution Strategies. 2014
Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. 2014
F. Sehnke, C. Osendorfer, T. Ruckstiess, A. Graves, J. Peters and J. Schmidhuber. Parameter-exploring policy gradients. 2010
Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor and Ilya Sutskever. Evolution Strategies as a Scalable Alternative to Reinforcement Learning. 2017

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- POSIX :: Linux
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.0.5

Mar 31, 2020

0.0.4

Mar 31, 2020

0.0.3

Mar 31, 2020

0.0.2

Mar 31, 2020

0.0.1

Mar 30, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pepg-es-0.0.5.tar.gz (8.0 kB view details)

Uploaded May 10, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pepg_es-0.0.5-py3.6.egg (12.1 kB view details)

Uploaded Mar 31, 2020 Egg

File details

Details for the file pepg-es-0.0.5.tar.gz.

File metadata

Download URL: pepg-es-0.0.5.tar.gz
Upload date: May 10, 2020
Size: 8.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.44.1 CPython/3.8.2

File hashes

Hashes for pepg-es-0.0.5.tar.gz
Algorithm	Hash digest
SHA256	`f400494d230e1d5cbc233c01ef77136a4439f05611df2119a8de820c1fd17526`
MD5	`77f56f0a89df40391b3b66e4c2724ba3`
BLAKE2b-256	`8c31a3600bb4e7acf57d8eb19a80c0e707e08769d46352aecdcbc63645bcd332`

See more details on using hashes here.

File details

Details for the file pepg_es-0.0.5-py3.6.egg.

File metadata

Download URL: pepg_es-0.0.5-py3.6.egg
Upload date: Mar 31, 2020
Size: 12.1 kB
Tags: Egg
Uploaded using Trusted Publishing? No
Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.44.1 CPython/3.8.2

File hashes

Hashes for pepg_es-0.0.5-py3.6.egg
Algorithm	Hash digest
SHA256	`b1a8b198f5aa12b096deb16b3efc6c6cf4bd6a367f8db03e71a30f723232fca8`
MD5	`48cc93926bf58ad4aa6ef58b17217d8b`
BLAKE2b-256	`f4631743dbc56bbadc26f8087498dceeb779499eb5107835da63fcdd2458f138`

See more details on using hashes here.

pepg-es 0.0.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Parameter-exploring Policy Gradients

Requirements

Optional

Install

About Implementation

Usage

XOR Example

Documentation

PEPG Class

NeuralNetwork Class

Custom Optimizer Example

References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes