Skip to main content

Python Implementation of Parameter-exploring Policy Gradients Evolution Strategy

Project description

Parameter-exploring Policy Gradients

Python Implementation of Parameter-exploring Policy Gradients [3] Evolution Strategy

Requirements

  • Python >= 3.6
  • Numpy

Optional

  • gym

Install

  • From PyPI
pip3 install pepg-es
  • From Source
git clone https://github.com/goktug97/PEPG-ES
cd PEPG-ES
python3 setup.py install --user

About Implementation

I implemented several things differently from the original paper;

  • Applied rank transformation [1] to the fitness scores.
  • Used Adam [2] optimizer to update the mean.
  • Weight decay is applied to the mean, similar to [4].

Usage

Refer to PEPG-ES/examples folder for more complete examples.

XOR Example

  • Find Neural Network parameters for XOR Gate.
  • Black-box optimization algorithms like PEPG are competitive in the area of reinforcement learning because they don't require backpropagation to calculate the gradients. In supervised learning using backpropagation is faster and more reliable. Thus, using backpropagation to solve the XOR problem would be faster. I demonstrated library by solving XOR becuase it was easy and understandable.
from pepg import PEPG, NeuralNetwork, Adam

import numpy as np

network = NeuralNetwork(input_size = 2, output_size = 1, hidden_sizes = [2],
                        hidden_activation = sigmoid,
                        output_activation = sigmoid)

# Adam Optimizer is the default optimizer, it is written for the example
# mu_lr is passed to the optimizer as the learning rate.
optimizer_kwargs = {'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08} # Adam Parameters

es = PEPG(population_size = 100, theta_size = network.number_of_parameters,
          mu_init = 0, sigma_init = 2.0,
          mu_lr = 0.3, sigma_lr = 0.2, optimizer = Adam,
          optimizer_kwargs = optimizer_kwargs)

truth_table = [[0, 1],[1, 0]]
solution_found = False

while True:
    print(f'Step: {es.step}')
    solutions = es.get_parameters()
    rewards = []
    for solution in solutions:
        network.weights = solution
        error = 0
        for input_1 in range(len(truth_table)):
            for input_2 in range(len(truth_table[0])):
                output = int(round(network([input_1, input_2])[0]))
                error += abs(truth_table[input_1][input_2] - output)
        reward = (4 - error) ** 2
        rewards.append(reward)
    es.update(rewards)
    if es.best_fitness == 16:
        print('Solution Found')
        print(f'Parameters: {es.best_theta}')
        break
  • Output:
Step: 233
Step: 234
Step: 235
Step: 236
Step: 237
Solution Found
Parameters: [ 1.25863047 -0.73151503 -2.53377723  1.01802355  3.02723507  1.23112726
 -2.00288859 -3.66789242  4.56593794]

Documentation

PEPG Class

es = PEPG(self, population_size, theta_size,
          mu_init, sigma_init, mu_lr,
          sigma_lr, l2_coeff = 0.005,
          optimizer = Adam, optimizer_kwargs = {})
  • Parameters:
    • population_size: int: Population size of the evolution strategy.
    • theta_size int: Number of parameters that will be optimized.
    • mu_init float: Initial mean.
    • sigma_init float: Initial sigma.
    • mu_lr float: Learning rate for the mean.
    • sigma_lr float: Learning rate for the sigma.
    • l2_coeff float: Weight decay coefficient.
    • optimizer Optimizer: Optimizer to use
    • optimizer_kwargs Dict[str, Any]: Parameters for optimizer except learning rate.

solutions = self.get_parameters(self)
  • Creates symmetric samples around the mean and returns a numpy array with the size of [population_size, theta_size]

self.update(self, rewards)
  • Parameters:
    • rewards: List[float]: Rewards for the given solutions.
  • Update the mean and the sigma.

self.save_checkpoint(self)
  • Creates a checkpoint and save it into created time.time().checkpoint file.

es = PEPG.load_checkpoint(cls, filename)
  • Creates new PEPG class and loads the checkpoint.

self.save_best(self, filename)
  • Saves the best theta and the mu and the sigma that used to create the best theta.

theta, mu, sigma = PEPG.load_best(cls, filename)
  • Load the theta, the mu, and the sigma arrays from the given file.

NeuralNetwork Class

NeuralNetwork(self, input_size, output_size, hidden_sizes = [],
              hidden_activation = lambda x: x,
              output_activation = lambda x: x,
              bias = True):
  • Parameters:
    • input_size: int: Input size of network.
    • output_size: int: Output size of the network.
    • hidden_sizes: List[int]: Sizes for the hidden layers.
    • hidden_activation: Callable[[float], float]: Activation function used in hidden layers.
    • output_activation: Callable[[float], float]: Activation function used at the output.
    • bias: bool: Add bias node.

Custom Optimizer Example

from pepg import PEPG, Optimizer, NeuralNetwork

class CustomOptimizer(Optimizer):
    def __init__(self, alpha, parameter, another_parameter):
        self.alpha = alpha
        self.parameter = parameter
        self.another_parameter = another_parameter

    def __call__(self, gradients):
        gradients = (gradients + self.parameter) * self.another_parameter
        return -self.alpha * gradients

network = NeuralNetwork(input_size = 2, output_size = 1)

optimizer_kwargs = {'parameter': 0.3, 'another_parameter': 0.2}
es = PEPG(population_size = 100, theta_size = network.number_of_parameters,
          mu_init = 0.0, sigma_init = 2.0,
          mu_lr = 0.3, sigma_lr = 0.2, optimizer = CustomOptimizer,
          optimizer_kwargs = optimizer_kwargs)

References

  1. Daan Wierstra, Tom Schaul, Tobias Glasmachers, Yi Sun, Jan Peters and Jurgen Schmidhuber. Natural Evolution Strategies. 2014
  2. Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. 2014
  3. F. Sehnke, C. Osendorfer, T. Ruckstiess, A. Graves, J. Peters and J. Schmidhuber. Parameter-exploring policy gradients. 2010
  4. Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor and Ilya Sutskever. Evolution Strategies as a Scalable Alternative to Reinforcement Learning. 2017

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pepg-es-0.0.1.tar.gz (7.5 kB view details)

Uploaded Source

Built Distributions

pepg_es-0.0.1-py3.8.egg (11.0 kB view details)

Uploaded Egg

pepg_es-0.0.1-py3.6.egg (11.4 kB view details)

Uploaded Egg

File details

Details for the file pepg-es-0.0.1.tar.gz.

File metadata

  • Download URL: pepg-es-0.0.1.tar.gz
  • Upload date:
  • Size: 7.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.44.1 CPython/3.8.2

File hashes

Hashes for pepg-es-0.0.1.tar.gz
Algorithm Hash digest
SHA256 ee5cbf0dcb2960f803cb0a5ed333e37119e3ca4f268a6167ef992a01b367615e
MD5 cf7239a0ba407e9324c2cb0b315befa6
BLAKE2b-256 c810c36b33bea870d9173cf8d3b0c1ecd0587bf191ee2b158bc7f42926e492ab

See more details on using hashes here.

File details

Details for the file pepg_es-0.0.1-py3.8.egg.

File metadata

  • Download URL: pepg_es-0.0.1-py3.8.egg
  • Upload date:
  • Size: 11.0 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.44.1 CPython/3.8.2

File hashes

Hashes for pepg_es-0.0.1-py3.8.egg
Algorithm Hash digest
SHA256 db252b1a75441e62636d3bcd615b4d19526e3fe1b205a086da512a10c00c3be6
MD5 cb2912a5c045385315905356b946d946
BLAKE2b-256 a8ce082b86a7c09b4a6637b5b1fa300422136737c1a5759832bd9ce7de84cf47

See more details on using hashes here.

File details

Details for the file pepg_es-0.0.1-py3.6.egg.

File metadata

  • Download URL: pepg_es-0.0.1-py3.6.egg
  • Upload date:
  • Size: 11.4 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.44.1 CPython/3.8.2

File hashes

Hashes for pepg_es-0.0.1-py3.6.egg
Algorithm Hash digest
SHA256 58ccede8652b6a61fe5ac51e0cb8b9425fbc5374b08eac5816f337c0bfb6a5dc
MD5 8f2ee43cd6bb561b51c564cb60036d11
BLAKE2b-256 13a00759fe78ca9f1bdecb384b4d9bf1bb2358f46cc458cb631f287000590a51

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page