Skip to main content

An all-purpose pythonic genetic algorithm

Project description

NaturalSelection Logo of green flower

An all-purpose pythonic genetic algorithm, which also has built-in hyperparameter tuning support for neural networks.

Installation

$ pip install naturalselection

Usage

Here is a toy example optimising a pair of numbers with respect to division.

>>> import naturalselection as ns
>>>
>>> Pair = ns.Genus(x = range(1, 10000), y = range(1, 10000))
>>> def division(number):
...   return number.x / number.y
...
>>> pairs = ns.Population(genus = Pair, size = 100, fitness_fn = division)
>>> history = pairs.evolve(generations = 50, progress_bars = 1)
Evolving population: 100%|█████████████████████| 50/50 [00:09<00:00,  5.28it/s]
>>>
>>> history.fittest
{'genome': {'x': 9974, 'y': 4}, 'fitness': 2493.5}
>>>
>>> history.plot()

Plot showing fitness value over 50 generations. The mean rises from 0 to 1500 in the first five generations, whereafter it slowly increases to roughly 2200. The maximum value converges to around 25000 after seven generations, and the standard deviation stays at around 700 throughout.

We can also easily solve the classical OneMax problem, which is about getting the algorithm to find the bit-string of a given length with all 1's. Here we set the goal parameter in the evolve function to be 100, to allow for early stopping if we do reach our goal before the maximum number of generations, which we here set to 1000.

>>> import naturalselection as ns
>>>
>>> # Length of the bit strings
>>> N = 100
>>> BitString = ns.Genus(**{f'x{n}' : (0,1) for n in range(N)})
>>>
>>> def sum_bits(bitstring):
...   return sum(bitstring.get_genome().values())
>>>
>>> bitstrings = ns.Population(
...   genus = BitString,
...   size = 300,
...   fitness_fn = sum_bits
...   )
>>> 
>>> history = bitstrings.evolve(
...   generations = 1000,
...   goal = 100, 
...   progress_bars = 1
...   )
Evolving population: 46%|███████       | 460/1000 [2:09:49<2:16:59,  15.22s/it]
>>> 
>>> history.plot()

Plot showing fitness value over ? generations, converging steadily to the optimal filled out sequence of ones.

Lastly, here is an example of finding a vanilla feedforward neural network to model MNIST.

>>> import naturalselection as ns
>>> from tensorflow.keras.utils import to_categorical
>>> import mnist
>>>
>>> # Standard train and test sets for MNIST
>>> X_train = ((mnist.train_images() / 255) - 0.5).reshape((-1, 784))
>>> Y_train = to_categorical(mnist.train_labels())
>>> X_val = ((mnist.test_images() / 255) - 0.5).reshape((-1, 784))
>>> Y_val = to_categorical(mnist.test_labels())
>>>
>>> fitness_fn = ns.get_nn_fitness_fn(
...   train_val_sets = (X_train, Y_train, X_val, Y_val),
...   loss_fn = 'binary_crossentropy',
...   score = 'accuracy',
...   output_activation = 'softmax',
...   max_training_time = 60
...   )
>>>
>>> # The above fitness function will actually output 1 / (1 - accuracy) to
>>> # enable an unbounded range for which the algorithm performs better, so
>>> # below we set `post_fn` to be the inverse of this, to get the accuracy
>>> fnns = ns.Population(
...   genus = ns.FNN(),
...   size = 50,
...   fitness_fn = fitness_fn,
...   post_fn = lambda x: 1 - (1 / x)
...   )
>>> history = fnns.evolve(generations = 20)
Evolving population: 100%|██████████████████| 20/20 [3:05:08<00:00, 403.65s/it]
Computing fitness for gen 20: 100%|█████████████| 24/24 [6:01<00:00, 15.06s/it]
>>> 
>>> history.fittest
{'genome': {'optimizer': 'adam', 'hidden_activation': 'relu',
'batch_size': 1024, 'initializer': 'lecun_normal', 'input_dropout': 0.0,
'layer0': 0, 'dropout0': 0.1, 'layer1': 512, 'dropout1': 0.0, 'layer2': 128,
'dropout2': 0.0, 'layer3': 256, 'dropout3': 0.3, 'layer4': 128,
'dropout4': 0.0}, 'fitness': 0.9808}
>>> 
>>> history.plot(
...   title = "Average accuracy by generation",
...   ylabel = "Average accuracy"
...   )

Plot showing fitness value (which is accuracy in this case) over 20 generations. It converges to roughly 98% after 8 generations, and the maximum reaches that already from the first generation. The standard deviation also converges to almost zero.

As layer0 = 0 this of course means that the architecture here is [512, 128, 256, 128] with 10% input dropout and 30% dropout after the layer with 256 neurons, along with the adam optimiser, lecun_normal initialiser, relu activation for the hidden layers and a batch size of 1024. Note that this large batch size is encouraged by the fact that we set max_training_time = 60 as larger batch sizes tend to perform better on the short term.

Algorithmic details

The algorithm follows the standard blueprint for a genetic algorithm as e.g. described on this Wikipedia page, which roughly goes like this:

  1. An initial population is constructed
  2. Fitness values for all organisms in the population are computed
  3. A subset of the population (the elite pool) is selected
  4. A subset of the population (the breeding pool) is selected
  5. Pairs from the breeding pool are chosen, who will breed to create a new "child" organism with genome a combination of the "parent" organisms. Continue breeding until the the children and the elites constitute a population of the same size as the original
  6. A subset of the children (the mutation pool) is selected
  7. Every child in the mutation pool is mutated, meaning that they will have their genome altered in some way
  8. Go back to step 2

We now describe the individual steps in this particular implementation in more detail. Note that step 3 is sometimes left out completely, but since that just corresponds to an empty elite pool I decided to keep it in, for generality.

Step 1: Constructing the initial population

The population is a uniformly random sample of the possible genome values as dictated by the genus, which is run when a new Population object is created. Alternatively, you may set the initial_genome to a whatever genome you would like, which will make a completely homogenous population consisting only of organisms of this genome (mutations will create some diversity in each generation).

>>> pairs = ns.Population(
...   genus = Pair,
...   size = 100,
...   fitness_fn = division,
...   initial_genome = {'x' : 9750, 'y' : 15}
...   )
Evolving population: 100%|███████████████████| 100/100 [00:09<00:00,  5.28it/s]
>>> 
>>> self.fittest
{'genome' : {'x' : 9846, 'y' : 1}, 'fitness' : 9846.0}

Step 2: Compute fitness values

This happens in the get_fitness function which is called by the evolve function. These computations will by default be computed in parallel for each CPU core, so in the MNIST example above this will require 4-5gb RAM. Alternatively, the number of parallel computations can be explicitly set by setting workers to a small value, or disable the parallel computations completely by setting multiprocessing = False.

Steps 3 & 4: Selecting elite pool and breeding pool

These two pools are selected in exactly the same way, only differing in the amount of organisms in each pool, where the default elitism_rate is 5% and breeding_rate is 80%. In the pool selection it chooses the population based on the distribution with density function the fitness value divided by the sum of all fitness values of the population. This means that the higher fitness score an organism has, the more likely it is for it to be chosen to be a part of the pool. The precise implementation of this follows the algorithm specified on this Wikipedia page.

Step 5: Breeding

In this implementation the parent organisms are chosen uniformly at random, and when determining the value of the child's genome, every gene is a uniformly random choice between its parents' values for that particular gene.

Step 6: Selection of mutation pool

The mutation pool is chosen uniformly at random in contrast with the other two pools, as otherwise we would suddenly be more likely to "mutate away" many of the good genes of our fittest organisms. The default mutation_rate is 20%.

Step 7: Mutation

This implementation is roughly the bit string mutation, where every gene of the organism has a 1/n chance of being uniformly randomly replaced by another gene, with n being the number of genes in the organism's genome. This means that, on average, mutation causes one gene to be altered.

Possible future extensions

These are the ideas that I have thought of implementing in the future. Check the ongoing process on the dev branch.

  • Enable support for CNNs
  • Enable support for RNNs and LSTMs
  • Include an option to have dependency relations between genes. In a neural network setting this could include the topology as a gene on which all the layer-specific genes depend upon, which would be similar to the approach taken in this paper.

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

naturalselection-0.3.0.tar.gz (16.6 kB view details)

Uploaded Source

Built Distribution

naturalselection-0.3.0-py3-none-any.whl (14.5 kB view details)

Uploaded Python 3

File details

Details for the file naturalselection-0.3.0.tar.gz.

File metadata

  • Download URL: naturalselection-0.3.0.tar.gz
  • Upload date:
  • Size: 16.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.18.4 setuptools/41.0.1 requests-toolbelt/0.8.0 tqdm/4.19.5 CPython/3.6.8

File hashes

Hashes for naturalselection-0.3.0.tar.gz
Algorithm Hash digest
SHA256 8e9ea633ca90983ba4b0b598dc3db43423263f6185fa6f730419c37662972ba7
MD5 87a1a20ab5cc5373b4d11f22224243c7
BLAKE2b-256 f7b505196e30b2fc616cd90deca46c68b379e4599b59e1b2bda4d8cef296328c

See more details on using hashes here.

File details

Details for the file naturalselection-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: naturalselection-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 14.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.18.4 setuptools/41.0.1 requests-toolbelt/0.8.0 tqdm/4.19.5 CPython/3.6.8

File hashes

Hashes for naturalselection-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7b79e031c8e029bc26322cecaa12a27c9965ac14dcdab80b244be304af5e8c7e
MD5 0ab79a78b4f26d3d398da3678239d06a
BLAKE2b-256 7be980ca2f11d7d544e8ad405890b243685ab0cfcad492edacfe2cb4a588e45c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page