Skip to main content

Evolving Directed Acyclic Graph

Project description

Build Status

PyPI version

Coverage Status

EvoDAG

Evolving Directed Acyclic Graph (EvoDAG) is a steady-state Genetic Programming system with tournament selection. The main characteristic of EvoDAG is that the genetic operation is performed at the root. EvoDAG was inspired by the geometric semantic crossover proposed by Alberto Moraglio et al. and the implementation performed by Leonardo Vanneschi et al.

Example using command line

Let us assume one wants to create a classifier of iris dataset. The first step is to download the dataset from the UCI Machine Learning Repository

curl -O https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data

Random search on the EvoDAG’s parameters space

The recommended first step is to optimize the parameters used by EvoDAG. In order to free from this task, EvoDAG can perform a random search on the parameter space and select the best configuration found in this random search. This can be performed as follows:

EvoDAG-params -P params.evodag -r 8 -u 4 iris.data

where -P indicates the file name where the parameters sampled are stored, -r specifies the number of samples, -u indicates the number of cpu cores, and iris.data is the dataset.

params.evodag looks like:

[
    {
        "Add": 30,
        "Cos": false,
        "Div": true,
        "Exp": true,
        "Fabs": true,
        "If": true,
        "Ln": true,
        "Max": 5,
        "Min": 30,
        "Mul": 0,
        "Sigmoid": true,
        "Sin": false,
        "Sq": true,
        "Sqrt": true,
        "classifier": true,
        "early_stopping_rounds": 2000,
        "fitness": [
            0.0,
            0.0,
            0.0
        ],
        "popsize": 500,
        "seed": 0,
        "unique_individuals": true
    },
...

where fitness is the balance error rate on a validation set randomly taken from the training set, in this case the 20% of the iris.data.

Training EvoDAG

At this point we are in the position to train a model. Let us assume one would like to create an ensemble of 10 classifiers on iris.data. Then the following command would do the job:

EvoDAG-train -P params.evodag -m model.evodag -n 10 -u 4 iris.data

where -m specifies the file name used to store the model, -n is the size of the ensemble, -P receives EvoDAG’s parameters, -u is the number of cpu cores, and iris.data is the dataset.

Predict using EvoDAG model

At this point, EvoDAG has been trained and the model is store in model.evodag, the next step is to use this model to predict some unknown data. Given that we did not split iris.data in a training and test set let us assume that iris.data contains the unknown data. This can be achieved as follows:

EvoDAG-predict -m model.evodag -o iris.predicted iris.data

where -o indicates the file name used to store the predictions, -m contains the model, and iris.data has the test set.

iris.predicted looks like:

Iris-setosa
Iris-setosa
Iris-setosa
Iris-setosa
Iris-setosa
...

Performance

dataset

auto-sklearn

SVC

EvoDAG

banana

\(28.00 \pm 3.69^*\)

:math:`11.27 pm 0.18`

\(12.20 \pm 0.19^*\)

titanic

\(37.18 \pm 1.64^*\)

\(30.27 \pm 0.36^*\)

:math:`30.04 pm 0.26`

thyroid

\(23.38 \pm 3.99^*\)

:math:`6.13 pm 0.76`

\(8.06 \pm 0.83^*\)

diabetis

\(37.65 \pm 2.01^*\)

\(26.65 \pm 0.44^*\)

:math:`24.85 pm 0.42`

breast-cancer

\(42.36 \pm 1.38^*\)

\(36.25 \pm 1.04^*\)

:math:`34.67 pm 1.08`

flare-solar

\(39.05 \pm 1.49^*\)

\(33.41 \pm 0.38^*\)

:math:`32.88 pm 0.32`

heart

\(27.69 \pm 2.85^*\)

\(18.12 \pm 0.63^*\)

:math:`16.58 pm 0.73`

ringnorm

\(15.49 \pm 4.24^*\)

:math:`1.96 pm 0.10`

\(2.56 \pm 0.08^*\)

twonorm

\(20.87 \pm 4.49^*\)

\(2.90 \pm 0.09^*\)

:math:`2.70 pm 0.04`

german

\(39.45 \pm 1.62^*\)

\(29.00 \pm 0.50^*\)

:math:`28.77 pm 0.53`

image

\(21.29 \pm 10.54^*\)

:math:`3.32 pm 0.29`

\(3.88 \pm 0.43^*\)

waveform

\(22.67 \pm 3.53^*\)

\(10.62 \pm 0.21^*\)

:math:`10.45 pm 0.11`

splice

\(10.79 \pm 7.43^*\)

\(11.23 \pm 0.37^*\)

:math:`9.33 pm 0.56`

dataset

auto-sklearn

SVC

EvoDAG

banana

\(13.39\)

:math:`11.02`

\(12.05\)

titanic

\(33.10\)

:math:`29.63`

\(29.75\)

thyroid

\(11.96\)

:math:`5.95`

\(8.27\)

diabetis

\(31.17\)

\(26.58\)

:math:`24.51`

breast-cancer

\(41.64\)

\(35.38\)

:math:`34.83`

flare-solar

\(34.90\)

\(33.31\)

:math:`32.99`

heart

\(20.68\)

\(18.28\)

:math:`15.96`

ringnorm

\(2.07\)

:math:`1.83`

\(2.51\)

twonorm

\(3.29\)

\(2.83\)

:math:`2.69`

german

\(36.25\)

\(28.88\)

:math:`28.57`

image

:math:`3.06`

\(3.41\)

\(3.66\)

waveform

\(11.41\)

\(10.45\)

:math:`10.36`

splice

:math:`3.39`

\(11.07\)

\(9.26\)

Dataset

import glob
for train in glob.glob('data/*train_data*.asc'):
    label = train.split('/')[1].replace('data', 'labels')
    label = train.split('/')[0] + '/' + label
    with open(train) as fpt:
        t = fpt.readlines()
    with open(label) as fpt:
        l = fpt.readlines()
    with open(train, 'w') as fpt:
        for a, b in zip(t, l):
            d = [x.rstrip().rstrip() for x in a.split(" ")]
            d = [x for x in d if len(x)]
            d.append(b.rstrip().lstrip())
            fpt.write(",".join(d) + '\n')

for test in glob.glob('data/*test_data*.asc'):
    with open(test) as fpt:
        l = fpt.readlines()
    with open(test, 'w') as fpt:
        for a in l:
            d = [x.rstrip().rstrip() for x in a.split(" ")]
            d = [x for x in d if len(x)]
            fpt.write(",".join(d) + '\n')

Execute EvoDAG in the dataset (using a cluster with SGE)

#!/bin/sh
for i in data/*train_data*.asc;
do
    seed=`python -c "import sys; print(sys.argv[1].split('.asc')[0].split('_')[-1])" $i`;
    test=`python -c "import sys; print(sys.argv[1].replace('train', 'test'))" $i`;
    output=data/`basename $test .asc`.evodag
    cache=$i.rs.evodag
    cpu=2
    if [ ! -f $cache ]
       then
    echo \#!/bin/sh > job.sh
    echo \#$ -N `basename $i .asc` >> job.sh
    echo \#$ -S /bin/sh >> job.sh
    echo \#$ -cwd >> job.sh
    echo \#$ -j y >> job.sh
    echo ~/miniconda3/bin/EvoDAG -s $seed -u $cpu -o $output -m $output -t $test --cache-file $cache -r 734 $i >> job.sh
    qsub job.sh
    fi
done

Install EvoDAG

  • Install using pip

    pip install EvoDAG

Using source code

  • Clone the repository git clone https://github.com/mgraffg/EvoDAG.git

  • Install the package as usual python setup.py install

  • To install only for the use then python setup.py install --user

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

EvoDAG-0.2.10.tar.gz (35.2 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page