Skip to main content

MECO: Multi-objective Evolutionary Compression.

Project description

Travis (.org) Codecov Requires.io PyPI license PyPI

The MECO (Multi-objective Evolutionary COmpression) algorithm is a tool to perform:

  • dataset compression,

  • feature selection, and

  • coreset discovery.

This python package provides a sklearn-like transformer implementation of the MECO algorithm.

Quick start

You can install the meco package along with all its dependencies from PyPI:

$ pip install meco

Example

For this simple experiment, let’s use the digits dataset from sklearn. We first need to import the dataset, a simple sklearn classifier (e.g. Ridge) and the MECO transformer. We can then load the dataset, create a MECO model, and fit the model on the digits dataset:

from sklearn.datasets import load_digits
from sklearn.linear_model import RidgeClassifier

from meco import MECO

X, y = load_digits(return_X_y=True)

model = MECO(RidgeClassifier(random_state=42))
model.fit(X, y)

Once training is over, we get a view of the compressed input data X containing the most relevant samples (i.e. a subset of the rows in X, a.k.a. the coreset), and the most relevant features (i.e. a subset of the columns in X):

x_reduced = model.transform(X)

Once trained, the model.best_set_ dictionary contains:

  • the indices of the most relevant samples,

  • the indices of the most relevant features, and

  • the validation accuracy of the compressed dataset x_reduced, e.g.:

>>> model.best_set_
{
    'samples': [0, 2, 4, ...],
    'features': [3, 7, 8, ...],
    'accuracy': 0.9219,
}

The compressed dataset (x_reduced, y_reduced) can be used instead of the original dataset (X, y) to train machine learning models more efficiently:

from sklearn.ensemble import RandomForestClassifier

y_reduced = y[model.best_set_['samples']]

classifier = RandomForestClassifier(random_state=42)
classifier.fit(x_reduced, y_reduced)

Tasks

Dataset compression

Should you need to compress the whole dataset X (i.e. for dataset compression), you can set the parameter compression to 'both' (this is the default behaviour anyway):

model = MECO(RidgeClassifier(), compression='both')

Coreset discovery

Should you need to compress the rows of X only (i.e. for coreset discovery), you can set the parameter compression to 'samples':

model = MECO(RidgeClassifier(), compression='samples')

Feature selection

Should you need to compress the columns of X only (i.e. for feature selection), you can set the parameter compression to 'features':

model = MECO(RidgeClassifier(), compression='features')

Citing

If you find MECO useful in your research, please consider citing the following papers:

@inproceedings{barbiero2019novel,
  title={A Novel Outlook on Feature Selection as a Multi-objective Problem},
  author={Barbiero, Pietro and Lutton, Evelyne and Squillero, Giovanni and Tonda, Alberto},
  booktitle={International Conference on Artificial Evolution (Evolution Artificielle)},
  pages={68--81},
  year={2019},
  organization={Springer}
}

@article{barbiero2020uncovering,
  title={Uncovering Coresets for Classification With Multi-Objective Evolutionary Algorithms},
  author={Barbiero, Pietro and Squillero, Giovanni and Tonda, Alberto},
  journal={arXiv preprint arXiv:2002.08645},
  year={2020}
}

Source

The source code and minimal working examples can be found on GitHub.

Authors

Pietro Barbiero, Giovanni Squillero, and Alberto Tonda.

Licence

Copyright 2020 Pietro Barbiero, Giovanni Squillero, and Alberto Tonda.

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and limitations under the License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

meco-1.0.1.tar.gz (16.5 kB view details)

Uploaded Source

Built Distribution

meco-1.0.1-py3-none-any.whl (14.1 kB view details)

Uploaded Python 3

File details

Details for the file meco-1.0.1.tar.gz.

File metadata

  • Download URL: meco-1.0.1.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.7

File hashes

Hashes for meco-1.0.1.tar.gz
Algorithm Hash digest
SHA256 34aff3a562f57bacabe03654b9b1fc6dc5dcae29655f6d92e6eeb861b0887839
MD5 c23646c0c815d1ce7b97a9f829b2e60e
BLAKE2b-256 2e687278e8494aa80d4dbd29d21355567571841869ebd8b6389de59de22533ae

See more details on using hashes here.

File details

Details for the file meco-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: meco-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 14.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.7

File hashes

Hashes for meco-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 54134f9d571533b2c9640ec6089d5f5504a6638d79e3d994e504ee09c1690ebb
MD5 5a4cafc1fdd1dc12fd1d6898d280550d
BLAKE2b-256 007674cd40a3e9f486a96722defd9dd9b2a2d26f86261bcad4b2441099ca170d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page