Skip to main content

Framework for data analysis and machine learning experiments

Project description

Experiment Runner (exp-runner)

exp-runner is a simple and extensible framework for data analysis and machine learning experiments in Python.

Structure

The framework includes following step:

  1. Data loading
  2. Data transformation
  3. Model training and testing
  4. Performance evaluation
  5. Results saving

Main features

  • Generability: Variaty of models and methods are supported and it can be used in a number of tasks (such as preprocessing, dimensionality reduction, classification, regression, clustering, statistical tests, etc.)
  • Flexability: Steps can be easily skipped and/or included
  • Dynamic loading: Automatically imports modules during runtime - no additional lines are needed

Installation

pip install exp-runner

Usage

Let's say, your project has the following structure:

MyAwesomeProject/
        main.py
        my_custom_module.py

        data/
                data_00.npy
                data_01.npy
                ...
                data_NN.npy

        protocols/
                experiment_config.json

        results/

Just give me a code!

You just need to describe your framework in the JSON configuration file:

experiment_config.json
{
  "Setup": {
    "description": "You can add detailed description of the experiment",
    "random_seed": 42
  },
  "Dataset": {
    "class": "my_custom_module.MyAwesomeDataLoader",
    "args": {"path_to_data": "data/*.npy"}
  },
  "Transforms": [
    {
      "class": "sklearn.decomposition.PCA",
      "args": {"n_components": 3, "whiten": true}
    }
  ],
  "Model": {
    "class": "sklearn.cluster.KMeans",
    "args": {"n_clusters": 3, "n_jobs": -1, "verbose": 0}
  },
  "Metric": {
    "class": "my_custom_module.SklearnMetricWrapper",
    "args": {"metric": "normalized_mutual_info_score"}
  },
  "Saver": {
    "class": "my_custom_module.CSVReport",
    "args": {"path_to_output": "results/evaluation_results.csv", "sep": ";"}
  }
}
Here are aforementioned classes (click):

my_custom_module.py
import os
import glob
import numpy as np
import sklearn.metrics

from exp_runner import Dataset, Metric, Saver

from collections import defaultdict
from typing import Any, Dict, List, Union, NoReturn, Iterable, Callable

from sklearn.model_selection import StratifiedShuffleSplit


class MyAwesomeDataLoader(Dataset):

    def __init__(self, path_to_data: str, test_size: float = 0.1, training: bool = True):

        super(MyAwesomeDataLoader, self).__init__()

        self._samples = dict()
        self._labels = dict()
        self._splits = defaultdict(dict)

        paths_to_data = glob.glob(path_to_data)

        for path in paths_to_data:
            fname = os.path.basename(path)

            data = np.load(path)
            X = data[:, :-1]   
            y = data[:, -1]

            indices_train, indices_test = next(StratifiedShuffleSplit(
                test_size=test_size
            ).split(X, y))

            self._samples[fname] = X
            self._labels[fname] = y
            self._splits[fname]['train'] = indices_train
            self._splits[fname]['test'] = indices_test

        self._indices = list(self._samples.keys())

        self._training = training

    def __getitem__(self, index: int) -> Dict[str, Dict[str, Union[str, np.ndarray]]]:
        if not (0 <= index < len(self._indices)):
            raise IndexError

        fname = self._indices[index]

        item = {
        'X': self._samples[fname][self._splits[fname]['train'] if self.training else self._splits[fname]['test']],
        'y': self._labels[fname][self._splits[fname]['train'] if self.training else self._splits[fname]['test']]
        }

        item['desc'] = 'it is possible to add description for each data sample'

        return {'filename': fname, 'item': item}

    def __len__(self) -> int:
        return len(self._indices)

    @property
    def training(self):
        return self._training


class SklearnMetricWrapper(Metric):

    def __init__(self, metric: str):
        super(SklearnMetricWrapper, self).__init__()

        metric = getattr(sklearn.metrics, metric)
        self._metric: Callable[[Iterable[Union[float, int]], Iterable[Union[float, int]]], float] = metric

    def __call__(self, y_true: Iterable[Union[float, int]], y_pred: Iterable[Union[float, int]]) -> float:
        return self._metric(y_true, y_pred)


class CSVReport(Saver):

    def __init__(self, path_to_output: str, sep: str = ';', append: bool = True):
        super(CSVReport, self).__init__()

        self.path_to_output = path_to_output
        self.sep = sep
        self.mode = 'a+' if append else 'w+'

    def save(self, report: List[Dict[str, Any]]) -> NoReturn:
        with open(self.path_to_output, self.mode) as csv:
            for entry in report:
                line = self.sep.join([
                    entry['filename'],
                    entry['desc'],
                    entry['perf']
                ]) + '\n'
                csv.write(line)

Finally, to run your experiment type in your terminal:

cd /path/to/MyAwesomeProject
python main.py --config protocols/experiment_config.json

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

exp_runner-0.1.0b2-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file exp_runner-0.1.0b2-py3-none-any.whl.

File metadata

  • Download URL: exp_runner-0.1.0b2-py3-none-any.whl
  • Upload date:
  • Size: 6.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.7.1

File hashes

Hashes for exp_runner-0.1.0b2-py3-none-any.whl
Algorithm Hash digest
SHA256 ed7118cada6a5f2dcc74ac84485bdddc0fc988514a18f7fac905da42e822b202
MD5 5cbca2ebb141bb49036031448c596be6
BLAKE2b-256 3a5ca9af5166e6c2dbf84e945b248409494a0f33e59bb4ac0d500203ccb38b1d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page