Framework for data analysis and machine learning experiments
Project description
Experiment Runner (exp-runner)
exp-runner is a simple and extensible framework for data analysis and machine learning experiments in Python.
Structure
The framework includes following step:
- Data loading
- Data transformation
- Model training and testing
- Performance evaluation
- Results saving
Main features
- Generability: Variaty of models and methods are supported and it can be used in a number of tasks (such as preprocessing, dimensionality reduction, classification, regression, clustering, statistical tests, etc.)
- Flexability: Steps can be easily skipped and/or included
- Dynamic loading: Automatically imports modules during runtime - no additional lines are needed
Installation
pip install exp-runner
Usage
Let's say, your project has the following structure:
MyAwesomeProject/
main.py
my_custom_module.py
data/
data_00.npy
data_01.npy
...
data_NN.npy
protocols/
experiment_config.json
results/
Just give me a code!
You just need to describe your framework in the JSON configuration file:
experiment_config.json
{
"Setup": {
"description": "You can add detailed description of the experiment",
"random_seed": 42
},
"Dataset": {
"class": "my_custom_module.MyAwesomeDataLoader",
"args": {"path_to_data": "data/*.npy"}
},
"Transforms": [
{
"class": "sklearn.decomposition.PCA",
"args": {"n_components": 3, "whiten": true}
}
],
"Model": {
"class": "sklearn.cluster.KMeans",
"args": {"n_clusters": 3, "n_jobs": -1, "verbose": 0}
},
"Metric": {
"class": "my_custom_module.SklearnMetricWrapper",
"args": {"metric": "normalized_mutual_info_score"}
},
"Saver": {
"class": "my_custom_module.CSVReport",
"args": {"path_to_output": "results/evaluation_results.csv", "sep": ";"}
}
}
Here are aforementioned classes (click):
my_custom_module.py
import os
import glob
import numpy as np
import sklearn.metrics
from exp_runner import Dataset, Metric, Saver
from collections import defaultdict
from typing import Any, Dict, List, Union, NoReturn, Iterable, Callable
from sklearn.model_selection import StratifiedShuffleSplit
class MyAwesomeDataLoader(Dataset):
def __init__(self, path_to_data: str, test_size: float = 0.1, training: bool = True):
super(MyAwesomeDataLoader, self).__init__()
self._samples = dict()
self._labels = dict()
self._splits = defaultdict(dict)
paths_to_data = glob.glob(path_to_data)
for path in paths_to_data:
fname = os.path.basename(path)
data = np.load(path)
X = data[:, :-1]
y = data[:, -1]
indices_train, indices_test = next(StratifiedShuffleSplit(
test_size=test_size
).split(X, y))
self._samples[fname] = X
self._labels[fname] = y
self._splits[fname]['train'] = indices_train
self._splits[fname]['test'] = indices_test
self._indices = list(self._samples.keys())
self._training = training
def __getitem__(self, index: int) -> Dict[str, Dict[str, Union[str, np.ndarray]]]:
if not (0 <= index < len(self._indices)):
raise IndexError
fname = self._indices[index]
item = {
'X': self._samples[fname][self._splits[fname]['train'] if self.training else self._splits[fname]['test']],
'y': self._labels[fname][self._splits[fname]['train'] if self.training else self._splits[fname]['test']]
}
item['desc'] = 'it is possible to add description for each data sample'
return {'filename': fname, 'item': item}
def __len__(self) -> int:
return len(self._indices)
@property
def training(self):
return self._training
class SklearnMetricWrapper(Metric):
def __init__(self, metric: str):
super(SklearnMetricWrapper, self).__init__()
metric = getattr(sklearn.metrics, metric)
self._metric: Callable[[Iterable[Union[float, int]], Iterable[Union[float, int]]], float] = metric
def __call__(self, y_true: Iterable[Union[float, int]], y_pred: Iterable[Union[float, int]]) -> float:
return self._metric(y_true, y_pred)
class CSVReport(Saver):
def __init__(self, path_to_output: str, sep: str = ';', append: bool = True):
super(CSVReport, self).__init__()
self.path_to_output = path_to_output
self.sep = sep
self.mode = 'a+' if append else 'w+'
def save(self, report: List[Dict[str, Any]]) -> NoReturn:
with open(self.path_to_output, self.mode) as csv:
for entry in report:
line = self.sep.join([
entry['filename'],
entry['desc'],
entry['perf']
]) + '\n'
csv.write(line)
Finally, to run your experiment type in your terminal:
cd /path/to/MyAwesomeProject
python main.py --config protocols/experiment_config.json
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
File details
Details for the file exp_runner-0.1.0b2-py3-none-any.whl
.
File metadata
- Download URL: exp_runner-0.1.0b2-py3-none-any.whl
- Upload date:
- Size: 6.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ed7118cada6a5f2dcc74ac84485bdddc0fc988514a18f7fac905da42e822b202 |
|
MD5 | 5cbca2ebb141bb49036031448c596be6 |
|
BLAKE2b-256 | 3a5ca9af5166e6c2dbf84e945b248409494a0f33e59bb4ac0d500203ccb38b1d |