Skip to main content

Python package for instance selection algorithms

Project description

kondo-ML

kondo-ML is a package containing various instance selection algorithms usable with regression models. The implementations are compatible with sklearn and follow its outlier detection interface.

This is still a work in progress and some documentation is missing. Please refer to the source code for each algorithm in the instance_selection folder.

Install

The package can be installed via pip
pip install kondo_ml

Overview of algorithms

Algorithm Goal
RegCNN Size reduction
RegENN Noise filter
RegENNTime Noise filter/drift handling
DROP-RX Noise filter/size reduction
Shapley Utility assignment
FISH Drift Handling
SELCON Size reduction
Mutual Information Noise filter

Algorithm sources

RegCNN & RegENN: https://link.springer.com/chapter/10.1007/978-3-642-33266-1_33
DROP-RX: https://www.sciencedirect.com/science/article/abs/pii/S0925231216301953
Shapley: https://proceedings.mlr.press/v97/ghorbani19c/ghorbani19c.pdf
FISH: http://eprints.bournemouth.ac.uk/18567/1/FISH_journal_preprint.pdf
SELCON: https://arxiv.org/abs/2106.12491
Mutual Information: https://research.cs.aalto.fi//aml/Publications/Publication167.pdf

The SELCON implementation is taken from the author's github with minor changes: https://github.com/abir-de/SELCOn

Example

# import instance selection algorithm of your choice
from kondo_ml.instace_selection import RegENNSelector
# initialize selector 
reg_enn = RegEnnSelector(alpha=1,nr_of_neighbors=3)
# predict labels (1 to use that instance, -1 to ignore)
labels = reg_enn.fit_predict(X,y)
# transform -1/1 labels into boolean 0/1 labels
from kondo_ml.utils import transform_selector_output_into_mask
boolean_labels = transform_selector_output_into_mask(labels)
# use selected instances for model training (any model, LR here as an example)
from sklearn.linear_model import LinearRegression
model = LinearRegression().fit(X[boolean_labels],y[boolean_labels])

More examples can be found in the notebooks of the examples folder

Contribution

Please feel free to contribute documentation, tests or new algorithms to this package. And let me know if you find any mistakes in the implementations

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kondo_ml-0.0.10.tar.gz (31.9 kB view hashes)

Uploaded Source

Built Distribution

kondo_ml-0.0.10-py3-none-any.whl (41.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page