Python package for instance selection algorithms
Project description
kondo-ML
kondo-ML is a package containing various instance selection algorithms usable with regression models. The implementations are compatible with sklearn and follow its outlier detection interface.
This is still a work in progress and some documentation is missing. Please refer to the source code for each algorithm in the instance_selection folder.
Install
The package can be installed via pip
pip install kondo_ml
Overview of algorithms
Algorithm | Goal |
---|---|
RegCNN | Size reduction |
RegENN | Noise filter |
RegENNTime | Noise filter/drift handling |
DROP-RX | Noise filter/size reduction |
Shapley | Utility assignment |
FISH | Drift Handling |
SELCON | Size reduction |
Mutual Information | Noise filter |
Algorithm sources
RegCNN & RegENN: https://link.springer.com/chapter/10.1007/978-3-642-33266-1_33
DROP-RX: https://www.sciencedirect.com/science/article/abs/pii/S0925231216301953
Shapley: https://proceedings.mlr.press/v97/ghorbani19c/ghorbani19c.pdf
FISH: http://eprints.bournemouth.ac.uk/18567/1/FISH_journal_preprint.pdf
SELCON: https://arxiv.org/abs/2106.12491
Mutual Information: https://research.cs.aalto.fi//aml/Publications/Publication167.pdf
The SELCON implementation is taken from the author's github with minor changes: https://github.com/abir-de/SELCOn
Example
# import instance selection algorithm of your choice
from kondo_ml.instace_selection import RegENNSelector
# initialize selector
reg_enn = RegEnnSelector(alpha=1,nr_of_neighbors=3)
# predict labels (1 to use that instance, -1 to ignore)
labels = reg_enn.fit_predict(X,y)
# transform -1/1 labels into boolean 0/1 labels
from kondo_ml.utils import transform_selector_output_into_mask
boolean_labels = transform_selector_output_into_mask(labels)
# use selected instances for model training (any model, LR here as an example)
from sklearn.linear_model import LinearRegression
model = LinearRegression().fit(X[boolean_labels],y[boolean_labels])
More examples can be found in the notebooks of the examples folder
Contribution
Please feel free to contribute documentation, tests or new algorithms to this package. And let me know if you find any mistakes in the implementations
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for kondo_ml-0.0.10-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 80399b0607c7b72507f673adcffbf7a2c3367601d30a975dec826005b2296a65 |
|
MD5 | 0b0be67d561d2260505141e80c53f1c5 |
|
BLAKE2b-256 | 84a1240b52108c71711edd77a51d1f0ac649443dae76b2fab79b6e2dae303849 |