Skip to main content

Python package for instance selection algorithms

Project description

kondo-ML

kondo-ML is a package containing various instance selection algorithms usable with regression models. The implementations are compatible with sklearn and follow its outlier detection interface.

This is still a work in progress and some documentation is missing. Please refer to the source code for each algorithm in the instance_selection folder.

Install

The package can be installed via pip
pip install kondo_ml

Overview of algorithms

Algorithm Goal
RegCNN Size reduction
RegENN Noise filter
RegENNTime Noise filter/drift handling
DROP-RX Noise filter/size reduction
Shapley Utility assignment
FISH Drift Handling
SELCON Size reduction
Mutual Information Noise filter

Algorithm sources

RegCNN & RegENN: https://link.springer.com/chapter/10.1007/978-3-642-33266-1_33
DROP-RX: https://www.sciencedirect.com/science/article/abs/pii/S0925231216301953
Shapley: https://proceedings.mlr.press/v97/ghorbani19c/ghorbani19c.pdf
FISH: http://eprints.bournemouth.ac.uk/18567/1/FISH_journal_preprint.pdf
SELCON: https://arxiv.org/abs/2106.12491
Mutual Information: https://research.cs.aalto.fi//aml/Publications/Publication167.pdf

The SELCON implementation is taken from the author's github with minor changes: https://github.com/abir-de/SELCOn

Example

# import instance selection algorithm of your choice
from kondo_ml.instace_selection import RegENNSelector
# initialize selector 
reg_enn = RegEnnSelector(alpha=1,nr_of_neighbors=3)
# predict labels (1 to use that instance, -1 to ignore)
labels = reg_enn.fit_predict(X,y)
# transform -1/1 labels into boolean 0/1 labels
from kondo_ml.utils import transform_selector_output_into_mask
boolean_labels = transform_selector_output_into_mask(labels)
# use selected instances for model training (any model, LR here as an example)
from sklearn.linear_model import LinearRegression
model = LinearRegression().fit(X[boolean_labels],y[boolean_labels])

More examples can be found in the notebooks of the examples folder

Contribution

Please feel free to contribute documentation, tests or new algorithms to this package. And let me know if you find any mistakes in the implementations

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kondo_ml-0.0.10.tar.gz (31.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kondo_ml-0.0.10-py3-none-any.whl (41.4 kB view details)

Uploaded Python 3

File details

Details for the file kondo_ml-0.0.10.tar.gz.

File metadata

  • Download URL: kondo_ml-0.0.10.tar.gz
  • Upload date:
  • Size: 31.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.1

File hashes

Hashes for kondo_ml-0.0.10.tar.gz
Algorithm Hash digest
SHA256 1be03714717e9c23184ec2b672a01b5b3a60e2cbe21d7d04e2bb6b50b68f1998
MD5 fe6cb57f6221618a3abd9350002bf5d8
BLAKE2b-256 7b6f0a56fd61c7d7cf1dbcf8d22f41e682222ee2effb71628e144c1eb189d692

See more details on using hashes here.

File details

Details for the file kondo_ml-0.0.10-py3-none-any.whl.

File metadata

  • Download URL: kondo_ml-0.0.10-py3-none-any.whl
  • Upload date:
  • Size: 41.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.1

File hashes

Hashes for kondo_ml-0.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 80399b0607c7b72507f673adcffbf7a2c3367601d30a975dec826005b2296a65
MD5 0b0be67d561d2260505141e80c53f1c5
BLAKE2b-256 84a1240b52108c71711edd77a51d1f0ac649443dae76b2fab79b6e2dae303849

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page