Skip to main content

Kydavra is a sci-kit learn inspired python library with feature selection methods for Data Science and Macine Learning Model development

Project description

kydavra

Kydavra is a python sci-kit learn inspired package for feature selection. It used some statistical methods to extract from pure pandas Data Frames the columns that are related to column that your model should predict. This version of kydavra has the next methods of feature selection:

  • ANOVA test selector (ANOVASelector)
  • Chi squared selector (ChiSquaredSelector)
  • Genetic Algorithm selector (GeneticAlgorithmSelector)
  • Kendall Correlation selector (KendallCorrelationSelector)
  • Lasso selector (LassoSelector)
  • Pearson Correlation selector (PearsonCorrelationSelector)
  • Point-Biserial selector (PointBiserialCorrSelector)
  • P-value selector (PValueSelector)
  • Spearman Correlation selector (SpermanCorrelationSelector)
  • Shannon selector (ShannonSelector)
  • ElasticNet Selector (ElasticNetSelector)
  • M3U Selector (M3USelector)
  • MUSE Selector (MUSESelector)
  • Mixer Selector (MixerSelector)
  • PCA Filter (PCAFilter)
  • PCA Reducer (PCAReducer)
  • LDA Reducer (LDAReducer)
  • Bregman Divergence selector (BregmanDivergenceSelector)
  • Fisher Selector (FisherSelector)
  • ICA Reducer (ICAReducer)
  • ICA Filter (ICAFilter)
  • Itakura-Saito Divergence selector (ItakuraSaitoSelector)
  • Jensen-Shannon Divergence selector (JensenShannonSelector)
  • Kullback-Leibler selector (KullbackLeiblerSelector)
  • MultiSURF selector (MultiSURFSelector)
  • Phik selector (PhikSelector)
  • ReliefF selector (ReliefFSelector)

All these methods takes the pandas Data Frame and y column to select from remained columns in the Data Frame.

How to use kydavraTo use selector from kydavra you should just import the selector from kydavra in the following framework:

from kydavra import PValueSelector

class names are written above in parantheses.Next create a object of this algorithm (I will use p-value method as an example).

method = PValueSelector()

To get the best feature on the opinion of the method you should use the 'select' function, using as parameters the pandas Data Frame and the column that you want your model to predict.

selected_columns = method.select(df, 'target')

Returned value is a list of columns selected by the algorithm.

Some methods could plot the process of selecting the best features.In these methods dotted are features that wasn't selected by the method.ChiSquaredSelector

method.plot_chi2()

For ploting and

method.plot_chi2(save=True, file_path='FILE/PATH.png')

and

method.plot_p_value()

for ploting the p-values.LassoSelector

method.plot_process()

also you can save the plot using the same parameters.PValueSelector

method.plot_process()

Some advice.

  • Use ChiSquaredSelector for categorical features.
  • Use LassoSelector and PValueSelector for regression problems.
  • Use PointBiserialCorrSelector for binary classification problems.
  • Use ShannonSelector to choose whatever to keep the NaN values (as another value) and to drop column with a lot of NaN values.

With love from Sigmoid.

We are open for feedback. Please send your impression to vladimir.stojoc@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kydavra-0.3.4.tar.gz (29.0 kB view details)

Uploaded Source

Built Distribution

kydavra-0.3.4-py3-none-any.whl (47.1 kB view details)

Uploaded Python 3

File details

Details for the file kydavra-0.3.4.tar.gz.

File metadata

  • Download URL: kydavra-0.3.4.tar.gz
  • Upload date:
  • Size: 29.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.2

File hashes

Hashes for kydavra-0.3.4.tar.gz
Algorithm Hash digest
SHA256 cf45e1fe2492c54d0d38c3f5752c127e116e96c58f4142286d046d7aa5004c44
MD5 673c702add11953cc619a87a28eb884f
BLAKE2b-256 fc0ad2e796ac41845872dcacfdecb1fe3de60eb4458af9512aa37b8ae384b6db

See more details on using hashes here.

File details

Details for the file kydavra-0.3.4-py3-none-any.whl.

File metadata

  • Download URL: kydavra-0.3.4-py3-none-any.whl
  • Upload date:
  • Size: 47.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.2

File hashes

Hashes for kydavra-0.3.4-py3-none-any.whl
Algorithm Hash digest
SHA256 9fdfc25b85a4b0a4de5b580fa1bcec71000ff4a366072f8ca2d2d424ce2c2b05
MD5 ea67fc9c0446d4dba4b5dc2f0b75133e
BLAKE2b-256 1b6e00488814fd1a9637a1e95e5c6b2a2e06bf06de58cc472e8c8e3f9b845f2b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page