Skip to main content

It performs feature analysis for data preprocessing or usage of data in Machine Learning.

Project description

Feature Analysis: Simple Feature Analysis in Python

It performs feature analysis for data preprocessing or usage of data in Machine Learning.

Methods

There are four methods used to identify features to remove:

  1. Finding Missing Values (find_missing(missing_threshold))

    %% Find the features with a fraction of missing values above missing_threshold

  2. Single Unique Values (find_unique())

    %% Find the features having a single unique value. NaNs do not count as a unique value.

  3. Collinear Features (find_collinear(correlation_threshold,one_hot=False))

    %% Finds collinear features based on the correlation coefficient between features.

  4. Low Importance Features (find_low_impt(cumulative_impt,label))

    %% Finds the lowest importance features not needed to account for cumulative_importance fraction of the total feature importance from the gradient boosting machine. label corresponding to regession or classification

Usage

Refer to the testing.py for how to use the different methods in module.

Visualizations

The FeatureAnalysis methods also includes a number of visualization in each methods mentioned above to inspect characteristics of a dataset.

Histogram for missing values

Histogram for unique values

Correlation Heat map

Important Features

Cumulative Feature Importance

Requirements:

Install dependencies just mentioned in requirements.txt by typing command in shell.

pip install -r requirements.txt

Contact

Any questions can be directed to dipanshugolan96@gmail.com.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

features_anal-0.0.2.tar.gz (2.8 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page