Skip to main content

It performs feature analysis for data preprocessing or usage of data in Machine Learning.

Project description

Feature Analysis: Simple Feature Analysis in Python

It performs feature analysis for data preprocessing or usage of data in Machine Learning.

Methods

There are four methods used to identify features to remove:

  1. Finding Missing Values (find_missing(missing_threshold))

    %% Find the features with a fraction of missing values above missing_threshold

  2. Single Unique Values (find_unique())

    %% Find the features having a single unique value. NaNs do not count as a unique value.

  3. Collinear Features (find_collinear(correlation_threshold,one_hot=False))

    %% Finds collinear features based on the correlation coefficient between features.

  4. Low Importance Features (find_low_impt(cumulative_impt,label))

    %% Finds the lowest importance features not needed to account for cumulative_importance fraction of the total feature importance from the gradient boosting machine. label corresponding to regession or classification

Usage

Refer to the testing.py for how to use the different methods in module.

Visualizations

The FeatureAnalysis methods also includes a number of visualization in each methods mentioned above to inspect characteristics of a dataset.

Histogram for missing values

Histogram for unique values

Correlation Heat map

Important Features

Cumulative Feature Importance

Requirements:

Install dependencies just mentioned in requirements.txt by typing command in shell.

pip install -r requirements.txt

Contact

Any questions can be directed to nbansal1_be18@thapar.edu.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Feature_Analysis-0.0.4.tar.gz (2.7 kB view details)

Uploaded Source

File details

Details for the file Feature_Analysis-0.0.4.tar.gz.

File metadata

  • Download URL: Feature_Analysis-0.0.4.tar.gz
  • Upload date:
  • Size: 2.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.5

File hashes

Hashes for Feature_Analysis-0.0.4.tar.gz
Algorithm Hash digest
SHA256 666196f2e04c06fc2a87fc1f02941ffea60a0ac238c8d9e0209e621fe408e460
MD5 db8eb1c35e2853698cc4c6ad6a7f0b22
BLAKE2b-256 3f1477f050523a6b10fcc1716692e521eac6b97bd4afc2c8cf99d517a5efcd12

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page