Skip to main content

It performs feature analysis for data preprocessing or usage of data in Machine Learning.

Project description

Feature Analysis: Simple Feature Analysis in Python

It performs feature analysis for data preprocessing or usage of data in Machine Learning.

Methods

There are four methods used to identify features to remove:

  1. Finding Missing Values (find_missing(missing_threshold))

    %% Find the features with a fraction of missing values above missing_threshold

  2. Single Unique Values (find_unique())

    %% Find the features having a single unique value. NaNs do not count as a unique value.

  3. Collinear Features (find_collinear(correlation_threshold,one_hot=False))

    %% Finds collinear features based on the correlation coefficient between features.

  4. Low Importance Features (find_low_impt(cumulative_impt,label))

    %% Finds the lowest importance features not needed to account for cumulative_importance fraction of the total feature importance from the gradient boosting machine. label corresponding to regession or classification

Usage

Refer to the testing.py for how to use the different methods in module.

Visualizations

The FeatureAnalysis methods also includes a number of visualization in each methods mentioned above to inspect characteristics of a dataset.

Histogram for missing values

Histogram for unique values

Correlation Heat map

Important Features

Cumulative Feature Importance

Requirements:

Install dependencies just mentioned in requirements.txt by typing command in shell.

pip install -r requirements.txt

Contact

Any questions can be directed to nbansal1_be18@thapar.edu.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Feature_Analysis-0.0.3.tar.gz (2.7 kB view details)

Uploaded Source

File details

Details for the file Feature_Analysis-0.0.3.tar.gz.

File metadata

  • Download URL: Feature_Analysis-0.0.3.tar.gz
  • Upload date:
  • Size: 2.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.5

File hashes

Hashes for Feature_Analysis-0.0.3.tar.gz
Algorithm Hash digest
SHA256 077b93e05110a22072347e60ddabd2879260c8b29aa59bbc14e19d890d327fe1
MD5 6459abc4f754201e9c1479dc66bc89e2
BLAKE2b-256 e8c45b81a04771d8def088c9cdc362a3dc02c77677e4b69ff097fe2963c112cb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page