It performs feature analysis for data preprocessing or usage of data in Machine Learning.
Project description
Feature Analysis: Simple Feature Analysis in Python
It performs feature analysis for data preprocessing or usage of data in Machine Learning.
Methods
There are four methods used to identify features to remove:
-
Finding Missing Values (find_missing(missing_threshold))
%% Find the features with a fraction of missing values above
missing_threshold
-
Single Unique Values (find_unique())
%% Find the features having a single unique value. NaNs do not count as a unique value.
-
Collinear Features (find_collinear(correlation_threshold,one_hot=False))
%% Finds collinear features based on the
correlation coefficient
between features. -
Low Importance Features (find_low_impt(cumulative_impt,label))
%% Finds the lowest importance features not needed to account for
cumulative_importance
fraction of the total feature importance from the gradient boosting machine. label corresponding to regession or classification
Usage
Refer to the testing.py for how to use the different methods in module.
Visualizations
The FeatureAnalysis
methods also includes a number of visualization in each methods mentioned above to inspect
characteristics of a dataset.
Histogram for missing values
Histogram for unique values
Correlation Heat map
Important Features
Cumulative Feature Importance
Requirements:
Install dependencies just mentioned in requirements.txt by typing command in shell.
pip install -r requirements.txt
Contact
Any questions can be directed to nbansal1_be18@thapar.edu.
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file Feature_Analysis-0.0.3.tar.gz
.
File metadata
- Download URL: Feature_Analysis-0.0.3.tar.gz
- Upload date:
- Size: 2.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 077b93e05110a22072347e60ddabd2879260c8b29aa59bbc14e19d890d327fe1 |
|
MD5 | 6459abc4f754201e9c1479dc66bc89e2 |
|
BLAKE2b-256 | e8c45b81a04771d8def088c9cdc362a3dc02c77677e4b69ff097fe2963c112cb |