EDA on sparse data for classification problems
Project description
Sparse profile - EDA on sparse data
Module to perform EDA tasks for a classification problem with sparse data
Curently takes only numeric values
Sample usage
import pandas as pd
import numpy as np
from sparse_profile import sparse_profile
df = pd.DataFrame({
'target' : [1, 1, 1, 1, 0, 0 ,0 ,0, 1, 0],
'col_1' : [1, 0, 0, 0, 0, 0, 0, 0, 0, 9],
'col_2' : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
})
sProfile = sparse_profile(df, 'target')
print(sProfile.top_gain)
Output maximum gain obtained from each column
col_2 0.422810
col_1 0.074882
dtype: float64
print(sProfile.report_sparsity)
Output percentage of zeros in column
col_1 0.8
col_2 0.1
Various sparse_profile reports can be accessed as attributes of the sparse_profile class object. List of all available attributes:
- report_sparsity: pandas dataframe, Percentage of zeros in each column
- report_distinct: pandas dataframe, Count of distinct non zero values in each column
- report_overall: pandas dataframe, Overall summary of each column (similar to pandas describe())
- report_non_zero: pandas dataframe, Summary of each column after removing zeros
- gain_df: pandas dataframe, Relative information gain at decile cutoffs for each column wrt target column
- auc_df: pandas dataframe, AUC of each column wrt target column
- top_gain: pandas dataframe, Columns sorted by maximum gain obtained from gain_df
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
sparse_profile-0.1.1.tar.gz
(4.9 kB
view details)
Built Distribution
File details
Details for the file sparse_profile-0.1.1.tar.gz
.
File metadata
- Download URL: sparse_profile-0.1.1.tar.gz
- Upload date:
- Size: 4.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 69c97adc38046be4f54a77554b5bd723ccc7e4a7c9e7a4830f6810b6af1e9ed2 |
|
MD5 | b3df4fc4e91b01f9e159cbaf663cf15e |
|
BLAKE2b-256 | 6382e099d4738d968132a483d09187734158998ead6a9bfda7f269b3ad81a787 |
File details
Details for the file sparse_profile-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: sparse_profile-0.1.1-py3-none-any.whl
- Upload date:
- Size: 5.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d92195cb0f936ff6770de53ac81f472c716a6f58fe2f9544bfe4d7e61a8a1b2d |
|
MD5 | e97ccdbf8eacedc3ab5b9bf066889c58 |
|
BLAKE2b-256 | 4eff35229ba56301a12b25cedd6f2afcd299082c040a2a20cbaf4f1ca0a829ff |