EDA on sparse data for classification problems
Project description
Sparse profile - EDA on sparse data
Module to perform EDA tasks for a classification problem with sparse data
Curently takes only numeric values
Sample usage
import pandas as pd
import numpy as np
from sparse_profile import sparse_profile
df = pd.DataFrame({
'target' : [1, 1, 1, 1, 0, 0 ,0 ,0, 1, 0],
'col_1' : [1, 0, 0, 0, 0, 0, 0, 0, 0, 9],
'col_2' : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
})
sProfile = sparse_profile(df, 'target')
print(sProfile.top_gain)
Output maximum gain obtained from each column
col_2 0.422810
col_1 0.074882
dtype: float64
print(sProfile.report_sparsity)
Output percentage of zeros in column
col_1 0.8
col_2 0.1
Various sparse_profile reports can be accessed as attributes of the sparse_profile class object. List of all available attributes:
- report_sparsity: pandas dataframe, Percentage of zeros in each column
- report_distinct: pandas dataframe, Count of distinct non zero values in each column
- report_overall: pandas dataframe, Overall summary of each column (similar to pandas describe())
- report_non_zero: pandas dataframe, Summary of each column after removing zeros
- gain_df: pandas dataframe, Relative information gain at decile cutoffs for each column wrt target column
- auc_df: pandas dataframe, AUC of each column wrt target column
- top_gain: pandas dataframe, Columns sorted by maximum gain obtained from gain_df
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sparse_profile-0.1.1.tar.gz.
File metadata
- Download URL: sparse_profile-0.1.1.tar.gz
- Upload date:
- Size: 4.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
69c97adc38046be4f54a77554b5bd723ccc7e4a7c9e7a4830f6810b6af1e9ed2
|
|
| MD5 |
b3df4fc4e91b01f9e159cbaf663cf15e
|
|
| BLAKE2b-256 |
6382e099d4738d968132a483d09187734158998ead6a9bfda7f269b3ad81a787
|
File details
Details for the file sparse_profile-0.1.1-py3-none-any.whl.
File metadata
- Download URL: sparse_profile-0.1.1-py3-none-any.whl
- Upload date:
- Size: 5.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d92195cb0f936ff6770de53ac81f472c716a6f58fe2f9544bfe4d7e61a8a1b2d
|
|
| MD5 |
e97ccdbf8eacedc3ab5b9bf066889c58
|
|
| BLAKE2b-256 |
4eff35229ba56301a12b25cedd6f2afcd299082c040a2a20cbaf4f1ca0a829ff
|