Skip to main content

EDA on sparse data for classification problems

Project description

Sparse profile - EDA on sparse data

Module to perform EDA tasks for a classification problem with sparse data
Curently takes only numeric values

Sample usage

import pandas as pd
import numpy as np
from sparse_profile import sparse_profile

df = pd.DataFrame({
        'target' : [1, 1, 1, 1, 0, 0 ,0 ,0, 1, 0],
        'col_1' :  [1, 0, 0, 0, 0, 0, 0, 0, 0, 9],
        'col_2' :  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    })
sProfile = sparse_profile(df, 'target')
print(sProfile.top_gain)

Output maximum gain obtained from each column

col_2    0.422810
col_1    0.074882
dtype: float64
print(sProfile.report_sparsity)

Output percentage of zeros in column

col_1  0.8
col_2  0.1

Various sparse_profile reports can be accessed as attributes of the sparse_profile class object. List of all available attributes:

  • report_sparsity:      pandas dataframe, Percentage of zeros in each column
  • report_distinct:       pandas dataframe, Count of distinct non zero values in each column
  • report_overall:        pandas dataframe, Overall summary of each column (similar to pandas describe())
  • report_non_zero:    pandas dataframe, Summary of each column after removing zeros
  • gain_df:                   pandas dataframe, Relative information gain at decile cutoffs for each column wrt target column
  • auc_df:                    pandas dataframe, AUC of each column wrt target column
  • top_gain:                pandas dataframe, Columns sorted by maximum gain obtained from gain_df

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparse_profile-0.1.1.tar.gz (4.9 kB view details)

Uploaded Source

Built Distribution

sparse_profile-0.1.1-py3-none-any.whl (5.0 kB view details)

Uploaded Python 3

File details

Details for the file sparse_profile-0.1.1.tar.gz.

File metadata

  • Download URL: sparse_profile-0.1.1.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for sparse_profile-0.1.1.tar.gz
Algorithm Hash digest
SHA256 69c97adc38046be4f54a77554b5bd723ccc7e4a7c9e7a4830f6810b6af1e9ed2
MD5 b3df4fc4e91b01f9e159cbaf663cf15e
BLAKE2b-256 6382e099d4738d968132a483d09187734158998ead6a9bfda7f269b3ad81a787

See more details on using hashes here.

File details

Details for the file sparse_profile-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: sparse_profile-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 5.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for sparse_profile-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d92195cb0f936ff6770de53ac81f472c716a6f58fe2f9544bfe4d7e61a8a1b2d
MD5 e97ccdbf8eacedc3ab5b9bf066889c58
BLAKE2b-256 4eff35229ba56301a12b25cedd6f2afcd299082c040a2a20cbaf4f1ca0a829ff

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page