Skip to main content

A package for generating EDA reports

Project description

In this README.md, I have included the following sections:

  • Features
  • Installation
  • Usage
  • Important Note about handling null values
  • Input Parameters with descriptions
  • Example Usage
  • Screenshots
  • License

Additionally, I've provided an example showing how to remove or impute null values before generating the EDA report. This ensures users understand the importance of handling null values in their datasets.

EDAExcelReport

EDAExcelReport is a Python package for generating detailed exploratory data analysis (EDA) reports specifically for datasets with binary target variables. The package creates comprehensive EDA reports in Excel format, which include statistics and visualizations in the form of table that help in understanding the distribution and relationship of various features with the target variable.

Features

  • Calculates frequency and distribution of feature values.
  • Computes target rate, percentage of total target, and lift for each feature value.
  • Automatically handles numeric and categorical data.
  • Generates Excel reports with well-formatted tables and conditional formatting.
  • Removes gridlines and adds borders for better readability.

Installation

You can install the package via pip:

pip install EDAExcelReport
# How to import?
from EDAR.excel_report import EDAExcelReport
# Import necessary libraries
import pandas as pd
import numpy as np
import os
from EDAR.excel_report import EDAExcelReport
# Loading the credit dataset
df = pd.read_csv(r"tests\credit_data.csv")
df.columns
Index(['ID', 'CODE_GENDER', 'FLAG_OWN_CAR', 'FLAG_OWN_REALTY', 'CNT_CHILDREN',
       'AMT_INCOME_TOTAL', 'NAME_INCOME_TYPE', 'NAME_EDUCATION_TYPE',
       'NAME_FAMILY_STATUS', 'NAME_HOUSING_TYPE', 'DAYS_BIRTH',
       'DAYS_EMPLOYED', 'FLAG_MOBIL', 'FLAG_WORK_PHONE', 'FLAG_PHONE',
       'FLAG_EMAIL', 'OCCUPATION_TYPE', 'CNT_FAM_MEMBERS', 'target'],
      dtype='object')
df.isna().sum()
ID                         0
CODE_GENDER                0
FLAG_OWN_CAR               0
FLAG_OWN_REALTY            0
CNT_CHILDREN               0
AMT_INCOME_TOTAL           0
NAME_INCOME_TYPE           0
NAME_EDUCATION_TYPE        0
NAME_FAMILY_STATUS         0
NAME_HOUSING_TYPE          0
DAYS_BIRTH                 0
DAYS_EMPLOYED              0
FLAG_MOBIL                 0
FLAG_WORK_PHONE            0
FLAG_PHONE                 0
FLAG_EMAIL                 0
OCCUPATION_TYPE        11323
CNT_FAM_MEMBERS            0
target                     0
dtype: int64
ignore_feats = ["ID", "OCCUPATION_TYPE", "DAYS_BIRTH", "DAYS_EMPLOYED", "FLAG_MOBIL"]
EDAExcelReport(df, 'target',r'tests\test_eda_report.xlsx', ignore_cols= ignore_feats)
Your EDA report is ready at tests\test_eda_report_20240610_153828.xlsx

<ed_report.excel_report.EDAExcelReport at 0x188c09ee9f0>

Important Note

Ensure your dataset is free of null values before using the EDAExcelReport package. This is crucial because numeric data is bucketed during the analysis, and the presence of null values can interfere with the bucket creation process. Additionally, having null values in the dataset can lead to inaccurate or misleading results when showcasing the report to stakeholders.

Example

# Remove or impute null values
df.fillna(method='ffill', inplace=True)

Input Parameters

EDAExcelReport

class EDAExcelReport:
    def __init__(self, data, target, report_path, ignore_cols=None, cat_label_enco_thresh=0.05, num_min_samples_leaf=0.1, conditional_color='red'):


`data:` The input DataFrame containing the dataset.
`target:` The name of the target column in the DataFrame.
`report_path:` The file path where the Excel report will be saved.
`ignore_cols:` (Optional) List of column names to ignore in the analysis.
`cat_label_enco_thresh:` (Optional) Threshold for label encoding of categorical variables (default is 0.05).
`num_min_samples_leaf:` (Optional) Minimum samples per leaf for numeric data bucketing (default is 0.1).
`conditional_color:` (Optional) The color used for conditional formatting in the report (default is 'red').

Exploratory Data Analysis Excel File for above Credit Data you can download from here:

Download Excel File

Screenshots

Screenshot 1

Screenshot 1

Screenshot 2

Screenshot 2

Screenshot 3

Screenshot 3

Screenshot 4

Screenshot 4

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

EDAExcelReport-0.1.8.tar.gz (11.2 kB view details)

Uploaded Source

Built Distribution

EDAExcelReport-0.1.8-py3-none-any.whl (18.9 kB view details)

Uploaded Python 3

File details

Details for the file EDAExcelReport-0.1.8.tar.gz.

File metadata

  • Download URL: EDAExcelReport-0.1.8.tar.gz
  • Upload date:
  • Size: 11.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.2

File hashes

Hashes for EDAExcelReport-0.1.8.tar.gz
Algorithm Hash digest
SHA256 c677cb59f6af03fb65bf77619ec3fd87be02aa29477b169525bdc799479e54e7
MD5 8da2b25d37a052df5d36608f4beb9556
BLAKE2b-256 0b7238274e7a60e9fcdabb5c2b4a15535f5c2e697dd7c34226937791b22a5ec0

See more details on using hashes here.

File details

Details for the file EDAExcelReport-0.1.8-py3-none-any.whl.

File metadata

File hashes

Hashes for EDAExcelReport-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 991ca84682684ecbe38c732f4330f8df87f85047f73f7e745c4613bff9ccb7e8
MD5 eff0150f0b60d71b34a529e427f243d6
BLAKE2b-256 f227677aca010a6dbf5e2605349b1c1ac8429f3e58e798b5fd64210184ca6c84

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page