Skip to main content

A simple library for exploratory data analysis

Project description

SimpleEDA

SimpleEDA is a Python library for simple exploratory data analysis tasks. It provides functions to handle outliers, find special characters, calculate Variance Inflation Factor (VIF), detect duplicates, and visualize continuous data using box plots.

Installation

You can install SimpleEDA via pip:

pip install SimpleEDA

Usage

Below are examples of how to use the various functions provided by SimpleEDA.

Importing the Library

import SimpleEDA as eda
import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'A': [1, 2, 2, 4, 5, 6, 7, 8, 9, 10],
    'B': [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
    'C': [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
    'D': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
})

remove_outlier

This function removes outliers from a column based on the Interquartile Range (IQR) method.

lower, upper = eda.remove_outlier(df['A'])
print(f"Lower bound: {lower}, Upper bound: {upper}")

Parameters:

  • col (pd.Series): The column from which to remove outliers.
  • multiplier (float): The multiplier for the IQR to define outliers. Default is 1.5.

Returns:

  • tuple: Lower and upper range for outlier detection.

find_specialchar

This function finds special characters in a DataFrame.

eda.find_specialchar(df)

Parameters:

  • df (pd.DataFrame): The DataFrame to check.

Returns:

  • None

vif_cal

This function calculates the Variance Inflation Factor (VIF) for each feature in the DataFrame.

eda.vif_cal(df[['A', 'B', 'C']])

Parameters:

  • input_data (pd.DataFrame): The DataFrame for which to calculate VIF.

Returns:

  • None

dups

This function shows a duplicate summary of a DataFrame.

eda.dups(df)

Parameters:

  • df (pd.DataFrame): The DataFrame to check for duplicates.

Returns:

  • None

boxplt_continous

This function plots boxplots for all continuous features in the DataFrame.

eda.boxplt_continous(df)

Parameters:

  • df (pd.DataFrame): The DataFrame to plot.

Returns:

  • None

Example

Here's a complete example of using SimpleEDA with a sample DataFrame:

import SimpleEDA as eda
import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'A': [1, 2, 2, 4, 5, 6, 7, 8, 9, 10],
    'B': [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
    'C': [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
    'D': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
})

# Remove outliers
lower, upper = eda.remove_outlier(df['A'])
print(f"Lower bound: {lower}, Upper bound: {upper}")

# Find special characters
eda.find_specialchar(df)

# Calculate VIF
eda.vif_cal(df[['A', 'B', 'C']])

# Detect duplicates
eda.dups(df)

# Plot boxplots for continuous features
eda.boxplt_continous(df)

enhance_summary

Provides an enhanced summary of a pandas DataFrame, including custom percentiles, IQR, outliers, duplicates, missing values, and skewness. It also handles both numerical and categorical variables.

summary = eda.enhance_summary(df, custom_percentiles=[5, 95])
print(summary)

Parameters:

dataframe (pd.DataFrame): The DataFrame to summarize. custom_percentiles (list, optional): A list of custom percentiles to include in the summary.

Returns:

pd.DataFrame: DataFrame containing the enhanced summary statistics.

Example

Here's a complete example of using SimplyEDA with a sample DataFrame:

import SimplyEDA as eda
import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'A': [1, 2, 2, 4, 5, 6, 7, 8, 9, 10],
    'B': [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
    'C': [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
    'D': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
})

# Remove outliers
lower, upper = eda.remove_outlier(df['A'])
print(f"Lower bound: {lower}, Upper bound: {upper}")

# Find special characters
eda.find_specialchar(df)

# Calculate VIF
vif = eda.vif_cal(df[['A', 'B', 'C']])
print(vif)

# Detect duplicates
eda.dups(df)

# Plot boxplots for continuous features
eda.boxplt_continous(df)

# Enhanced summary
summary = eda.enhance_summary(df, custom_percentiles=[5, 95])
print(summary)

Author

This project was created by M.R.Vijay Krishnan. You can reach me at vijaykrishnanmr@gmail.com.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SimplyEDA-0.1.7.tar.gz (5.6 kB view details)

Uploaded Source

Built Distribution

SimplyEDA-0.1.7-py3-none-any.whl (4.8 kB view details)

Uploaded Python 3

File details

Details for the file SimplyEDA-0.1.7.tar.gz.

File metadata

  • Download URL: SimplyEDA-0.1.7.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.19

File hashes

Hashes for SimplyEDA-0.1.7.tar.gz
Algorithm Hash digest
SHA256 f7b693c7a4d1c79cdde43c65c05445977401d077cb1cd8038394c2f197ebb6eb
MD5 a5373330e061c067bbac796b81264f2a
BLAKE2b-256 33dc9c24bd46b3888ae6d98f632b770b40c49d2a69197ecaed6425ac86bd85e5

See more details on using hashes here.

File details

Details for the file SimplyEDA-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: SimplyEDA-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 4.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.19

File hashes

Hashes for SimplyEDA-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 050e71f1aee2503e85d4e999b6b6a7b2207f7e4b57beb964ff8aca7d8f955a54
MD5 52e0077eecf61fd1aef5d78396c1ec75
BLAKE2b-256 d89ed518481a4b33c1f7f43eaa191167b5dade1e98f75ae02c662c403890f2d8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page