Skip to main content

The perform_eda function is used to conduct Exploratory Data Analysis (EDA) on a given dataset. It provides various insights and visualizations to help understand the data.

Project description

perform_eda

The perform_eda function is used to conduct Exploratory Data Analysis (EDA) on a given dataset. It provides various insights and visualizations to help understand the data.

Function Signature

Install my-project with npm

  def perform_eda(data):

Usage

 perform_eda(data)

Functionality

The perform_eda function performs the following steps:

  1. Prints the dimensions of the dataset.
  2. Displays the data types of each column in the dataset.
  3. Provides summary statistics for the dataset.
  4. Checks for missing values and displays the count of null values for each column.
  5. Identifies duplicate rows in the dataset and prints the count of duplicate rows.
  6. Visualizes the distributions and relationships in the data:
    • For categorical variables, generates bar plots showing the value counts for each category.
    • For numeric variables, generates histograms, box plots, scatter plots (numeric vs. numeric), and kernel density plots.
  7. Displays a correlation matrix heatmap.
  8. Generates a pairwise scatter plot for numeric variables.
  9. For categorical variables, if there are more than one, generates cross-tabulation bar plots to visualize the relationships between different categorical variables.
  10. Displays a heatmap showing the locations of missing values in the dataset.
  11. If a target variable name is provided, calculates the correlation between each feature and the target variable and displays a bar plot of the feature correlations with the target variable.
  12. Detects outliers in numeric variables by calculating z-scores and identifying values that exceed a threshold of 3 standard deviations from the mean.

Note: Replace <target_variable_name> with the actual name of your target variable to enable the feature correlation analysis.

Dependencies

The perform_eda function requires the following libraries:

  • pandas
  • numpy
  • matplotlib.pyplot
  • seaborn
  • scipy.stats.ttest_ind

Make sure to have these libraries installed in your Python environment before using the function.

Example

import pandas as pd

# Load your dataset
data = pd.read_csv('your_dataset.csv')

# Perform EDA
perform_eda(data)

🔗 Links

linkedin

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

perform_eda-0.1.1.tar.gz (3.2 kB view details)

Uploaded Source

File details

Details for the file perform_eda-0.1.1.tar.gz.

File metadata

  • Download URL: perform_eda-0.1.1.tar.gz
  • Upload date:
  • Size: 3.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for perform_eda-0.1.1.tar.gz
Algorithm Hash digest
SHA256 02265e677f13b57a189a97203058fe257bb06f90e65ad81bc5696088b0991108
MD5 be905461bb23b3e40d423b7ee3da826c
BLAKE2b-256 8b8d9276794548c25406116b892f762d990c9087017b0b24fc1a1ac59cfa6649

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page