The perform_eda function is used to conduct Exploratory Data Analysis (EDA) on a given dataset. It provides various insights and visualizations to help understand the data.
Project description
perform_eda
The perform_eda function is used to conduct Exploratory Data Analysis (EDA) on a given dataset. It provides various insights and visualizations to help understand the data.
Function Signature
Install my-project with npm
def perform_eda(data):
Usage
perform_eda(data)
Functionality
The perform_eda
function performs the following steps:
- Prints the dimensions of the dataset.
- Displays the data types of each column in the dataset.
- Provides summary statistics for the dataset.
- Checks for missing values and displays the count of null values for each column.
- Identifies duplicate rows in the dataset and prints the count of duplicate rows.
- Visualizes the distributions and relationships in the data:
- For categorical variables, generates bar plots showing the value counts for each category.
- For numeric variables, generates histograms, box plots, scatter plots (numeric vs. numeric), and kernel density plots.
- Displays a correlation matrix heatmap.
- Generates a pairwise scatter plot for numeric variables.
- For categorical variables, if there are more than one, generates cross-tabulation bar plots to visualize the relationships between different categorical variables.
- Displays a heatmap showing the locations of missing values in the dataset.
- If a target variable name is provided, calculates the correlation between each feature and the target variable and displays a bar plot of the feature correlations with the target variable.
- Detects outliers in numeric variables by calculating z-scores and identifying values that exceed a threshold of 3 standard deviations from the mean.
Note: Replace <target_variable_name>
with the actual name of your target variable to enable the feature correlation analysis.
Dependencies
The perform_eda
function requires the following libraries:
- pandas
- numpy
- matplotlib.pyplot
- seaborn
- scipy.stats.ttest_ind
Make sure to have these libraries installed in your Python environment before using the function.
Example
import pandas as pd
# Load your dataset
data = pd.read_csv('your_dataset.csv')
# Perform EDA
perform_eda(data)
🔗 Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file perform_eda-0.1.1.tar.gz
.
File metadata
- Download URL: perform_eda-0.1.1.tar.gz
- Upload date:
- Size: 3.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 02265e677f13b57a189a97203058fe257bb06f90e65ad81bc5696088b0991108 |
|
MD5 | be905461bb23b3e40d423b7ee3da826c |
|
BLAKE2b-256 | 8b8d9276794548c25406116b892f762d990c9087017b0b24fc1a1ac59cfa6649 |