Skip to main content

An interactive cheat sheet, AI-powered guide for exploratory data analysis (EDA), and tools for data visualization, cleaning and feature engineering.

Project description

๐Ÿง  pyedahelper - Simplify Your Exploratory Data Analysis (EDA)

pyedahelper is an educational and practical Python library designed to make Exploratory Data Analysis (EDA) simple, guided, and fast, especially for data analysts, students, and early-career data scientists who want to spend more time analyzing data and less time remembering syntax.

It's a lightweight, educational, and intelligent Python library that helps you perform Exploratory Data Analysis (EDA) faster โ€” with guided suggestions, ready-to-use utilities, and clean visualizations.

๐ŸŒŸ Key Features:

  • โšก A smart EDA cheat sheet (interactive and collapsible),
  • ๐Ÿ’ฌ AI-guided EDA assistant โ€” suggests the next logical step (e.g., โ€œView top rows with df.head()โ€).
  • ๐Ÿงฉ A suite of data tools for real-world EDA tasks (loading, cleaning, feature engineering, visualization, and summaries),
  • ๐Ÿ’ฌ Handy code hints and examples you can copy directly into your notebook.

๐ŸŒ Why pyedahelper?

Performing EDA often involves the use of numerous syntaxes to understand the dataset, it forces the narrative that good data professionals are those who know all the Python syntaxes by heart rather than those who can interprete accurately, the output of each of the EDA steps. And more importantly, Data Analysts spend more than 80% of their analytics time on iterative EDA, some of these hours spent checking documentary and Googling stuffs.

pyedahelper solves this by combining ready-to-use functions for your data workflow, AI-powered guide with inline learning โ€” you can see, learn, and apply the same steps.

โœจ What Problem Does pyedahelper Solve?

Exploratory Data Analysis (EDA) is essential, but repetitive.

Across projects, users repeatedly:

  • Forget basic pandas syntax (df.info(), df.describe(), df.groupby())

  • Run the same plots without understanding what matters

  • Miss data issues that affect modeling readiness

  • Lose time recalling workflows rather than reasoning about data

pyedahelper addresses this by guiding users through EDA as a logical process, not a memory test.

โš™๏ธ Installation

pip install pyedahelper

Upgrade

pip install --upgrade pyedahelper

๐Ÿš€ Quick Start

import edahelper as eda
import pandas as pd

# Load your dataset
df = pd.read_csv("data.csv")

# ๐Ÿ“š Display the interactive EDA cheat-sheet
eda.show() -- for experienced analysts or
eda.core.show() -- for total newbies

# ๐Ÿ” Start guided suggestion
eda.next("read_csv")   # Suggests: "View first rows with df.head()"

# ๐Ÿ’ก View an example command with short explanation
eda.core.example("describe")

From there, the assistant automatically continues:

df.head() โ†’ df.columns โ†’ df.shape โ†’ df.info() โ†’ df.describe() โ†’ ...

If you want to skip a suggestion, simply type "Next".

๐Ÿ” Modules Overview

1๏ธโƒฃ EDA Guidance (AI Suggestion System)

The next() method in pyedahelper provides contextual next-step suggestions for your data analysis workflow.

Instead of remembering long commands, simply call:

eda.next("read_csv")

โ€ฆand it will suggest the next logical step in your EDA, cleaning, visualization, or modeling process.

Below is a list of common helper keywords and what next() will suggest for each stage of analysis:

๐Ÿ”น Basic EDA

| Keyword    | Suggestion                                                         |
| ---------- | ------------------------------------------------------------------ |
| `read_csv` | View first rows with `df.head()`                                   |
| `head`     | Check column names with `df.columns`                               |
| `columns`  | See shape (rows, columns) using `df.shape`                         |
| `shape`    | Get column data types with `df.info()`                             |
| `info`     | Summarize numeric data with `df.describe()`                        |
| `describe` | Check for missing values using `df.isnull().sum()`                 |
| `isnull`   | Get total missing values count using `df.isnull().sum()`           |
| `sum`      | Fill missing values using `df.fillna()` or drop with `df.dropna()` |

๐Ÿ”น Missing Values Handling

| Keyword            | Suggestion                                                                  |
| ------------------ | --------------------------------------------------------------------------- |
| `fillna`           | Try filling missing values by data type: numeric, categorical, or datetime. |
| `fill_numeric`     | Fill numeric NaNs with `df['col'].fillna(df['col'].mean())`                 |
| `fill_categorical` | Fill categorical NaNs with `df['col'].fillna(df['col'].mode()[0])`          |
| `fill_datetime`    | Fill datetime NaNs with `df['col'].fillna(df['col'].median())`              |
| `dropna`           | Drop missing rows using `df.dropna()` if too many missing values exist.     |

๐Ÿ”น Data Cleaning

| Keyword           | Suggestion                                                |
| ----------------- | --------------------------------------------------------- |
| `duplicated`      | Check for duplicate rows using `df.duplicated().sum()`    |
| `drop_duplicates` | Remove duplicates with `df.drop_duplicates(inplace=True)` |
| `replace`         | Replace wrong entries with `df.replace({'old':'new'})`    |
| `astype`          | Convert columns to proper data types using `df.astype()`  |

๐Ÿ”น Visualization

| Keyword             | Suggestion                                                                                      |
| ------------------- | ----------------------------------------------------------------------------------------------- |
| `plot_distribution` | Plot column distributions using `sns.histplot(df['col'])`                                       |
| `plot_correlation`  | Visualize correlations using `sns.heatmap(df.corr())`                                           |
| `scatterplot`       | Scatter two numeric variables using `sns.scatterplot(x, y, data=df)`                            |
| `cat_num_plot`      | Use `sns.boxplot(x='Category', y='Value', data=df)` for categorical-numerical plots.            |
| `cat_cat_plot`      | Use `sns.countplot(x='Category1', hue='Category2', data=df)` for categorical-categorical plots. |
| `num_num_plot`      | Use `sns.jointplot(x='X', y='Y', data=df)` for numerical-numerical relationships.               |

๐Ÿ”น Feature Engineering

| Keyword         | Suggestion                                                              |
| --------------- | ----------------------------------------------------------------------- |
| `label_encode`  | Label encode with `LabelEncoder()` for categorical columns.             |
| `onehot_encode` | Use `pd.get_dummies(df, columns=['col'])` for one-hot encoding.         |
| `scale_numeric` | Standardize numerical features using `StandardScaler().fit_transform()` |

๐Ÿ”น Modeling

| Keyword                 | Suggestion                                                                |
| ----------------------- | ------------------------------------------------------------------------- |
| `train_test_split`      | Split data using `train_test_split(X, y, test_size=0.2, random_state=42)` |
| `fit_model`             | Train a model like `LogisticRegression().fit(X_train, y_train)`           |
| `predict`               | Predict outcomes with `model.predict(X_test)`                             |
| `classification_report` | Evaluate performance using `classification_report(y_test, y_pred)`        |
| `confusion_matrix`      | Plot confusion matrix with `sns.heatmap(confusion_matrix(...))`           |

This feature helps beginners and professionals alike stay productive and focused on insights rather than remembering syntax.

5๏ธโƒฃ Visualization Module

Functions for exploring and visualizing data quickly.

from edahelper import visualization as vis

vis.plot_correlation(df)
vis.plot_distribution(df, "Age")
vis.scatter(df, "Age", "Income", hue="Gender")

๐ŸŽจ Uses matplotlib and seaborn under the hood for fast, clean plots.

๐Ÿ“˜ The Interactive Cheat-Sheet

When you forget a syntax, simply call:

eda.show()

โœจ Displays a colorful grouped guide of:

Data Loading Overview Missing Values Indexing & Grouping Visualization Feature Engineering NumPy & sklearn tips

๐Ÿง‘๐Ÿฝโ€๐Ÿ’ป Example Workflow

import pandas as pd
import edahelper as eda
from edahelper import inspect

df = pd.read_csv("data.csv")

eda.next("read_csv")
df.head()

eda.next("head")
df.columns

eda.next("columns")
df.info()

inspect(df)

๐Ÿ“ฆ Project Structure

edahelper/
โ”‚
โ”œโ”€โ”€ __init__.py
โ”œโ”€โ”€ core.py        # examples, topics, hints
โ”œโ”€โ”€ show.py        # display utilities
โ”œโ”€โ”€ nextstep.py   # guided workflow engine
โ”œโ”€โ”€ inspector.py  # decision-oriented EDA checks

๐Ÿ›  Requirements

Python 3.8+ pandas numpy seaborn scikit-learn matplotlib rich (for colored terminal output)

๐Ÿงพ License

MIT License ยฉ 2025

Chidiebere V. Christopher Feel free to fork, contribute, or use it in your analytics workflow!

๐ŸŒŸ Contributing

We welcome contributions โ€” bug fixes, new EDA tools, or notebook examples.

  1. Fork the repo
  2. Create your feature branch (git checkout -b feature-name)
  3. Commit your changes
  4. Push and open a Pull Request ๐ŸŽ‰

๐Ÿ”— Links

๐Ÿ“ฆ PyPI: https://pypi.org/project/pyedahelper/ ๐Ÿ’ป GitHub: https://github.com/93Chidiebere/pyedahelper-Python-EDA-Helper โœ‰๏ธ Author: Chidiebere V. Christopher

๐Ÿš€ Learn. Explore. Analyze. Faster. pyedahelper โ€” Stop remembering syntax. Start reasoning about data.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyedahelper-1.0.9.tar.gz (27.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyedahelper-1.0.9-py3-none-any.whl (27.2 kB view details)

Uploaded Python 3

File details

Details for the file pyedahelper-1.0.9.tar.gz.

File metadata

  • Download URL: pyedahelper-1.0.9.tar.gz
  • Upload date:
  • Size: 27.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for pyedahelper-1.0.9.tar.gz
Algorithm Hash digest
SHA256 47c8413742f2d2543a185e406bc4c7788fc4c90f8c2c66aac4dae00e73e2ba71
MD5 5d512e42a756ea4c0bb65c824db79dad
BLAKE2b-256 ffbd1c990be5d8158a01b862793ce276dbf9df103d31442a2ee27c1ba6869200

See more details on using hashes here.

File details

Details for the file pyedahelper-1.0.9-py3-none-any.whl.

File metadata

  • Download URL: pyedahelper-1.0.9-py3-none-any.whl
  • Upload date:
  • Size: 27.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for pyedahelper-1.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 cbe15566db2e6081b9396a695f199ebc38b0d3078d48bbd2b313ef2026b9e777
MD5 103ae3caf2242082b7c5188ad0ad376c
BLAKE2b-256 f3c7aecbf32a26e4026301451da68a551a03e8d24ff4fc17e56f2d891b1936be

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page