lightweight library that provides functionalities for common EDA tasks
Project description
🚀 edazer
edazer is a lightweight Python package designed to accelerate exploratory data analysis (EDA) workflows. It provides simple, intuitive, and consistent APIs to inspect, summarize, and understand datasets—supporting both pandas and polars backends.
Instead of rewriting repetitive EDA code for every project, edazer helps you get insights in just a few lines.
📓 Kaggle Tutorial
👉 Quick hands-on guide:
https://www.kaggle.com/code/adarsh79x/edazer-for-quick-eda-pandas-polars-profiling
✨ What’s New in v0.2.0
- Improved backend abstraction for pandas & polars
- Cleaner API for dtype-based column selection
- Enhanced unique value inspection
- Better handling of edge cases (non-hashable columns, dtype normalization)
- Internal performance and structure improvements
🎯 Use Cases
- ⚡ Quick dataset understanding
- 📊 Early-stage data exploration
- 📓 Jupyter notebook workflows
- 🔍 Identifying data quality issues
- 🧠 Feature understanding before modeling
🔧 Features
📌 DataFrame Summary
Get a complete overview in one call:
- Schema / info
- Descriptive statistics
- Null percentages
- Duplicate count
- Unique values
- Shape
dz.summarize_df()
🔍 Smart Data Inspection
dz.lookup("head") # first rows
dz.lookup("tail") # last rows
dz.lookup("sample") # random sample
🧩 Unique Value Exploration
dz.show_unique_values(
column_names=["col1", "col2"],
max_unique=10
)
- Automatically skips noisy columns
- Suggests when to increase threshold
🧠 Dtype-Based Column Selection
dz.cols_with_dtype(["float", "int"])
Options:
exact=True→ strict dtype match (float64)return_dtype_map=True→ returns{column: dtype}
🔑 Primary Key Detection
from edazer import get_primary_key
get_primary_key(df, threshold=0.9, n_combos=2)
Find:
- Single-column unique identifiers
- Multi-column composite keys
📊 Data Profiling (Optional)
from edazer.profiling import show_data_profile
show_data_profile(dz)
Powered by ydata-profiling.
🖱️ Interactive Tables
from edazer import interactive_df
interactive_df()
Enables rich DataFrame viewing using itables.
📦 Installation
pip install edazer==0.2.0
⚡ Quick Start
import seaborn as sns
from edazer import Edazer
# Load dataset
df = sns.load_dataset("titanic")
# Initialize
dz = Edazer(df, backend="pandas")
# Summary
dz.summarize_df()
# Unique values
dz.show_unique_values(column_names=["sex", "class"])
# Dtype filtering
print(dz.cols_with_dtype(["float"]))
# Inspect data
dz.lookup("head")
📘 API Reference
Edazer(df, backend="pandas")
Create an analyzer instance.
df:pd.DataFrameorpl.DataFramebackend:"pandas"or"polars"
summarize_df()
Displays:
- Schema/info
- Descriptive stats
- Null/duplicate counts
- Unique values
- Shape
show_unique_values(column_names, max_unique=10)
column_names: list of columnsmax_unique: max values to display
cols_with_dtype(dtypes=None, exact=False, return_dtype_map=False)
dtypes: list of dtype stringsexact: strict matchreturn_dtype_map: return dict instead of list
lookup(option="head")
"head"→ first rows"tail"→ last rows"sample"→ random rows
get_primary_key(df, threshold=0.9, n_combos=1, valid_column_dtypes=None)
Detect candidate keys.
Returns:
List[str]orList[List[str]]
📊 Example Output
dz.show_unique_values(
column_names=dz.cols_with_dtype(["object"])
)
sex: ['male', 'female']
embarked: ['S', 'C', 'Q', nan]
class: ['Third', 'First', 'Second']
🤝 Contributing
Contributions are welcome!
GitHub: https://github.com/adarsh-79/edazer
📄 License
MIT License
👨💻 Author
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file edazer-0.2.0.tar.gz.
File metadata
- Download URL: edazer-0.2.0.tar.gz
- Upload date:
- Size: 7.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a98d91991b2ef42e7868a6c19332c81ddc86ac9afd6f3bf985e5393683cd0619
|
|
| MD5 |
de6f77c56d6f4591216f64159a5726a7
|
|
| BLAKE2b-256 |
bfb985926d67fb496459332b8d6b3ba241967adc4a7874640da7031529c9e008
|
File details
Details for the file edazer-0.2.0-py3-none-any.whl.
File metadata
- Download URL: edazer-0.2.0-py3-none-any.whl
- Upload date:
- Size: 9.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
df09e81544f396d1557ab91938328459060b0939708a8e170403483b3084d25f
|
|
| MD5 |
b97718795d7bfcfeb4140fe0acd672ba
|
|
| BLAKE2b-256 |
684f57559c94d20841d9e346f6386c8f696c0d4f55307fae32b8a70168b97d05
|