Skip to main content

lightweight library that provides functionalities for common EDA tasks

Project description

🚀 edazer

edazer is a lightweight Python package designed to accelerate exploratory data analysis (EDA) workflows. It provides simple, intuitive, and consistent APIs to inspect, summarize, and understand datasets—supporting both pandas and polars backends.

Instead of rewriting repetitive EDA code for every project, edazer helps you get insights in just a few lines.


📓 Kaggle Tutorial

👉 Quick hands-on guide:
https://www.kaggle.com/code/adarsh79x/edazer-for-quick-eda-pandas-polars-profiling


✨ What’s New in v0.2.0

  • Improved backend abstraction for pandas & polars
  • Cleaner API for dtype-based column selection
  • Enhanced unique value inspection
  • Better handling of edge cases (non-hashable columns, dtype normalization)
  • Internal performance and structure improvements

🎯 Use Cases

  • ⚡ Quick dataset understanding
  • 📊 Early-stage data exploration
  • 📓 Jupyter notebook workflows
  • 🔍 Identifying data quality issues
  • 🧠 Feature understanding before modeling

🔧 Features

📌 DataFrame Summary

Get a complete overview in one call:

  • Schema / info
  • Descriptive statistics
  • Null percentages
  • Duplicate count
  • Unique values
  • Shape
dz.summarize_df()

🔍 Smart Data Inspection

dz.lookup("head")     # first rows
dz.lookup("tail")     # last rows
dz.lookup("sample")   # random sample

🧩 Unique Value Exploration

dz.show_unique_values(
    column_names=["col1", "col2"],
    max_unique=10
)
  • Automatically skips noisy columns
  • Suggests when to increase threshold

🧠 Dtype-Based Column Selection

dz.cols_with_dtype(["float", "int"])

Options:

  • exact=True → strict dtype match (float64)
  • return_dtype_map=True → returns {column: dtype}

🔑 Primary Key Detection

from edazer import get_primary_key

get_primary_key(df, threshold=0.9, n_combos=2)

Find:

  • Single-column unique identifiers
  • Multi-column composite keys

📊 Data Profiling (Optional)

from edazer.profiling import show_data_profile

show_data_profile(dz)

Powered by ydata-profiling.


🖱️ Interactive Tables

from edazer import interactive_df

interactive_df()

Enables rich DataFrame viewing using itables.


📦 Installation

pip install edazer==0.2.0

⚡ Quick Start

import seaborn as sns
from edazer import Edazer

# Load dataset
df = sns.load_dataset("titanic")

# Initialize
dz = Edazer(df, backend="pandas")

# Summary
dz.summarize_df()

# Unique values
dz.show_unique_values(column_names=["sex", "class"])

# Dtype filtering
print(dz.cols_with_dtype(["float"]))

# Inspect data
dz.lookup("head")

📘 API Reference

Edazer(df, backend="pandas")

Create an analyzer instance.

  • df: pd.DataFrame or pl.DataFrame
  • backend: "pandas" or "polars"

summarize_df()

Displays:

  • Schema/info
  • Descriptive stats
  • Null/duplicate counts
  • Unique values
  • Shape

show_unique_values(column_names, max_unique=10)

  • column_names: list of columns
  • max_unique: max values to display

cols_with_dtype(dtypes=None, exact=False, return_dtype_map=False)

  • dtypes: list of dtype strings
  • exact: strict match
  • return_dtype_map: return dict instead of list

lookup(option="head")

  • "head" → first rows
  • "tail" → last rows
  • "sample" → random rows

get_primary_key(df, threshold=0.9, n_combos=1, valid_column_dtypes=None)

Detect candidate keys.

Returns:

  • List[str] or List[List[str]]

📊 Example Output

dz.show_unique_values(
    column_names=dz.cols_with_dtype(["object"])
)
sex: ['male', 'female']
embarked: ['S', 'C', 'Q', nan]
class: ['Third', 'First', 'Second']

🤝 Contributing

Contributions are welcome!

GitHub: https://github.com/adarsh-79/edazer


📄 License

MIT License


👨‍💻 Author

adarsh3690704

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edazer-0.2.0.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

edazer-0.2.0-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file edazer-0.2.0.tar.gz.

File metadata

  • Download URL: edazer-0.2.0.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for edazer-0.2.0.tar.gz
Algorithm Hash digest
SHA256 a98d91991b2ef42e7868a6c19332c81ddc86ac9afd6f3bf985e5393683cd0619
MD5 de6f77c56d6f4591216f64159a5726a7
BLAKE2b-256 bfb985926d67fb496459332b8d6b3ba241967adc4a7874640da7031529c9e008

See more details on using hashes here.

File details

Details for the file edazer-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: edazer-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 9.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for edazer-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 df09e81544f396d1557ab91938328459060b0939708a8e170403483b3084d25f
MD5 b97718795d7bfcfeb4140fe0acd672ba
BLAKE2b-256 684f57559c94d20841d9e346f6386c8f696c0d4f55307fae32b8a70168b97d05

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page