DataBeacon — A Python library for EDA, data cleaning, feature engineering, and visualization.
Project description
databeacon
A Python library for exploratory data analysis, data cleaning, feature engineering, and visualization.
Installation
pip install databeacon
Or install from source:
git clone https://github.com/Adityasharma-6782/databeacon.git
cd databeacon
pip install -e ".[dev]"
Quick Start
import pandas as pd
from databeacon import (
summarize, describe_numerics, describe_categoricals, correlation_matrix,
drop_duplicates, handle_missing, remove_outliers, fix_dtypes,
encode_categoricals, scale_features, apply_transforms,
plot_distributions, plot_correlations, plot_missing,
)
df = pd.read_csv("your_data.csv")
# ── EDA ──────────────────────────────────────────────────────────
info = summarize(df)
print(info["shape"], info["missing"])
num_stats = describe_numerics(df)
cat_stats = describe_categoricals(df)
corr = correlation_matrix(df, method="spearman")
# ── Cleaning ─────────────────────────────────────────────────────
df = drop_duplicates(df)
df = handle_missing(df, strategy="mean") # or "median", "mode", "drop", "constant"
df = remove_outliers(df, method="iqr") # or "zscore"
df = fix_dtypes(df, datetime_cols=["date_col"])
# ── Feature Engineering ──────────────────────────────────────────
df = encode_categoricals(df, method="onehot") # or "label", "ordinal"
df = scale_features(df, method="standard") # or "minmax", "robust"
df = apply_transforms(df, columns=["price"], method="log")
# ── Visualization ────────────────────────────────────────────────
plot_distributions(df)
plot_correlations(df)
plot_missing(df)
Modules
databeacon.eda
| Function | Description |
|---|---|
summarize(df) |
Shape, dtypes, missing counts, duplicates, memory |
describe_numerics(df) |
Extended stats: mean, std, skewness, kurtosis, percentiles |
describe_categoricals(df) |
Count, unique, top value, frequency for object/category cols |
correlation_matrix(df, method, threshold) |
Pearson/Spearman/Kendall correlation with optional masking |
databeacon.cleaning
| Function | Description |
|---|---|
drop_duplicates(df, subset, keep) |
Remove duplicate rows |
handle_missing(df, strategy, fill_value, drop_threshold) |
Impute or drop missing values |
remove_outliers(df, method, columns) |
IQR or Z-score based outlier removal |
fix_dtypes(df, datetime_cols, category_threshold) |
Auto-infer and fix column types |
databeacon.features
| Function | Description |
|---|---|
encode_categoricals(df, method, ordinal_mapping) |
One-hot, label, or ordinal encoding |
scale_features(df, method) |
Standard, min-max, or robust scaling |
apply_transforms(df, columns, method) |
Log, sqrt, square, box-cox transforms |
create_interaction_features(df, column_pairs, operations) |
Multiply, add, subtract, divide pairs |
databeacon.viz
| Function | Description |
|---|---|
plot_distributions(df) |
Histograms + KDE for numeric columns |
plot_correlations(df) |
Annotated correlation heatmap |
plot_missing(df) |
Horizontal bar chart of missing % |
plot_categorical_counts(df) |
Value count bar charts |
generate_report(df) |
Auto full-page EDA report |
Development
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/ -v
# Run tests with coverage
pytest tests/ --cov=databeacon --cov-report=term-missing
# Lint
ruff check databeacon/
# Format
black databeacon/ tests/
Contributing
- Fork the repository
- Create a feature branch:
git checkout -b feature/my-feature - Commit your changes:
git commit -m "Add my feature" - Push to the branch:
git push origin feature/my-feature - Open a Pull Request
Please add tests for any new functionality and ensure all tests pass.
License
MIT — see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file databeacon-0.1.0.tar.gz.
File metadata
- Download URL: databeacon-0.1.0.tar.gz
- Upload date:
- Size: 22.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b975a8c98ae7a1a32cf335c9861e9915e82cdfcdff9117ba88fd7bda8bd78e0c
|
|
| MD5 |
c25bc9b529a11821085f4365bb113b09
|
|
| BLAKE2b-256 |
666abbe42cd403d6564429d03d7f0f8903d1090e2624e115d73d1d60a3f3643d
|
File details
Details for the file databeacon-0.1.0-py3-none-any.whl.
File metadata
- Download URL: databeacon-0.1.0-py3-none-any.whl
- Upload date:
- Size: 22.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9202ba48872b1b196b4f7f45164b82664d7405d3ec3b2bb96f3b6b0f5d5517f4
|
|
| MD5 |
719a9885397e8a520203c686e4d7c401
|
|
| BLAKE2b-256 |
d990b59f24adde16621e41765d27b031ccd6b261980863bf283104eeff3df5ae
|