Skip to main content

DataBeacon — A Python library for EDA, data cleaning, feature engineering, and visualization.

Project description

databeacon

A Python library for exploratory data analysis, data cleaning, feature engineering, and visualization.

Python 3.9+ License: MIT


Installation

pip install databeacon

Or install from source:

git clone https://github.com/Adityasharma-6782/databeacon.git
cd databeacon
pip install -e ".[dev]"

Quick Start

import pandas as pd
from databeacon import (
    summarize, describe_numerics, describe_categoricals, correlation_matrix,
    drop_duplicates, handle_missing, remove_outliers, fix_dtypes,
    encode_categoricals, scale_features, apply_transforms,
    plot_distributions, plot_correlations, plot_missing,
)

df = pd.read_csv("your_data.csv")

# ── EDA ──────────────────────────────────────────────────────────
info = summarize(df)
print(info["shape"], info["missing"])

num_stats = describe_numerics(df)
cat_stats = describe_categoricals(df)
corr = correlation_matrix(df, method="spearman")

# ── Cleaning ─────────────────────────────────────────────────────
df = drop_duplicates(df)
df = handle_missing(df, strategy="mean")          # or "median", "mode", "drop", "constant"
df = remove_outliers(df, method="iqr")            # or "zscore"
df = fix_dtypes(df, datetime_cols=["date_col"])

# ── Feature Engineering ──────────────────────────────────────────
df = encode_categoricals(df, method="onehot")     # or "label", "ordinal"
df = scale_features(df, method="standard")        # or "minmax", "robust"
df = apply_transforms(df, columns=["price"], method="log")

# ── Visualization ────────────────────────────────────────────────
plot_distributions(df)
plot_correlations(df)
plot_missing(df)

Modules

databeacon.eda

Function Description
summarize(df) Shape, dtypes, missing counts, duplicates, memory
describe_numerics(df) Extended stats: mean, std, skewness, kurtosis, percentiles
describe_categoricals(df) Count, unique, top value, frequency for object/category cols
correlation_matrix(df, method, threshold) Pearson/Spearman/Kendall correlation with optional masking

databeacon.cleaning

Function Description
drop_duplicates(df, subset, keep) Remove duplicate rows
handle_missing(df, strategy, fill_value, drop_threshold) Impute or drop missing values
remove_outliers(df, method, columns) IQR or Z-score based outlier removal
fix_dtypes(df, datetime_cols, category_threshold) Auto-infer and fix column types

databeacon.features

Function Description
encode_categoricals(df, method, ordinal_mapping) One-hot, label, or ordinal encoding
scale_features(df, method) Standard, min-max, or robust scaling
apply_transforms(df, columns, method) Log, sqrt, square, box-cox transforms
create_interaction_features(df, column_pairs, operations) Multiply, add, subtract, divide pairs

databeacon.viz

Function Description
plot_distributions(df) Histograms + KDE for numeric columns
plot_correlations(df) Annotated correlation heatmap
plot_missing(df) Horizontal bar chart of missing %
plot_categorical_counts(df) Value count bar charts
generate_report(df) Auto full-page EDA report

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Run tests with coverage
pytest tests/ --cov=databeacon --cov-report=term-missing

# Lint
ruff check databeacon/

# Format
black databeacon/ tests/

Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/my-feature
  3. Commit your changes: git commit -m "Add my feature"
  4. Push to the branch: git push origin feature/my-feature
  5. Open a Pull Request

Please add tests for any new functionality and ensure all tests pass.


License

MIT — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databeacon-0.1.0.tar.gz (22.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

databeacon-0.1.0-py3-none-any.whl (22.2 kB view details)

Uploaded Python 3

File details

Details for the file databeacon-0.1.0.tar.gz.

File metadata

  • Download URL: databeacon-0.1.0.tar.gz
  • Upload date:
  • Size: 22.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for databeacon-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b975a8c98ae7a1a32cf335c9861e9915e82cdfcdff9117ba88fd7bda8bd78e0c
MD5 c25bc9b529a11821085f4365bb113b09
BLAKE2b-256 666abbe42cd403d6564429d03d7f0f8903d1090e2624e115d73d1d60a3f3643d

See more details on using hashes here.

File details

Details for the file databeacon-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: databeacon-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 22.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for databeacon-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9202ba48872b1b196b4f7f45164b82664d7405d3ec3b2bb96f3b6b0f5d5517f4
MD5 719a9885397e8a520203c686e4d7c401
BLAKE2b-256 d990b59f24adde16621e41765d27b031ccd6b261980863bf283104eeff3df5ae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page