Skip to main content

One-line automated EDA for pandas DataFrames: column analysis, cleaning, visualization, and HTML reports.

Project description

basicdatainfo

One-line automated EDA for pandas DataFrames.

Send it a DataFrame and it will:

  • Detect each column's type (numeric, categorical, datetime, boolean, text)
  • Compute summary statistics for every column
  • Optionally clean the data (clean=True): drop duplicate rows, strip whitespace, auto-parse date columns, fill missing values, optionally cap outliers
  • Automatically engineer date features (feature_engineer=True, default): if any column is (or becomes) a datetime column, it automatically derives year, quarter, month, month_name, week_of_year, day_of_month, day_of_week, day_name, is_weekend, is_month_start/end, is_quarter_start/end, and hour (if a time component is present) — and analyzes/visualizes all of these new columns too
  • Automatically generate visualizations: histograms + boxplots for numeric columns, an extra "count by value" bar chart for low-cardinality numeric columns (e.g. the engineered month/day-of-week/quarter features), a "records over time" trend chart for datetime columns, bar charts for categorical columns, a missing-values chart, and a correlation heatmap
  • Either print a summary + show the plots (default), or generate a single self-contained HTML report (html=True)

Install

pip install -e .

(or build a wheel with python -m build and pip install dist/*.whl)

Usage

import pandas as pd
import basicdatainfo

df = pd.read_csv("data.csv")

# 1. Just analyze + show plots inline (e.g. in Jupyter)
basicdatainfo.analyze(df)

# 2. Auto-clean the data first, then analyze
result = basicdatainfo.analyze(df, clean=True)
clean_df = result["df"]

# 3. Generate a standalone HTML report instead of inline output
basicdatainfo.analyze(df, clean=True, html=True, output_path="report.html")

analyze() parameters

Parameter Default Description
clean False Auto-clean the dataframe before analysis
feature_engineer True Auto-derive calendar features from any datetime column
drop_date_columns False Drop the original datetime column after extracting its features
html False Save an HTML report instead of printing/showing plots
output_path "eda_report.html" Where to save the HTML report
numeric_strategy "median" "median", "mean", or "zero" — fill for missing numeric data
categorical_strategy "mode" "mode" or "unknown" — fill for missing categorical data
outlier_method None Set to "iqr" to cap numeric outliers using the 1.5×IQR rule
max_categories 10 Max categories shown in bar charts / top-value tables
show True Call plt.show() when html=False
verbose True Print the text summary when html=False

Return value

analyze() returns a dict:

{
  "df": <cleaned-or-original dataframe, plus engineered date columns>,
  "analysis": {"overview": {...}, "column_types": {...}, "columns": {...}},
  "cleaning_report": {"actions": [...]} or None,
  "feature_engineering_report": {"actions": [...], "new_columns": [...]} or None,
  "report_path": "report.html",      # only if html=True
  "figures": {...},                   # only if html=False
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

basicdatainfo-0.1.0.tar.gz (14.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

basicdatainfo-0.1.0-py3-none-any.whl (15.6 kB view details)

Uploaded Python 3

File details

Details for the file basicdatainfo-0.1.0.tar.gz.

File metadata

  • Download URL: basicdatainfo-0.1.0.tar.gz
  • Upload date:
  • Size: 14.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for basicdatainfo-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7cd572f38db7b71a1756fbfa2e7071b7b1b8f9039679e06c0335215eeff237af
MD5 f9da3f660dacd62f7e1ad3e5cabcec11
BLAKE2b-256 68406c2b6e50ff8cc0b4aee3f2e08fac7598ac72356f2f425ab5e64f59e79a6a

See more details on using hashes here.

File details

Details for the file basicdatainfo-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: basicdatainfo-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for basicdatainfo-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 01065e4f773438a12f4efb008a373fb89d6478139eba73f8a57db49c80c68afd
MD5 d7761f6bd3738af438132e280d7c3fc4
BLAKE2b-256 c78eb985a143ec9a0ca266214ae3d1c089af235be000ded91c8284849b03d21d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page