Skip to main content

Automated, local exploratory data analysis: stats, charts, correlations, outliers, a chat assistant, and self-contained HTML reports.

Project description

eda-k

Automated, local exploratory data analysis — as a Python library you can import, with an optional Streamlit UI on top.

Runs 100% locally. Your data never leaves your machine, no API key needed.


Install

pip install -e .

Want everything in one shot (library + Streamlit app + OLS trendlines)?

pip install -e ".[app,trend]"

Or pick extras individually:

Need the bundled Streamlit app too?

pip install -e ".[app]"

Need OLS trendlines on scatter plots (charts.pairwise_scatter_with_trendline)?

pip install -e ".[trend]"

Use it as a library

import eda_k

result = eda_k.analyze("data.csv")   # path, file-like, or DataFrame all work

print(result)                  # <EDAResult 'data.csv' rows=150 cols=5 ...>
print(result.summary())        # quick text overview
print(result.ask("which columns have missing values?"))

result.to_html("report.html")     # self-contained HTML report (charts inline)
result.to_csv_zip("tables.zip")   # every summary table as CSVs in one ZIP

result.df is the loaded pandas.DataFrame, and result.results is the raw dict of every table (overview, missing_summary, numeric_summary, outliers, categorical_summary, correlation, top_correlations, dtype_table) if you want to work with the data directly.

Lower-level access

The original modules are available as submodules, unchanged, for full control:

from eda_k import eda_engine, charts, chat_assistant, report_builder

df = eda_engine.load_file(open("data.csv", "rb"), "data.csv")
results = eda_engine.run_full_eda(df)
fig = charts.histogram(df, "some_column")

Use the Streamlit UI

pip install -e ".[app]"
streamlit run apps/streamlit_app.py

Opens a browser tab with upload, tabs (Overview / Missing / Numeric / Outliers / Categorical / Correlation / Chat / Download), and one-click export of the HTML report or a ZIP of CSVs — same as before, just now built on top of the installed eda_k package instead of loose scripts.


Project layout

eda-k/
├── pyproject.toml
├── README.md
├── requirements.txt          # convenience: pip install -r requirements.txt == pip install -e ".[app]"
├── src/
│   └── eda_k/
│       ├── __init__.py        # public API: analyze(), EDAResult
│       ├── eda_engine.py       # core analysis (pandas/numpy/scipy, no UI)
│       ├── charts.py           # Plotly chart builders
│       ├── chat_assistant.py   # local rule-based Q&A
│       └── report_builder.py   # self-contained HTML report builder
└── apps/
    └── streamlit_app.py        # optional UI, imports from the installed package

Supported file types

CSV, TSV, TXT (auto-delimiter-detect), XLSX, XLS, JSON, Parquet.

Notes / known limits

  • Very large files (millions of rows) will be slower to chart; consider sampling first if you hit performance issues.
  • The "likely datetime column" detector is a heuristic on a small sample — always double check it against the Overview before trusting it blindly.
  • Normality test (Shapiro-Wilk) auto-samples to 5,000 rows for large columns for speed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eda_k-0.1.0.tar.gz (21.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eda_k-0.1.0-py3-none-any.whl (21.2 kB view details)

Uploaded Python 3

File details

Details for the file eda_k-0.1.0.tar.gz.

File metadata

  • Download URL: eda_k-0.1.0.tar.gz
  • Upload date:
  • Size: 21.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for eda_k-0.1.0.tar.gz
Algorithm Hash digest
SHA256 bbefabc1074ebc02bd44aa64aa6f6e196fca7e64ccc1d9f6cf1061573fe14d80
MD5 248bf5a00fad58e3497bc88317ff5ea6
BLAKE2b-256 76d3c3f2f74eb8f27bf435fd2e6af781b20ce4f930ec49856b32e3077d75d667

See more details on using hashes here.

File details

Details for the file eda_k-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: eda_k-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 21.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for eda_k-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 98b6c393823e03bf456fd67b9f776f6833de3993d6526350745e822bc2058c5a
MD5 6b534d3fbf2b635cba0d2e10dba7a054
BLAKE2b-256 4477b6d013498fb26ac79d351e497c6cafb9de1e7e42972963e357acab30e3ff

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page