Automated, local exploratory data analysis: stats, charts, correlations, outliers, a chat assistant, and self-contained HTML reports.
Project description
eda-k
Automated, local exploratory data analysis — as a Python library you can
import, with an optional Streamlit UI on top.
Runs 100% locally. Your data never leaves your machine, no API key needed.
Install
pip install -e .
Want everything in one shot (library + Streamlit app + OLS trendlines)?
pip install -e ".[app,trend]"
Or pick extras individually:
Need the bundled Streamlit app too?
pip install -e ".[app]"
Need OLS trendlines on scatter plots (charts.pairwise_scatter_with_trendline)?
pip install -e ".[trend]"
Use it as a library
import eda_k
result = eda_k.analyze("data.csv") # path, file-like, or DataFrame all work
print(result) # <EDAResult 'data.csv' rows=150 cols=5 ...>
print(result.summary()) # quick text overview
print(result.ask("which columns have missing values?"))
result.to_html("report.html") # self-contained HTML report (charts inline)
result.to_csv_zip("tables.zip") # every summary table as CSVs in one ZIP
result.df is the loaded pandas.DataFrame, and result.results is the raw
dict of every table (overview, missing_summary, numeric_summary,
outliers, categorical_summary, correlation, top_correlations,
dtype_table) if you want to work with the data directly.
Lower-level access
The original modules are available as submodules, unchanged, for full control:
from eda_k import eda_engine, charts, chat_assistant, report_builder
df = eda_engine.load_file(open("data.csv", "rb"), "data.csv")
results = eda_engine.run_full_eda(df)
fig = charts.histogram(df, "some_column")
Use the Streamlit UI
pip install -e ".[app]"
streamlit run apps/streamlit_app.py
Opens a browser tab with upload, tabs (Overview / Missing / Numeric / Outliers
/ Categorical / Correlation / Chat / Download), and one-click export of the
HTML report or a ZIP of CSVs — same as before, just now built on top of the
installed eda_k package instead of loose scripts.
Project layout
eda-k/
├── pyproject.toml
├── README.md
├── requirements.txt # convenience: pip install -r requirements.txt == pip install -e ".[app]"
├── src/
│ └── eda_k/
│ ├── __init__.py # public API: analyze(), EDAResult
│ ├── eda_engine.py # core analysis (pandas/numpy/scipy, no UI)
│ ├── charts.py # Plotly chart builders
│ ├── chat_assistant.py # local rule-based Q&A
│ └── report_builder.py # self-contained HTML report builder
└── apps/
└── streamlit_app.py # optional UI, imports from the installed package
Supported file types
CSV, TSV, TXT (auto-delimiter-detect), XLSX, XLS, JSON, Parquet.
Notes / known limits
- Very large files (millions of rows) will be slower to chart; consider sampling first if you hit performance issues.
- The "likely datetime column" detector is a heuristic on a small sample — always double check it against the Overview before trusting it blindly.
- Normality test (Shapiro-Wilk) auto-samples to 5,000 rows for large columns for speed.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file eda_k-0.1.0.tar.gz.
File metadata
- Download URL: eda_k-0.1.0.tar.gz
- Upload date:
- Size: 21.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bbefabc1074ebc02bd44aa64aa6f6e196fca7e64ccc1d9f6cf1061573fe14d80
|
|
| MD5 |
248bf5a00fad58e3497bc88317ff5ea6
|
|
| BLAKE2b-256 |
76d3c3f2f74eb8f27bf435fd2e6af781b20ce4f930ec49856b32e3077d75d667
|
File details
Details for the file eda_k-0.1.0-py3-none-any.whl.
File metadata
- Download URL: eda_k-0.1.0-py3-none-any.whl
- Upload date:
- Size: 21.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
98b6c393823e03bf456fd67b9f776f6833de3993d6526350745e822bc2058c5a
|
|
| MD5 |
6b534d3fbf2b635cba0d2e10dba7a054
|
|
| BLAKE2b-256 |
4477b6d013498fb26ac79d351e497c6cafb9de1e7e42972963e357acab30e3ff
|