A lightweight EDA tool inspired by the curious nature of suricates. Built just for fun 🔬.
Project description
pysuricata
A lightweight Python library to generate self-contained HTML reports for exploratory data analysis (EDA).
Installation
Install pysuricata directly from PyPI:
pip install pysuricata
Why use pysuricata?
- Instant reports: Generate clean, self-contained HTML reports directly from pandas DataFrames.
- Out-of-core option (v2): Stream CSV/Parquet in chunks and profile datasets larger than RAM.
- No heavy deps: Minimal runtime dependencies (pandas/pyarrow optional depending on source).
- Rich insights: Summaries for numeric, categorical, datetime columns, missing values, duplicates, correlations, and sample rows.
- Portable: Reports are standalone HTML (with inline CSS/JS/images) that can be easily shared.
- Customizable: Title, sample display, and output path can be tailored to your needs.
Quick Example (classic, in-memory DataFrame)
The following example demonstrates how to generate an EDA report using the Iris dataset with Pandas:
import pandas as pd
import pysuricata
from IPython.display import HTML
# Load the Iris dataset directly using Pandas
iris_url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv"
iris_df = pd.read_csv(iris_url)
# Generate the HTML EDA report and save it to a file
html_report = pysuricata.generate_report(iris_df, output_file="iris_report.html")
# Display the report in a Jupyter Notebook
HTML(html_report)
Out-of-core streaming report (v2)
For large CSV/Parquet files, use the streaming generator in report_v2.
from pysuricata.report_v2 import generate_report, ReportConfig
# From file path (CSV/Parquet)
html = generate_report(
source="/path/to/big.parquet", # or .csv
config=ReportConfig(chunk_size=250_000, compute_correlations=True),
output_file="report_big.html",
)
# Or from a DataFrame (single chunk)
import pandas as pd
df = pd.read_csv("data.csv")
html = generate_report(df)
# Optional: get a programmatic JSON-like summary too
html, summary = generate_report("/path/to/big.csv", return_summary=True)
Highlights in v2:
- Streams data in chunks, low peak memory.
- Shows processed bytes (≈) and precise generation time (e.g., 0.02s).
- Approximate distinct (KMV), heavy hitters (Misra–Gries), quantiles/histograms via reservoir sampling.
- Numeric extras: 95% CI for mean, coefficient of variation, heaping %, granularity hints, bimodality.
- Categorical extras: case/trim variants, empty strings, length stats.
- Datetime details: per-hour, day-of-week, and month breakdown tables + timeline chart.
- Correlation chips (streaming) for numeric columns.
- Hardened HTML escaping for column names and labels.
What’s New
- Out-of-core
report_v2with CSV/Parquet chunking (pandas/pyarrow backends). - Processed bytes displayed in Summary and per-variable cards.
- Precise duration in header (e.g., “0.02s”).
- Removed “Likely ID” flag to reduce false positives.
- Datetime Details section with human-readable breakdown tables.
- Numeric extremes now show row IDs (tracked across chunks).
- Optional
(html, summary)return for programmatic consumption.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pysuricata-0.0.7.tar.gz.
File metadata
- Download URL: pysuricata-0.0.7.tar.gz
- Upload date:
- Size: 524.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b6c04da85e5b4bdc1cab7c83f3829fa3dcd848423394ce803279d8f315c29d80
|
|
| MD5 |
a31d5d04f22258934efcb194c3c18d67
|
|
| BLAKE2b-256 |
9c1903dc2016543b8d374abfedab29141d01a65dd947bf9ecfe009c808defc9d
|
File details
Details for the file pysuricata-0.0.7-py3-none-any.whl.
File metadata
- Download URL: pysuricata-0.0.7-py3-none-any.whl
- Upload date:
- Size: 522.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5e106945160e48c7081f50e8d366bb522bb50c0e06ca527c5003e414a2da435f
|
|
| MD5 |
5bae264938fa7e713ff2cf64595f95f0
|
|
| BLAKE2b-256 |
fecd51eda24fc26915c6b7c9465011b9cb9be0c81bd3012b8ba1127291e87fc8
|