Skip to main content

A lightweight EDA tool inspired by the curious nature of suricates. Built just for fun 🔬.

Project description

pysuricata

Build Status PyPI version versions License: MIT

pysuricata Logo

A lightweight Python library to generate self-contained HTML reports for exploratory data analysis (EDA).

📖 Read the documentation

Installation

Install pysuricata directly from PyPI:

pip install pysuricata

Why use pysuricata?

  • Instant reports: Generate clean, self-contained HTML reports directly from pandas DataFrames.
  • Out-of-core option (v2): Stream CSV/Parquet in chunks and profile datasets larger than RAM.
  • No heavy deps: Minimal runtime dependencies (pandas/pyarrow optional depending on source).
  • Rich insights: Summaries for numeric, categorical, datetime columns, missing values, duplicates, correlations, and sample rows.
  • Portable: Reports are standalone HTML (with inline CSS/JS/images) that can be easily shared.
  • Customizable: Title, sample display, and output path can be tailored to your needs.

Quick Example (classic, in-memory DataFrame)

The following example demonstrates how to generate an EDA report using the Iris dataset with Pandas:

import pandas as pd
import pysuricata
from IPython.display import HTML

# Load the Iris dataset directly using Pandas
iris_url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv"
iris_df = pd.read_csv(iris_url)

# Generate the HTML EDA report and save it to a file
html_report = pysuricata.generate_report(iris_df, output_file="iris_report.html")

# Display the report in a Jupyter Notebook
HTML(html_report)

Out-of-core streaming report (v2)

For large CSV/Parquet files, use the streaming generator in report_v2.

from pysuricata.report_v2 import generate_report, ReportConfig

# From file path (CSV/Parquet)
html = generate_report(
    
    source="/path/to/big.parquet",  # or .csv
    config=ReportConfig(chunk_size=250_000, compute_correlations=True),
    output_file="report_big.html",
)

# Or from a DataFrame (single chunk)
import pandas as pd
df = pd.read_csv("data.csv")
html = generate_report(df)

# Optional: get a programmatic JSON-like summary too
html, summary = generate_report("/path/to/big.csv", return_summary=True)

Highlights in v2:

  • Streams data in chunks, low peak memory.
  • Shows processed bytes (≈) and precise generation time (e.g., 0.02s).
  • Approximate distinct (KMV), heavy hitters (Misra–Gries), quantiles/histograms via reservoir sampling.
  • Numeric extras: 95% CI for mean, coefficient of variation, heaping %, granularity hints, bimodality.
  • Categorical extras: case/trim variants, empty strings, length stats.
  • Datetime details: per-hour, day-of-week, and month breakdown tables + timeline chart.
  • Correlation chips (streaming) for numeric columns.
  • Hardened HTML escaping for column names and labels.

What’s New

  • Out-of-core report_v2 with CSV/Parquet chunking (pandas/pyarrow backends).
  • Processed bytes displayed in Summary and per-variable cards.
  • Precise duration in header (e.g., “0.02s”).
  • Removed “Likely ID” flag to reduce false positives.
  • Datetime Details section with human-readable breakdown tables.
  • Numeric extremes now show row IDs (tracked across chunks).
  • Optional (html, summary) return for programmatic consumption.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysuricata-0.0.7.tar.gz (524.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pysuricata-0.0.7-py3-none-any.whl (522.8 kB view details)

Uploaded Python 3

File details

Details for the file pysuricata-0.0.7.tar.gz.

File metadata

  • Download URL: pysuricata-0.0.7.tar.gz
  • Upload date:
  • Size: 524.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.15

File hashes

Hashes for pysuricata-0.0.7.tar.gz
Algorithm Hash digest
SHA256 b6c04da85e5b4bdc1cab7c83f3829fa3dcd848423394ce803279d8f315c29d80
MD5 a31d5d04f22258934efcb194c3c18d67
BLAKE2b-256 9c1903dc2016543b8d374abfedab29141d01a65dd947bf9ecfe009c808defc9d

See more details on using hashes here.

File details

Details for the file pysuricata-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: pysuricata-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 522.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.15

File hashes

Hashes for pysuricata-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 5e106945160e48c7081f50e8d366bb522bb50c0e06ca527c5003e414a2da435f
MD5 5bae264938fa7e713ff2cf64595f95f0
BLAKE2b-256 fecd51eda24fc26915c6b7c9465011b9cb9be0c81bd3012b8ba1127291e87fc8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page