Skip to main content

A lightweight EDA tool inspired by the curious nature of suricates. Built just for fun 🔬.

Project description

pysuricata

Build Status PyPI version versions License: MIT codecov

pysuricata Logo

A lightweight Python library to generate self-contained HTML reports for exploratory data analysis (EDA).

📖 Read the documentation

Installation

Install pysuricata directly from PyPI:

pip install pysuricata

Why use pysuricata?

  • Instant reports: Generate clean, self-contained HTML reports directly from pandas DataFrames.
  • Out-of-core option (v2): Consume in-memory DataFrame chunks and profile datasets larger than RAM.
  • No heavy deps: Minimal runtime dependencies (pandas/pyarrow optional depending on source).
  • Rich insights: Summaries for numeric, categorical, datetime columns, missing values, duplicates, correlations, and sample rows.
  • Portable: Reports are standalone HTML (with inline CSS/JS/images) that can be easily shared.
  • Customizable: Title, sample display, and output path can be tailored to your needs.

Quick Example (classic, in-memory DataFrame)

The following example demonstrates how to generate an EDA report using the Iris dataset with Pandas:

import pandas as pd
from pysuricata import profile

# Load the Iris dataset directly using Pandas
iris_url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv"
iris_df = pd.read_csv(iris_url)

# Build the report and save to a file
rep = profile(iris_df)
rep.save_html("iris_report.html")

Streaming report (low memory)

For large datasets, stream in-memory DataFrame chunks you control.

from pysuricata import profile, ReportConfig
import pandas as pd

def chunk_iter():
    for i in range(10):
        yield pd.read_csv(f"part-{i}.csv")  # You manage chunking externally

rep = profile((ch for ch in chunk_iter()), config=ReportConfig())
rep.save_html("report.html")

# Optional: stats-only
from pysuricata import summarize
stats = summarize(iris_df)

Highlights:

  • Streams data in chunks, low peak memory.
  • Shows processed bytes (≈) and precise generation time (e.g., 0.02s).
  • Approximate distinct (KMV), heavy hitters (Misra–Gries), quantiles/histograms via reservoir sampling.
  • Numeric extras: 95% CI for mean, coefficient of variation, heaping %, granularity hints, bimodality.
  • Categorical extras: case/trim variants, empty strings, length stats.
  • Datetime details: per-hour, day-of-week, and month breakdown tables + timeline chart.
  • Correlation chips (streaming) for numeric columns.
  • Hardened HTML escaping for column names and labels.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysuricata-0.0.10.tar.gz (644.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pysuricata-0.0.10-py3-none-any.whl (655.5 kB view details)

Uploaded Python 3

File details

Details for the file pysuricata-0.0.10.tar.gz.

File metadata

  • Download URL: pysuricata-0.0.10.tar.gz
  • Upload date:
  • Size: 644.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.22

File hashes

Hashes for pysuricata-0.0.10.tar.gz
Algorithm Hash digest
SHA256 bbc97e081d0c74313c2914e140059caf8467839d83d0e9bac2182eb878c2e8c2
MD5 41298ea1096e2cc918c095366c7a5c1b
BLAKE2b-256 70eae794edbb2d55ac1b608811e5d6905056c795bb3a19d65571729679c1316d

See more details on using hashes here.

File details

Details for the file pysuricata-0.0.10-py3-none-any.whl.

File metadata

File hashes

Hashes for pysuricata-0.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 e0f3e40ff9b3f939484cfd90d1da104cca57458182ce2c55af9d046c370ecaf6
MD5 cbcb9c989213abf97797a45cef1b260a
BLAKE2b-256 e65b1f9cb7b0b137da2b5cefd62f56b785e6e20f335bd8534640430a9c75dc0d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page