Skip to main content

A lightweight EDA tool inspired by the curious nature of suricates. Built just for fun 🔬.

Project description

pysuricata

Build Status PyPI version versions License: MIT

pysuricata Logo

A lightweight Python library to generate self-contained HTML reports for exploratory data analysis (EDA).

📖 Read the documentation

Installation

Install pysuricata directly from PyPI:

pip install pysuricata

Why use pysuricata?

  • Instant reports: Generate clean, self-contained HTML reports directly from pandas DataFrames.
  • Out-of-core option (v2): Consume in-memory DataFrame chunks and profile datasets larger than RAM.
  • No heavy deps: Minimal runtime dependencies (pandas/pyarrow optional depending on source).
  • Rich insights: Summaries for numeric, categorical, datetime columns, missing values, duplicates, correlations, and sample rows.
  • Portable: Reports are standalone HTML (with inline CSS/JS/images) that can be easily shared.
  • Customizable: Title, sample display, and output path can be tailored to your needs.

Quick Example (classic, in-memory DataFrame)

The following example demonstrates how to generate an EDA report using the Iris dataset with Pandas:

import pandas as pd
from pysuricata import profile

# Load the Iris dataset directly using Pandas
iris_url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv"
iris_df = pd.read_csv(iris_url)

# Build the report and save to a file
rep = profile(iris_df)
rep.save_html("iris_report.html")

Streaming report (low memory)

For large datasets, stream in-memory DataFrame chunks you control.

from pysuricata import profile, ReportConfig
import pandas as pd

def chunk_iter():
    for i in range(10):
        yield pd.read_csv(f"part-{i}.csv")  # You manage chunking externally

rep = profile((ch for ch in chunk_iter()), config=ReportConfig())
rep.save_html("report.html")

# Optional: stats-only
from pysuricata import summarize
stats = summarize(iris_df)

Highlights:

  • Streams data in chunks, low peak memory.
  • Shows processed bytes (≈) and precise generation time (e.g., 0.02s).
  • Approximate distinct (KMV), heavy hitters (Misra–Gries), quantiles/histograms via reservoir sampling.
  • Numeric extras: 95% CI for mean, coefficient of variation, heaping %, granularity hints, bimodality.
  • Categorical extras: case/trim variants, empty strings, length stats.
  • Datetime details: per-hour, day-of-week, and month breakdown tables + timeline chart.
  • Correlation chips (streaming) for numeric columns.
  • Hardened HTML escaping for column names and labels.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysuricata-0.0.9.tar.gz (645.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pysuricata-0.0.9-py3-none-any.whl (655.6 kB view details)

Uploaded Python 3

File details

Details for the file pysuricata-0.0.9.tar.gz.

File metadata

  • Download URL: pysuricata-0.0.9.tar.gz
  • Upload date:
  • Size: 645.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.17

File hashes

Hashes for pysuricata-0.0.9.tar.gz
Algorithm Hash digest
SHA256 4d40757a633d039a1623556f1d9cabe1d9d0414704fa52172b614659ad6e8909
MD5 abf1aab9802a78fc86c2c635c895732b
BLAKE2b-256 524c4f74759d3cddb164c80fa7e0b6de3604057d7669c12639c1b388274735c7

See more details on using hashes here.

File details

Details for the file pysuricata-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: pysuricata-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 655.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.17

File hashes

Hashes for pysuricata-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 ecdc11b597118a1fa9330116af0f858763aaf7e5bc07a5fc322cc8cefd16063c
MD5 99992632aa09805e9084554e216574d5
BLAKE2b-256 a2faf919b515f867f979ed5a46dc18cdd31f039feca009344c494d26ad4f3453

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page