Skip to main content

A lightweight EDA tool inspired by the curious nature of suricates. Built just for fun 🔬.

Project description

pysuricata

Build Status PyPI version versions License: MIT

pysuricata Logo

A lightweight Python library to generate self-contained HTML reports for exploratory data analysis (EDA).

📖 Read the documentation

Installation

Install pysuricata directly from PyPI:

pip install pysuricata

Why use pysuricata?

  • Instant reports: Generate clean, self-contained HTML reports directly from pandas DataFrames.
  • Out-of-core option (v2): Consume in-memory DataFrame chunks and profile datasets larger than RAM.
  • No heavy deps: Minimal runtime dependencies (pandas/pyarrow optional depending on source).
  • Rich insights: Summaries for numeric, categorical, datetime columns, missing values, duplicates, correlations, and sample rows.
  • Portable: Reports are standalone HTML (with inline CSS/JS/images) that can be easily shared.
  • Customizable: Title, sample display, and output path can be tailored to your needs.

Quick Example (classic, in-memory DataFrame)

The following example demonstrates how to generate an EDA report using the Iris dataset with Pandas:

import pandas as pd
from pysuricata import profile

# Load the Iris dataset directly using Pandas
iris_url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv"
iris_df = pd.read_csv(iris_url)

# Build the report and save to a file
rep = profile(iris_df)
rep.save_html("iris_report.html")

Streaming report (low memory)

For large datasets, stream in-memory DataFrame chunks you control.

from pysuricata import profile, ReportConfig
import pandas as pd

def chunk_iter():
    for i in range(10):
        yield pd.read_csv(f"part-{i}.csv")  # You manage chunking externally

rep = profile((ch for ch in chunk_iter()), config=ReportConfig())
rep.save_html("report.html")

# Optional: stats-only
from pysuricata import summarize
stats = summarize(iris_df)

Highlights:

  • Streams data in chunks, low peak memory.
  • Shows processed bytes (≈) and precise generation time (e.g., 0.02s).
  • Approximate distinct (KMV), heavy hitters (Misra–Gries), quantiles/histograms via reservoir sampling.
  • Numeric extras: 95% CI for mean, coefficient of variation, heaping %, granularity hints, bimodality.
  • Categorical extras: case/trim variants, empty strings, length stats.
  • Datetime details: per-hour, day-of-week, and month breakdown tables + timeline chart.
  • Correlation chips (streaming) for numeric columns.
  • Hardened HTML escaping for column names and labels.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysuricata-0.0.8.tar.gz (544.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pysuricata-0.0.8-py3-none-any.whl (549.4 kB view details)

Uploaded Python 3

File details

Details for the file pysuricata-0.0.8.tar.gz.

File metadata

  • Download URL: pysuricata-0.0.8.tar.gz
  • Upload date:
  • Size: 544.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.15

File hashes

Hashes for pysuricata-0.0.8.tar.gz
Algorithm Hash digest
SHA256 0d2c07fe6d82bc140e9dc13333a3bdbf6c7c521fe06e8d1bded1720666f9cc8a
MD5 2d5a36b7d36a8df65d258feda405ef82
BLAKE2b-256 1526947acd962e80463468a9fe1f20d170ef27b2e5cea74602ee4eda58f80889

See more details on using hashes here.

File details

Details for the file pysuricata-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: pysuricata-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 549.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.15

File hashes

Hashes for pysuricata-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 62bb0b340808ee810f5ed71cc7c8b5fa5c613545fa987823cbe9ee9c2fdb97d9
MD5 9ab9721ab6fc487e04e1804f3d8fe744
BLAKE2b-256 91431d7152ccfc65847803d54fedbe2120a3501da6229842d19b122eaf907fb1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page