Skip to main content

Fast, Polars-native data profiling with interactive HTML reports and data quality alerts.

Project description

dataxid-profiling

PyPI version Python versions License

Fast, Polars-native data profiling with interactive HTML reports and data quality alerts.

Quickstart

import polars as pl
from dataxid_profiling import ProfileReport

df = pl.read_csv("data.csv")
report = ProfileReport(df)
report.to_html("report.html")

Pandas works too:

report = ProfileReport(pd.read_csv("data.csv"))

Report Preview

Dataset overview — row/column counts, missing cells, duplicates, memory usage, and column type distribution at a glance.

Dataset overview and alerts

Column details — per-column statistics, top value distribution, and word clouds for categorical data.

Column details with charts and word cloud

Correlations — interactive heatmap showing relationships between numeric columns.

Correlation heatmap

Interactions — scatter plots for numeric pairs and box plots for categorical × numeric pairs, with dynamic column selection.

Highlights

  • Built on Polars — fast, memory-efficient, Rust-powered
  • 3 lines to profile any dataset
  • Programmatic-first: .to_dict(), .stats, .alerts
  • Interactive HTML reports with ECharts
  • Accepts Polars, Pandas, CSV, and Parquet
  • 5 column types: numeric, categorical, boolean, datetime, text
  • 7 data quality alerts out of the box
  • 5 correlation types: Pearson, Spearman, Kendall, Cramér's V, Phi K
  • Interactions: scatter plot + box plot with dynamic column selection
  • Two modes: "complete" for deep analysis, "overview" for speed
  • Fully typed

Installation

pip install dataxid-profiling

Usage

Programmatic access

report = ProfileReport(df, title="Customer Data Profile")

stats = report.to_dict()
alerts = report.alerts
column_stats = report.stats["age"]
correlations = report.correlations

JSON export

report.to_json("report.json")

Configuration

from dataxid_profiling import ProfileReport, ProfileConfig

config = ProfileConfig(
    title="Customer Data Profile",
    mode="overview",
    missing_threshold=0.1,
    histogram_bins=30,
)
report = ProfileReport(df, config=config)

Modes

Feature "complete" "overview"
Basic stats
Histograms & value counts
Correlations
Interactions
Character analysis
Duplicate rows table

Output formats

Format Method Use case
HTML report.to_html("report.html") Interactive report
JSON report.to_json("report.json") Machine-readable
Dict report.to_dict() Python-native

Contributing

Contributions are welcome. See CONTRIBUTING.md for details.

Links

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataxid_profiling-0.3.0.tar.gz (753.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataxid_profiling-0.3.0-py3-none-any.whl (104.3 kB view details)

Uploaded Python 3

File details

Details for the file dataxid_profiling-0.3.0.tar.gz.

File metadata

  • Download URL: dataxid_profiling-0.3.0.tar.gz
  • Upload date:
  • Size: 753.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dataxid_profiling-0.3.0.tar.gz
Algorithm Hash digest
SHA256 042ed10748522c8fbca23f2a82904309d741d5d3ac3ac96cc6d42c77fd8c0b0f
MD5 0deec2c5c68c0e83d7fbb5e35f996962
BLAKE2b-256 f551272867e4aca0e87505e89d75a89abf8e6a3c6d1417df1484ddba686c85a6

See more details on using hashes here.

File details

Details for the file dataxid_profiling-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: dataxid_profiling-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 104.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dataxid_profiling-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8c1d7c8c7f8acd78da30c1842a51be495c1b8a21a261a4597855bcdcc778861b
MD5 4478813e69245ee7ad1792682e6c59fc
BLAKE2b-256 f45ddf5b44ed74fed4bf459b155c92a96e3ca004557d62f94e04462de8a11226

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page