Skip to main content

Fast, Polars-native data profiling with interactive HTML reports and data quality alerts.

Project description

dataxid-profiling

PyPI version Python versions License

Fast, Polars-native data profiling with interactive HTML reports and data quality alerts.

Quickstart

import polars as pl
from dataxid_profiling import ProfileReport

df = pl.read_csv("data.csv")
report = ProfileReport(df)
report.to_html("report.html")

Pandas works too:

report = ProfileReport(pd.read_csv("data.csv"))

Report Preview

Dataset overview — row/column counts, missing cells, duplicates, memory usage, and column type distribution at a glance.

Dataset overview and alerts

Column details — per-column statistics, top value distribution, and word clouds for categorical data.

Column details with charts and word cloud

Correlations — interactive heatmap showing relationships between numeric columns.

Correlation heatmap

Highlights

  • Built on Polars — two runtime dependencies
  • 3 lines to profile any dataset
  • Programmatic-first: .to_dict(), .stats, .alerts
  • Interactive HTML reports with ECharts
  • Accepts Polars, Pandas, CSV, and Parquet
  • 5 column types: numeric, categorical, boolean, datetime, text
  • 7 data quality alerts out of the box
  • Pearson correlation heatmap
  • Two modes: "complete" for deep analysis, "overview" for speed
  • Fully typed

Installation

pip install dataxid-profiling

Usage

Programmatic access

report = ProfileReport(df, title="Customer Data Profile")

stats = report.to_dict()
alerts = report.alerts
column_stats = report.stats["age"]
correlations = report.correlations

JSON export

report.to_json("report.json")

Configuration

from dataxid_profiling import ProfileReport, ProfileConfig

config = ProfileConfig(
    title="Customer Data Profile",
    mode="overview",
    missing_threshold=0.1,
    histogram_bins=30,
)
report = ProfileReport(df, config=config)

Modes

Feature "complete" "overview"
Basic stats
Histograms & value counts
Correlations
Character analysis
Duplicate rows table

Output formats

Format Method Use case
HTML report.to_html("report.html") Interactive report
JSON report.to_json("report.json") Machine-readable
Dict report.to_dict() Python-native

Contributing

Contributions are welcome. See CONTRIBUTING.md for details.

Links

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataxid_profiling-0.2.0.tar.gz (746.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataxid_profiling-0.2.0-py3-none-any.whl (98.6 kB view details)

Uploaded Python 3

File details

Details for the file dataxid_profiling-0.2.0.tar.gz.

File metadata

  • Download URL: dataxid_profiling-0.2.0.tar.gz
  • Upload date:
  • Size: 746.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dataxid_profiling-0.2.0.tar.gz
Algorithm Hash digest
SHA256 75b8354ef12723ccb0f653b67d5da5db303b84828bae8cabd4952c8303f64ffa
MD5 51f32044e8d6ddd502cd496a13190d20
BLAKE2b-256 8d27c049a646e0a0207033b462945e72bd77480318f096d653ed543b32800ed6

See more details on using hashes here.

File details

Details for the file dataxid_profiling-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: dataxid_profiling-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 98.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dataxid_profiling-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7dc383443c7db68271c16db8274e37d887fe886cf646b72681cc0d2544647581
MD5 891f1ccc467bed22dc3559c5b3f46f36
BLAKE2b-256 2952ecdfa86a051e126382ca0f316865a1d2519f94bd14bd5ce6cc7617530e22

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page