Skip to main content

Fast, Polars-native data profiling with interactive HTML reports and data quality alerts.

Project description

dataxid-profiling

PyPI version Python versions License

Fast, Polars-native data profiling with interactive HTML reports and data quality alerts.

Quickstart

import polars as pl
from dataxid_profiling import ProfileReport

df = pl.read_csv("data.csv")
report = ProfileReport(df)
report.to_html("report.html")

Pandas works too:

report = ProfileReport(pd.read_csv("data.csv"))

Report Preview

Dataset overview — row/column counts, missing cells, duplicates, memory usage, and column type distribution at a glance.

Dataset overview and alerts

Column details — per-column statistics, top value distribution, and word clouds for categorical data.

Column details with charts and word cloud

Correlations — interactive heatmap showing relationships between numeric columns.

Correlation heatmap

Highlights

  • Built on Polars — two runtime dependencies
  • 3 lines to profile any dataset
  • Programmatic-first: .to_dict(), .stats, .alerts
  • Interactive HTML reports with ECharts
  • Accepts Polars, Pandas, CSV, and Parquet
  • 5 column types: numeric, categorical, boolean, datetime, text
  • 7 data quality alerts out of the box
  • Pearson correlation heatmap
  • Two modes: "complete" for deep analysis, "overview" for speed
  • Fully typed

Installation

pip install dataxid-profiling

Usage

Programmatic access

report = ProfileReport(df, title="Customer Data Profile")

stats = report.to_dict()
alerts = report.alerts
column_stats = report.stats["age"]
correlations = report.correlations

JSON export

report.to_json("report.json")

Configuration

from dataxid_profiling import ProfileReport, ProfileConfig

config = ProfileConfig(
    title="Customer Data Profile",
    mode="overview",
    missing_threshold=0.1,
    histogram_bins=30,
)
report = ProfileReport(df, config=config)

Modes

Feature "complete" "overview"
Basic stats
Histograms & value counts
Correlations
Character analysis
Duplicate rows table

Output formats

Format Method Use case
HTML report.to_html("report.html") Interactive report
JSON report.to_json("report.json") Machine-readable
Dict report.to_dict() Python-native

Contributing

Contributions are welcome. See CONTRIBUTING.md for details.

Links

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataxid_profiling-0.1.0.tar.gz (657.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataxid_profiling-0.1.0-py3-none-any.whl (94.9 kB view details)

Uploaded Python 3

File details

Details for the file dataxid_profiling-0.1.0.tar.gz.

File metadata

  • Download URL: dataxid_profiling-0.1.0.tar.gz
  • Upload date:
  • Size: 657.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dataxid_profiling-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d4ede2052c029763d5f0288eac257668a4ff80a6b3b130dcaaa388fc45ac4072
MD5 ac7c5d6390d8336082bc9d5309e21ccf
BLAKE2b-256 be23f4caa35d9c87b85f8328ad1137f73befb9ef190c5aa1e0b98da7e0aad7ef

See more details on using hashes here.

File details

Details for the file dataxid_profiling-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: dataxid_profiling-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 94.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.22 {"installer":{"name":"uv","version":"0.9.22","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dataxid_profiling-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3518365fe1d0f671bde3a522acea0dafd84edf43d56f250f26fa4f5f4e219389
MD5 c7262c3a7f86b8f0ad58e4e29395a1d2
BLAKE2b-256 5c64e877b9670bb01c6e3a135fded5f49a302b23e14e65941bac742b66fef762

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page