Skip to main content

Modern, fast data profiling in Rust with Python bindings

Project description

meti_profil

A modern, Rust-powered data profiling library with Python bindings. It reads CSV, Parquet, and Excel files (or pandas / polars DataFrames) and generates a hybrid Markdown report that is readable by humans and structured for consumption by code agents.

Installation

pip install meti_profil

Quick start

import meti_profil as mp

# From a file
report = mp.ProfileReport("data.csv", title="My dataset")

# Interactive HTML report (self-contained, works offline)
report.to_html("profile.html")

# Markdown report (great for diffs and code agents)
report.to_file("profile.md")

# From a pandas DataFrame
import pandas as pd
df = pd.read_csv("data.csv")
report = mp.ProfileReport(df)

# Programmatic access
print(report.get_summary())          # dataset-level metrics
print(report.get_column_info("age")) # per-column schema info
markdown = report.to_markdown()
html = report.to_html()              # returns the HTML as a string

In a notebook

In Jupyter / VSCode, just display the report — it renders inline as an interactive dashboard (sandboxed, no external resources):

report = mp.ProfileReport(df)
report  # interactive histograms, bar charts, correlation heatmap, ...

ProfileReport parameters

Parameter Type Default Description
source str, Path, pandas/polars DataFrame required Data source.
title str "Dataset Profile" Report title (written to the frontmatter).
minimal bool False Reserved: reduce heavy analyses.
explorative bool True Reserved: enable advanced analyses.

Report format

The Markdown report starts with a YAML frontmatter block (rows, columns, missing cells, duplicates, version) followed by normalized ## sections: Overview, Schema, Numeric Columns, Categorical Columns, Missing Values, Duplicate Rows, and Correlations.

Features

  • Fast Rust engine backed by Apache Arrow.
  • Reads CSV, Parquet (snappy/zstd/lz4/brotli/gzip), and Excel files.
  • Accepts pandas and polars DataFrames.
  • Schema/type detection, descriptive numeric statistics, categorical frequencies, missing-value and duplicate-row analysis, and Pearson correlations.
  • Interactive HTML report: a single self-contained file (embedded CSS/JS, no CDN) with histograms, categorical bar charts, a missing-value overview and a correlation heatmap — all with hover tooltips.
  • Native notebook rendering in Jupyter / VSCode via _repr_html_.
  • Clean Markdown reports optimized for both humans and code agents.

Output formats

Method Output
to_html(path) Write a self-contained interactive HTML file.
to_html() Return the HTML document as a string.
to_file(path) Write the Markdown report.
to_markdown() Return the Markdown report as a string.
get_summary() Dataset-level metrics as a dict.
get_column_info(name) Per-column schema info as a dict.
display in a notebook Inline interactive dashboard (_repr_html_).

Development

Requires a Rust toolchain (1.78+) and Python 3.10+.

python3 -m venv .venv
source .venv/bin/activate
pip install maturin pytest pandas polars pyarrow

# Build the extension in-place
maturin develop

# Run the test suites
cargo test --workspace
pytest tests/python -v

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

meti_profil-0.1.0.tar.gz (43.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

meti_profil-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl (4.5 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

File details

Details for the file meti_profil-0.1.0.tar.gz.

File metadata

  • Download URL: meti_profil-0.1.0.tar.gz
  • Upload date:
  • Size: 43.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.14.1

File hashes

Hashes for meti_profil-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ece16c10737d17fe97fd8470175bc8a234bae4bb45aed32153e65b10659b3554
MD5 7b6ac79f66ccfb45002235cfce83966c
BLAKE2b-256 c10b9fa1185810ee1b7497bfd4a87fa8d32def5f230608e72e49be628f49689d

See more details on using hashes here.

File details

Details for the file meti_profil-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for meti_profil-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 c3fb78070451153ec5cc507691e0df2268b97e73295220f261cf05157c41a152
MD5 42cad671e78b9d076113c318584ddfb3
BLAKE2b-256 fbcfacd83ad33bfb388cdf356aa21bef319b7204eb384ead796fdcc8286f65e5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page