Modern, fast data profiling in Rust with Python bindings
Project description
meti_profil
A modern, Rust-powered data profiling library with Python bindings. It reads CSV, Parquet, and Excel files (or pandas / polars DataFrames) and generates a hybrid Markdown report that is readable by humans and structured for consumption by code agents.
Installation
pip install meti_profil
Quick start
import meti_profil as mp
# From a file
report = mp.ProfileReport("data.csv", title="My dataset")
# Interactive HTML report (self-contained, works offline)
report.to_html("profile.html")
# Markdown report (great for diffs and code agents)
report.to_file("profile.md")
# From a pandas DataFrame
import pandas as pd
df = pd.read_csv("data.csv")
report = mp.ProfileReport(df)
# Programmatic access
print(report.get_summary()) # dataset-level metrics
print(report.get_column_info("age")) # per-column schema info
markdown = report.to_markdown()
html = report.to_html() # returns the HTML as a string
In a notebook
In Jupyter / VSCode, just display the report — it renders inline as an interactive dashboard (sandboxed, no external resources):
report = mp.ProfileReport(df)
report # interactive histograms, bar charts, correlation heatmap, ...
ProfileReport parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
source |
str, Path, pandas/polars DataFrame |
required | Data source. |
title |
str |
"Dataset Profile" |
Report title (written to the frontmatter). |
minimal |
bool |
False |
Reserved: reduce heavy analyses. |
explorative |
bool |
True |
Reserved: enable advanced analyses. |
Report format
The Markdown report starts with a YAML frontmatter block (rows, columns,
missing cells, duplicates, version) followed by normalized ## sections:
Overview, Schema, Numeric Columns, Categorical Columns, Missing Values, Duplicate Rows, and Correlations.
Features
- Fast Rust engine backed by Apache Arrow.
- Reads CSV, Parquet (snappy/zstd/lz4/brotli/gzip), and Excel files.
- Accepts pandas and polars DataFrames.
- Schema/type detection, descriptive numeric statistics, categorical frequencies, missing-value and duplicate-row analysis, and Pearson correlations.
- Interactive HTML report: a single self-contained file (embedded CSS/JS, no CDN) with histograms, categorical bar charts, a missing-value overview and a correlation heatmap — all with hover tooltips.
- Native notebook rendering in Jupyter / VSCode via
_repr_html_. - Clean Markdown reports optimized for both humans and code agents.
Output formats
| Method | Output |
|---|---|
to_html(path) |
Write a self-contained interactive HTML file. |
to_html() |
Return the HTML document as a string. |
to_file(path) |
Write the Markdown report. |
to_markdown() |
Return the Markdown report as a string. |
get_summary() |
Dataset-level metrics as a dict. |
get_column_info(name) |
Per-column schema info as a dict. |
| display in a notebook | Inline interactive dashboard (_repr_html_). |
Development
Requires a Rust toolchain (1.78+) and Python 3.10+.
python3 -m venv .venv
source .venv/bin/activate
pip install maturin pytest pandas polars pyarrow
# Build the extension in-place
maturin develop
# Run the test suites
cargo test --workspace
pytest tests/python -v
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file meti_profil-0.1.0.tar.gz.
File metadata
- Download URL: meti_profil-0.1.0.tar.gz
- Upload date:
- Size: 43.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.14.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ece16c10737d17fe97fd8470175bc8a234bae4bb45aed32153e65b10659b3554
|
|
| MD5 |
7b6ac79f66ccfb45002235cfce83966c
|
|
| BLAKE2b-256 |
c10b9fa1185810ee1b7497bfd4a87fa8d32def5f230608e72e49be628f49689d
|
File details
Details for the file meti_profil-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: meti_profil-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 4.5 MB
- Tags: CPython 3.12, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.14.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c3fb78070451153ec5cc507691e0df2268b97e73295220f261cf05157c41a152
|
|
| MD5 |
42cad671e78b9d076113c318584ddfb3
|
|
| BLAKE2b-256 |
fbcfacd83ad33bfb388cdf356aa21bef319b7204eb384ead796fdcc8286f65e5
|