Python wrapper for the anofox-tabular DuckDB extension — data quality, PII, and validation primitives
Project description
anofox-tabular Python package
Python wrapper for the anofox-tabular DuckDB extension — data quality, PII detection, email/phone validation, anomaly detection, diffing, money, and VAT primitives.
Installation
pip install anofox-tabular
Optional extras for DataFrame support:
pip install "anofox-tabular[pandas]" # adds pandas
pip install "anofox-tabular[polars]" # adds polars
pip install "anofox-tabular[pandas,polars]"
Quick start
import anofox
# In-memory database (extension downloaded automatically)
with anofox.connect() as conn:
# Email validation
print(conn.execute("SELECT anofox_tab_email_is_valid('hi@example.com', 'regex')").fetchone())
# Or use a locally built extension
conn = anofox.connect(
extension_path="/path/to/anofox_tabular.duckdb_extension"
)
Python-native API
import anofox
from anofox import validate, quality, pii, diff
conn = anofox.connect()
# ── Email validation ──────────────────────────────────────────────────
validate.email_is_valid(conn, "hi@example.com") # True
validate.email_is_valid(conn, "hi@example.com", mode="dns") # True (DNS checked)
import pandas as pd
df = pd.DataFrame({"email": ["a@b.com", "bad-email", "c@d.org"]})
result_df = validate.email_is_valid(conn, df, column="email")
# Returns DataFrame with added 'email_is_valid' column
# ── Phone validation ──────────────────────────────────────────────────
validate.phone_is_valid(conn, "+14155552671", region="US") # True
validate.phone_format(conn, "+14155552671", "US", "INTERNATIONAL")
# ── Data quality ──────────────────────────────────────────────────────
conn.execute("CREATE TABLE orders AS SELECT * FROM read_parquet('orders.parquet')")
quality.volume(conn, "orders", min_rows=100)
# {"status": "pass", "min_rows": 100, ...}
quality.null_rate(conn, "orders", "amount", max_null_rate=0.05)
quality.distinct_count(conn, "orders", "status", min_distinct=2, max_distinct=10)
quality.schema_check(conn, "orders", ["id", "amount", "created_at"])
# ── High-level profile ────────────────────────────────────────────────
summary = conn.profile(df) # returns pd.DataFrame with per-column metrics
# ── PII detection ─────────────────────────────────────────────────────
pii.pii_contains(conn, "Call me at +1-415-555-2671") # True
pii.pii_detect(conn, "Email: test@example.com") # [{"type": "EMAIL", ...}]
pii.pii_mask(conn, "test@example.com", strategy="redact")
scan_result = pii.pii_scan_table(conn, "orders") # pd.DataFrame
# ── Diff ──────────────────────────────────────────────────────────────
# Table names or DataFrames both work
changes = diff.joindiff(conn, "orders_v1", "orders_v2", primary_keys="id")
changes = diff.joindiff(conn, df_before, df_after, primary_keys="id")
# Returns pd.DataFrame with diff_type: 'added', 'removed', 'changed', 'unchanged'
# ── Schema validation ─────────────────────────────────────────────────
from anofox.validate import EmailRule, PhoneRule
result = conn.validate(df, schema={
"email": EmailRule(mode="dns"),
"phone": PhoneRule(region="DE"),
})
print(result.passed) # True / False
print(result.failures) # pd.DataFrame of failed rows
Module overview
| Module | Functions |
|---|---|
anofox.validate |
email_is_valid, email_validate, phone_is_valid, phone_parse, phone_format, phone_region |
anofox.quality |
volume, null_rate, distinct_count, freshness, zscore, iqr, schema_check |
anofox.anomaly |
isolation_forest, isolation_forest_mv, dbscan, dbscan_mv, outlier_tree |
anofox.pii |
pii_detect, pii_mask, pii_contains, pii_scan_table, pii_audit_table |
anofox.diff |
joindiff, hashdiff |
anofox.money |
make_money, money_from_cents, is_valid_currency, currency_symbol, money_add, etc. |
anofox.vat |
make_vat, vat_is_valid, vat_is_eu_member, vat_country_name, etc. |
CLI
# Profile any file (colored table output)
anofox profile data.parquet
anofox profile data.csv --format json
# Quality checks (exit 0 = pass, exit 1 = fail)
anofox quality data.parquet --volume-min 1000
anofox quality data.csv --null-max 0.05 --column email
Supported formats: .parquet, .csv, .tsv, .json, .ndjson
pytest plugin
# Run with: pytest --anofox-check
import pytest
@pytest.mark.anofox_quality("orders", volume_min=100)
def test_orders_table_has_data(anofox_conn):
...
The anofox_conn session-scoped fixture is provided automatically. Tests skip if the extension is unavailable.
Extension resolution
The package resolves the extension binary in this order:
ANOFOX_EXT_PATHenvironment variable (path to local binary)extension_pathargument toconnect()- Cached binary in
~/.anofox/extensions/ - Download from community registry → S3 mirror (
https://get.erpl.io)
Development
# Build the extension first
make release
# Install package in dev mode
cd python
pip install -e ".[dev]"
# Run tests
ANOFOX_EXT_PATH=../build/release/extension/anofox_tabular/anofox_tabular.duckdb_extension \
pytest tests/ -v
# Loader/utils tests run without extension (no env var needed)
pytest tests/test_loader.py tests/test_utils.py -v
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file anofox_tabular-0.5.1.tar.gz.
File metadata
- Download URL: anofox_tabular-0.5.1.tar.gz
- Upload date:
- Size: 75.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a3f44f3ef70db1486c9c8747cac0b2a5a16b5d13149e4ee5b9209e02e2f9e0bf
|
|
| MD5 |
5ed4dc01060e04a62aab70e17759c521
|
|
| BLAKE2b-256 |
45a526f83c4b35b50c1b680e7406294681e4758bdf231f759c68bc2519fcee92
|
Provenance
The following attestation bundles were made for anofox_tabular-0.5.1.tar.gz:
Publisher:
publish_python.yml on DataZooDE/anofox-tabular
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
anofox_tabular-0.5.1.tar.gz -
Subject digest:
a3f44f3ef70db1486c9c8747cac0b2a5a16b5d13149e4ee5b9209e02e2f9e0bf - Sigstore transparency entry: 1352766464
- Sigstore integration time:
-
Permalink:
DataZooDE/anofox-tabular@d1252e86ec105deab7fe2c2849615b548ec2dd06 -
Branch / Tag:
refs/tags/py-v0.5.1 - Owner: https://github.com/DataZooDE
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish_python.yml@d1252e86ec105deab7fe2c2849615b548ec2dd06 -
Trigger Event:
push
-
Statement type:
File details
Details for the file anofox_tabular-0.5.1-py3-none-any.whl.
File metadata
- Download URL: anofox_tabular-0.5.1-py3-none-any.whl
- Upload date:
- Size: 28.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c70bea6ec42d02073f0727692c85fd870c0c9607007aaa82e76becf7f3c04206
|
|
| MD5 |
67b412fd73ab7721856c2ac00fe35ead
|
|
| BLAKE2b-256 |
0f8b5f38eac43348a8587d2469658257410b5ae614ea461dc721a9accc192212
|
Provenance
The following attestation bundles were made for anofox_tabular-0.5.1-py3-none-any.whl:
Publisher:
publish_python.yml on DataZooDE/anofox-tabular
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
anofox_tabular-0.5.1-py3-none-any.whl -
Subject digest:
c70bea6ec42d02073f0727692c85fd870c0c9607007aaa82e76becf7f3c04206 - Sigstore transparency entry: 1352766534
- Sigstore integration time:
-
Permalink:
DataZooDE/anofox-tabular@d1252e86ec105deab7fe2c2849615b548ec2dd06 -
Branch / Tag:
refs/tags/py-v0.5.1 - Owner: https://github.com/DataZooDE
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish_python.yml@d1252e86ec105deab7fe2c2849615b548ec2dd06 -
Trigger Event:
push
-
Statement type: