Skip to main content

Python wrapper for the anofox-tabular DuckDB extension — data quality, PII, and validation primitives

Project description

anofox-tabular Python package

Python wrapper for the anofox-tabular DuckDB extension — data quality, PII detection, email/phone validation, anomaly detection, diffing, money, and VAT primitives.

Installation

pip install anofox-tabular

Optional extras for DataFrame support:

pip install "anofox-tabular[pandas]"   # adds pandas
pip install "anofox-tabular[polars]"   # adds polars
pip install "anofox-tabular[pandas,polars]"

Quick start

import anofox

# In-memory database (extension downloaded automatically)
with anofox.connect() as conn:
    # Email validation
    print(conn.execute("SELECT anofox_tab_email_is_valid('hi@example.com', 'regex')").fetchone())

# Or use a locally built extension
conn = anofox.connect(
    extension_path="/path/to/anofox_tabular.duckdb_extension"
)

Python-native API

import anofox
from anofox import validate, quality, pii, diff

conn = anofox.connect()

# ── Email validation ──────────────────────────────────────────────────
validate.email_is_valid(conn, "hi@example.com")                # True
validate.email_is_valid(conn, "hi@example.com", mode="dns")    # True (DNS checked)

import pandas as pd
df = pd.DataFrame({"email": ["a@b.com", "bad-email", "c@d.org"]})
result_df = validate.email_is_valid(conn, df, column="email")
# Returns DataFrame with added 'email_is_valid' column

# ── Phone validation ──────────────────────────────────────────────────
validate.phone_is_valid(conn, "+14155552671", region="US")     # True
validate.phone_format(conn, "+14155552671", "US", "INTERNATIONAL")

# ── Data quality ──────────────────────────────────────────────────────
conn.execute("CREATE TABLE orders AS SELECT * FROM read_parquet('orders.parquet')")

quality.volume(conn, "orders", min_rows=100)
# {"status": "pass", "min_rows": 100, ...}

quality.null_rate(conn, "orders", "amount", max_null_rate=0.05)
quality.distinct_count(conn, "orders", "status", min_distinct=2, max_distinct=10)
quality.schema_check(conn, "orders", ["id", "amount", "created_at"])

# ── High-level profile ────────────────────────────────────────────────
summary = conn.profile(df)   # returns pd.DataFrame with per-column metrics

# ── PII detection ─────────────────────────────────────────────────────
pii.pii_contains(conn, "Call me at +1-415-555-2671")  # True
pii.pii_detect(conn, "Email: test@example.com")        # [{"type": "EMAIL", ...}]
pii.pii_mask(conn, "test@example.com", strategy="redact")

scan_result = pii.pii_scan_table(conn, "orders")  # pd.DataFrame

# ── Diff ──────────────────────────────────────────────────────────────
# Table names or DataFrames both work
changes = diff.joindiff(conn, "orders_v1", "orders_v2", primary_keys="id")
changes = diff.joindiff(conn, df_before, df_after, primary_keys="id")
# Returns pd.DataFrame with diff_type: 'added', 'removed', 'changed', 'unchanged'

# ── Schema validation ─────────────────────────────────────────────────
from anofox.validate import EmailRule, PhoneRule

result = conn.validate(df, schema={
    "email": EmailRule(mode="dns"),
    "phone": PhoneRule(region="DE"),
})
print(result.passed)      # True / False
print(result.failures)    # pd.DataFrame of failed rows

Module overview

Module Functions
anofox.validate email_is_valid, email_validate, phone_is_valid, phone_parse, phone_format, phone_region
anofox.quality volume, null_rate, distinct_count, freshness, zscore, iqr, schema_check
anofox.anomaly isolation_forest, isolation_forest_mv, dbscan, dbscan_mv, outlier_tree
anofox.pii pii_detect, pii_mask, pii_contains, pii_scan_table, pii_audit_table
anofox.diff joindiff, hashdiff
anofox.money make_money, money_from_cents, is_valid_currency, currency_symbol, money_add, etc.
anofox.vat make_vat, vat_is_valid, vat_is_eu_member, vat_country_name, etc.

CLI

# Profile any file (colored table output)
anofox profile data.parquet
anofox profile data.csv --format json

# Quality checks (exit 0 = pass, exit 1 = fail)
anofox quality data.parquet --volume-min 1000
anofox quality data.csv --null-max 0.05 --column email

Supported formats: .parquet, .csv, .tsv, .json, .ndjson

pytest plugin

# Run with: pytest --anofox-check
import pytest

@pytest.mark.anofox_quality("orders", volume_min=100)
def test_orders_table_has_data(anofox_conn):
    ...

The anofox_conn session-scoped fixture is provided automatically. Tests skip if the extension is unavailable.

Extension resolution

The package resolves the extension binary in this order:

  1. ANOFOX_EXT_PATH environment variable (path to local binary)
  2. extension_path argument to connect()
  3. Cached binary in ~/.anofox/extensions/
  4. Download from community registry → S3 mirror (https://get.erpl.io)

Development

# Build the extension first
make release

# Install package in dev mode
cd python
pip install -e ".[dev]"

# Run tests
ANOFOX_EXT_PATH=../build/release/extension/anofox_tabular/anofox_tabular.duckdb_extension \
  pytest tests/ -v

# Loader/utils tests run without extension (no env var needed)
pytest tests/test_loader.py tests/test_utils.py -v

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anofox_tabular-0.5.1.tar.gz (75.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

anofox_tabular-0.5.1-py3-none-any.whl (28.6 kB view details)

Uploaded Python 3

File details

Details for the file anofox_tabular-0.5.1.tar.gz.

File metadata

  • Download URL: anofox_tabular-0.5.1.tar.gz
  • Upload date:
  • Size: 75.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for anofox_tabular-0.5.1.tar.gz
Algorithm Hash digest
SHA256 a3f44f3ef70db1486c9c8747cac0b2a5a16b5d13149e4ee5b9209e02e2f9e0bf
MD5 5ed4dc01060e04a62aab70e17759c521
BLAKE2b-256 45a526f83c4b35b50c1b680e7406294681e4758bdf231f759c68bc2519fcee92

See more details on using hashes here.

Provenance

The following attestation bundles were made for anofox_tabular-0.5.1.tar.gz:

Publisher: publish_python.yml on DataZooDE/anofox-tabular

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file anofox_tabular-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: anofox_tabular-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 28.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for anofox_tabular-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c70bea6ec42d02073f0727692c85fd870c0c9607007aaa82e76becf7f3c04206
MD5 67b412fd73ab7721856c2ac00fe35ead
BLAKE2b-256 0f8b5f38eac43348a8587d2469658257410b5ae614ea461dc721a9accc192212

See more details on using hashes here.

Provenance

The following attestation bundles were made for anofox_tabular-0.5.1-py3-none-any.whl:

Publisher: publish_python.yml on DataZooDE/anofox-tabular

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page