Skip to main content

Declarative data-quality + load probes for Python. Rust + tokio under the hood.

Project description

ematix-probe

Declarative data-quality + load probes for Python. Rust + tokio under the hood.

Assert on the shape of your data and the behavior of your services with one decorator. Postgres, DuckDB, Parquet (local + S3), HTTP — same primitives, one runner.

from ematix_probe import probe, source

@probe.data(
    source=source.postgres("postgres://user:pass@host/db"),
    table="events",
    schema="analytics",
)
def events_are_healthy(t):
    t.row_count(at_least=1_000, at_most=1_000_000)
    t.column("event_id").not_null().unique()
    t.column("received_at").not_null()
pip install ematix-probe
ematix-probe run probes.py

Why ematix-probe

  • Same primitives, every backend. Postgres, DuckDB, Parquet (local
    • S3), HTTP. Cross-engine consistency tests cover the SQL-pushdown vs. Arrow-scan paths.
  • Three runners, one decorator. ematix-probe run, your pytest suite, or directly from Python — same probe code, same verdicts.
  • ematix-flow integration. Probe a target table from inside a flow pipeline via probe_from_table() — the verdict joins the pipeline's run-history row.
  • Run history, opt-in. Append every verdict to a sqlite file and query trends across runs with --run-history-db.

Status: v0.1.2 on PyPI as ematix-probe. PI-1 closed — data probes, load probes, pytest plugin, and ematix-flow integration are shipped and stable.


Table of contents

  1. Install
  2. Sources
  3. Data probes
  4. Assertions
  5. Load probes
  6. pytest plugin
  7. ematix-flow integration
  8. Run history
  9. CLI
  10. Python API
  11. What's shipped
  12. Development
  13. License

Install

pip install ematix-probe

The core install ships every adapter, the ematix-probe CLI binary, and the pytest plugin (auto-loaded via the pytest11 entry point — no pytest_plugins wiring required).

Optional extras

Extra What it adds Install
dev Test runner + linters + maturin + testcontainers (Postgres / LocalStack) for the local development workflow. pip install "ematix-probe[dev]"

The runtime surface (CLI, pytest plugin, every data + load adapter) needs no extras. To build from source, see Development at the bottom.


Sources

Sources are the first thing to set up. Every data probe references a source by call-site; ematix-probe doesn't ship a connection registry the way ematix-flow does — credentials live in the URL or environment variables you pass in.

from ematix_probe import source

postgres   = source.postgres("postgres://user:pass@host/db")
duckdb     = source.duckdb(":memory:")
parquet    = source.parquet("/path/to/file.parquet")
s3_parquet = source.s3_parquet(
    bucket="analytics",
    key="dim/customers.parquet",
    region="us-east-1",
    # endpoint_url= is optional — set it for LocalStack / MinIO.
)

Sources are inert factories — no connection is opened until the probe runs.


Data probes

A data probe declares a target table + the assertions it must satisfy. The decorator returns a DataProbe object you can run directly, collect via pytest, or list / explain through the CLI.

from ematix_probe import probe, source

@probe.data(
    source=source.postgres("postgres://localhost/warehouse"),
    table="dim_customers",
    schema="public",
)
def customer_dim_quality(t):
    t.column("customer_id").not_null().unique()
    t.column("email").not_null().regex(r".+@.+\..+")
    t.column("status").is_in(["active", "churned", "trial"])
    t.column("age").between(0, 120)
    t.row_count(at_least=1_000, at_most=10_000_000)
    t.freshness("updated_at", within="24h")

Run it directly:

report = customer_dim_quality.run()
print(report.verdict)              # "pass" | "fail" | "error"
for a in report.assertions:
    print(a.name, a.verdict, a.message)

Or write it to JUnit / JSON for CI:

from ematix_probe.report import write_junit, write_json

write_junit([report], "build/probe-results.xml")
write_json([report], "build/probe-results.json")

The same probe can be picked up by pytest with no extra wiring — see pytest plugin.


Assertions

The assertion vocabulary is the same across every adapter; the adapter chooses pushdown SQL vs. an Arrow scan internally.

Assertion Meaning
t.column(c).not_null() Every value in c is non-NULL.
t.column(c).unique() Every value in c is unique (NULLs allowed).
t.column(c).between(low, high) Every value in c lies in [low, high] inclusive.
t.column(c).regex(pattern) Every non-NULL value matches pattern (Postgres POSIX flavor on the SQL path; regex crate on the scan path).
t.column(c).is_in([...]) Every value is in the allowed set.
t.row_count(at_least=, at_most=) Table row count falls in [at_least, at_most] (open ends supported).
t.freshness(c, within="24h") The most recent value of c is no older than within (h / m / s / d).
t.percentile_between(c, p=99, low=, high=) The pᵗʰ percentile of c lies in [low, high]. Scan-path only.
t.cardinality_between(c, low=, high=) The count of distinct values in c lies in [low, high]. Scan-path only.
t.schema_match({col: type, ...}) The target's column types match the declared mapping. Scan-path only.

Each assertion produces one AssertionResult with verdict{"pass", "fail", "error"} and an actionable message on non-pass.


Load probes

Load probes drive a target with synthetic traffic and assert on the resulting samples. v0.1 ships HTTP and Postgres SQL targets under either constant-rate (open-model) or virtual-user (closed-model) schedulers. The Python surface is Rust-only in v0.1 — Python decorators land in v0.2.

Drive the engine directly today:

# Pseudocode mirroring the Rust API; full Python load surface ships in v0.2.
from ematix_probe import load
plan = load.http_plan(
    target=load.HttpTarget.get("https://api.example.com/health"),
    duration="60s",
    mode=load.ConstantRate(rps=100),
    warmup="10s",
    assertions=[
        load.p99_under("latency_ms", 200),
        load.error_rate_below(0.005),
        load.throughput_above(95),
        load.status_code_in([200, 304]),
    ],
)

Or use the Rust API directly via cargo run --example load_probe_demo / --example postgres_load_demo.


pytest plugin

pip install ematix-probe registers a pytest11 plugin; pytest auto-loads it. Any @probe.data instance at module top-level becomes one pytest test node per assertion:

# tests/test_warehouse_quality.py
from ematix_probe import probe, source

@probe.data(
    source=source.postgres("postgres://localhost/warehouse"),
    table="dim_customers",
)
def customer_dim_quality(t):
    t.column("customer_id").not_null()
    t.column("email").regex(r".+@.+\..+")

pytest -v reports:

tests/test_warehouse_quality.py::customer_dim_quality::customer_id.not_null PASSED
tests/test_warehouse_quality.py::customer_dim_quality::email.regex          FAILED

The probe runs once per pytest collection — assertion fan-out caches the RunReport so N assertions don't multiply the underlying database / HTTP work.


ematix-flow integration

Sibling project ematix-flow ships declarative table classes; ematix-probe consumes them through a duck-typed shim:

from ematix_probe.flow import probe_from_table
from ematix_probe import source

# CustomerDim is any class exposing __tablename__, optional
# __schema__, and an iterable `columns` with .name / .nullable /
# .primary_key — ematix-flow's ManagedTable matches out of the box.
quality = probe_from_table(
    CustomerDim,
    source=source.postgres("postgres://warehouse/db"),
    extend=lambda t: t.column("email").regex(r".+@.+\..+"),
)

Auto-derived: not_null on every non-nullable column + unique on each primary key. extend lets you layer extras via the same fluent API. ematix-probe has zero hard dependency on ematix-flow — the protocol-typing means any conforming class participates.


Run history

Opt-in sqlite persistence. Pass --run-history-db <path> to the CLI, or use the API directly:

from ematix_probe.run_history import RunHistory

h = RunHistory("history.sqlite")
h.record(probe.run())

Schema is runs (one row per probe execution) + assertions (one row per assertion result, joined by run_id), tagged with PRAGMA user_version = 1. Designed as the substrate for v0.2 drift detection — additive columns only, no renames.


CLI

ematix-probe run <path>           # discover + run probes; non-zero on fail
ematix-probe run <path> --run-history-db history.sqlite

ematix-probe list <path>          # enumerate probes, no execution
ematix-probe explain <path> <probe>   # print compiled plan for one probe
ematix-probe doctor               # environment health check

<path> points at any Python file containing @probe.* decorators. The CLI imports the file, finds module-level DataProbe attributes, runs each, and exits non-zero if any verdict isn't pass.


Python API

The package exposes:

  • probe.data(source=..., table=..., schema=None) — data-probe decorator.
  • source.postgres / duckdb / parquet / s3_parquet — source factories.
  • DataProbe.run() — execute a probe, return a RunReport.
  • report.write_junit(reports, path) / report.write_json(reports, path) — CI reports.
  • flow.probe_from_table(cls, source=, extend=) — ematix-flow shim.
  • run_history.RunHistory(path) — opt-in sqlite persistence.
  • pytest_plugin — auto-loaded by pytest; not imported directly.

The Rust load-probe surface (engine::load, adapters::load::http, adapters::load::postgres) is exposed through the workspace's example crates today; the Python load surface lands in v0.2.


What's shipped

Data probes: Postgres, DuckDB, local Parquet, S3 Parquet. Assertions: not_null, unique, between, regex, enum, row_count, freshness, percentile_between, cardinality_between, schema_match.

Load probes (Rust API): HTTP + Postgres SQL targets; constant-rate (open-model) and virtual-user (closed-model) schedulers. Assertions: p99_under, error_rate_below, throughput_above, status_code_in. Sample-window warmup filtering. Per-tick Samples shared across HTTP and SQL paths through one evaluate_load entry point.

Reporting: JUnit XML + JSON writers; pytest plugin with per-assertion test nodes; opt-in sqlite run history.

Out of v0.1 (planned for v0.2): async PyO3 (async def probe functions + pyo3-asyncio integration), drift detection, distributed load generation, backends beyond the v0.1 set.


Development

# Build the Rust workspace (core + CLI + Python extension crate)
cargo build --release

# Build + install the Python extension into a venv
python -m venv .venv && source .venv/bin/activate
pip install maturin
maturin develop --release

# Run tests
cargo test --workspace                    # default + integration (Docker)
pytest                                    # full Python suite
coverage run -m pytest && coverage report --fail-under=90

Process docs:

Sibling project: ematix-flow.


License

Apache-2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ematix_probe-0.1.2.tar.gz (119.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ematix_probe-0.1.2-cp314-cp314-manylinux_2_28_x86_64.whl (19.7 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.28+ x86-64

ematix_probe-0.1.2-cp314-cp314-macosx_11_0_arm64.whl (15.7 MB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

ematix_probe-0.1.2-cp313-cp313-manylinux_2_28_x86_64.whl (19.7 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

ematix_probe-0.1.2-cp313-cp313-macosx_11_0_arm64.whl (15.7 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

ematix_probe-0.1.2-cp312-cp312-manylinux_2_28_x86_64.whl (19.7 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

ematix_probe-0.1.2-cp312-cp312-macosx_11_0_arm64.whl (15.7 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

ematix_probe-0.1.2-cp311-cp311-manylinux_2_28_x86_64.whl (19.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

ematix_probe-0.1.2-cp311-cp311-macosx_11_0_arm64.whl (15.7 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

File details

Details for the file ematix_probe-0.1.2.tar.gz.

File metadata

  • Download URL: ematix_probe-0.1.2.tar.gz
  • Upload date:
  • Size: 119.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ematix_probe-0.1.2.tar.gz
Algorithm Hash digest
SHA256 beb991d1ec048a264ba6e4fbc6048cb9637bfd47b76de222d73fb7c19b141505
MD5 845eafd59439257ba721bb26c7ec7891
BLAKE2b-256 09e64a6e911afee00a3c86f1b70e5cdc67eada12ac058ba64bacfe2087c23934

See more details on using hashes here.

Provenance

The following attestation bundles were made for ematix_probe-0.1.2.tar.gz:

Publisher: release.yml on ryan-evans-git/ematix-probe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ematix_probe-0.1.2-cp314-cp314-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ematix_probe-0.1.2-cp314-cp314-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 d5f36341c8f1deaf43fb42424aa4741899cd19f2f302ac9a35eb1d2fe28aafaa
MD5 6d6135e6f707de227b383c7a034e14a9
BLAKE2b-256 d2e54632763b99c84324eb8984e86cfe2de652ad482665103f340e7bd688a330

See more details on using hashes here.

Provenance

The following attestation bundles were made for ematix_probe-0.1.2-cp314-cp314-manylinux_2_28_x86_64.whl:

Publisher: release.yml on ryan-evans-git/ematix-probe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ematix_probe-0.1.2-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ematix_probe-0.1.2-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4df86a6aa90a6a2db9b9fb93766596fa2738374aa81fa7f379c4054180524ec6
MD5 575fe9a7edf021a201011b4bff87d649
BLAKE2b-256 1bd2e5217086f930aaa52e1337c97d606a1108c3bbed0f8d94eda216c3120d17

See more details on using hashes here.

Provenance

The following attestation bundles were made for ematix_probe-0.1.2-cp314-cp314-macosx_11_0_arm64.whl:

Publisher: release.yml on ryan-evans-git/ematix-probe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ematix_probe-0.1.2-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ematix_probe-0.1.2-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 44a223cfccd435e7eede6a71697e779f7ebc3e9fa30997487bebdcb376f7e33c
MD5 2d3469bfaa201da95c10ce8c96ccd843
BLAKE2b-256 f4e17d7baa708e21a13936b58a9b828a261a6b3274a6891bd6d39ecb2700bd03

See more details on using hashes here.

Provenance

The following attestation bundles were made for ematix_probe-0.1.2-cp313-cp313-manylinux_2_28_x86_64.whl:

Publisher: release.yml on ryan-evans-git/ematix-probe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ematix_probe-0.1.2-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ematix_probe-0.1.2-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c3fc13f06bd5863c11cdedfd909758633600ec362c9332af1532e1f2415cb299
MD5 9a92af381ab4f7dcc4c8ec6803499fa5
BLAKE2b-256 1dce6a241a8c7e3f5ab29bd3edee45810889d407947f4cdecf57182bcf87cea0

See more details on using hashes here.

Provenance

The following attestation bundles were made for ematix_probe-0.1.2-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on ryan-evans-git/ematix-probe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ematix_probe-0.1.2-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ematix_probe-0.1.2-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b327491c9eaf8a2c88dcc8e252949499330620ff9efa4f02d5434474490c34b5
MD5 f850ac743efb16c7f1948219fc2238f4
BLAKE2b-256 a27ce53903f2c3f696d70c8ee620b4e0a43c0db2547c63ddaee25085034addc2

See more details on using hashes here.

Provenance

The following attestation bundles were made for ematix_probe-0.1.2-cp312-cp312-manylinux_2_28_x86_64.whl:

Publisher: release.yml on ryan-evans-git/ematix-probe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ematix_probe-0.1.2-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ematix_probe-0.1.2-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4608902caaaac0cb832c668b9b1e43a0ebfc8a3e898d37e7dcbbdcd5c229ef2c
MD5 95eed335b5a097556b76884b29894a3b
BLAKE2b-256 70ee0952bcf45732187819dbe73c1a1c1f9bf3eb1c49b3db613261fde1d4aef9

See more details on using hashes here.

Provenance

The following attestation bundles were made for ematix_probe-0.1.2-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on ryan-evans-git/ematix-probe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ematix_probe-0.1.2-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ematix_probe-0.1.2-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4fb7bde7cd9c7ed9e6592c69f5371a75a9c038a744e50bd75dc979fe5a182d20
MD5 ef5318b485baa5b0e3e0292c603ec845
BLAKE2b-256 f6307180b2c7d711e4382e6246ae5ee9b1b43c5c905f3b54ae38c192d1f83ac0

See more details on using hashes here.

Provenance

The following attestation bundles were made for ematix_probe-0.1.2-cp311-cp311-manylinux_2_28_x86_64.whl:

Publisher: release.yml on ryan-evans-git/ematix-probe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ematix_probe-0.1.2-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ematix_probe-0.1.2-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e2db9a09e5ef6ccf557ef425fa1e282a9241198391c5eda781d6bff2fcdd52ef
MD5 491f19609fdfe11237a12085d45f0f99
BLAKE2b-256 3540f3caba700f7f991f36e1f950b546347f7411ecae2c1069418e912de56bf7

See more details on using hashes here.

Provenance

The following attestation bundles were made for ematix_probe-0.1.2-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: release.yml on ryan-evans-git/ematix-probe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page