Skip to main content

Zero-Configuration Data Quality Framework Powered by Polars

Project description

Truthound Banner

Truthound

Zero-Configuration Data Quality Framework Powered by Polars

Sniffs out bad data.

Documentation PyPI Python License Powered by Polars Awesome Polars Downloads

Truthound 3.1.0 is a layered data quality system built around a Polars-first validation kernel, with first-party orchestration adapters, an additive AI review surface, and an operational console built on top of the same core runtime contract.


Abstract

Truthound Icon

Truthound 3.1.0 is a layered data quality system. At the center is a small, durable, Polars-first validation kernel. Around that core sit an additive truthound.ai review surface, Truthound Orchestration for host-native execution inside schedulers and workflow systems, and Truthound Dashboard for operating Truthound through an installation-managed control-plane UI.

The point of the 3.x reset is not to hide the broader product line. It is to make the system boundary honest. The core validation kernel is the most rigorously validated contract in the ecosystem, while the AI review layer, orchestration adapters, and dashboard build on top of that contract instead of redefining it.

Documentation: truthound.netlify.app

What's New In 3.1.0

Truthound 3.1.0 keeps the 3.0 kernel boundary and adds the first complete public AI review surface.

  • truthound.ai is now the canonical optional namespace for proposal generation, run analysis, approval history, and controlled apply
  • root feature probes has_ai_support() and get_ai_support_status() make it safe for downstream integrations to feature-gate AI functionality
  • the AI lifecycle is explicit: suggest_suite(...), explain_run(...), approve_proposal(...), reject_proposal(...), apply_proposal(...)
  • live smoke runners now exist for both proposal generation and run analysis
  • the public docs portal now documents Truthound AI directly and keeps the dashboard at a boundary-level overview instead of a mirrored manual

Truthound Product Line

Layer Repository Responsibility Start Here
Truthound Core truthound Validation kernel and data-plane: th.check(), ValidationRunResult, planner/runtime, zero-config workspace, reporters, checkpoints, Data Docs Core docs
Truthound AI truthound.ai Optional review-layer APIs for prompt-to-proposal compilation, run analysis, approval history, and controlled apply /ai/
Truthound Orchestration truthound-orchestration First-party execution integration layer for Airflow, Dagster, Prefect, dbt, Mage, and Kestra /orchestration/
Truthound Dashboard separately distributed operational console First-party control-plane for RBAC, sources, artifacts, incidents, secrets, observability, and AI review workflows /dashboard/

Truthound is therefore not a monolithic platform with one flat feature surface. It is a layered system in which the core validation contract stays central, while the AI namespace, orchestration adapters, and dashboard expose first-party operational layers on top of it.

Why Start With Truthound Core

  • Polars-first execution and planner-driven aggregation instead of repeated validator-side scans
  • Extreme zero-configuration by default: th.check(data) creates and reuses a local .truthound/ workspace automatically
  • Deterministic auto-suite selection that starts with schema/nullability/type/range/key heuristics instead of "run everything"
  • Canonical ValidationRunResult shared by checkpoints, reporters, validation docs, and plugins
  • Explicit contracts for contexts, check factories, backends, and artifact generation
  • Failure-first test lanes and migration diagnostics that make framework upgrades safer in production

Measured Advantages Of Truthound Core Over Great Expectations

The latest fixed-runner release-grade benchmark artifact set shows Truthound ahead of Great Expectations on every comparable workload in the current comparison catalog while preserving correctness parity.

Workload Truthound Warm (s) GX Warm (s) Speedup Memory Ratio
local-mixed-core-suite 0.028240 0.075232 2.66x 44.29%
local-null 0.016487 0.024964 1.51x 43.62%
local-range 0.002470 0.013219 5.35x 43.84%
local-schema 0.001479 0.017303 11.70x 35.88%
local-unique 0.002023 0.013785 6.81x 42.28%
sqlite-null 0.007370 0.032909 4.47x 48.16%
sqlite-range 0.006053 0.022355 3.69x 43.80%
sqlite-unique 0.002066 0.015655 7.58x 42.12%

The practical reasons behind that result are straightforward and core-specific:

  • a Polars-first planner/runtime that deduplicates metric work instead of re-scanning through validator loops
  • deterministic auto-suite selection that keeps default work relevant and exact
  • a smaller zero-config context model that persists baselines and artifacts without forcing a heavy project bootstrap
  • one canonical result contract shared by reporters, checkpoints, and validation docs

This comparison is intentionally bounded. It covers comparable deterministic core checks and SQLite pushdown workloads. It is not a blanket claim about orchestration layers, dashboard operations, or every Great Expectations feature area.

Read the published evidence in Latest Verified Benchmark Summary.

What Truthound Core Stabilizes

Truthound Core 3.x centers the public contract around a smaller and more durable kernel:

Layer Responsibility
TruthoundContext Auto-discovered project workspace, baselines, run history, docs artifacts, plugin runtime, and resolved defaults
contracts Stable ports such as DataAsset, ExecutionBackend, MetricRepository, ArtifactStore, and plugin capabilities
suite Immutable validation intent via ValidationSuite, CheckSpec, SchemaSpec, evidence policy, and severity policy
planning Scan planning, backend routing, metric deduplication, and pushdown eligibility
runtime Session lifecycle, retries, timeout-safe execution, exception isolation, and evidence capture
results CheckResult, ValidationRunResult, and ExecutionIssue as the canonical output model

Truthound Orchestration and Truthound Dashboard build on these contracts instead of replacing them. That is the key layered-system boundary.

The design is grounded in proven ideas from Great Expectations, Soda, Deequ, and Pandera, but optimized for a simpler zero-config starting point and a Polars-first execution path.

The practical 3.x kernel changes are:

  • th.check() returns ValidationRunResult directly
  • the local .truthound/ workspace is auto-created and reused
  • validators=None now means deterministic AutoSuiteBuilder, not "run every built-in validator"
  • compare moved to truthound.drift.compare
  • checkpoints standardize on CheckpointResult.validation_run and CheckpointResult.validation_view
  • reporters and validation docs consume ValidationRunResult directly through reporter contract v3

The practical 3.1.0 additions on top of that kernel are:

  • optional AI dependency bundle: truthound[ai]
  • public AI review APIs and CLI commands
  • persisted suite proposal, run analysis, and approval/apply artifacts
  • root AI support probes for downstream services and dashboards

Quick Start

Installation

pip install truthound
# Optional AI review surface
pip install truthound[ai]
# Development and docs workflows in this repository
uv sync --extra dev --extra docs

Python API

import truthound as th
from truthound.datadocs import generate_validation_report
from truthound.reporters import get_reporter
from truthound.drift import compare

run = th.check(
    {"customer_id": [1, 2, 2], "email": ["a@example.com", None, "c@example.com"]},
)

print(run.execution_mode)
print([check.name for check in run.checks])
print(run.metadata["context_root"])

json_report = get_reporter("json").render(run)
validation_docs = generate_validation_report(run, title="Customer Quality Overview")

context = th.get_context()
schema = th.learn({"id": [1, 2], "status": ["active", "inactive"]})
masked = th.mask(
    {"email": ["a@example.com", "b@example.com"]},
    columns=["email"],
    strategy="hash",
)
drift = compare({"score": [0.1, 0.2]}, {"score": [0.1, 0.8]})

CLI

truthound check data.csv --validators null,unique
truthound check --connection "sqlite:///warehouse.db" --table users --pushdown
truthound scan pii.csv
truthound profile data.csv
truthound doctor . --migrate-2to3
truthound doctor . --workspace
truthound plugins list --json
# Optional AI review workflow
truthound ai suggest-suite data.csv --prompt "Require customer_id to be unique"
truthound ai proposals list
truthound ai explain-run --run-id <run_id>

Public Surface

The root package intentionally exports a smaller API:

  • Stable facade: check, scan, mask, profile, learn, read, get_context
  • Core types: TruthoundContext, ValidationSuite, CheckSpec, SchemaSpec, ValidationRunResult, CheckResult
  • th.check() returns ValidationRunResult directly
  • Checkpoint runtime results: CheckpointResult.validation_run is canonical and CheckpointResult.validation_view is the compatibility projection for legacy action formatting
  • Reporter-facing types: truthound.reporters.RunPresentation, truthound.reporters.ReporterContext
  • Validation docs entry points: truthound.datadocs.ValidationDocsBuilder, truthound.datadocs.generate_validation_report
  • Drift comparison: import from truthound.drift.compare
  • Advanced systems: import by namespace, for example truthound.ml, truthound.lineage, truthound.realtime, or truthound.datadocs
  • Optional AI review surface: import truthound.ai after installing truthound[ai]

Optional AI Surface

Truthound now ships an additive truthound.ai namespace that preserves the core hot path and zero-config workflow while exposing a reviewable AI layer.

  • suggest_suite(...) compiles prompts into persisted suite proposal artifacts
  • explain_run(...) compiles run evidence into persisted analysis artifacts
  • approve_proposal(...), reject_proposal(...), and apply_proposal(...) keep approval and mutation in explicit human-reviewed steps
  • has_ai_support() and get_ai_support_status() let downstream integrations feature-gate the AI surface cleanly

Read the technical docs in docs/ai/index.md.

The public CLI surface is additive as well:

  • truthound ai suggest-suite
  • truthound ai explain-run
  • truthound ai proposals list/show/approve/reject/apply/history
  • truthound ai analyses list/show
  • truthound ai smoke openai
  • truthound ai smoke openai-explain-run

The experimental use_engine and --use-engine switches remain removed.

Zero-Config Workflow

Truthound 3.0 auto-creates a .truthound/ workspace at your project root. By default it manages:

  • .truthound/config.yaml: resolved project defaults
  • .truthound/catalog/: asset fingerprints and source signatures
  • .truthound/baselines/: learned schemas and metric history
  • .truthound/runs/: persisted ValidationRunResult metadata
  • .truthound/docs/: generated validation docs
  • .truthound/plugins/: resolved plugin manifest and trust metadata

If you do nothing except call th.check(data), Truthound will:

  1. detect the asset/backend
  2. resolve the active TruthoundContext
  3. load or create a baseline
  4. synthesize an auto-suite
  5. plan and execute the validation
  6. persist the run and validation docs when persistence is enabled

Use truthound doctor . --workspace to verify that the local .truthound/ layout, indexes, baselines, and persisted run artifacts are still structurally healthy.

Plugin Platform

Truthound now uses one lifecycle runtime:

  • PluginManager is the canonical plugin manager
  • EnterprisePluginManager is an async, capability-driven facade over the same runtime
  • Plugins register through stable ports such as register_check_factory, register_data_asset_provider, register_reporter, register_hook, and register_capability
  • Reporter plugins should target the contract-v3 surface where ValidationRunResult is the canonical render input and RunPresentation is the shared render projection

Documentation

Development

uv run --frozen --extra dev python -m pytest -q
uv run --frozen --extra dev python -m pytest --collect-only -q tests
uv run --frozen --extra dev python -m pytest -q -m "contract or fault or e2e" -p no:cacheprovider
uv run --frozen --extra dev python -m pytest -q -m "contract or fault or integration or soak or stress or scale_100m or e2e" --run-integration --run-expensive --run-soak -p no:cacheprovider
uv run --frozen --extra dev python -m pytest -q tests/test_truthound_3_0_contract.py tests/test_api.py tests/test_public_surface.py tests/test_checkpoint.py -p no:cacheprovider
uv run --frozen --extra benchmarks python -m truthound.cli benchmark parity --suite pr-fast --frameworks truthound --backend local --strict
uv run --frozen --extra benchmarks python -m truthound.cli benchmark parity --suite nightly-core --frameworks both --backend local --strict
uv run --frozen --extra benchmarks python -m truthound.cli benchmark parity --suite nightly-sql --frameworks both --backend sqlite --strict
uv run --frozen --extra benchmarks python -m truthound.cli benchmark parity --suite release-ga --frameworks both --strict
python docs/scripts/prepare_public_docs.py --mode full
python docs/scripts/prepare_public_docs.py --mode public
uv run --frozen --extra dev python docs/scripts/check_links.py --mkdocs mkdocs.yml README.md CLAUDE.md build/full-docs
uv run --frozen --extra dev --extra docs mkdocs build --strict
uv run --frozen --extra dev --extra docs mkdocs build --strict -f mkdocs.public.yml
truthound doctor . --migrate-2to3

Official benchmark comparisons should cite the published fixed-runner artifact set: release-ga.json, env-manifest.json, and latest-benchmark-summary.md.

Tests now follow a failure-first lane model:

  • contract: stable public API and compatibility boundaries
  • fault: deterministic failure injection, timeout, corruption, and concurrency scenarios
  • integration: opt-in backend and external-service coverage
  • soak and stress: nightly-only load and chaos coverage

The default local run is intentionally fast. Manual verification artifacts live under verification/phase6 and are intentionally kept out of pytest discovery.

Official performance claims should come only from the verified release-grade parity artifacts under .truthound/benchmarks/release/. Nightly outputs are for trend visibility, not public benchmark positioning.

When adding tests, prefer scenarios that protect public contracts or operational failure modes. Avoid adding default-value, getter/setter, enum-literal, to_dict() round-trip, or CSS-string existence tests unless they prove a compatibility boundary that has failed before.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

truthound-3.1.0.tar.gz (7.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

truthound-3.1.0-py3-none-any.whl (3.5 MB view details)

Uploaded Python 3

File details

Details for the file truthound-3.1.0.tar.gz.

File metadata

  • Download URL: truthound-3.1.0.tar.gz
  • Upload date:
  • Size: 7.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for truthound-3.1.0.tar.gz
Algorithm Hash digest
SHA256 ea194f1db2977535a62f7676164b60409bc3a635d021f2cfca87b7011328d9ed
MD5 57acfd038ca54f6eb89b46fb8534b3bd
BLAKE2b-256 e007f1caa0851b0600beb1bd2fc0e76d3359370c61241ab18b7ace52d91442ab

See more details on using hashes here.

File details

Details for the file truthound-3.1.0-py3-none-any.whl.

File metadata

  • Download URL: truthound-3.1.0-py3-none-any.whl
  • Upload date:
  • Size: 3.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for truthound-3.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 790b54efe733ec3ea099307fd719653cdffcae8cf8fcca4c23fadf0523e73eff
MD5 d698cd954aa1f99a120e4d2657bdb62e
BLAKE2b-256 0b74e652625fb42862c07d3f7daeb3c6e71a3d77e3dd8cb24974128238866dd4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page