Zero-Configuration Data Quality Framework Powered by Polars

These details have not been verified by PyPI

Project links

Project description

Truthound

Zero-Configuration Data Quality Framework Powered by Polars

Sniffs out bad data.

Truthound 3.1.3 is a layered data quality system built around a Polars-first validation kernel, first-party orchestration adapters, an additive AI review surface, and Truthound Depot as the repository console for dataset version-control workflows.

Abstract

Truthound Icon

Truthound 3.1.3 is a layered data quality system. At the center is a small, durable, Polars-first validation kernel. Around that core sit an additive truthound.ai review surface, Truthound Orchestration for host-native execution inside schedulers and workflow systems, and Truthound Depot for operating dataset repositories through an installation-managed console.

The point of the 3.x reset is not to hide the broader product line. It is to make the system boundary honest. The core validation kernel is the most rigorously validated contract in the ecosystem, while the AI review layer, orchestration adapters, and Depot build on top of that contract instead of redefining it.

Documentation: truthound.netlify.app

What's New In 3.1.3

Truthound 3.1.3 keeps the 3.1 review surface intact while aligning the public package, README, and docs surface with the completed Truthound Depot product direction.

public product-line wording now points to Truthound Depot as the dataset repository console instead of treating the older dashboard wording as the canonical product surface
Core remains the data-plane primitive owner for validation semantics, deterministic fingerprints, summary diffs, and quality gate projections
Depot remains the business-state owner for branch, merge, release, rollback, approval, evidence, and operator workflows
Orchestration remains the execution owner for host-native submit, poll, wait, retry, and projection behavior
the private Depot engine primitive docs now match the implemented Core/Depot/Orchestration ownership contract used by the Depot release line
compatibility wording preserves existing Data Docs dashboard and private runtime identifiers where they are feature names or migration surfaces

Truthound Product Line

Layer	Repository	Responsibility	Start Here
`Truthound Core`	`truthound`	Validation kernel and data-plane: `th.check()`, `ValidationRunResult`, planner/runtime, zero-config workspace, reporters, checkpoints, Data Docs	Core docs
`Truthound AI`	`truthound.ai`	Optional review-layer APIs for prompt-to-proposal compilation, run analysis, approval history, and controlled apply	`/ai/`
`Truthound Orchestration`	`truthound-orchestration`	First-party execution integration layer for Airflow, Dagster, Prefect, dbt, Mage, and Kestra	`/orchestration/`
`Truthound Depot`	`truthound-depot`	First-party dataset repository console for branch, push, compare, merge request, quality gate, release, rollback, evidence, RBAC, and operator workflows	`/dashboard/`

Truthound is therefore not a monolithic platform with one flat feature surface. It is a layered system in which the core validation contract stays central, while the AI namespace, orchestration adapters, and Depot expose first-party operational layers on top of it.

Why Start With Truthound Core

Polars-first execution and planner-driven aggregation instead of repeated validator-side scans
Extreme zero-configuration by default: th.check(data) creates and reuses a local .truthound/ workspace automatically
Deterministic auto-suite selection that starts with schema/nullability/type/range/key heuristics instead of "run everything"
Canonical ValidationRunResult shared by checkpoints, reporters, validation docs, and plugins
Explicit contracts for contexts, check factories, backends, and artifact generation
Failure-first test lanes and migration diagnostics that make framework upgrades safer in production

Measured Advantages Of Truthound Core Over Great Expectations

The latest fixed-runner release-grade benchmark artifact set shows Truthound ahead of Great Expectations on every comparable workload in the current comparison catalog while preserving correctness parity.

Workload	Truthound Warm (s)	GX Warm (s)	Speedup	Memory Ratio
local-mixed-core-suite	0.028240	0.075232	2.66x	44.29%
local-null	0.016487	0.024964	1.51x	43.62%
local-range	0.002470	0.013219	5.35x	43.84%
local-schema	0.001479	0.017303	11.70x	35.88%
local-unique	0.002023	0.013785	6.81x	42.28%
sqlite-null	0.007370	0.032909	4.47x	48.16%
sqlite-range	0.006053	0.022355	3.69x	43.80%
sqlite-unique	0.002066	0.015655	7.58x	42.12%

The practical reasons behind that result are straightforward and core-specific:

a Polars-first planner/runtime that deduplicates metric work instead of re-scanning through validator loops
deterministic auto-suite selection that keeps default work relevant and exact
a smaller zero-config context model that persists baselines and artifacts without forcing a heavy project bootstrap
one canonical result contract shared by reporters, checkpoints, and validation docs

This comparison is intentionally bounded. It covers comparable deterministic core checks and SQLite pushdown workloads. It is not a blanket claim about orchestration layers, dashboard operations, or every Great Expectations feature area.

Read the published evidence in Latest Verified Benchmark Summary.

What Truthound Core Stabilizes

Truthound Core 3.x centers the public contract around a smaller and more durable kernel:

Layer	Responsibility
`TruthoundContext`	Auto-discovered project workspace, baselines, run history, docs artifacts, plugin runtime, and resolved defaults
`contracts`	Stable ports such as `DataAsset`, `ExecutionBackend`, `MetricRepository`, `ArtifactStore`, and plugin capabilities
`suite`	Immutable validation intent via `ValidationSuite`, `CheckSpec`, `SchemaSpec`, evidence policy, and severity policy
`planning`	Scan planning, backend routing, metric deduplication, and pushdown eligibility
`runtime`	Session lifecycle, retries, timeout-safe execution, exception isolation, and evidence capture
`results`	`CheckResult`, `ValidationRunResult`, and `ExecutionIssue` as the canonical output model

Truthound Orchestration and Truthound Depot build on these contracts instead of replacing them. That is the key layered-system boundary.

Depot Engine Primitives

Truthound Core also carries private Depot engine primitives for dataset repository workflows before they become public API. These Core-owned private primitives are used by Truthound Depot and Truthound Orchestration to exchange redacted artifact envelopes, deterministic dataset fingerprints, summary-level diffs, and ValidationRunResult-based quality gate projections.

This is not a new root public surface: there is no public truthound.datasets and there is no public truthound.depot namespace. Core owns the validation engine, fingerprint/diff primitive, and quality gate projection runtime; Truthound Depot owns repository UI, branch/merge/review/rollback decisions, approval state, and operator workflows.

The design is grounded in proven ideas from Great Expectations, Soda, Deequ, and Pandera, but optimized for a simpler zero-config starting point and a Polars-first execution path.

The practical 3.x kernel changes are:

th.check() returns ValidationRunResult directly
the local .truthound/ workspace is auto-created and reused
validators=None now means deterministic AutoSuiteBuilder, not "run every built-in validator"
compare moved to truthound.drift.compare
checkpoints standardize on CheckpointResult.validation_run and CheckpointResult.validation_view
reporters and validation docs consume ValidationRunResult directly through reporter contract v3

The practical current 3.x AI additions on top of that kernel are:

optional AI dependency bundle: truthound[ai]
public AI review APIs and CLI commands
persisted suite proposal, run analysis, and approval/apply artifacts
root AI support probes for downstream services and dashboards
prompt hardening for Korean, English, and mixed prompts through deterministic normalization, structured provider output, compiler validation, and explicit review artifacts

Quick Start

Installation

pip install truthound

# Optional AI review surface
pip install truthound[ai]

# Development and docs workflows in this repository
uv sync --extra dev --extra docs

Python API

import truthound as th
from truthound.datadocs import generate_validation_report
from truthound.reporters import get_reporter
from truthound.drift import compare

run = th.check(
    {"customer_id": [1, 2, 2], "email": ["a@example.com", None, "c@example.com"]},
)

print(run.execution_mode)
print([check.name for check in run.checks])
print(run.metadata["context_root"])

json_report = get_reporter("json").render(run)
validation_docs = generate_validation_report(run, title="Customer Quality Overview")

context = th.get_context()
schema = th.learn({"id": [1, 2], "status": ["active", "inactive"]})
masked = th.mask(
    {"email": ["a@example.com", "b@example.com"]},
    columns=["email"],
    strategy="hash",
)
drift = compare({"score": [0.1, 0.2]}, {"score": [0.1, 0.8]})

CLI

truthound check data.csv --validators null,unique
truthound check --connection "sqlite:///warehouse.db" --table users --pushdown
truthound scan pii.csv
truthound profile data.csv
truthound doctor . --migrate-2to3
truthound doctor . --workspace
truthound plugins list --json

# Optional AI review workflow
truthound ai suggest-suite data.csv --prompt "Require customer_id to be unique"
truthound ai proposals list
truthound ai explain-run --run-id <run_id>

Public Surface

The root package intentionally exports a smaller API:

Stable facade: check, scan, mask, profile, learn, read, get_context
Core types: TruthoundContext, ValidationSuite, CheckSpec, SchemaSpec, ValidationRunResult, CheckResult
th.check() returns ValidationRunResult directly
Checkpoint runtime results: CheckpointResult.validation_run is canonical and CheckpointResult.validation_view is the compatibility projection for legacy action formatting
Reporter-facing types: truthound.reporters.RunPresentation, truthound.reporters.ReporterContext
Validation docs entry points: truthound.datadocs.ValidationDocsBuilder, truthound.datadocs.generate_validation_report
Drift comparison: import from truthound.drift.compare
Advanced systems: import by namespace, for example truthound.ml, truthound.lineage, truthound.realtime, or truthound.datadocs
Optional AI review surface: import truthound.ai after installing truthound[ai]

Optional AI Surface

Truthound now ships an additive truthound.ai namespace that preserves the core hot path and zero-config workflow while exposing a reviewable AI layer.

suggest_suite(...) compiles prompts into persisted suite proposal artifacts
explain_run(...) compiles run evidence into persisted analysis artifacts
approve_proposal(...), reject_proposal(...), and apply_proposal(...) keep approval and mutation in explicit human-reviewed steps
has_ai_support() and get_ai_support_status() let downstream integrations feature-gate the AI surface cleanly
Korean, English, and mixed prompt normalization converts common quality requests into canonical validation intent candidates before provider guidance
ambiguous or unsupported prompt requests become reviewable rejected items rather than route failures

Read the technical docs in docs/ai/index.md. Read the prompt safety contract in docs/ai/prompt-hardening.md.

The public CLI surface is additive as well:

truthound ai suggest-suite
truthound ai explain-run
truthound ai proposals list/show/approve/reject/apply/history
truthound ai analyses list/show
truthound ai smoke openai
truthound ai smoke openai-explain-run

The experimental use_engine and --use-engine switches remain removed.

Zero-Config Workflow

Truthound auto-creates a .truthound/ workspace at your project root. By default it manages:

.truthound/config.yaml: resolved project defaults
.truthound/catalog/: asset fingerprints and source signatures
.truthound/baselines/: learned schemas and metric history
.truthound/runs/: persisted ValidationRunResult metadata
.truthound/docs/: generated validation docs
.truthound/plugins/: resolved plugin manifest and trust metadata

If you do nothing except call th.check(data), Truthound will:

detect the asset/backend
resolve the active TruthoundContext
load or create a baseline
synthesize an auto-suite
plan and execute the validation
persist the run and validation docs when persistence is enabled

Use truthound doctor . --workspace to verify that the local .truthound/ layout, indexes, baselines, and persisted run artifacts are still structurally healthy.

Plugin Platform

Truthound now uses one lifecycle runtime:

PluginManager is the canonical plugin manager
EnterprisePluginManager is an async, capability-driven facade over the same runtime
Plugins register through stable ports such as register_check_factory, register_data_asset_provider, register_reporter, register_hook, and register_capability
Reporter plugins should target the contract-v3 surface where ValidationRunResult is the canonical render input and RunPresentation is the shared render projection

Documentation

Main docs portal: truthound.netlify.app
Core overview: docs/index.md
Core getting started: docs/getting-started/index.md
Core architecture: docs/concepts/architecture.md
Depot engine primitives: docs/concepts/depot-engine-primitives.md
Core zero-config context: docs/concepts/zero-config.md
Core guides: docs/guides/index.md
Core reference: docs/reference/index.md
AI docs: docs/ai/index.md
Orchestration layer: truthound.netlify.app/orchestration/
Orchestration getting started: docs/orchestration/getting-started.md
Depot console layer: truthound.netlify.app/dashboard/
Release notes: docs/releases/truthound-3.1.3.md
Latest verified benchmark summary: docs/releases/latest-benchmark-summary.md
Migration guide: docs/guides/migration-3.0.md
Legacy archive: docs/legacy/index.md
ADRs: docs/adr/001-validation-kernel.md, docs/adr/002-plugin-platform.md, docs/adr/003-result-model.md, docs/adr/004-migration-compatibility.md

Development

uv run --frozen --extra dev python -m pytest -q
uv run --frozen --extra dev python -m pytest --collect-only -q tests
uv run --frozen --extra dev python -m pytest -q -m "contract or fault or e2e" -p no:cacheprovider
uv run --frozen --extra dev python -m pytest -q -m "contract or fault or integration or soak or stress or scale_100m or e2e" --run-integration --run-expensive --run-soak -p no:cacheprovider
uv run --frozen --extra dev python -m pytest -q tests/test_truthound_3_0_contract.py tests/test_api.py tests/test_public_surface.py tests/test_checkpoint.py -p no:cacheprovider
uv run --frozen --extra benchmarks python -m truthound.cli benchmark parity --suite pr-fast --frameworks truthound --backend local --strict
uv run --frozen --extra benchmarks python -m truthound.cli benchmark parity --suite nightly-core --frameworks both --backend local --strict
uv run --frozen --extra benchmarks python -m truthound.cli benchmark parity --suite nightly-sql --frameworks both --backend sqlite --strict
uv run --frozen --extra benchmarks python -m truthound.cli benchmark parity --suite release-ga --frameworks both --strict
python docs/scripts/prepare_public_docs.py --mode full
python docs/scripts/prepare_public_docs.py --mode public
uv run --frozen --extra dev python docs/scripts/check_links.py --mkdocs mkdocs.yml README.md CLAUDE.md build/full-docs
uv run --frozen --extra dev --extra docs mkdocs build --strict
uv run --frozen --extra dev --extra docs mkdocs build --strict -f mkdocs.public.yml
truthound doctor . --migrate-2to3

Official benchmark comparisons should cite the published fixed-runner artifact set: release-ga.json, env-manifest.json, and latest-benchmark-summary.md.

Tests now follow a failure-first lane model:

contract: stable public API and compatibility boundaries
fault: deterministic failure injection, timeout, corruption, and concurrency scenarios
integration: opt-in backend and external-service coverage
soak and stress: nightly-only load and chaos coverage

The default local run is intentionally fast. Manual verification artifacts live under verification/phase6 and are intentionally kept out of pytest discovery.

Official performance claims should come only from the verified release-grade parity artifacts under .truthound/benchmarks/release/. Nightly outputs are for trend visibility, not public benchmark positioning.

When adding tests, prefer scenarios that protect public contracts or operational failure modes. Avoid adding default-value, getter/setter, enum-literal, to_dict() round-trip, or CSS-string existence tests unless they prove a compatibility boundary that has failed before.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

3.1.3

May 20, 2026

3.1.2

May 4, 2026

3.1.1

Apr 6, 2026

3.1.0

Apr 1, 2026

3.0.1

Mar 27, 2026

3.0.0

Mar 19, 2026

2.0.0

Mar 19, 2026

1.5.0

Mar 9, 2026

1.3.2

Mar 5, 2026

1.3.1

Feb 15, 2026

1.3.0

Feb 12, 2026

1.2.11

Jan 29, 2026

1.2.10

Jan 28, 2026

1.2.9

Jan 27, 2026

1.2.8

Jan 27, 2026

1.2.7

Jan 26, 2026

1.2.6

Jan 26, 2026

1.2.5

Jan 23, 2026

1.2.4

Jan 22, 2026

1.2.3

Jan 22, 2026

1.2.2

Jan 22, 2026

1.2.1

Jan 21, 2026

1.2.0

Jan 20, 2026

1.1.1

Jan 19, 2026

1.1.0

Jan 19, 2026

1.0.17

Jan 19, 2026

1.0.16

Jan 19, 2026

1.0.15

Jan 15, 2026

1.0.13

Jan 13, 2026

1.0.12

Jan 12, 2026

1.0.11

Jan 11, 2026

1.0.10

Jan 9, 2026

1.0.9

Jan 8, 2026

1.0.8

Jan 6, 2026

1.0.7

Jan 5, 2026

1.0.5

Jan 4, 2026

1.0.4

Jan 3, 2026

1.0.2

Jan 3, 2026

1.0.1

Jan 2, 2026

1.0.0

Dec 28, 2025

0.2.0

Dec 22, 2025

0.1.0

Dec 22, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

truthound-3.1.3.tar.gz (9.4 MB view details)

Uploaded May 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

truthound-3.1.3-py3-none-any.whl (3.5 MB view details)

Uploaded May 20, 2026 Python 3

File details

Details for the file truthound-3.1.3.tar.gz.

File metadata

Download URL: truthound-3.1.3.tar.gz
Upload date: May 20, 2026
Size: 9.4 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for truthound-3.1.3.tar.gz
Algorithm	Hash digest
SHA256	`3285b55bc62f38d017ee36ce8afde661159969a543a07b589d5a07be3f9ba9d7`
MD5	`4a87f58dfe25581ba63af6a8fe8dac2d`
BLAKE2b-256	`6ddaf836c03a1f8cf0b7e5c8cdf8bc1068fb1630c9283531cc4682ca3544cb60`

See more details on using hashes here.

File details

Details for the file truthound-3.1.3-py3-none-any.whl.

File metadata

Download URL: truthound-3.1.3-py3-none-any.whl
Upload date: May 20, 2026
Size: 3.5 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for truthound-3.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dfd7a3a8705c06ecc5aea350e285244bb72f47641c2b6c5c516497a19b30c6b7`
MD5	`3353350ea9673bd1403a75144665d36e`
BLAKE2b-256	`a00cdb3b034fc2124f6519fb6351484fd7917055db7cebd829c6007ef0ff000f`

See more details on using hashes here.

truthound 3.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Truthound

Abstract

What's New In 3.1.3

Truthound Product Line

Why Start With Truthound Core

Measured Advantages Of Truthound Core Over Great Expectations

What Truthound Core Stabilizes

Depot Engine Primitives

Quick Start

Installation

Python API

CLI

Public Surface

Optional AI Surface

Zero-Config Workflow

Plugin Platform

Documentation

Development

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes