Skip to main content

Data Reliability Index - native validation and filtering of datasets.

Project description

Data Reliability Index

CI

Data Reliability Index pipeline visualization

Data Reliability Index is a Python package for attaching reliability metadata to data points, enforcing trust policies, and filtering unreliable records before they reach analysis or API boundaries.

The package is built around a simple rule: data should carry the evidence needed to decide whether it is safe to use.

Features

  • Pydantic models for reliability metadata and policies.
  • Scanning engine for computing reliability scores from validation evidence.
  • Tiered trust classification with scores from 0 to 100.
  • Policy-based acceptance checks for individual records.
  • Pandas helpers for filtering DataFrames by reliability metadata.
  • FastAPI example for rejecting low-reliability input at ingestion time.
  • MkDocs documentation for concepts and API usage.

Installation

pip install data-reliability-index

For local development from this repository:

pip install -e ".[test]"

Quick Start

from data_reliability import DataTier, ReliabilityPolicy, ReliabilityScanner, ValidationEvidence

scanner = ReliabilityScanner()
data = scanner.scan(
    {"temperature": 21.4, "unit": "celsius"},
    source_id="sensor-a",
    evidence=ValidationEvidence(
        completeness=1.0,
        consistency=1.0,
        provenance=1.0,
        cryptographic_verification=1.0,
        calibration=1.0,
        schema_compliance=1.0,
        anomaly_detection=1.0,
        duplicate_detection=1.0,
        metadata_quality=1.0,
    ),
)

policy = ReliabilityPolicy(
    minimum_score=90,
    maximum_tier=DataTier.TIER_2,
)

assert policy.resolve(data) == {"temperature": 21.4, "unit": "celsius"}

Pandas Filtering

import pandas as pd
from data_reliability import DataTier, ReliabilityMetadata, ReliabilityPolicy, filter_reliable_df

df = pd.DataFrame([
    {
        "value": 10,
        "reliability": ReliabilityMetadata(
            score=95,
            tier=DataTier.TIER_1,
            source_id="sensor-a",
            trace_hash="abc123",
        ),
    },
])

policy = ReliabilityPolicy(minimum_score=90, maximum_tier=DataTier.TIER_2)
trusted = filter_reliable_df(df, policy)

Documentation

The project documentation lives in docs/ and can be served locally with MkDocs:

pip install mkdocs-material
mkdocs serve

Start with:

The longer project rationale is available in data-reliability.md.

Development

Run the test suite:

pip install -e ".[test]"
pytest

Build package artifacts:

pip install -e ".[build]"
python -m build
python -m twine check dist/*

Run the FastAPI example:

uvicorn examples.fastapi_app:app --reload

Contributing

Contributions are welcome. See CONTRIBUTING.md for the local workflow and pull request expectations.

Security

Please report security issues privately. See SECURITY.md.

License

Licensed under the Apache License 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_reliability_index-0.2.0.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

data_reliability_index-0.2.0-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file data_reliability_index-0.2.0.tar.gz.

File metadata

  • Download URL: data_reliability_index-0.2.0.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for data_reliability_index-0.2.0.tar.gz
Algorithm Hash digest
SHA256 3065a561cde34e5835c372199f330184efba159a971f263ffe0a2f367538957d
MD5 9822e41307b520647adcb8e0637b3899
BLAKE2b-256 ff82fc3809a04c7374f4c27cf8761540e40f8fcb4bfe17ca26f61ea5606895c3

See more details on using hashes here.

Provenance

The following attestation bundles were made for data_reliability_index-0.2.0.tar.gz:

Publisher: publish.yml on h3pdesign/data-reliability-index

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file data_reliability_index-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for data_reliability_index-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e08662023e80d10870c2080863840ec875227e83592b5c9e2f80cd60ae63b6f3
MD5 fd0061384f71ba09fabf2e0d8aa53785
BLAKE2b-256 030903f385ac3636571eff19054146c26cac0d7890f4f34c7f78e8643b10f82b

See more details on using hashes here.

Provenance

The following attestation bundles were made for data_reliability_index-0.2.0-py3-none-any.whl:

Publisher: publish.yml on h3pdesign/data-reliability-index

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page