Skip to main content

GhostDQ SDK: compute data-quality metrics locally and ship them to GhostDQ.

Project description

ghostdq — Python SDK

PyPI License: Apache-2.0

The GhostDQ SDK lets you compute data-quality metrics locally and ship only the aggregated numbers to the GhostDQ cloud — your raw data never leaves your infrastructure.


Install

pip install ghostdq

Optional extras (Avro support requires fastavro, Parquet requires pyarrow — both are included in the core install):

pip install "ghostdq[dev]"   # adds pytest, ruff, mypy, stubs

Quick start

from ghostdq import read_file, parse_contract, compute_metrics, GhostDQClient

# 1. Load your data
df = read_file("sales_2024.parquet")   # .csv / .parquet / .avro

# 2. Parse the contract (or fetch it from the API — see below)
contract = parse_contract(open("sales_contract.yaml").read())

# 3. Compute metrics *locally* — no raw data leaves your machine
metrics = compute_metrics(df, contract.rules)
# → {"row_count": 120000, "null_rate:country": 0.02, ...}

# 4. Ship the metrics to GhostDQ
client = GhostDQClient(api_key="ghd_your_key")
result = client.create_run(dataset_id="<dataset-uuid>", metrics=metrics)
print(result.run_id, result.status)  # ⇒ <uuid>  pending

CLI

# Validate a file against a local contract
ghostdq run \
  --dataset-id <uuid> \
  --file sales.csv \
  --contract contract.yaml \
  --api-key ghd_xxx

# Fetch the contract automatically from the API
ghostdq run \
  --dataset-id <uuid> \
  --file sales.parquet \
  --api-key ghd_xxx

Environment variable shortcuts:

export GHOSTDQ_API_KEY=ghd_xxx
ghostdq run --dataset-id <uuid> --file sales.csv

The Ingest API defaults to https://ghostdq.com/ingest. Override with --ingest-url or GHOSTDQ_INGEST_URL (e.g. http://localhost:8000 for local dev).


Supported file formats

Format Extension Engine
CSV .csv pandas
Parquet .parquet pyarrow
Avro .avro fastavro

Supported rule types

Rule Metric key(s)
row_count row_count
null_rate null_rate:{column}
unique duplicate_count:{column}
value_range value_min:{column}, value_max:{column}
allowed_values disallowed_count:{column}

Local development

Requires Python 3.10+ (3.13 recommended). From the repo root:

python3.13 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
pytest tests
ruff check src tests
mypy src tests --ignore-missing-imports

License & disclaimer

Licensed under Apache License 2.0.

This software is provided “as is”, without warranty of any kind. You are responsible for evaluating whether it fits your use case and for any outcomes from using it. See the LICENSE for the full terms, including limitations of liability.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ghostdq-0.1.4.tar.gz (19.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ghostdq-0.1.4-py3-none-any.whl (16.5 kB view details)

Uploaded Python 3

File details

Details for the file ghostdq-0.1.4.tar.gz.

File metadata

  • Download URL: ghostdq-0.1.4.tar.gz
  • Upload date:
  • Size: 19.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ghostdq-0.1.4.tar.gz
Algorithm Hash digest
SHA256 d637f3c86c2ccaa189f7df15c4792f647f32cb21ab0d85830e1a2fc77226d2b1
MD5 13583ffa109ea06dd1b1249e1dcd5426
BLAKE2b-256 48faef4c7efd5e924aa7fddcb849e81fd13fe08253a9bd46b1dce1cf95e151a5

See more details on using hashes here.

File details

Details for the file ghostdq-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: ghostdq-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 16.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ghostdq-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 9579611cba8c202e68b3d05768f131744eaea255b6247eaf7e9cfaea76976293
MD5 f13102cda1bc9fc65cfe540ae37615c6
BLAKE2b-256 437c97ae16dcc9b95fc52355697ebbb0946f9c82bcad320c67d6c1517b86c8f6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page