Skip to main content

GhostDQ SDK: compute data-quality metrics locally and ship them to GhostDQ.

Project description

ghostdq — Python SDK

PyPI License: Apache-2.0

The GhostDQ SDK lets you compute data-quality metrics locally and ship only the aggregated numbers to the GhostDQ cloud — your raw data never leaves your infrastructure.


Install

pip install ghostdq

Optional extras (Avro support requires fastavro, Parquet requires pyarrow — both are included in the core install):

pip install "ghostdq[dev]"   # adds pytest, ruff, mypy, stubs

Quick start

from ghostdq import read_file, parse_contract, compute_metrics, GhostDQClient

# 1. Load your data
df = read_file("sales_2024.parquet")   # .csv / .parquet / .avro

# 2. Parse the contract (or fetch it from the API — see below)
contract = parse_contract(open("sales_contract.yaml").read())

# 3. Compute metrics *locally* — no raw data leaves your machine
metrics = compute_metrics(df, contract.rules)
# → {"row_count": 120000, "null_rate:country": 0.02, ...}

# 4. Ship the metrics to GhostDQ
client = GhostDQClient(api_key="ghd_your_key")
result = client.create_run(dataset_id="<dataset-uuid>", metrics=metrics)
print(result.run_id, result.status)  # ⇒ <uuid>  pending

CLI

# Validate a file against a local contract
ghostdq run \
  --dataset-id <uuid> \
  --file sales.csv \
  --contract contract.yaml \
  --api-key ghd_xxx

# Fetch the contract automatically from the API
ghostdq run \
  --dataset-id <uuid> \
  --file sales.parquet \
  --api-key ghd_xxx

Environment variable shortcuts:

export GHOSTDQ_API_KEY=ghd_xxx
ghostdq run --dataset-id <uuid> --file sales.csv

The Ingest API defaults to https://ghostdq.com/ingest. Override with --ingest-url or GHOSTDQ_INGEST_URL (e.g. http://localhost:8000 for local dev).


Supported file formats

Format Extension Engine
CSV .csv pandas
Parquet .parquet pyarrow
Avro .avro fastavro

Supported rule types

Rule Metric key(s)
row_count row_count
null_rate null_rate:{column}
unique duplicate_count:{column}
value_range value_min:{column}, value_max:{column}
allowed_values disallowed_count:{column}

Local development

Requires Python 3.10+ (3.13 recommended). From the repo root:

python3.13 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
pytest tests
ruff check src tests
mypy src tests --ignore-missing-imports

License & disclaimer

Licensed under Apache License 2.0.

This software is provided “as is”, without warranty of any kind. You are responsible for evaluating whether it fits your use case and for any outcomes from using it. See the LICENSE for the full terms, including limitations of liability.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ghostdq-0.1.3.tar.gz (17.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ghostdq-0.1.3-py3-none-any.whl (14.9 kB view details)

Uploaded Python 3

File details

Details for the file ghostdq-0.1.3.tar.gz.

File metadata

  • Download URL: ghostdq-0.1.3.tar.gz
  • Upload date:
  • Size: 17.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ghostdq-0.1.3.tar.gz
Algorithm Hash digest
SHA256 9686fdd29e3cde45b22de39f046e2c4fe7615918c628831ab85209d8b755a2fe
MD5 6c3abfcf4ebc4344ba7f2ec5145d1086
BLAKE2b-256 9d3c624879947545dd42b660a42b61d00a99e16af0862f9a30be8d1a13a6acd6

See more details on using hashes here.

File details

Details for the file ghostdq-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: ghostdq-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 14.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ghostdq-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 83f581919c699420f44a0958cebe78b253c36e817a161f324aa48d13b77be519
MD5 011274fbc66ad8a7657d49689fee314c
BLAKE2b-256 1de76a31aa0d476afbbc37b72c8a627f18233be235e33e6b72df1d41d0746153

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page