GhostDQ SDK: compute data-quality metrics locally and ship them to GhostDQ.
Project description
ghostdq — Python SDK
The GhostDQ SDK lets you compute data-quality metrics locally and ship only the aggregated numbers to the GhostDQ cloud — your raw data never leaves your infrastructure.
Install
pip install ghostdq
Optional extras (Avro support requires fastavro, Parquet requires pyarrow — both are included in the core install):
pip install "ghostdq[dev]" # adds pytest, ruff, mypy, stubs
Quick start
from ghostdq import read_file, parse_contract, compute_metrics, GhostDQClient
# 1. Load your data
df = read_file("sales_2024.parquet") # .csv / .parquet / .avro
# 2. Parse the contract (or fetch it from the API — see below)
contract = parse_contract(open("sales_contract.yaml").read())
# 3. Compute metrics *locally* — no raw data leaves your machine
metrics = compute_metrics(df, contract.rules)
# → {"row_count": 120000, "null_rate:country": 0.02, ...}
# 4. Ship the metrics to GhostDQ
client = GhostDQClient(
api_key="ghd_your_key",
ingest_url="https://ingest.ghostdq.io",
)
result = client.create_run(dataset_id="<dataset-uuid>", metrics=metrics)
print(result.run_id, result.status) # ⇒ <uuid> pending
CLI
# Validate a file against a local contract
ghostdq run \
--dataset-id <uuid> \
--file sales.csv \
--contract contract.yaml \
--api-key ghd_xxx \
--ingest-url https://ingest.ghostdq.io
# Fetch the contract automatically from the API
ghostdq run \
--dataset-id <uuid> \
--file sales.parquet \
--api-key ghd_xxx \
--ingest-url https://ingest.ghostdq.io
Environment variable shortcuts:
export GHOSTDQ_API_KEY=ghd_xxx
export GHOSTDQ_INGEST_URL=https://ingest.ghostdq.io
ghostdq run --dataset-id <uuid> --file sales.csv
Supported file formats
| Format | Extension | Engine |
|---|---|---|
| CSV | .csv |
pandas |
| Parquet | .parquet |
pyarrow |
| Avro | .avro |
fastavro |
Supported rule types
| Rule | Metric key(s) |
|---|---|
row_count |
row_count |
null_rate |
null_rate:{column} |
unique |
duplicate_count:{column} |
value_range |
value_min:{column}, value_max:{column} |
allowed_values |
disallowed_count:{column} |
Local development
Requires Python 3.10+ (3.13 recommended). From the repo root:
python3.13 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
pytest tests
ruff check src tests
mypy src tests --ignore-missing-imports
License & disclaimer
Licensed under Apache License 2.0.
This software is provided “as is”, without warranty of any kind. You are responsible for evaluating whether it fits your use case and for any outcomes from using it. See the LICENSE for the full terms, including limitations of liability.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ghostdq-0.1.2.tar.gz.
File metadata
- Download URL: ghostdq-0.1.2.tar.gz
- Upload date:
- Size: 17.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cdd521edc3eecfbf860b5ab87bc5b3de075abf92ac583de4985a375c39d929e2
|
|
| MD5 |
f348dbd36a7630494c160b5ac14deb29
|
|
| BLAKE2b-256 |
2ac12d5f6687e84fd4bbef0f4897d8bc15d80bf6b8646a2b325fdc8447c2056e
|
File details
Details for the file ghostdq-0.1.2-py3-none-any.whl.
File metadata
- Download URL: ghostdq-0.1.2-py3-none-any.whl
- Upload date:
- Size: 14.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1bec277a5f0856ab77cbdf3b4fce36f25c9c58a1d0705ecc17eb1c9cc7165c09
|
|
| MD5 |
ec2dcd9dbc40fd863a0b0d21f27dae0d
|
|
| BLAKE2b-256 |
55049ac6de57fd8eff56c53e2ba06fa4ff0f8cabe1d9082cb240e0f12e92ed75
|