Skip to main content

Real-time data quality screening API — PASS / WARN / BLOCK in under 10ms

Project description

DataScreenIQ Python SDK

PyPI version Python 3.8+ License: MIT

Real-time data quality screening at the edge. Screen any data payload and get PASS / WARN / BLOCK in under 10ms.

import datascreeniq as dsiq

client = dsiq.Client("dsiq_live_...")
report = client.screen(rows, source="orders")

print(report.status)       # BLOCK
print(report.health_pct)   # 34.0%
print(report.issues)       # {"type_mismatches": ["amount"], "null_rates": {"email": 0.5}}

Installation

pip install datascreeniq

With pandas support:

pip install datascreeniq[pandas]

With Excel support:

pip install datascreeniq[excel]

Everything:

pip install datascreeniq[all]

Quick start

Get a free API key at datascreeniq.com — 500K rows/month free.

import datascreeniq as dsiq

client = dsiq.Client("dsiq_live_...")

rows = [
    {"order_id": "ORD-001", "amount": 99.50,    "email": "alice@corp.com"},
    {"order_id": "ORD-002", "amount": "broken", "email": None},
    {"order_id": "ORD-003", "amount": 75.00,    "email": None},
]

report = client.screen(rows, source="orders")

print(report.status)        # BLOCK
print(report.health_pct)    # 34.0%
print(report.type_mismatches)  # ["amount"]
print(report.null_rates)       # {"email": 0.5}
print(report.summary())
# 🚨 BLOCK | Health: 34.0% | Rows: 3 | Type mismatches: amount | Null rate: email=50% | (9ms)

API key

Set as environment variable (recommended):

export DATASCREENIQ_API_KEY="dsiq_live_..."
client = dsiq.Client()  # reads from env automatically

Or pass directly:

client = dsiq.Client("dsiq_live_...")

Usage

Screen a list of dicts

report = client.screen(rows, source="orders")

Screen a CSV file

report = client.screen_file("orders.csv", source="orders")

Screen an Excel file

# pip install datascreeniq[excel]
report = client.screen_file("orders.xlsx", source="orders", sheet=0)

Screen a pandas DataFrame

# pip install datascreeniq[pandas]
import pandas as pd

df = pd.read_csv("orders.csv")
report = client.screen_dataframe(df, source="orders")

Screen a JSON or XML file

report = client.screen_file("orders.json", source="orders")
report = client.screen_file("orders.xml",  source="orders")

The ScreenReport object

report.status           # "PASS" | "WARN" | "BLOCK"
report.health_score     # float 0.0 – 1.0
report.health_pct       # "94.5%"

report.is_pass          # True / False
report.is_warn          # True / False
report.is_blocked       # True / False

report.issues           # full issues dict
report.type_mismatches  # ["amount", "price"]
report.null_rates       # {"email": 0.50}
report.outlier_fields   # ["amount"]

report.drift            # list of drift events
report.drift_count      # int
report.has_drift        # True / False

report.rows_received    # int
report.rows_sampled     # int
report.latency_ms       # int
report.batch_id         # str
report.timestamp        # ISO string

report.summary()        # human-readable one-liner
report.to_dict()        # raw API response

Pipeline integration

Raise on block

from datascreeniq.exceptions import DataQualityError

try:
    client.screen(rows, source="orders").raise_on_block()
    # only reaches here if PASS or WARN
    load_to_warehouse(rows)

except DataQualityError as e:
    print(f"Blocked: {e}")
    print(f"Issues:  {e.report.issues}")
    send_to_dead_letter_queue(rows)

Airflow task

from airflow.decorators import task
import datascreeniq as dsiq

@task
def quality_gate(rows: list, source: str) -> dict:
    client = dsiq.Client()   # reads DATASCREENIQ_API_KEY from env
    report = client.screen(rows, source=source)
    if report.is_blocked:
        raise ValueError(f"Data blocked: {report.summary()}")
    return report.to_dict()

Prefect flow

from prefect import flow, task
import datascreeniq as dsiq

@task
def screen_data(rows, source):
    return dsiq.Client().screen(rows, source=source).raise_on_block()

@flow
def my_pipeline():
    rows = extract_from_source()
    screen_data(rows, source="orders")   # blocks flow if quality fails
    load_to_warehouse(rows)

dbt post-hook

import pandas as pd
import datascreeniq as dsiq

def screen_dbt_model(model_name: str, conn):
    df = pd.read_sql(f"SELECT * FROM {model_name} LIMIT 10000", conn)
    return dsiq.Client().screen_dataframe(df, source=model_name).raise_on_block()

Large files — auto chunking

Files with more than 10,000 rows are automatically split into chunks and screened in parallel. Results are merged into a single report:

# 1M row file — 100 API calls, one merged report
report = client.screen_file("events.csv", source="events")
print(f"Screened {report.rows_received:,} rows")

Error handling

from datascreeniq.exceptions import (
    AuthenticationError,   # invalid API key
    PlanLimitError,        # monthly row limit exceeded
    RateLimitError,        # too many requests
    ValidationError,       # bad payload
    APIError,              # server error
    DataQualityError,      # raised by .raise_on_block()
)

try:
    report = client.screen(rows, source="orders")
except AuthenticationError:
    print("Check your API key")
except PlanLimitError:
    print("Monthly limit reached — upgrade at datascreeniq.com")
except PlanLimitError as e:
    print(f"Rate limited: {e}")

Pricing

Plan Price Rows / month
Developer Free 500K
Starter $19/mo 5M
Growth $79/mo 50M
Scale $199/mo 500M

Get your free API key →


Requirements

  • Python 3.8+
  • requests (auto-installed)
  • pandas — optional, for screen_dataframe()
  • openpyxl — optional, for Excel files

Links


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datascreeniq-1.0.2.tar.gz (15.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datascreeniq-1.0.2-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file datascreeniq-1.0.2.tar.gz.

File metadata

  • Download URL: datascreeniq-1.0.2.tar.gz
  • Upload date:
  • Size: 15.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for datascreeniq-1.0.2.tar.gz
Algorithm Hash digest
SHA256 614a812078b883ea4c955964f7d8824bdede1f31abee6bd8f55fd6fa671a0515
MD5 6f748bca112856aa1d5849f78ca44ace
BLAKE2b-256 7c9dccbd75b3524fe8d8ef097abd4ac47321e14da8454854e3fc408e9e255428

See more details on using hashes here.

File details

Details for the file datascreeniq-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: datascreeniq-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 12.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for datascreeniq-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c25b2a5b73c5dfe089db25beda58e59443b49f4947ee220eea7f5dc857fae2fe
MD5 b0925240fae877668954ae5df9b16ece
BLAKE2b-256 bcaef02376cb63d56d64671af4451d5a1f44db91343b5e6d55ca1ca8730d117b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page