Skip to main content

Real-time data quality screening API — PASS / WARN / BLOCK in under 10ms

Project description

DataScreenIQ Python SDK

Real-time data quality screening at the edge. Screen any data payload and get PASS / WARN / BLOCK in under 10ms.

import datascreeniq as dsiq

client = dsiq.Client("dsiq_live_...")
report = client.screen(rows, source="orders")

print(report.status)       # BLOCK
print(report.health_pct)   # 34.0%
print(report.issues)       # {"type_mismatches": ["amount"], "null_rates": {"email": 0.5}}

Installation

pip install datascreeniq

With pandas support:

pip install datascreeniq[pandas]

With Excel support:

pip install datascreeniq[excel]

Everything:

pip install datascreeniq[all]

Quick start

Get a free API key at datascreeniq.com — 500K rows/month free.

import datascreeniq as dsiq

client = dsiq.Client("dsiq_live_...")

rows = [
    {"order_id": "ORD-001", "amount": 99.50,    "email": "alice@corp.com"},
    {"order_id": "ORD-002", "amount": "broken", "email": None},
    {"order_id": "ORD-003", "amount": 75.00,    "email": None},
]

report = client.screen(rows, source="orders")

print(report.status)        # BLOCK
print(report.health_pct)    # 34.0%
print(report.type_mismatches)  # ["amount"]
print(report.null_rates)       # {"email": 0.5}
print(report.summary())
# 🚨 BLOCK | Health: 34.0% | Rows: 3 | Type mismatches: amount | Null rate: email=50% | (9ms)

API key

Set as environment variable (recommended):

export DATASCREENIQ_API_KEY="dsiq_live_..."
client = dsiq.Client()  # reads from env automatically

Or pass directly:

client = dsiq.Client("dsiq_live_...")

Usage

Screen a list of dicts

report = client.screen(rows, source="orders")

Screen a CSV file

report = client.screen_file("orders.csv", source="orders")

Screen an Excel file

# pip install datascreeniq[excel]
report = client.screen_file("orders.xlsx", source="orders", sheet=0)

Screen a pandas DataFrame

# pip install datascreeniq[pandas]
import pandas as pd

df = pd.read_csv("orders.csv")
report = client.screen_dataframe(df, source="orders")

Screen a JSON or XML file

report = client.screen_file("orders.json", source="orders")
report = client.screen_file("orders.xml",  source="orders")

The ScreenReport object

report.status           # "PASS" | "WARN" | "BLOCK"
report.health_score     # float 0.0 – 1.0
report.health_pct       # "94.5%"

report.is_pass          # True / False
report.is_warn          # True / False
report.is_blocked       # True / False

report.issues           # full issues dict
report.type_mismatches  # ["amount", "price"]
report.null_rates       # {"email": 0.50}
report.outlier_fields   # ["amount"]

report.drift            # list of drift events
report.drift_count      # int
report.has_drift        # True / False

report.rows_received    # int
report.rows_sampled     # int
report.latency_ms       # int
report.batch_id         # str
report.timestamp        # ISO string

report.summary()        # human-readable one-liner
report.to_dict()        # raw API response

Pipeline integration

Raise on block

from datascreeniq.exceptions import DataQualityError

try:
    client.screen(rows, source="orders").raise_on_block()
    # only reaches here if PASS or WARN
    load_to_warehouse(rows)

except DataQualityError as e:
    print(f"Blocked: {e}")
    print(f"Issues:  {e.report.issues}")
    send_to_dead_letter_queue(rows)

Airflow task

from airflow.decorators import task
import datascreeniq as dsiq

@task
def quality_gate(rows: list, source: str) -> dict:
    client = dsiq.Client()   # reads DATASCREENIQ_API_KEY from env
    report = client.screen(rows, source=source)
    if report.is_blocked:
        raise ValueError(f"Data blocked: {report.summary()}")
    return report.to_dict()

Prefect flow

from prefect import flow, task
import datascreeniq as dsiq

@task
def screen_data(rows, source):
    return dsiq.Client().screen(rows, source=source).raise_on_block()

@flow
def my_pipeline():
    rows = extract_from_source()
    screen_data(rows, source="orders")   # blocks flow if quality fails
    load_to_warehouse(rows)

dbt post-hook

import pandas as pd
import datascreeniq as dsiq

def screen_dbt_model(model_name: str, conn):
    df = pd.read_sql(f"SELECT * FROM {model_name} LIMIT 10000", conn)
    return dsiq.Client().screen_dataframe(df, source=model_name).raise_on_block()

Large files — auto chunking

Files with more than 10,000 rows are automatically split into chunks and screened in parallel. Results are merged into a single report:

# 1M row file — 100 API calls, one merged report
report = client.screen_file("events.csv", source="events")
print(f"Screened {report.rows_received:,} rows")

Error handling

from datascreeniq.exceptions import (
    AuthenticationError,   # invalid API key
    PlanLimitError,        # monthly row limit exceeded
    RateLimitError,        # too many requests
    ValidationError,       # bad payload
    APIError,              # server error
    DataQualityError,      # raised by .raise_on_block()
)

try:
    report = client.screen(rows, source="orders")
except AuthenticationError:
    print("Check your API key")
except PlanLimitError:
    print("Monthly limit reached — upgrade at datascreeniq.com")
except PlanLimitError as e:
    print(f"Rate limited: {e}")

Pricing

Plan Price Rows / month
Developer Free 500K
Starter $19/mo 5M
Growth $79/mo 50M
Scale $199/mo 500M

Get your free API key →


Requirements

  • Python 3.8+
  • requests (auto-installed)
  • pandas — optional, for screen_dataframe()
  • openpyxl — optional, for Excel files

Links


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datascreeniq-1.0.0.tar.gz (14.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datascreeniq-1.0.0-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file datascreeniq-1.0.0.tar.gz.

File metadata

  • Download URL: datascreeniq-1.0.0.tar.gz
  • Upload date:
  • Size: 14.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for datascreeniq-1.0.0.tar.gz
Algorithm Hash digest
SHA256 76656f3462b29d823a8efc71560c61e6af143a047901a29cf7c8c24723a902c3
MD5 c354b2a26a83f6168d721e13ea4e522d
BLAKE2b-256 78c50269dc776e02b9211f8c9bffdc69c18eb5e2d81ada1f297438f75e2e801d

See more details on using hashes here.

File details

Details for the file datascreeniq-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: datascreeniq-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 11.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for datascreeniq-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b0e30c0bc93ac9869280ba0ac4d1c3ddbd8c5c7609fdeffc39014fc9ab4fa6fe
MD5 fc8b49fea6f278434946a47882e957be
BLAKE2b-256 4da157628aa2c4ff265f9e3f93263238c229637aac3b328a0b16d25e1dab0d97

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page