Skip to main content

Real-time data quality screening API — PASS / WARN / BLOCK in under 10ms

Project description

DataScreenIQ Python SDK

PyPI version Python 3.8+ License: MIT

Most data pipelines don’t fail — they silently corrupt production data, break dashboards, and go unnoticed for days.

DataScreenIQ acts as a gate before your database, detecting schema drift, missing values, and type mismatches in real time.

Real-time data quality screening at the edge. Screen any data payload and get PASS / WARN / BLOCK in milli seconds .

import datascreeniq as dsiq

client = dsiq.Client("dsiq_live_...")
report = client.screen(rows, source="orders")

print(report.status)       # BLOCK
print(report.health_pct)   # 34.0%
print(report.issues)       # {"type_mismatches": ["amount"], "null_rates": {"email": 0.5}}

Installation

pip install datascreeniq

With pandas support:

pip install datascreeniq[pandas]

With Excel support:

pip install datascreeniq[excel]

Everything:

pip install datascreeniq[all]

Quick start

Get a free API key at datascreeniq.com — 500K rows/month free.

import datascreeniq as dsiq

client = dsiq.Client("dsiq_live_...")

rows = [
    {"order_id": "ORD-001", "amount": 99.50,    "email": "alice@corp.com"},
    {"order_id": "ORD-002", "amount": "broken", "email": None},
    {"order_id": "ORD-003", "amount": 75.00,    "email": None},
]

report = client.screen(rows, source="orders")

print(report.status)        # BLOCK
print(report.health_pct)    # 34.0%
print(report.type_mismatches)  # ["amount"]
print(report.null_rates)       # {"email": 0.5}
print(report.summary())
# 🚨 BLOCK | Health: 34.0% | Rows: 3 | Type mismatches: amount | Null rate: email=50% | (9ms)

API key

Set as environment variable (recommended):

export DATASCREENIQ_API_KEY="dsiq_live_..."
client = dsiq.Client()  # reads from env automatically

Or pass directly:

client = dsiq.Client("dsiq_live_...")

Usage

Screen a list of dicts

report = client.screen(rows, source="orders")

Screen a CSV file

report = client.screen_file("orders.csv", source="orders")

Screen an Excel file

# pip install datascreeniq[excel]
report = client.screen_file("orders.xlsx", source="orders", sheet=0)

Screen a pandas DataFrame

# pip install datascreeniq[pandas]
import pandas as pd

df = pd.read_csv("orders.csv")
report = client.screen_dataframe(df, source="orders")

Screen a JSON or XML file

report = client.screen_file("orders.json", source="orders")
report = client.screen_file("orders.xml",  source="orders")

The ScreenReport object

report.status           # "PASS" | "WARN" | "BLOCK"
report.health_score     # float 0.0 – 1.0
report.health_pct       # "94.5%"

report.is_pass          # True / False
report.is_warn          # True / False
report.is_blocked       # True / False

report.issues           # full issues dict
report.type_mismatches  # ["amount", "price"]
report.null_rates       # {"email": 0.50}
report.outlier_fields   # ["amount"]

report.drift            # list of drift events
report.drift_count      # int
report.has_drift        # True / False

report.rows_received    # int
report.rows_sampled     # int
report.latency_ms       # int
report.batch_id         # str
report.timestamp        # ISO string

report.summary()        # human-readable one-liner
report.to_dict()        # raw API response

Pipeline integration

Raise on block

from datascreeniq.exceptions import DataQualityError

try:
    client.screen(rows, source="orders").raise_on_block()
    # only reaches here if PASS or WARN
    load_to_warehouse(rows)

except DataQualityError as e:
    print(f"Blocked: {e}")
    print(f"Issues:  {e.report.issues}")
    send_to_dead_letter_queue(rows)

Airflow task

from airflow.decorators import task
import datascreeniq as dsiq

@task
def quality_gate(rows: list, source: str) -> dict:
    client = dsiq.Client()   # reads DATASCREENIQ_API_KEY from env
    report = client.screen(rows, source=source)
    if report.is_blocked:
        raise ValueError(f"Data blocked: {report.summary()}")
    return report.to_dict()

Prefect flow

from prefect import flow, task
import datascreeniq as dsiq

@task
def screen_data(rows, source):
    return dsiq.Client().screen(rows, source=source).raise_on_block()

@flow
def my_pipeline():
    rows = extract_from_source()
    screen_data(rows, source="orders")   # blocks flow if quality fails
    load_to_warehouse(rows)

dbt post-hook

import pandas as pd
import datascreeniq as dsiq

def screen_dbt_model(model_name: str, conn):
    df = pd.read_sql(f"SELECT * FROM {model_name} LIMIT 10000", conn)
    return dsiq.Client().screen_dataframe(df, source=model_name).raise_on_block()

Large files — auto chunking

Files with more than 10,000 rows are automatically split into chunks and screened in parallel. Results are merged into a single report:

# 1M row file — 100 API calls, one merged report
report = client.screen_file("events.csv", source="events")
print(f"Screened {report.rows_received:,} rows")

Error handling

from datascreeniq.exceptions import (
    AuthenticationError,   # invalid API key
    PlanLimitError,        # monthly row limit exceeded
    RateLimitError,        # too many requests
    ValidationError,       # bad payload
    APIError,              # server error
    DataQualityError,      # raised by .raise_on_block()
)

try:
    report = client.screen(rows, source="orders")
except AuthenticationError:
    print("Check your API key")
except PlanLimitError:
    print("Monthly limit reached — upgrade at datascreeniq.com")
except PlanLimitError as e:
    print(f"Rate limited: {e}")

Why DataScreenIQ exists

• Dashboards break AFTER bad data is already stored • Data tests are usually batch-based and too late • Silent corruption is the most expensive failure in data systems

DataScreenIQ moves validation to the edge — before storage, before transformation, before damage.

Why thrust this

Built for production workloads: • Handles 1M+ rows via auto-chunking • Parallel validation engine • Sub-second latency decisions

Pricing

Plan Price Rows / month
Developer Free 500K
Starter $19/mo 5M
Growth $79/mo 50M
Scale $199/mo 500M

Get your free API key →


Requirements

  • Python 3.8+
  • requests (auto-installed)
  • pandas — optional, for screen_dataframe()
  • openpyxl — optional, for Excel files

Links


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datascreeniq-1.0.3.tar.gz (16.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datascreeniq-1.0.3-py3-none-any.whl (12.9 kB view details)

Uploaded Python 3

File details

Details for the file datascreeniq-1.0.3.tar.gz.

File metadata

  • Download URL: datascreeniq-1.0.3.tar.gz
  • Upload date:
  • Size: 16.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for datascreeniq-1.0.3.tar.gz
Algorithm Hash digest
SHA256 445421da4d14603a91f9c372fb8fb716edcee53222cf0d5ce601fb9f0bab57c7
MD5 04e245210d75e66bafbe9247d72acb47
BLAKE2b-256 82e26851fceed94d963f98b4ef00a1b4b755fd9052cc5721dbdf53f7199c0254

See more details on using hashes here.

File details

Details for the file datascreeniq-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: datascreeniq-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 12.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for datascreeniq-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 2a4bc6a1313888006ae08d34080d7f8ba418feb286eb2210eab9c035c1e3d5ca
MD5 e13e9d7b1d05d3709815f7610eb234dc
BLAKE2b-256 9bf27acdbd8f4f3ae1cb0247c401c1897822ba76ba7d518a25741bf7e068f0c4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page