Real-time data quality screening API — PASS / WARN / BLOCK in under 10ms
Project description
DataScreenIQ Python SDK
Most data pipelines don’t fail — they silently corrupt production data, break dashboards, and go unnoticed for days.
DataScreenIQ acts as a gate before your database, detecting schema drift, missing values, and type mismatches in real time.
Real-time data quality screening at the edge. Screen any data payload and get PASS / WARN / BLOCK in milli seconds .
import datascreeniq as dsiq
client = dsiq.Client("dsiq_live_...")
report = client.screen(rows, source="orders")
print(report.status) # BLOCK
print(report.health_pct) # 34.0%
print(report.issues) # {"type_mismatches": ["amount"], "null_rates": {"email": 0.5}}
Installation
pip install datascreeniq
With pandas support:
pip install datascreeniq[pandas]
With Excel support:
pip install datascreeniq[excel]
Everything:
pip install datascreeniq[all]
Quick start
Get a free API key at datascreeniq.com — 500K rows/month free.
import datascreeniq as dsiq
client = dsiq.Client("dsiq_live_...")
rows = [
{"order_id": "ORD-001", "amount": 99.50, "email": "alice@corp.com"},
{"order_id": "ORD-002", "amount": "broken", "email": None},
{"order_id": "ORD-003", "amount": 75.00, "email": None},
]
report = client.screen(rows, source="orders")
print(report.status) # BLOCK
print(report.health_pct) # 34.0%
print(report.type_mismatches) # ["amount"]
print(report.null_rates) # {"email": 0.5}
print(report.summary())
# 🚨 BLOCK | Health: 34.0% | Rows: 3 | Type mismatches: amount | Null rate: email=50% | (9ms)
API key
Set as environment variable (recommended):
export DATASCREENIQ_API_KEY="dsiq_live_..."
client = dsiq.Client() # reads from env automatically
Or pass directly:
client = dsiq.Client("dsiq_live_...")
Usage
Screen a list of dicts
report = client.screen(rows, source="orders")
Screen a CSV file
report = client.screen_file("orders.csv", source="orders")
Screen an Excel file
# pip install datascreeniq[excel]
report = client.screen_file("orders.xlsx", source="orders", sheet=0)
Screen a pandas DataFrame
# pip install datascreeniq[pandas]
import pandas as pd
df = pd.read_csv("orders.csv")
report = client.screen_dataframe(df, source="orders")
Screen a JSON or XML file
report = client.screen_file("orders.json", source="orders")
report = client.screen_file("orders.xml", source="orders")
The ScreenReport object
report.status # "PASS" | "WARN" | "BLOCK"
report.health_score # float 0.0 – 1.0
report.health_pct # "94.5%"
report.is_pass # True / False
report.is_warn # True / False
report.is_blocked # True / False
report.issues # full issues dict
report.type_mismatches # ["amount", "price"]
report.null_rates # {"email": 0.50}
report.outlier_fields # ["amount"]
report.drift # list of drift events
report.drift_count # int
report.has_drift # True / False
report.rows_received # int
report.rows_sampled # int
report.latency_ms # int
report.batch_id # str
report.timestamp # ISO string
report.summary() # human-readable one-liner
report.to_dict() # raw API response
Pipeline integration
Raise on block
from datascreeniq.exceptions import DataQualityError
try:
client.screen(rows, source="orders").raise_on_block()
# only reaches here if PASS or WARN
load_to_warehouse(rows)
except DataQualityError as e:
print(f"Blocked: {e}")
print(f"Issues: {e.report.issues}")
send_to_dead_letter_queue(rows)
Airflow task
from airflow.decorators import task
import datascreeniq as dsiq
@task
def quality_gate(rows: list, source: str) -> dict:
client = dsiq.Client() # reads DATASCREENIQ_API_KEY from env
report = client.screen(rows, source=source)
if report.is_blocked:
raise ValueError(f"Data blocked: {report.summary()}")
return report.to_dict()
Prefect flow
from prefect import flow, task
import datascreeniq as dsiq
@task
def screen_data(rows, source):
return dsiq.Client().screen(rows, source=source).raise_on_block()
@flow
def my_pipeline():
rows = extract_from_source()
screen_data(rows, source="orders") # blocks flow if quality fails
load_to_warehouse(rows)
dbt post-hook
import pandas as pd
import datascreeniq as dsiq
def screen_dbt_model(model_name: str, conn):
df = pd.read_sql(f"SELECT * FROM {model_name} LIMIT 10000", conn)
return dsiq.Client().screen_dataframe(df, source=model_name).raise_on_block()
Large files — auto chunking
Files with more than 10,000 rows are automatically split into chunks and screened in parallel. Results are merged into a single report:
# 1M row file — 100 API calls, one merged report
report = client.screen_file("events.csv", source="events")
print(f"Screened {report.rows_received:,} rows")
Error handling
from datascreeniq.exceptions import (
AuthenticationError, # invalid API key
PlanLimitError, # monthly row limit exceeded
RateLimitError, # too many requests
ValidationError, # bad payload
APIError, # server error
DataQualityError, # raised by .raise_on_block()
)
try:
report = client.screen(rows, source="orders")
except AuthenticationError:
print("Check your API key")
except PlanLimitError:
print("Monthly limit reached — upgrade at datascreeniq.com")
except PlanLimitError as e:
print(f"Rate limited: {e}")
Why DataScreenIQ exists
• Dashboards break AFTER bad data is already stored • Data tests are usually batch-based and too late • Silent corruption is the most expensive failure in data systems
DataScreenIQ moves validation to the edge — before storage, before transformation, before damage.
Why thrust this
Built for production workloads: • Handles 1M+ rows via auto-chunking • Parallel validation engine • Sub-second latency decisions
Pricing
| Plan | Price | Rows / month |
|---|---|---|
| Developer | Free | 500K |
| Starter | $19/mo | 5M |
| Growth | $79/mo | 50M |
| Scale | $199/mo | 500M |
Requirements
- Python 3.8+
requests(auto-installed)pandas— optional, forscreen_dataframe()openpyxl— optional, for Excel files
Links
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datascreeniq-1.0.3.tar.gz.
File metadata
- Download URL: datascreeniq-1.0.3.tar.gz
- Upload date:
- Size: 16.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
445421da4d14603a91f9c372fb8fb716edcee53222cf0d5ce601fb9f0bab57c7
|
|
| MD5 |
04e245210d75e66bafbe9247d72acb47
|
|
| BLAKE2b-256 |
82e26851fceed94d963f98b4ef00a1b4b755fd9052cc5721dbdf53f7199c0254
|
File details
Details for the file datascreeniq-1.0.3-py3-none-any.whl.
File metadata
- Download URL: datascreeniq-1.0.3-py3-none-any.whl
- Upload date:
- Size: 12.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2a4bc6a1313888006ae08d34080d7f8ba418feb286eb2210eab9c035c1e3d5ca
|
|
| MD5 |
e13e9d7b1d05d3709815f7610eb234dc
|
|
| BLAKE2b-256 |
9bf27acdbd8f4f3ae1cb0247c401c1897822ba76ba7d518a25741bf7e068f0c4
|