A detective for your data. Zero-config data quality monitoring.
Project description
Zero config. Zero YAML. Zero rules to write.
Scherlok learns what "normal" looks like, then tells you when something changes.
The Problem
Every data team has the same nightmare:
A source API silently changes from dollars to cents. Revenue dashboards show wrong numbers for 3 weeks before anyone notices.
A column starts returning NULLs. A table stops updating. Row counts drop 40% on a Tuesday. Nobody knows until the CEO asks why the report looks weird.
Current tools (Great Expectations, Soda, dbt tests) require you to define what "correct" looks like before you can detect what's wrong. Hundreds of rules. Dozens of YAML files. And you still miss things — because you can't write rules for problems you haven't imagined yet.
The Solution
Scherlok takes the opposite approach: learn first, then detect.
scherlok connect postgres://user:pass@host/db # connect once
scherlok investigate # learn your data
scherlok watch # detect anomalies
Three commands. Five minutes. Done.
What It Catches
| Anomaly | What Happened | Severity |
|---|---|---|
| Volume drop | Row count dropped 40% overnight | CRITICAL |
| Volume spike | 3x more rows than normal | WARNING |
| Freshness alert | Table hasn't updated in 12h (normally every 2h) | CRITICAL |
| Schema drift | Column removed or type changed | CRITICAL |
| NULL surge | NULL rate jumped from 2% to 45% | WARNING |
| Distribution shift | Column mean shifted 5+ standard deviations | WARNING |
| Cardinality explosion | Status column went from 5 values to 500 | CRITICAL |
Every anomaly is auto-scored: INFO, WARNING, or CRITICAL. No thresholds to configure.
How It Works
1. investigate — Learn the patterns
$ scherlok investigate
Profiling 12 tables...
✓ users — 45,231 rows, 8 columns
✓ orders — 1,203,847 rows, 15 columns
✓ products — 892 rows, 12 columns
...
Done. Profiles saved.
Scherlok profiles every table: row counts, column types, NULL rates, value distributions, freshness cadence, cardinality. Stores everything locally in SQLite.
2. watch — Detect anomalies
$ scherlok watch
Checking 12 tables against learned profiles...
🔴 CRITICAL orders volume_drop Row count dropped 52% (1,203,847 → 578,412)
🟡 WARNING users null_increase Column "email": NULL rate 2.1% → 18.7%
🔵 INFO products distribution Column "price": mean shifted 3.2σ
3 anomalies detected. Exit code: 1
3. Alert — Slack, CI/CD, or both
# Slack
scherlok watch --webhook https://hooks.slack.com/services/...
# Discord
scherlok watch --webhook https://discord.com/api/webhooks/...
# Microsoft Teams
scherlok watch --webhook https://outlook.office.com/webhook/...
# Any endpoint (generic JSON payload)
scherlok watch --webhook https://my-api.com/alerts
# CI/CD gate (fails pipeline on CRITICAL)
scherlok watch --exit-code --fail-on critical
Auto-detects Slack, Discord, and Teams from the URL and formats the payload accordingly. Any other URL receives a generic JSON payload.
CI/CD Integration
Use Scherlok as a data quality gate. The ci command does it in one line:
# GitHub Actions
- name: Data quality check
run: |
pip install scherlok
scherlok config --store s3://my-bucket/scherlok/profiles.db
scherlok ci ${{ secrets.DATABASE_URL }} \
--webhook ${{ secrets.SLACK_WEBHOOK }} \
--fail-on critical
If Scherlok detects a critical anomaly, the pipeline fails. Bad data never reaches production.
Email alerts
export SCHERLOK_SMTP_HOST=smtp.gmail.com
export SCHERLOK_SMTP_USER=alerts@company.com
export SCHERLOK_SMTP_PASSWORD=app-specific-password
scherlok watch --email team@company.com --email cto@company.com
Connectors
# PostgreSQL
scherlok connect postgres://user:pass@host:5432/db
# BigQuery
pip install scherlok[bigquery]
scherlok connect bigquery://project-id/dataset-name
# Snowflake
pip install scherlok[snowflake]
export SNOWFLAKE_USER=...
export SNOWFLAKE_PASSWORD=...
export SNOWFLAKE_WAREHOUSE=...
scherlok connect snowflake://account/database/schema
| Database | Status |
|---|---|
| PostgreSQL | Available |
| BigQuery | Available |
| Snowflake | Available |
| MySQL | Coming soon |
| DuckDB | Planned |
Remote Storage
Share profiles across CI runs and team members:
# AWS S3
scherlok config --store s3://my-bucket/scherlok/profiles.db
# Google Cloud Storage
scherlok config --store gs://my-bucket/scherlok/profiles.db
# Azure Blob Storage
scherlok config --store az://my-container/scherlok/profiles.db
Why Not [Other Tool]?
| Great Expectations | Soda | Monte Carlo | Scherlok | |
|---|---|---|---|---|
| Setup time | Hours | 30 min | Weeks | 5 minutes |
| Config required | Hundreds of rules | YAML checks | Dashboard setup | None |
| Anomaly detection | Manual thresholds | Paid feature | Yes | Yes, free |
| Self-hosted | Yes | Limited | No (SaaS) | Yes |
| CI/CD gate | Yes | Yes | No | Yes |
| Price | Free | Freemium | $50-200K/yr | Free, forever |
CLI Reference
scherlok connect <url> Connect to a database
scherlok investigate Profile all tables (learn patterns)
scherlok watch [-w <url>] [-e <email>] Detect anomalies and alert
scherlok ci <url> [opts] All-in-one CI/CD command (connect + watch + exit code)
scherlok status Quick health dashboard
scherlok report Detailed profile summary
scherlok history [--days N] Timeline of past anomalies
scherlok config --store <url> Set remote storage
scherlok version Show version
Install
pip install scherlok
# With BigQuery support
pip install scherlok[bigquery]
Requires Python 3.10+.
Contributing
Contributions welcome! See CONTRIBUTING.md.
We're especially looking for:
- New database connectors (Snowflake, MySQL, DuckDB)
- Anomaly detection improvements
- Documentation and examples
License
MIT — Developed by Robson Bayer Müller
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scherlok-0.4.0.tar.gz.
File metadata
- Download URL: scherlok-0.4.0.tar.gz
- Upload date:
- Size: 791.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1d948d9985f1c062ed7bb9f6f63bb49e3c2ba114b10b9df6a699647ab7144a26
|
|
| MD5 |
f42f057444593ceb0ee441bbda49cb5e
|
|
| BLAKE2b-256 |
14093c866bae379b97d54128182cd6d606e69c2aeeb2bcb4610f77f73d427726
|
Provenance
The following attestation bundles were made for scherlok-0.4.0.tar.gz:
Publisher:
release.yml on rbmuller/scherlok
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scherlok-0.4.0.tar.gz -
Subject digest:
1d948d9985f1c062ed7bb9f6f63bb49e3c2ba114b10b9df6a699647ab7144a26 - Sigstore transparency entry: 1394921779
- Sigstore integration time:
-
Permalink:
rbmuller/scherlok@69d66005091bbd199dd9c45ca01da7a5c10ecb7e -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/rbmuller
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@69d66005091bbd199dd9c45ca01da7a5c10ecb7e -
Trigger Event:
push
-
Statement type:
File details
Details for the file scherlok-0.4.0-py3-none-any.whl.
File metadata
- Download URL: scherlok-0.4.0-py3-none-any.whl
- Upload date:
- Size: 34.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5eeebd78e6b91474a4680e2b3b9bbc7d0f27f82f39b0f061aa9663b56a0ba8e7
|
|
| MD5 |
8ad6a1f447a72d1aeeef6a63d8ee7ae9
|
|
| BLAKE2b-256 |
0f6775568ae580c5e2572931b1dda7c5847f7d233663d41bcf8740530338ab21
|
Provenance
The following attestation bundles were made for scherlok-0.4.0-py3-none-any.whl:
Publisher:
release.yml on rbmuller/scherlok
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scherlok-0.4.0-py3-none-any.whl -
Subject digest:
5eeebd78e6b91474a4680e2b3b9bbc7d0f27f82f39b0f061aa9663b56a0ba8e7 - Sigstore transparency entry: 1394921827
- Sigstore integration time:
-
Permalink:
rbmuller/scherlok@69d66005091bbd199dd9c45ca01da7a5c10ecb7e -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/rbmuller
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@69d66005091bbd199dd9c45ca01da7a5c10ecb7e -
Trigger Event:
push
-
Statement type: