Skip to main content

Fast Semantic Static Validator - A semantic validation framework for OMOP CDM SQL queries

Project description

FastSSV

FastSSV — Fast Semantic Static Validator

Python OMOP CDM Rules Version License Tests

OMOP SQL that runs without errors can still be analytically wrong.

A query that silently drops 30% of patients because it misses concept descendants, filters on a deprecated concept, or applies a temporal constraint outside a patient's observation window will execute cleanly, return plausible numbers, and produce a flawed study. FastSSV catches these violations before they reach results — 154 rules, no database connection, deterministic.

📖 Full documentation: https://fastomop.github.io/fastssv/


Install

pip install fastssv

Use it

fastssv path/to/query.sql                       # writes output/validation_report.json
fastssv path/to/query.sql --strict              # cohort-grade enforcement
fastssv path/to/query.sql --dialect bigquery    # auto, postgres, tsql, oracle, redshift, bigquery, snowflake, databricks, duckdb
from fastssv import validate_sql_structured

for v in validate_sql_structured(sql):
    print(f"[{v.severity.value.upper()}] {v.rule_id}: {v.message}")

What it catches

This query runs cleanly and returns rows, but every row is analytically suspect:

SELECT *
FROM drug_exposure de
JOIN concept c ON de.drug_concept_id = c.concept_id
WHERE c.concept_name LIKE '%aspirin%';

fastssv query.sql writes output/validation_report.json:

{
  "query": "SELECT * FROM drug_exposure de JOIN concept c ON de.drug_concept_id = c.concept_id WHERE c.concept_name LIKE '%aspirin%';",
  "is_valid": true,
  "error_count": 0,
  "warning_count": 3,
  "warnings": [
    {
      "rule_id": "anti_patterns.concept_name_lookup",
      "severity": "warning",
      "issue": "Query filters by concept_name with pattern matching ('%aspirin%'). This is highly unreliable as concept names can vary. Use concept_code + vocabulary_id or concept_id instead.",
      "fix": "REPLACE: `WHERE c.concept_name = '<name>'` WITH `WHERE c.concept_code = '<code>' AND c.vocabulary_id = '<vocab>'`, OR with `WHERE c.concept_id = <id>` if the concept_id is known."
    },
    {
      "rule_id": "concept_standardization.standard_concept_enforcement",
      "severity": "warning",
      "issue": "Query uses STANDARD concept fields without ensuring concepts are standard.",
      "fix": "ADD: `JOIN concept c ON c.concept_id = <table>.<concept_id_col>` AND `WHERE c.standard_concept = 'S'` to filter to standard concepts."
    },
    {
      "rule_id": "concept_standardization.concept_domain_validation",
      "severity": "warning",
      "issue": "drug_exposure.drug_concept_id joined to concept 'c' without domain_id filter. Expected domain 'Drug'.",
      "fix": "ADD: `AND c.domain_id = 'Drug'` to the WHERE/JOIN-ON predicates."
    }
  ]
}

is_valid is true because every violation here is a warning — under normal mode, only error-severity violations gate the exit code. Run fastssv query.sql --strict to escalate best-practice warnings to errors. See the Semantic rules guide for the reasoning behind each category and the Rules reference for the full catalog.

Why FastSSV

Existing OHDSI tools validate data quality, characterise cohorts, and measure phenotype performance. None of them validate whether the SQL logic itself follows OMOP CDM rules. FastSSV fills that gap.

It targets silent failures. The violations FastSSV catches are not syntax errors or missing columns — they are cases where the SQL is valid, the query returns results, and those results are wrong. Missing hierarchy expansion, reversed concept relationship direction, temporal filters outside observation windows: all of these pass any SQL linter and fail any replication attempt.

It is static and deterministic. FastSSV parses SQL into an abstract syntax tree and checks structural patterns against the OMOP CDM v5.4 schema. The same query produces the same result every time, on any machine, without connecting to a database.

It is AI-agnostic. SQL produced by humans, ATLAS, scripts, or AI agents is validated identically. FastSSV treats any SQL generator as a black box whose output needs checking.

It is rule-based and extensible. Every check is a discrete, documented rule with a unique ID, a severity level, a violation message, and a suggested fix. New rules can be added without touching existing ones.

Position in the OHDSI ecosystem

FastSSV validates what other OHDSI tools assume to be correct — the SQL logic itself.

Layer Tool
Data correctness DataQualityDashboard
Data characterisation Achilles
Cohort inspection CohortDiagnostics
Phenotype validity PheValuator
Model performance PatientLevelPrediction
Analysis logic validity FastSSV

HTTP API (optional)

pip install "fastssv[api]"
fastssv serve              # http://localhost:8000 — JSON API + HTMX web UI

The service ships body-size limits, parse-timeout, rate limiting, strict CORS, security headers, structured JSON logging, and a Docker image under deploy/. See the HTTP API guide for endpoints, configuration, and deployment.

Documentation

Topic Page
Architecture overview docs/architecture.md
Plugin system / writing a rule docs/plugin_architecture.md
Reasoning behind each rule category docs/semantic_rules_guide.md
Per-rule catalog (all 154) docs/rules_reference.md
HTTP API docs/api.md
JSON report format docs/json_output.md
Logging docs/logging.md

For contributing, see CONTRIBUTING.md (PR policy and the AI-assisted PR rules) and AGENTS.md (build, test, and conventions). Release notes live in CHANGELOG.md.

Stability

Pre-1.0 (0.x.y). The Python API (validate_sql_structured, validate_sql, RuleViolation, Severity, the registry helpers) and the rule_id format <category>.<rule_name> are stable. The exact rule set, violation wording, and individual severities may change between minor versions as rules are calibrated against real OHDSI corpora. Pin to a minor version (fastssv>=0.2,<0.3) and review CHANGELOG.md before upgrading.

License

Apache 2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastssv-0.3.0.tar.gz (622.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fastssv-0.3.0-py3-none-any.whl (819.8 kB view details)

Uploaded Python 3

File details

Details for the file fastssv-0.3.0.tar.gz.

File metadata

  • Download URL: fastssv-0.3.0.tar.gz
  • Upload date:
  • Size: 622.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for fastssv-0.3.0.tar.gz
Algorithm Hash digest
SHA256 cfd6c0feadaffb8cc051ac5e6a1eef000efd1904f9126cb3390b5e9d7b11f2e5
MD5 d11f96f368165db8a4f0348f96c11ea7
BLAKE2b-256 ef24053d09b23d0177be588e2ffa74dedd7c66b61d9ee96e607bbd6bead9a248

See more details on using hashes here.

File details

Details for the file fastssv-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: fastssv-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 819.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for fastssv-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d921468588ef28361ff61c675f8cc828fac8ab060283c4f075a4dd9dceb55021
MD5 2a405a3c93f30466501062c09b455953
BLAKE2b-256 f1cf8daac66b80e853f66873d306db386e70f394b5c12427689a5d1cdbac84ad

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page