Skip to main content

Python implementation of OHDSI CIRCE-BE for cohort definition and SQL generation

Project description

CIRCE Python Implementation

Python Tests Coverage License PyPI

[!CAUTION] This project is currently under active testing and development. It is a Python implementation of the OHDSI CIRCE-BE Java library. While we aim for 1:1 parity, this version is an Alpha release and should be used with caution in production environments.

A Python implementation of the OHDSI CIRCE-BE (Cohort Inclusion and Restriction Criteria Engine) for generating SQL queries from cohort definitions in the OMOP Common Data Model.

Overview

CIRCE Python provides a comprehensive toolkit for working with OMOP CDM cohort definitions:

  • Cohort Definition Modeling: Create and validate cohort expressions using Pydantic models
  • SQL Generation: Generate SQL queries from cohort definitions for OMOP CDM v5.x
  • Concept Set Management: Handle concepts and concept sets from OMOP vocabularies
  • Validation & Checking: Comprehensive validation with 40+ checker implementations
  • Print-Friendly Output: Generate human-readable markdown descriptions of cohort definitions
  • CLI Interface: Command-line tools for validation, SQL generation, and markdown rendering

Package Status

[!IMPORTANT] This package is currently in Alpha status and undergoing rigorous parity testing against the Java implementation.

  • Version: 0.1.0 (Alpha)
  • Tests: 3,400+ passing
  • Coverage: 34% (Core logic focus)
  • Python: 3.8+
  • License: Apache 2.0

Installation

[!NOTE] This package is currently in private development. Install from source using Git.

From Source (Current Method)

# Clone the repository
git clone https://github.com/OHDSI/ohdsi-circepy.git
cd Circepy

# Install in development mode with all dependencies
pip install -e ".[dev]"

# Verify installation
circe --help

See INSTALLATION.md for detailed installation instructions, troubleshooting, and setup options.

From PyPI (Coming Soon)

# Coming in future release
pip install ohdsi-circepy

Quick Start

Command-Line Interface

The easiest way to use CIRCE is through the command-line interface:

# Validate a cohort expression JSON file
circe validate cohort.json

# Generate SQL from a cohort expression
circe generate-sql cohort.json --output cohort.sql

# Render a cohort expression to markdown
circe render-markdown cohort.json --output cohort.md

# Process a cohort expression (validate, generate SQL, and render markdown)
circe process cohort.json --validate --sql --markdown

See the CLI Documentation section below for more details.

Python API

from circe import CohortExpression
from circe.cohortdefinition import PrimaryCriteria, ConditionOccurrence
from circe.cohortdefinition.core import ObservationFilter, ResultLimit
from circe.vocabulary import ConceptSet, ConceptSetExpression, ConceptSetItem, Concept

# Create a cohort expression
cohort = CohortExpression(
    title="Type 2 Diabetes Cohort",
    primary_criteria=PrimaryCriteria(
        criteria_list=[
            ConditionOccurrence(
                codeset_id=1,
                first=True
            )
        ],
        observation_window=ObservationFilter(prior_days=0, post_days=0),
        primary_limit=ResultLimit(type="All")
    ),
    concept_sets=[
        ConceptSet(
            id=1,
            name="Type 2 Diabetes",
            expression=ConceptSetExpression(
                items=[
                    ConceptSetItem(
                        concept=Concept(
                            concept_id=201826,
                            concept_name="Type 2 diabetes mellitus"
                        ),
                        include_descendants=True
                    )
                ]
            )
        )
    ]
)

# Generate SQL using the API
from circe.api import build_cohort_query
from circe.cohortdefinition import BuildExpressionQueryOptions

options = BuildExpressionQueryOptions()
options.cdm_schema = 'cdm'
options.vocabulary_schema = 'cdm'
options.cohort_id = 1
options.target_table = 'scratch.cohort'
sql = build_cohort_query(cohort, options)
print(sql)

What's Included

This package provides a complete Python implementation of CIRCE-BE with:

  • 3,400+ passing tests with focused coverage on core logic
  • 18+ SQL builders for all OMOP CDM domains:
    • Condition Occurrence/Era
    • Drug Exposure/Era
    • Procedure Occurrence
    • Measurement, Observation
    • Visit Occurrence/Detail
    • Device Exposure, Specimen
    • Death, Location Region
    • Observation Period, Payer Plan Period
    • And more...
  • Full cohort expression validation with comprehensive error checking
  • Markdown rendering for human-readable cohort descriptions
  • Complete CLI interface with 4 commands (validate, generate-sql, render-markdown, process)
  • Java interoperability - supports both camelCase and snake_case field names for seamless Java CIRCE-BE compatibility

⚠️ Java Fidelity Requirement

This project maintains 1:1 compatibility with Java CIRCE-BE.

  • All Python classes replicate Java functionality exactly
  • Field names support both Java (camelCase) and Python (snake_case) formats
  • SQL generation produces identical results to Java implementation
  • All changes are validated against Java schema

See JAVA_CLASS_MAPPINGS.md for complete class mappings.

Package Structure

circe/
├── cohortdefinition/          # Core cohort definition classes
│   ├── builders/              # SQL query builders (18+ builders)
│   ├── printfriendly/         # Human-readable markdown output
│   └── negativecontrols/      # Negative control generation
├── vocabulary/                # Concept and concept set management
├── check/                     # Validation and checking framework
│   ├── checkers/              # 40+ specific checker implementations
│   ├── operations/            # Check operations
│   ├── utils/                 # Check utilities
│   └── warnings/              # Warning classes
├── helper/                    # Utility helper classes
├── api.py                     # High-level API functions
└── cli.py                     # Command-line interface

Features

✅ Implemented

  • Complete cohort definition data model with Pydantic validation
  • 18+ SQL builders covering all OMOP CDM domains
  • Comprehensive CLI interface (validate, generate-sql, render-markdown, process)
  • Java interoperability with camelCase/snake_case field support
  • Cohort expression validation with 40+ checker implementations
  • Markdown rendering for print-friendly descriptions
  • Full test suite (3,400+ tests)
  • Type hints throughout with py.typed marker
  • Concept set expression handling
  • Window criteria and correlated criteria support
  • Date adjustments and custom era strategies
  • Observation period and demographic criteria
  • Inclusion rules and censoring criteria

Command-Line Interface

CIRCE provides a comprehensive command-line interface for validating, generating SQL, and rendering cohort expressions.

Validate Command

Validate a cohort expression JSON file against the CIRCE standard:

circe validate cohort.json

Options:

  • --verbose, -v: Display all validation warnings including INFO level
  • --quiet, -q: Suppress non-error output

Exit codes:

  • 0: Valid (no errors or warnings)
  • 1: Invalid (errors found)
  • 2: Valid but has warnings

Generate SQL Command

Generate SQL from a cohort expression:

# Output to stdout
circe generate-sql cohort.json

# Output to file
circe generate-sql cohort.json --output cohort.sql

# With custom schema names
circe generate-sql cohort.json --cdm-schema my_cdm --vocab-schema my_vocab --cohort-id 123

Options:

  • --output, -o: Output SQL file path (default: stdout)
  • --sql-options: JSON file with BuildExpressionQueryOptions
  • --cdm-schema: CDM schema name (default: @cdm_database_schema)
  • --vocab-schema: Vocabulary schema name (default: @vocabulary_database_schema)
  • --cohort-id: Cohort ID for SQL generation
  • --validate: Validate before generating SQL (default: True)
  • --no-validate: Skip validation before generating SQL
  • --verbose, -v: Verbose output
  • --quiet, -q: Suppress non-error output

Render Markdown Command

Render a cohort expression to human-readable markdown:

# Output to stdout
circe render-markdown cohort.json

# Output to file
circe render-markdown cohort.json --output cohort.md

Options:

  • --output, -o: Output markdown file path (default: stdout)
  • --validate: Validate before rendering markdown (default: True)
  • --no-validate: Skip validation before rendering markdown
  • --verbose, -v: Verbose output
  • --quiet, -q: Suppress non-error output

Process Command

Process a cohort expression with multiple operations:

# Validate, generate SQL, and render markdown
circe process cohort.json --validate --sql --markdown

# Generate SQL with custom output file
circe process cohort.json --sql output.sql

# Generate SQL and markdown with default file names
circe process cohort.json --sql --markdown

Options:

  • --validate: Validate the cohort expression
  • --sql [FILE]: Generate SQL (optionally specify output file, default: input file with .sql extension)
  • --markdown [FILE]: Render markdown (optionally specify output file, default: input file with .md extension)
  • --sql-options: JSON file with BuildExpressionQueryOptions
  • --cdm-schema: CDM schema name (default: @cdm_database_schema)
  • --vocab-schema: Vocabulary schema name (default: @vocabulary_database_schema)
  • --cohort-id: Cohort ID for SQL generation
  • --verbose, -v: Verbose output
  • --quiet, -q: Suppress non-error output

CLI Examples

# Validate a cohort expression
circe validate my_cohort.json

# Generate SQL with custom schema
circe generate-sql my_cohort.json --output my_cohort.sql \
    --cdm-schema my_cdm_schema \
    --vocab-schema my_vocab_schema \
    --cohort-id 1

# Generate SQL and markdown in one command
circe process my_cohort.json --sql --markdown

# Validate, generate SQL, and render markdown
circe process my_cohort.json --validate --sql my_cohort.sql --markdown my_cohort.md

Development

Setup Development Environment

# Clone the repository
git clone https://github.com/OHDSI/Circepy.git
cd Circepy

# Install with development dependencies
pip install -e ".[dev]"

# Verify installation
pytest --version
circe --help

Running Tests

pytest

All 3,400+ tests should pass.

Code Formatting

black circe/
isort circe/

Type Checking

mypy circe/

Compatibility Notes

This implementation is designed to be compatible with OHDSI CIRCE-BE Java version. The Python package:

  • Accepts JSON cohort definitions from OHDSI Atlas and other tools
  • Generates SQL identical to the Java implementation
  • Supports all OMOP CDM v5.x versions
  • Maintains field name compatibility (camelCase and snake_case)

Troubleshooting

Import Errors

If you encounter import errors, ensure the package is properly installed:

pip install --upgrade ohdsi-circepy

SQL Generation Issues

  • Verify your cohort expression JSON is valid using circe validate
  • Check that all concept IDs reference valid OMOP concepts
  • Ensure schema names are correctly specified

Performance Considerations

For large cohort definitions with many criteria:

  • SQL generation typically completes in < 1 second
  • Validation runs in < 500ms for most cohorts
  • Memory usage scales with the number of criteria (typically < 100MB)

FAQ

Q: Is this compatible with OHDSI Atlas? A: Yes, this package can process cohort definition JSON files exported from Atlas.

Q: Can I use this with CDM v5.3? A: Yes, the package supports all OMOP CDM v5.x versions.

Q: How do I convert camelCase JSON to Python? A: The package automatically handles both camelCase and snake_case field names.

Q: Does this replace the Java CIRCE-BE? A: No, this is a complementary Python implementation. Both produce identical SQL output.

Contributing

Contributions are welcome! Please see our Contributing Guidelines for details.

Key areas for contribution:

  • Additional test coverage
  • Performance optimizations
  • Documentation improvements
  • Bug fixes and issue reports

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Acknowledgments

This project is based on the Java CIRCE-BE implementation by the OHDSI community. We thank all contributors to the original Java implementation.

Special thanks to:

  • The OHDSI community for their continued support
  • Contributors to the Java CIRCE-BE project
  • The Pydantic team for their excellent validation library

Support

Related Projects

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ohdsi_circepy-0.1.0.tar.gz (102.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ohdsi_circepy-0.1.0-py3-none-any.whl (168.4 kB view details)

Uploaded Python 3

File details

Details for the file ohdsi_circepy-0.1.0.tar.gz.

File metadata

  • Download URL: ohdsi_circepy-0.1.0.tar.gz
  • Upload date:
  • Size: 102.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for ohdsi_circepy-0.1.0.tar.gz
Algorithm Hash digest
SHA256 12da3f2d0c057df001d298c291bc8338c6901914734a77df1cf1bd70ec8d7f2f
MD5 a97df8b27035efe96cd075d17a8d7866
BLAKE2b-256 2027aa777ce6ffcd6c88e7bb5a44c375d5830f20266f7ca354068e30a9c14d31

See more details on using hashes here.

File details

Details for the file ohdsi_circepy-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ohdsi_circepy-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 168.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for ohdsi_circepy-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8a8a25f8db368132be5891d3dbd13058c47501bbee52f4237f6bbfa2f276d89c
MD5 33120e5304f88f0596664483c764499a
BLAKE2b-256 3d3ad7f72d23bc45958d94d16208c092c72bd6208bc344c4f649d2f24f1b8968

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page