Python implementation of OHDSI CIRCE-BE for cohort definition and SQL generation
Project description
CIRCE Python Implementation
[!CAUTION] This project is currently under active testing and development. It is a Python implementation of the OHDSI CIRCE-BE Java library. While we aim for 1:1 parity, this version is an Alpha release and should be used with caution in production environments.
A Python implementation of the OHDSI CIRCE-BE (Cohort Inclusion and Restriction Criteria Engine) for generating SQL queries from cohort definitions in the OMOP Common Data Model.
Overview
CIRCE Python provides a comprehensive toolkit for working with OMOP CDM cohort definitions:
- Cohort Definition Modeling: Create and validate cohort expressions using Pydantic models
- SQL Generation: Generate SQL queries from cohort definitions for OMOP CDM v5.x
- Concept Set Management: Handle concepts and concept sets from OMOP vocabularies
- Validation & Checking: Comprehensive validation with 40+ checker implementations
- Print-Friendly Output: Generate human-readable markdown descriptions of cohort definitions
- CLI Interface: Command-line tools for validation, SQL generation, and markdown rendering
Package Status
[!IMPORTANT] This package is currently in Alpha status and undergoing rigorous parity testing against the Java implementation.
- Version: 0.1.0 (Alpha)
- Tests: 3,400+ passing
- Coverage: 34% (Core logic focus)
- Python: 3.8+
- License: Apache 2.0
Installation
[!NOTE] This package is currently in private development. Install from source using Git.
From Source (Current Method)
# Clone the repository
git clone https://github.com/OHDSI/ohdsi-circepy.git
cd Circepy
# Install in development mode with all dependencies
pip install -e ".[dev]"
# Verify installation
circe --help
See INSTALLATION.md for detailed installation instructions, troubleshooting, and setup options.
From PyPI (Coming Soon)
# Coming in future release pip install ohdsi-circepy
Quick Start
Command-Line Interface
The easiest way to use CIRCE is through the command-line interface:
# Validate a cohort expression JSON file
circe validate cohort.json
# Generate SQL from a cohort expression
circe generate-sql cohort.json --output cohort.sql
# Render a cohort expression to markdown
circe render-markdown cohort.json --output cohort.md
# Process a cohort expression (validate, generate SQL, and render markdown)
circe process cohort.json --validate --sql --markdown
See the CLI Documentation section below for more details.
Python API
from circe import CohortExpression
from circe.cohortdefinition import PrimaryCriteria, ConditionOccurrence
from circe.cohortdefinition.core import ObservationFilter, ResultLimit
from circe.vocabulary import ConceptSet, ConceptSetExpression, ConceptSetItem, Concept
# Create a cohort expression
cohort = CohortExpression(
title="Type 2 Diabetes Cohort",
primary_criteria=PrimaryCriteria(
criteria_list=[
ConditionOccurrence(
codeset_id=1,
first=True
)
],
observation_window=ObservationFilter(prior_days=0, post_days=0),
primary_limit=ResultLimit(type="All")
),
concept_sets=[
ConceptSet(
id=1,
name="Type 2 Diabetes",
expression=ConceptSetExpression(
items=[
ConceptSetItem(
concept=Concept(
concept_id=201826,
concept_name="Type 2 diabetes mellitus"
),
include_descendants=True
)
]
)
)
]
)
# Generate SQL using the API
from circe.api import build_cohort_query
from circe.cohortdefinition import BuildExpressionQueryOptions
options = BuildExpressionQueryOptions()
options.cdm_schema = 'cdm'
options.vocabulary_schema = 'cdm'
options.cohort_id = 1
options.target_table = 'scratch.cohort'
sql = build_cohort_query(cohort, options)
print(sql)
What's Included
This package provides a complete Python implementation of CIRCE-BE with:
- 3,400+ passing tests with focused coverage on core logic
- 18+ SQL builders for all OMOP CDM domains:
- Condition Occurrence/Era
- Drug Exposure/Era
- Procedure Occurrence
- Measurement, Observation
- Visit Occurrence/Detail
- Device Exposure, Specimen
- Death, Location Region
- Observation Period, Payer Plan Period
- And more...
- Full cohort expression validation with comprehensive error checking
- Markdown rendering for human-readable cohort descriptions
- Complete CLI interface with 4 commands (validate, generate-sql, render-markdown, process)
- Java interoperability - supports both camelCase and snake_case field names for seamless Java CIRCE-BE compatibility
⚠️ Java Fidelity Requirement
This project maintains 1:1 compatibility with Java CIRCE-BE.
- All Python classes replicate Java functionality exactly
- Field names support both Java (camelCase) and Python (snake_case) formats
- SQL generation produces identical results to Java implementation
- All changes are validated against Java schema
See JAVA_CLASS_MAPPINGS.md for complete class mappings.
Package Structure
circe/
├── cohortdefinition/ # Core cohort definition classes
│ ├── builders/ # SQL query builders (18+ builders)
│ ├── printfriendly/ # Human-readable markdown output
│ └── negativecontrols/ # Negative control generation
├── vocabulary/ # Concept and concept set management
├── check/ # Validation and checking framework
│ ├── checkers/ # 40+ specific checker implementations
│ ├── operations/ # Check operations
│ ├── utils/ # Check utilities
│ └── warnings/ # Warning classes
├── helper/ # Utility helper classes
├── api.py # High-level API functions
└── cli.py # Command-line interface
Features
✅ Implemented
- Complete cohort definition data model with Pydantic validation
- 18+ SQL builders covering all OMOP CDM domains
- Comprehensive CLI interface (validate, generate-sql, render-markdown, process)
- Java interoperability with camelCase/snake_case field support
- Cohort expression validation with 40+ checker implementations
- Markdown rendering for print-friendly descriptions
- Full test suite (3,400+ tests)
- Type hints throughout with py.typed marker
- Concept set expression handling
- Window criteria and correlated criteria support
- Date adjustments and custom era strategies
- Observation period and demographic criteria
- Inclusion rules and censoring criteria
Command-Line Interface
CIRCE provides a comprehensive command-line interface for validating, generating SQL, and rendering cohort expressions.
Validate Command
Validate a cohort expression JSON file against the CIRCE standard:
circe validate cohort.json
Options:
--verbose, -v: Display all validation warnings including INFO level--quiet, -q: Suppress non-error output
Exit codes:
0: Valid (no errors or warnings)1: Invalid (errors found)2: Valid but has warnings
Generate SQL Command
Generate SQL from a cohort expression:
# Output to stdout
circe generate-sql cohort.json
# Output to file
circe generate-sql cohort.json --output cohort.sql
# With custom schema names
circe generate-sql cohort.json --cdm-schema my_cdm --vocab-schema my_vocab --cohort-id 123
Options:
--output, -o: Output SQL file path (default: stdout)--sql-options: JSON file with BuildExpressionQueryOptions--cdm-schema: CDM schema name (default:@cdm_database_schema)--vocab-schema: Vocabulary schema name (default:@vocabulary_database_schema)--cohort-id: Cohort ID for SQL generation--validate: Validate before generating SQL (default: True)--no-validate: Skip validation before generating SQL--verbose, -v: Verbose output--quiet, -q: Suppress non-error output
Render Markdown Command
Render a cohort expression to human-readable markdown:
# Output to stdout
circe render-markdown cohort.json
# Output to file
circe render-markdown cohort.json --output cohort.md
Options:
--output, -o: Output markdown file path (default: stdout)--validate: Validate before rendering markdown (default: True)--no-validate: Skip validation before rendering markdown--verbose, -v: Verbose output--quiet, -q: Suppress non-error output
Process Command
Process a cohort expression with multiple operations:
# Validate, generate SQL, and render markdown
circe process cohort.json --validate --sql --markdown
# Generate SQL with custom output file
circe process cohort.json --sql output.sql
# Generate SQL and markdown with default file names
circe process cohort.json --sql --markdown
Options:
--validate: Validate the cohort expression--sql [FILE]: Generate SQL (optionally specify output file, default: input file with .sql extension)--markdown [FILE]: Render markdown (optionally specify output file, default: input file with .md extension)--sql-options: JSON file with BuildExpressionQueryOptions--cdm-schema: CDM schema name (default:@cdm_database_schema)--vocab-schema: Vocabulary schema name (default:@vocabulary_database_schema)--cohort-id: Cohort ID for SQL generation--verbose, -v: Verbose output--quiet, -q: Suppress non-error output
CLI Examples
# Validate a cohort expression
circe validate my_cohort.json
# Generate SQL with custom schema
circe generate-sql my_cohort.json --output my_cohort.sql \
--cdm-schema my_cdm_schema \
--vocab-schema my_vocab_schema \
--cohort-id 1
# Generate SQL and markdown in one command
circe process my_cohort.json --sql --markdown
# Validate, generate SQL, and render markdown
circe process my_cohort.json --validate --sql my_cohort.sql --markdown my_cohort.md
Development
Setup Development Environment
# Clone the repository
git clone https://github.com/OHDSI/Circepy.git
cd Circepy
# Install with development dependencies
pip install -e ".[dev]"
# Verify installation
pytest --version
circe --help
Running Tests
pytest
All 3,400+ tests should pass.
Code Formatting
black circe/
isort circe/
Type Checking
mypy circe/
Compatibility Notes
This implementation is designed to be compatible with OHDSI CIRCE-BE Java version. The Python package:
- Accepts JSON cohort definitions from OHDSI Atlas and other tools
- Generates SQL identical to the Java implementation
- Supports all OMOP CDM v5.x versions
- Maintains field name compatibility (camelCase and snake_case)
Troubleshooting
Import Errors
If you encounter import errors, ensure the package is properly installed:
pip install --upgrade ohdsi-circepy
SQL Generation Issues
- Verify your cohort expression JSON is valid using
circe validate - Check that all concept IDs reference valid OMOP concepts
- Ensure schema names are correctly specified
Performance Considerations
For large cohort definitions with many criteria:
- SQL generation typically completes in < 1 second
- Validation runs in < 500ms for most cohorts
- Memory usage scales with the number of criteria (typically < 100MB)
FAQ
Q: Is this compatible with OHDSI Atlas? A: Yes, this package can process cohort definition JSON files exported from Atlas.
Q: Can I use this with CDM v5.3? A: Yes, the package supports all OMOP CDM v5.x versions.
Q: How do I convert camelCase JSON to Python? A: The package automatically handles both camelCase and snake_case field names.
Q: Does this replace the Java CIRCE-BE? A: No, this is a complementary Python implementation. Both produce identical SQL output.
Contributing
Contributions are welcome! Please see our Contributing Guidelines for details.
Key areas for contribution:
- Additional test coverage
- Performance optimizations
- Documentation improvements
- Bug fixes and issue reports
License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Acknowledgments
This project is based on the Java CIRCE-BE implementation by the OHDSI community. We thank all contributors to the original Java implementation.
Special thanks to:
- The OHDSI community for their continued support
- Contributors to the Java CIRCE-BE project
- The Pydantic team for their excellent validation library
Support
- Repository: https://github.com/OHDSI/ohdsi-circepy
- Issues: https://github.com/OHDSI/ohdsi-circepy/issues
- Installation Guide: INSTALLATION.md
- PyPI: https://pypi.org/project/ohdsi-circepy/ (coming soon)
- Documentation: https://ohdsi-circepy.readthedocs.io/ (coming soon)
Related Projects
- OHDSI CIRCE-BE (Java) - Original Java implementation
- OHDSI Common Data Model - OMOP CDM specification
- OHDSI Atlas - Web-based cohort definition tool
- OHDSI WebAPI - RESTful API for OHDSI tools
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ohdsi_circepy-0.1.0.tar.gz.
File metadata
- Download URL: ohdsi_circepy-0.1.0.tar.gz
- Upload date:
- Size: 102.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
12da3f2d0c057df001d298c291bc8338c6901914734a77df1cf1bd70ec8d7f2f
|
|
| MD5 |
a97df8b27035efe96cd075d17a8d7866
|
|
| BLAKE2b-256 |
2027aa777ce6ffcd6c88e7bb5a44c375d5830f20266f7ca354068e30a9c14d31
|
File details
Details for the file ohdsi_circepy-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ohdsi_circepy-0.1.0-py3-none-any.whl
- Upload date:
- Size: 168.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8a8a25f8db368132be5891d3dbd13058c47501bbee52f4237f6bbfa2f276d89c
|
|
| MD5 |
33120e5304f88f0596664483c764499a
|
|
| BLAKE2b-256 |
3d3ad7f72d23bc45958d94d16208c092c72bd6208bc344c4f649d2f24f1b8968
|