A package to read in and convert OMOP data to the right schema.
Project description
OMOP Schema
omop_schema is a Python package designed to read, manage, and convert OMOP (Observational Medical Outcomes Partnership) data into the correct schema. It provides tools to handle OMOP CDM (Common Data Model) tables, convert schemas between different formats (e.g., PyArrow, Polars, Pandas), and load datasets efficiently.
Features
- Schema Management: Define and manage schemas for OMOP CDM tables for different versions (e.g., 5.3, v5.4).
- Schema Conversion: Convert PyArrow schemas to Polars or Pandas-compatible schemas.
- Dataset Loading: Load datasets from CSV files into PyArrow tables, ensuring they match the defined schema.
- Optional Dependencies: Support for Polars and Pandas as optional dependencies for schema conversion.
Installation
Install the package using pip:
pip install omop_schema
To include optional dependencies:
pip install omop_schema[polars]
pip install omop_schema[pandas]
pip install omop_schema[polars,pandas]
Usage
1. Define and Retrieve OMOP Schemas
The package provides predefined schemas for OMOP CDM tables. You can retrieve the schema for a specific table:
from omop_schema.schema.v54 import OMOPSchemaV54
schema_v54 = OMOPSchemaV54()
concept_schema = schema_v54.get_pyarrow_schema("concept")
print(concept_schema)
2. Load Datasets
You can load datasets from a folder containing CSV files. The files are matched to the predefined schemas:
from omop_schema.schema.v54 import OMOPSchemaV54
schema_v54 = OMOPSchemaV54()
datasets = schema_v54.load_csv_dataset("path/to/csv/folder")
# Access a specific table
concept_table = datasets["concept"]
print(concept_table)
3. Convert PyArrow Schema to Polars Schema
If Polars is installed, you can convert a PyArrow schema to a Polars-compatible schema:
from omop_schema.utils import pyarrow_to_polars_schema
import pyarrow as pa
arrow_schema = pa.schema(
[
pa.field("column1", pa.int64()),
pa.field("column2", pa.string()),
]
)
polars_schema = pyarrow_to_polars_schema(arrow_schema)
print(polars_schema)
4. Convert PyArrow Schema to Pandas Schema
If Pandas is installed, you can convert a PyArrow schema to a Pandas-compatible schema:
from omop_schema.utils import pyarrow_to_pandas_schema
import pyarrow as pa
arrow_schema = pa.schema(
[
pa.field("column1", pa.int64()),
pa.field("column2", pa.string()),
]
)
pandas_schema = pyarrow_to_pandas_schema(arrow_schema)
print(pandas_schema)
Optional Dependencies
- Polars: For converting PyArrow schemas to Polars schemas.
- Pandas: For converting PyArrow schemas to Pandas schemas.
Contributing
Contributions are welcome! Please open an issue or submit a pull request on the GitHub repository.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file omop_schema-0.0.3.tar.gz.
File metadata
- Download URL: omop_schema-0.0.3.tar.gz
- Upload date:
- Size: 21.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f9042b84930bd24141b5b61586a5c54184b1cc4edca44024cc47ce954aed3883
|
|
| MD5 |
fce72c37bc1831c4b0069a78b8c04536
|
|
| BLAKE2b-256 |
31564ad76dbbec4a5d3e3288dbf8a0fde2e9a33669ab8f4793373995a9b0f5ab
|
Provenance
The following attestation bundles were made for omop_schema-0.0.3.tar.gz:
Publisher:
python-build.yaml on rvandewater/omop_schema
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
omop_schema-0.0.3.tar.gz -
Subject digest:
f9042b84930bd24141b5b61586a5c54184b1cc4edca44024cc47ce954aed3883 - Sigstore transparency entry: 194480603
- Sigstore integration time:
-
Permalink:
rvandewater/omop_schema@b1fa95d2acc5ad18e2d070d88fd449634aa033a0 -
Branch / Tag:
refs/tags/0.0.3 - Owner: https://github.com/rvandewater
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-build.yaml@b1fa95d2acc5ad18e2d070d88fd449634aa033a0 -
Trigger Event:
push
-
Statement type:
File details
Details for the file omop_schema-0.0.3-py3-none-any.whl.
File metadata
- Download URL: omop_schema-0.0.3-py3-none-any.whl
- Upload date:
- Size: 16.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9b38c50e48f51a00f286cd7c4f7ea24b96bb2c4e8f0ea9c78df3abeb4c01c9ea
|
|
| MD5 |
39d20744ff27a336df1a63d4761a1743
|
|
| BLAKE2b-256 |
5ccdcb9ce2fc55a4e786c96e3b2b562ae6cafb7884a36f24036b339ceda19555
|
Provenance
The following attestation bundles were made for omop_schema-0.0.3-py3-none-any.whl:
Publisher:
python-build.yaml on rvandewater/omop_schema
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
omop_schema-0.0.3-py3-none-any.whl -
Subject digest:
9b38c50e48f51a00f286cd7c4f7ea24b96bb2c4e8f0ea9c78df3abeb4c01c9ea - Sigstore transparency entry: 194480606
- Sigstore integration time:
-
Permalink:
rvandewater/omop_schema@b1fa95d2acc5ad18e2d070d88fd449634aa033a0 -
Branch / Tag:
refs/tags/0.0.3 - Owner: https://github.com/rvandewater
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-build.yaml@b1fa95d2acc5ad18e2d070d88fd449634aa033a0 -
Trigger Event:
push
-
Statement type: