No project description provided
Project description
Streamlined Data Transformation to OMOP
Carrot Transform automates data transformation processes and facilitates the standardisation of datasets to the OMOP vocabulary, simplifying the integration of diverse data sources.
Carrot Mapper is a webapp which allows the user to use the metadata (as output by WhiteRabbit) from a dataset to produce mapping rules to the OMOP standard, in the JSON format. These can be ingested by Carrot Transform to perform the mapping of the contents of the dataset to OMOP.
Carrot Transform transforms input data into tab separated variable files of standard OMOP tables, with concepts mapped according to the provided rules (generated from Carrot Mapper).
Quick Start
To have the project up and running, please follow the Quick Start Guide.
If you need to perform development, there's a brief guide here to get the tool up and running.
Formatting and Linting
This project is using ruff to check formatting and linting.
The only dependency is the uv command line tool.
The .vscode/tasks.json file contains a task to run this tool for the currently open file.
The commands can be run on thier own (in the root folder) like this ...
# reformat all the files in `./`
λ uv run ruff format .
# run linting checks all the files in `./`
λ uv run ruff check .
# check and fix all the files in `./`
λ uv run ruff check --fix .
# check and fix all the files in `./` but do so so more eggrsively
λ uv run ruff check --fix --unsafe-fixes .
SQLAlchemy Workflow
Carrot-Transform can read input tables from SQLAlchemy.
This is experimental, and requires specifying a connection-string as --input-db-url instead of an input dir folder.
The person-file parameter and carrot-mapper workflow should still be used, as if working with .csv files, but carrot-transform can read from an SQLAlchemy database.
- Extract/export some rows from the various tables
- something like
SELECT column_name(s) FROM patients LIMIT 1000;is written topatients.csv
- something like
- the usual scan reports are performed on these subsets
- when carrot-transform is invoked instead of
--input-dirone specifies--input-db-urlwith a database connection string- the
--person-fileparameter should still point to the equivalent ofperson_tablename.csv - the
--rules-fileparameter needs to refer to a file on the disk as usual
- the
- carrot transform will still write data to
--output-dirand otherwise operate as normal- The following parameters have undefined behaviour with this functionality
--write-mode--saved-person-id-file--use-input-person-ids--last-used-ids-file
- The following parameters have undefined behaviour with this functionality
Release Procedure
To release a new version of carrot-transform follow these steps:
1. Prepare the repository
- First ensure that repository is clean and all required changes have been merged.
- Pull the latest changes from
mainwithgit pull origin main.
2. Create a release branch
- Now create a new feature branch name
release/v<NEW-VERSION>(e.g.release/v0.2.0).
3. Update the version number
- Use poetry to bump the version. For example, for a minor version update invoke:
poetry version minor
- Commit and push the changes (to the release feature branch):
NEW_VERSION=$(poetry version -s)
git add pyproject.toml
git commit -m "Bump version to $NEW_VERSION"
git push --set-upstream origin release/v$NEW_VERSION
4. Create pull request
- Open a pull request from
release/v$NEW_VERSIONtomainand await approval.
5. Merge and tag
- After approval merge the the feature branch to
main. - Checkout to
main, pull updates, and create a tag corresponding to the new version number.
git checkout main
git pull origin main
git tag -a "$NEW_VERSION" -m "Release $NEW_VERSION"
git push origin "$NEW_VERSION"
6. Create a release
- We must now link the tag to a release in the GitHub repository. To do this from the command line first install GitHub command line tools
ghand then invoke:
gh release create "$TAG" --title "$TAG" --notes "Release for $VERSION"
- Alternatively, follow the instructions in the GitHub documentation to manually create a release.
License
This repository's source code is available under the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file carrot_transform-0.6.1.tar.gz.
File metadata
- Download URL: carrot_transform-0.6.1.tar.gz
- Upload date:
- Size: 249.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
48de8192b47a0be39abd2ee26096cab25cc493c439e9d1bdfc6e62c6144821d8
|
|
| MD5 |
93ef187fd4c767812a6ce7e06ca86bb2
|
|
| BLAKE2b-256 |
50ce7d8145ea0ced389aeb8150ea5bd50f5d99f72d71dcca8d02eaf55b104f4d
|
Provenance
The following attestation bundles were made for carrot_transform-0.6.1.tar.gz:
Publisher:
pypi.publish.yml on Health-Informatics-UoN/carrot-transform
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
carrot_transform-0.6.1.tar.gz -
Subject digest:
48de8192b47a0be39abd2ee26096cab25cc493c439e9d1bdfc6e62c6144821d8 - Sigstore transparency entry: 743636086
- Sigstore integration time:
-
Permalink:
Health-Informatics-UoN/carrot-transform@94aedada51b93095e91a7656e761df2e20eb20d4 -
Branch / Tag:
refs/tags/0.6.1 - Owner: https://github.com/Health-Informatics-UoN
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.publish.yml@94aedada51b93095e91a7656e761df2e20eb20d4 -
Trigger Event:
release
-
Statement type:
File details
Details for the file carrot_transform-0.6.1-py3-none-any.whl.
File metadata
- Download URL: carrot_transform-0.6.1-py3-none-any.whl
- Upload date:
- Size: 264.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
07eb8088377857859fca961abe49a59ff90d2fe2508f53c246699a747f397062
|
|
| MD5 |
1b30167256b9054281ba7a6452f97c97
|
|
| BLAKE2b-256 |
a18b6870e195ddd5fe89174c048ef11b82117035d66b9190185b8aa81ad62b17
|
Provenance
The following attestation bundles were made for carrot_transform-0.6.1-py3-none-any.whl:
Publisher:
pypi.publish.yml on Health-Informatics-UoN/carrot-transform
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
carrot_transform-0.6.1-py3-none-any.whl -
Subject digest:
07eb8088377857859fca961abe49a59ff90d2fe2508f53c246699a747f397062 - Sigstore transparency entry: 743636089
- Sigstore integration time:
-
Permalink:
Health-Informatics-UoN/carrot-transform@94aedada51b93095e91a7656e761df2e20eb20d4 -
Branch / Tag:
refs/tags/0.6.1 - Owner: https://github.com/Health-Informatics-UoN
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.publish.yml@94aedada51b93095e91a7656e761df2e20eb20d4 -
Trigger Event:
release
-
Statement type: