Skip to main content

No project description provided

Project description

Carrot Logo

Release License

Streamlined Data Transformation to OMOP


Carrot Transform automates data transformation processes and facilitates the standardisation of datasets to the OMOP vocabulary, simplifying the integration of diverse data sources.


Explore the docs »

Carrot Mapper is a webapp which allows the user to use the metadata (as output by WhiteRabbit) from a dataset to produce mapping rules to the OMOP standard, in the JSON format. These can be ingested by Carrot Transform to perform the mapping of the contents of the dataset to OMOP.

Carrot Transform transforms input data into tab separated variable files of standard OMOP tables, with concepts mapped according to the provided rules (generated from Carrot Mapper).

Quick Start

To have the project up and running, please follow the Quick Start Guide.

If you need to perform development, there's a brief guide here to get the tool up and running.

Formatting and Linting

This project is using ruff to check formatting and linting. The only dependency is the uv command line tool. The .vscode/tasks.json file contains a task to run this tool for the currently open file. The commands can be run on thier own (in the root folder) like this ...

# reformat all the files in `./`
λ uv run ruff format .

# run linting checks all the files in `./` 
λ uv run ruff check .

# check and fix all the files in `./`
λ uv run ruff check --fix .

# check and fix all the files in `./` but do so so more eggrsively
λ uv run ruff check --fix --unsafe-fixes .

SQLAlchemy Workflow

Carrot-Transform can read input tables from SQLAlchemy. This is experimental, and requires specifying a connection-string as --input-db-url instead of an input dir folder. The person-file parameter and carrot-mapper workflow should still be used, as if working with .csv files, but carrot-transform can read from an SQLAlchemy database.

  1. Extract/export some rows from the various tables
    • something like SELECT column_name(s) FROM patients LIMIT 1000; is written to patients.csv
  2. the usual scan reports are performed on these subsets
  3. when carrot-transform is invoked instead of --input-dir one specifies --input-db-url with a database connection string
    • the --person-file parameter should still point to the equivalent of person_tablename.csv
    • the --rules-file parameter needs to refer to a file on the disk as usual
  4. carrot transform will still write data to --output-dir and otherwise operate as normal
    • The following parameters have undefined behaviour with this functionality
      • --write-mode
      • --saved-person-id-file
      • --use-input-person-ids
      • --last-used-ids-file

Release Procedure

To release a new version of carrot-transform follow these steps:

1. Prepare the repository

  • First ensure that repository is clean and all required changes have been merged.
  • Pull the latest changes from main with git pull origin main.

2. Create a release branch

  • Now create a new feature branch name release/v<NEW-VERSION> (e.g. release/v0.2.0).

3. Update the version number

  • Use poetry to bump the version. For example, for a minor version update invoke:
poetry version minor 
  • Commit and push the changes (to the release feature branch):
NEW_VERSION=$(poetry version -s)
git add pyproject.toml
git commit -m "Bump version to $NEW_VERSION"
git push --set-upstream origin release/v$NEW_VERSION

4. Create pull request

  • Open a pull request from release/v$NEW_VERSION to main and await approval.

5. Merge and tag

  • After approval merge the the feature branch to main.
  • Checkout to main, pull updates, and create a tag corresponding to the new version number.
git checkout main
git pull origin main
git tag -a "$NEW_VERSION" -m "Release $NEW_VERSION"
git push origin "$NEW_VERSION"

6. Create a release

  • We must now link the tag to a release in the GitHub repository. To do this from the command line first install GitHub command line tools gh and then invoke:
gh release create "$TAG" --title "$TAG" --notes "Release for $VERSION"

License

This repository's source code is available under the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

carrot_transform-0.6.1.tar.gz (249.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

carrot_transform-0.6.1-py3-none-any.whl (264.1 kB view details)

Uploaded Python 3

File details

Details for the file carrot_transform-0.6.1.tar.gz.

File metadata

  • Download URL: carrot_transform-0.6.1.tar.gz
  • Upload date:
  • Size: 249.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for carrot_transform-0.6.1.tar.gz
Algorithm Hash digest
SHA256 48de8192b47a0be39abd2ee26096cab25cc493c439e9d1bdfc6e62c6144821d8
MD5 93ef187fd4c767812a6ce7e06ca86bb2
BLAKE2b-256 50ce7d8145ea0ced389aeb8150ea5bd50f5d99f72d71dcca8d02eaf55b104f4d

See more details on using hashes here.

Provenance

The following attestation bundles were made for carrot_transform-0.6.1.tar.gz:

Publisher: pypi.publish.yml on Health-Informatics-UoN/carrot-transform

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file carrot_transform-0.6.1-py3-none-any.whl.

File metadata

File hashes

Hashes for carrot_transform-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 07eb8088377857859fca961abe49a59ff90d2fe2508f53c246699a747f397062
MD5 1b30167256b9054281ba7a6452f97c97
BLAKE2b-256 a18b6870e195ddd5fe89174c048ef11b82117035d66b9190185b8aa81ad62b17

See more details on using hashes here.

Provenance

The following attestation bundles were made for carrot_transform-0.6.1-py3-none-any.whl:

Publisher: pypi.publish.yml on Health-Informatics-UoN/carrot-transform

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page