VICC normalization routines for therapeutics

These details have not been verified by PyPI

Project description

Thera-Py

Thera-Py normalizes free-text names and references for drugs and other biomedical therapeutics to stable, unambiguous concept identifiers to support genomic knowledge harmonization.

Live OpenAPI service

Installation

Install from PyPI:

python3 -m pip install thera-py

Docker Installation (Preferred)

We recommend installing the Therapy Normalizer using Docker.

Requirements

Docker

Build, (re)create, and start containers

docker compose up

Point your browser to http://localhost:8001/therapy/.

Usage

Deploying DynamoDB Locally

We use Amazon DynamoDB for data storage. To deploy locally, follow these instructions.

Setting Environment Variables

RxNorm requires a UMLS license, which you can register for one here. You must set the UMLS_API_KEY environment variable to your API key. This can be found in the UTS 'My Profile' area after singing in.

export UMLS_API_KEY=12345-6789-abcdefg-hijklmnop  # make sure to replace with your key!

HemOnc.org data requires a Harvard Dataverse API token. You must create a user account on the Harvard Dataverse website, you can follow these instructions to create an account and generate an API token. Once you have an API token, set the following environment variable:

export HARVARD_DATAVERSE_API_KEY=12345-6789-abcdefgh-hijklmnop  # make sure to replace with your key!

Update source(s)

The Therapy Normalizer currently aggregates therapy data from:

Direct data management requires installation of the etl dependency group:

python3 -m pip install 'thera-py[etl]'

To update source(s), pass them as arguments to the command thera-py update. For example, the following command updates ChEMBL and Wikidata:

thera-py update chembl wikidata

You can update all sources at once with the --all flag:

thera-py update --all

Thera-Py can retrieve all required data itself, using the wags-tails library. By default, data will be housed under ~/.local/share/wags_tails/ in a format like the following:

~/.local/share/wags_tails
├── chembl
│   └── chembl_27.db
├── chemidplus
│   └── chemidplus_20200327.xml
├── drugbank
│   └── drugbank_5.1.8.csv
├── guidetopharmacology
│   ├── guidetopharmacology_ligand_id_mapping_2021.3.tsv
│   └── guidetopharmacology_ligands_2021.3.tsv
├── hemonc
│   ├── hemonc_concepts_20210225.csv
│   ├── hemonc_rels_20210225.csv
│   └── hemonc_synonyms_20210225.csv
├── ncit
│   └── ncit_20.09d.owl
├── rxnorm
│   ├── rxnorm_drug_forms_20210104.yaml
│   └── rxnorm_20210104.RRF
└── wikidata
    └── wikidata_20210425.json

Updates to the HemOnc source depend on the Disease Normalizer service. If the Disease Normalizer database appears to be empty or incomplete, updates to HemOnc will also trigger a refresh of the Disease Normalizer database. See its README for additional data requirements.

Create Merged Concept Groups

The /normalize endpoint relies on merged concept groups. The --normalize flag generates these groups:

thera-py update --normalize

Specifying the database URL endpoint

The default URL endpoint is http://localhost:8000. There are two different ways to specify the database URL endpoint.

The first way is to set the --db_url flag to the URL endpoint.

thera-py update --all --db_url=http://localhost:8001

The second way is to set the environment variable THERAPY_NORM_DB_URL to the URL endpoint.

export THERAPY_NORM_DB_URL="http://localhost:8001"
thera-py update --all

Starting the therapy normalization service

From the project root, run the following:

uvicorn therapy.main:app --reload

Next, view the OpenAPI docs on your local machine:

http://127.0.0.1:8000/therapy

FAQ

A data import method raised a SourceFormatError instance. How do I proceed?

TheraPy will automatically try to acquire the latest version of data for each source, but sometimes, sources alter the structure of their data (e.g. adding or removing CSV columns). If you encounter a SourceFormatException while importing data, please notify us by creating a new issue if one doesn't already exist, and we will attempt to resolve it.

In the meantime, you can force TheraPy to use an older data release by removing the incompatible version from the source data folder, manually downloading and replacing it with an older version of the data per the structure described above, and calling the CLI with the --use_existing argument.

Citation

If you use Thera-Py in scientific works, please cite the following article:

Matthew Cannon, James Stevenson, Kori Kuzma, Susanna Kiwala, Jeremy L Warner, Obi L Griffith, Malachi Griffith, Alex H Wagner, Normalization of drug and therapeutic concepts with Thera-Py, JAMIA Open, Volume 6, Issue 4, December 2023, ooad093, https://doi.org/10.1093/jamiaopen/ooad093

Development

Clone the repo and create a virtual environment:

git clone https://github.com/cancervariants/therapy-normalization
cd therapy-normalization
python3 -m virtualenv venv
source venv/bin/activate

Install development dependencies and prek:

python3 -m pip install -e '.[dev,tests]'
prek install

Check style with ruff:

python3 -m ruff format . && python3 -m ruff check --fix .

Run tests with pytest:

pipenv run pytest

By default, tests will employ an existing DynamoDB database. For test environments where this is unavailable (e.g. in CI), the THERAPY_TEST environment variable can be set to initialize a local DynamoDB instance with miniature versions of input data files before tests are executed.

export THERAPY_TEST=true

Sometimes, sources will update their data, and our test fixtures and data will become incorrect. The tests/scripts/ subdirectory includes scripts to rebuild data files, although most fixtures will need to be updated manually.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.13.0

Jan 13, 2026

0.12.0

Nov 4, 2025

0.11.1

Oct 31, 2025

0.11.0

Jul 22, 2025

0.10.0

Apr 22, 2025

0.9.0

Feb 13, 2025

0.8.0

Jan 30, 2025

0.7.1

Jan 2, 2025

0.7.0

Jan 2, 2025

0.6.0

Jul 15, 2024

0.5.0.dev5 pre-release

Jun 12, 2024

0.5.0.dev4 pre-release

Jun 7, 2024

0.5.0.dev3 pre-release

Jan 4, 2024

0.5.0.dev2 pre-release

Dec 29, 2023

0.5.0.dev1 pre-release

Dec 4, 2023

0.5.0.dev0 pre-release

Nov 10, 2023

0.4.0

Jan 11, 2023

0.4.dev0 pre-release

Oct 2, 2022

0.3.10

May 7, 2023

0.3.9

Jan 11, 2023

0.3.8

Jan 6, 2023

0.3.7

Nov 2, 2022

0.3.6

Aug 25, 2022

0.3.5

May 25, 2022

0.3.4

Mar 31, 2022

0.3.3

Jan 27, 2022

0.3.2

Dec 14, 2021

0.3.1

Dec 7, 2021

0.3.0rc1 pre-release

Dec 7, 2021

0.2.26

Sep 8, 2021

0.2.24

Aug 3, 2021

0.2.23

Aug 3, 2021

0.2.20

May 11, 2021

0.2.19

May 10, 2021

0.2.18

May 6, 2021

0.2.17

Apr 30, 2021

0.2.16

Apr 28, 2021

0.2.15

Apr 13, 2021

0.2.12

Mar 31, 2021

0.2.10

Mar 29, 2021

0.2.8

Mar 15, 2021

0.2.7

Mar 12, 2021

0.2.2

Mar 10, 2021

0.2.1

Mar 10, 2021

0.2.0

Mar 3, 2021

0.0.1

May 31, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thera_py-0.13.0.tar.gz (641.3 kB view details)

Uploaded Jan 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

thera_py-0.13.0-py3-none-any.whl (74.5 kB view details)

Uploaded Jan 13, 2026 Python 3

File details

Details for the file thera_py-0.13.0.tar.gz.

File metadata

Download URL: thera_py-0.13.0.tar.gz
Upload date: Jan 13, 2026
Size: 641.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for thera_py-0.13.0.tar.gz
Algorithm	Hash digest
SHA256	`5225f65bcd3737d9f76cbfd32a32bc4f44f0c97c0b9728c4281f0c8e8e4251de`
MD5	`320b9e32c136a66ca3b6859d630134f3`
BLAKE2b-256	`80a642fb130ab63764bc4e50755a05f55bf245e9f83e1ab67a0437969e1871f3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for thera_py-0.13.0.tar.gz:

Publisher: release.yml on cancervariants/therapy-normalization

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: thera_py-0.13.0.tar.gz
- Subject digest: 5225f65bcd3737d9f76cbfd32a32bc4f44f0c97c0b9728c4281f0c8e8e4251de
- Sigstore transparency entry: 819003849
- Sigstore integration time: Jan 13, 2026
Source repository:
- Permalink: cancervariants/therapy-normalization@5be89aac898688bfd598c7ed1b90b1f1bfdeeae6
- Branch / Tag: refs/tags/0.13.0
- Owner: https://github.com/cancervariants
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@5be89aac898688bfd598c7ed1b90b1f1bfdeeae6
- Trigger Event: release

File details

Details for the file thera_py-0.13.0-py3-none-any.whl.

File metadata

Download URL: thera_py-0.13.0-py3-none-any.whl
Upload date: Jan 13, 2026
Size: 74.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for thera_py-0.13.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`541dbd0810fb007b60070b192adc3f16509f5958e8941efcf03591f7432221e9`
MD5	`5547c09c5334e2fc526d1a767d710b7c`
BLAKE2b-256	`53ed0430c0318f25dc6b544fc7d6d0709ab92c9335968189870bd4802b6a6773`

See more details on using hashes here.

Provenance

The following attestation bundles were made for thera_py-0.13.0-py3-none-any.whl:

Publisher: release.yml on cancervariants/therapy-normalization

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: thera_py-0.13.0-py3-none-any.whl
- Subject digest: 541dbd0810fb007b60070b192adc3f16509f5958e8941efcf03591f7432221e9
- Sigstore transparency entry: 819003872
- Sigstore integration time: Jan 13, 2026
Source repository:
- Permalink: cancervariants/therapy-normalization@5be89aac898688bfd598c7ed1b90b1f1bfdeeae6
- Branch / Tag: refs/tags/0.13.0
- Owner: https://github.com/cancervariants
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@5be89aac898688bfd598c7ed1b90b1f1bfdeeae6
- Trigger Event: release

thera-py 0.13.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Thera-Py

Installation

Docker Installation (Preferred)

Requirements

Build, (re)create, and start containers

Usage

Deploying DynamoDB Locally

Setting Environment Variables

Update source(s)

Create Merged Concept Groups

Specifying the database URL endpoint

Starting the therapy normalization service

FAQ

Citation

Development

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance