Skip to main content

No project description provided

Project description

preon (PREcision Oncology Normalization)

preon is a fuzzy search tool for medical entities.

Installation

You can install preon with PyPi: python -m pip install preon

Examples

Let's first import the normalizer and EBI drug names with CHEMBL ids.

>>> from preon.normalization import PrecisionOncologyNormalizer
>>> from preon.drug import store_ebi_drugs, load_ebi_drugs

Please download the EBI compound CSV file and store it as a local resource. This step only has to be performed when the resource file is created or updated.

>>> store_ebi_drugs("/Users/Username/Downloads/compounds.csv")

Next, we can fit the normalizer with the drug names and ids as its reference data.

>>> drug_names, chembl_ids = load_ebi_drugs()
>>> normalizer = PrecisionOncologyNormalizer().fit(drug_names, chembl_ids)

We can now search for drug names and retrieve their CHEMBL ids. Let's search for the cancer drug "Avastin".

>>> normalizer.query("Avastin")
(['avastin'], [['CHEMBL1201583']], {'match_type': 'exact'})

As a result for our query, we get list of matching normalized drug names (in this case ['avastin']), a list of associated CHEMBL ids for every returned drug name [['CHEMBL1201583']] and some meta information about the matching {'match_type': 'exact'}. We can also search for multi-token drug names like "Ixabepilone Epothilone B analog" and find CHEMBL ids for the relevant tokens.

>>> normalizer.query("Ixabepilone Epothilone B analog")
(['ixabepilone'], [['CHEMBL1201752']], {'match_type': 'substring'})

We find the relevant drug name ['ixabepilone'] and preon provides the meta information that the matching is based on a substring. On default, preon only looks for 1 matching token. It can also look for n-grams by setting the n_grams parameter in the query method. Let's take a harder example, say "Isavuconazonium", but misspell it as "Isavuconaconium".

>>> normalizer.query("Isavuconaconium")
(['isavuconazonium'], [['CHEMBL1183349']], {'match_type': 'partial', 'edit_distance': 0.067})

preon finds the correct drug "Isavuconazonium" and provides the meta information that it is a partial match with 7% distance. It returns drug names with a distance smaller than 20% on default. In order to change this parameter, set the threshold argument in the query method.

In a similar fashion you can normalize cancer types or genes. we provide gold standards for preon with which we test it. For more detail, see the example notebooks. We also use preon in practice to normalize and integrate medical data in the PREDICT project.

Citation

The preon package is actively maintained, updated and intended for application. If you use it in your scientific publication, we would appreciate the following citation:

@article {preon2023,
	author = {Arik Ermshaus and Michael Piechotta and Gina R{\"u}ter and Ulrich Keilholz and Ulf Leser and Manuela Benary},
	title = {preon: Fast and accurate entity normalization for drug names and cancer types in precision oncology},
	year = {2023},
	doi = {10.1101/2023.05.22.540912},
	journal = {bioRxiv}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

preon-0.1.1.tar.gz (13.2 kB view details)

Uploaded Source

Built Distribution

preon-0.1.1-py3-none-any.whl (12.4 kB view details)

Uploaded Python 3

File details

Details for the file preon-0.1.1.tar.gz.

File metadata

  • Download URL: preon-0.1.1.tar.gz
  • Upload date:
  • Size: 13.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for preon-0.1.1.tar.gz
Algorithm Hash digest
SHA256 5b9e696e507ad4e44ca03fca55a50318cef91c6b7de4a1354a37ec6e9f86c655
MD5 6eb80ea2cc0eb8ef037f26be159e5d95
BLAKE2b-256 a14692fd7029e9f8da51b68a80a4d106ad48575ed1152ac7cdbb229148fd819f

See more details on using hashes here.

File details

Details for the file preon-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: preon-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 12.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for preon-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 738cdc7fee8d773c91a676201a50edf591832b6cadc4f9d71c47ab83957cb9a5
MD5 d9bd1f7fe1bd53b0195379545f6e2599
BLAKE2b-256 df7dada973eb53c1b587595583e6eb710fd95b2469e038aa2b40353dfea06f29

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page