Skip to main content

Finds drug names in a string

Project description

Drug named entity recognition

Developed by Fast Data Science, https://fastdatascience.com

Source code at https://github.com/fastdatascience/drug_named_entity_recognition

This is a lightweight Python library for finding drug names in a string.

Please note this library finds only high confidence drugs.

It also only finds the English names of these drugs. Names in other languages are not supported.

It also doesn’t find short code names of drugs, such as abbreviations commonly used in medicine, such as “Ceph” for “Cephradin” - as these are highly ambiguous.

Requirements

Python 3.9 and above

Installation

pip install drug-named-entity-recognition

Usage examples

You must first tokenise your input text using a tokeniser of your choice (NLTK, spaCy, etc).

You pass a list of strings to the find_drugs function.

Example 1

from drug_named_entity_recognition import find_drugs

find_drugs("i bought some Phenoxymethylpenicillin".split(" "))

outputs a list of tuples.

[({'name': 'Phenoxymethylpenicillin',
   'synonyms': {'Penicillin', 'Phenoxymethylpenicillin'},
   'nhs_url': 'https://www.nhs.uk/medicines/phenoxymethylpenicillin',
   'drugbank_id': 'DB00417'},
  3,
  3)]

You can ignore case with:

find_drugs("i bought some phenoxymethylpenicillin".split(" "), is_ignore_case=True)

Data sources

The main data source is from Drugbank, augmented by datasets from the NHS, MeSH, Medline Plus and Wikipedia.

Update the Drugbank dictionary

If you want to update the dictionary, you can use the data dump from Drugbank and replace the file drugbank vocabulary.csv:

Update the Wikipedia dictionary

If you want to update the Wikipedia dictionary, download the dump from:

and run extract_drug_names_and_synonyms_from_wikipedia_dump.py

Update the MeSH dictionary

If you want to update the dictionary, download the open data dump from https://www.nlm.nih.gov/

and run extract_drug_names_and_synonyms_from_mesh_dump.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

drug-named-entity-recognition-0.1.tar.gz (958.3 kB view details)

Uploaded Source

Built Distribution

drug_named_entity_recognition-0.1-py3-none-any.whl (962.7 kB view details)

Uploaded Python 3

File details

Details for the file drug-named-entity-recognition-0.1.tar.gz.

File metadata

File hashes

Hashes for drug-named-entity-recognition-0.1.tar.gz
Algorithm Hash digest
SHA256 f05f9cfcf236b3a980737fbc34deb3e94cecf9574122750bfbe9f2d8b20245b7
MD5 8b7c950ca70e1d17223fc80f052a0835
BLAKE2b-256 1e82780f461eef19de32598b01bd54b95dbdbd0cee34064dd2a609c5588f17bd

See more details on using hashes here.

File details

Details for the file drug_named_entity_recognition-0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for drug_named_entity_recognition-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 da7037b26c7a5a0207634e3494e3c43b406aa04e4bc942afb9e04476e0ba5322
MD5 c2e22e5af8e429e3f7320f0efdc73e14
BLAKE2b-256 1c60e1685b0bd2a3be6c915537f59f4611927a51b53a299fcc85f845f34018bc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page