Finds drug names in a string
Project description
Drug named entity recognition
Developed by Fast Data Science, https://fastdatascience.com
Source code at https://github.com/fastdatascience/drug_named_entity_recognition
This is a lightweight Python library for finding drug names in a string.
Please note this library finds only high confidence drugs.
It also only finds the English names of these drugs. Names in other languages are not supported.
It also doesn’t find short code names of drugs, such as abbreviations commonly used in medicine, such as “Ceph” for “Cephradin” - as these are highly ambiguous.
Requirements
Python 3.9 and above
Installation
pip install drug-named-entity-recognition
Usage examples
You must first tokenise your input text using a tokeniser of your choice (NLTK, spaCy, etc).
You pass a list of strings to the find_drugs function.
Example 1
from drug_named_entity_recognition import find_drugs find_drugs("i bought some Phenoxymethylpenicillin".split(" "))
outputs a list of tuples.
[({'name': 'Phenoxymethylpenicillin', 'synonyms': {'Penicillin', 'Phenoxymethylpenicillin'}, 'nhs_url': 'https://www.nhs.uk/medicines/phenoxymethylpenicillin', 'drugbank_id': 'DB00417'}, 3, 3)]
You can ignore case with:
find_drugs("i bought some phenoxymethylpenicillin".split(" "), is_ignore_case=True)
Data sources
The main data source is from Drugbank, augmented by datasets from the NHS, MeSH, Medline Plus and Wikipedia.
Update the Drugbank dictionary
If you want to update the dictionary, you can use the data dump from Drugbank and replace the file drugbank vocabulary.csv:
Download the open data dump from https://go.drugbank.com/releases/latest#open-data
Update the Wikipedia dictionary
If you want to update the Wikipedia dictionary, download the dump from:
and run extract_drug_names_and_synonyms_from_wikipedia_dump.py
Update the MeSH dictionary
If you want to update the dictionary, download the open data dump from https://www.nlm.nih.gov/
and run extract_drug_names_and_synonyms_from_mesh_dump.py
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for drug-named-entity-recognition-0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | f05f9cfcf236b3a980737fbc34deb3e94cecf9574122750bfbe9f2d8b20245b7 |
|
MD5 | 8b7c950ca70e1d17223fc80f052a0835 |
|
BLAKE2b-256 | 1e82780f461eef19de32598b01bd54b95dbdbd0cee34064dd2a609c5588f17bd |
Hashes for drug_named_entity_recognition-0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | da7037b26c7a5a0207634e3494e3c43b406aa04e4bc942afb9e04476e0ba5322 |
|
MD5 | c2e22e5af8e429e3f7320f0efdc73e14 |
|
BLAKE2b-256 | 1c60e1685b0bd2a3be6c915537f59f4611927a51b53a299fcc85f845f34018bc |