Drug Named Entity Recognition library to find and resolve drug names in a string (drug named entity linking)

These details have not been verified by PyPI

Project links

Project description

Drug named entity recognition Python library

my badge

Drug named entity recognition

Developed by Fast Data Science, https://fastdatascience.com

Source code at https://github.com/fastdatascience/drug_named_entity_recognition

Tutorial at https://fastdatascience.com/drug-named-entity-recognition-python-library/

This is a lightweight Python library for finding drug names in a string.

Please note this library finds only high confidence drugs.

It also only finds the English names of these drugs. Names in other languages are not supported.

It also doesn't find short code names of drugs, such as abbreviations commonly used in medicine, such as "Ceph" for "Cephradin" - as these are highly ambiguous.

Requirements

Python 3.9 and above

Who to contact?

You can contact Thomas Wood or Fast Data Science team at https://fastdatascience.com/.

Installing drug named entity recognition Python package

You can install from PyPI.

pip install drug-named-entity-recognition

Usage examples

You must first tokenise your input text using a tokeniser of your choice (NLTK, spaCy, etc).

You pass a list of strings to the find_drugs function.

Example 1

from drug_named_entity_recognition import find_drugs

find_drugs("i bought some Prednisone".split(" "))

outputs a list of tuples.

[({'name': 'Prednisone', 'synonyms': {'Sone', 'Sterapred', 'Deltasone', 'Panafcort', 'Prednidib', 'Cortan', 'Rectodelt', 'Prednisone', 'Cutason', 'Meticorten', 'Panasol', 'Enkortolon', 'Ultracorten', 'Decortin', 'Orasone', 'Winpred', 'Dehydrocortisone', 'Dacortin', 'Cortancyl', 'Encorton', 'Encortone', 'Decortisyl', 'Kortancyl', 'Pronisone', 'Prednisona', 'Predniment', 'Prednisonum', 'Rayos'}, 'medline_plus_id': 'a601102', 'mesh_id': 'D018931', 'drugbank_id': 'DB00635'}, 3, 3)]

You can ignore case with:

find_drugs("i bought some prednisone".split(" "), is_ignore_case=True)

Compatibility with other natural language processing libraries

The Drug Named Entity Recognition library is independent of other NLP tools and has no dependencies. You don't need any advanced system requirements and the tool is lightweight. However, it combines well with other libraries such as spaCy or the Natural Language Toolkit (NLTK).

Using Drug Named Entity Recognition together with spaCy

Here is an example call to the tool with a spaCy Doc object:

from drug_named_entity_recognition import find_drugs
import spacy
nlp = spacy.blank("en")
doc = nlp("i routinely rx rimonabant and pts prefer it")
find_drugs([t.text for t in doc], is_ignore_case=True)

outputs:

[({'name': 'Rimonabant', 'synonyms': {'Acomplia', 'Rimonabant', 'Zimulti'}, 'mesh_id': 'D063387', 'drugbank_id': 'DB06155'}, 3, 3)]

Using Drug Named Entity Recognition together with NLTK

You can also use the tool together with the Natural Language Toolkit (NLTK):

from drug_named_entity_recognition import find_drugs
from nltk.tokenize import wordpunct_tokenize
tokens = wordpunct_tokenize("i routinely rx rimonabant and pts prefer it")
find_drugs(tokens, is_ignore_case=True)

Data sources

The main data source is from Drugbank, augmented by datasets from the NHS, MeSH, Medline Plus and Wikipedia.

Update the Drugbank dictionary

If you want to update the dictionary, you can use the data dump from Drugbank and replace the file drugbank vocabulary.csv:

Download the open data dump from https://go.drugbank.com/releases/latest#open-data

Update the Wikipedia dictionary

If you want to update the Wikipedia dictionary, download the dump from:

https://meta.wikimedia.org/wiki/Data_dump_torrents#English_Wikipedia

and run extract_drug_names_and_synonyms_from_wikipedia_dump.py

Update the MeSH dictionary

If you want to update the dictionary, run

python download_mesh_dump_and_extract_drug_names_and_synonyms.py

This will download the latest XML file from NIH.

If the link doesn't work, download the open data dump manually from https://www.nlm.nih.gov/. It should be called something like desc2023.xml. And comment out the Wget/Curl commands in the code.

License information

Data from Drugbank is licensed under CC0.

To the extent possible under law, the person who associated CC0 with the DrugBank Open Data has waived all copyright and related or neighboring rights to the DrugBank Open Data. This work is published from: Canada.

Text from Wikipedia data dump is licensed under GNU Free Documentation License and Creative Commons Attribution-Share-Alike 3.0 License. More information.

Contributing to the Drug Named Entity Recognition library

If you'd like to contribute to this project, you can contact us at https://fastdatascience.com/ or make a pull request on our Github repository. You can also raise an issue.

Developing the Drug Named Entity Recognition library

Automated tests

Test code is in tests/ folder using unittest.

The testing tool tox is used in the automation with GitHub Actions CI/CD.

Use tox locally

Install tox and run it:

pip install tox
tox

In our configuration, tox runs a check of source distribution using check-manifest (which requires your repo to be git-initialized (git init) and added (git add .) at least), setuptools's check, and unit tests using pytest. You don't need to install check-manifest and pytest though, tox will install them in a separate environment.

The automated tests are run against several Python versions, but on your machine, you might be using only one version of Python, if that is Python 3.9, then run:

tox -e py39

Thanks to GitHub Actions' automated process, you don't need to generate distribution files locally. But if you insist, click to read the "Generate distribution files" section.

Continuous integration/deployment to PyPI

This package is based on the template https://pypi.org/project/example-pypi-package/

This package

uses GitHub Actions for both testing and publishing
is tested when pushing master or main branch, and is published when create a release
includes test files in the source distribution
uses setup.cfg for version single-sourcing (setuptools 46.4.0+)

Re-releasing the package manually

The code to re-release Harmony on PyPI is as follows:

source activate py311
pip install twine
rm -rf dist
python setup.py sdist
twine upload dist/*

Who worked on the Drug Named Entity Recognition library?

The tool was developed:

Thomas Wood (Fast Data Science)

License

Citing the Drug Named Entity Recognition library

Wood, T.A., Drug Named Entity Recognition [Computer software], Version 1.0.1, accessed at https://fastdatascience.com/drug-named-entity-recognition-python-library/, Fast Data Science Ltd (2023)

@unpublished{drugnamedentityrecognition,
    AUTHOR = {Wood, T.A.},
    TITLE  = {Drug Named Entity Recognition (Computer software), Version 1.0.1},
    YEAR   = {2023},
    Note   = {To appear},
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.0.9

Jul 15, 2025

2.0.8

Apr 13, 2025

2.0.7

Mar 28, 2025

2.0.6

Mar 28, 2025

2.0.5

Jan 24, 2025

2.0.4

Oct 14, 2024

2.0.1

Oct 10, 2024

2.0.0

Sep 6, 2024

1.0.11

Jun 21, 2024

1.0.10

Jun 21, 2024

1.0.9

Jun 21, 2024

1.0.8

Jun 20, 2024

1.0.3

Apr 14, 2024

1.0.2

Sep 27, 2023

This version

1.0.1

Jul 7, 2023

0.5.2

Jun 20, 2024

0.1

Jun 17, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

drug-named-entity-recognition-1.0.1.tar.gz (1.0 MB view details)

Uploaded Jul 7, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

drug_named_entity_recognition-1.0.1-py3-none-any.whl (1.0 MB view details)

Uploaded Jul 7, 2023 Python 3

File details

Details for the file drug-named-entity-recognition-1.0.1.tar.gz.

File metadata

Download URL: drug-named-entity-recognition-1.0.1.tar.gz
Upload date: Jul 7, 2023
Size: 1.0 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for drug-named-entity-recognition-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`4fde6441b8ca5adb4a5f7bd99132371026688b571e941f47d697a8f51d7ff7fa`
MD5	`7f085b1b1d2ce3723f1b89b4e2af91f4`
BLAKE2b-256	`7fb6ee97913fcabbd5ef0e5753b766aeee025ca4464a92bfe13d88571fbd1a85`

See more details on using hashes here.

File details

Details for the file drug_named_entity_recognition-1.0.1-py3-none-any.whl.

File metadata

Download URL: drug_named_entity_recognition-1.0.1-py3-none-any.whl
Upload date: Jul 7, 2023
Size: 1.0 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for drug_named_entity_recognition-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`aa3312583bbdcd64385222a7001d9847387035c5746f3ab51cd6570680d4de5e`
MD5	`756a7f8bf27b285d5188f4d84230fd5f`
BLAKE2b-256	`a32d2175acdcec129a45a29e7ceff92ca6e0af8371cfe5e829c28184280ba16d`

See more details on using hashes here.

drug-named-entity-recognition 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Drug named entity recognition Python library

Drug named entity recognition

Requirements

Who to contact?

Installing drug named entity recognition Python package

Usage examples

Compatibility with other natural language processing libraries

Using Drug Named Entity Recognition together with spaCy

Using Drug Named Entity Recognition together with NLTK

Data sources

Update the Drugbank dictionary

Update the Wikipedia dictionary

Update the MeSH dictionary

License information

Contributing to the Drug Named Entity Recognition library

Developing the Drug Named Entity Recognition library

Automated tests

Use tox locally

Continuous integration/deployment to PyPI

Re-releasing the package manually

Who worked on the Drug Named Entity Recognition library?

License

Citing the Drug Named Entity Recognition library

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes