No project description provided
Project description
Knowledge Extraction for COVID-19 Publications (KEP)
Content
This package provides the functionality of extracting and displaying key knowledge to enable a rapid understanding of COVID-19 publications. Current focuses include key topics and top disease and location mentions of the input COVID-19 publication. More functions are on the way.
Prerequisites
Install packages before use (with tested versions):
- Python >=3.8
- spacy (3.0.8)
- scispacy (0.4.0)
- gensim (4.1.2)
- nltk (3.7)
- en-core-web-sm (https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0.tar.gz)
- en-ner-bc5cdr-md (https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.4.0/en_ner_bc5cdr_md-0.4.0.tar.gz)
- bs4 (BeautifulSoup4)
- wordcloud
- pandas
- matplotlib
Default:
- re
- string
- os
- urllib
- xml
Local package:
- nltk_data (https://github.com/nltk/nltk_data/tree/gh-pages/packages) (rename the "packages" folder as "nltk_data" and put it in your own python project)
Instruction
Load the KEP package
pip install KEP
Sample code
from KEP import To_Generate_Disease,To_Generate_Key_Word,To_Generate_Location,To_Generate_All
To_Generate_All(7824075)
# If multiple publications at a time, uncomment the following lines and replace the PMC ID:
# To_Generate_All(7824075)
# To_Generate_All(7824470)
# To_Generate_All(6988269)
# ...
Input
Input the PMC ID of a publication (or a set of PMC IDs)
(For example: input "7824075" representing the publication https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7824075/)
Output
- The bar graph and word cloud of the key topics
- The bar graph of the top disease mentions
- The bar graph of the top location mentions
References
- Chen, Q., Allot, A., & Lu, Z. (2020). Keep up with the latest coronavirus research. Nature, 579(7798), 193-194.
- Comeau, D. C., Wei, C. H., Islamaj Doğan, R., & Lu, Z. (2019). PMC text mining subset in BioC: about three million full-text articles and growing. Bioinformatics, 35(18), 3533-3535.
- BioC API for PMC: https://www.ncbi.nlm.nih.gov/research/bionlp/APIs/BioC-PMC/
- Part of the functionality is based on the en-ner-bc5cdr-md, en-core-web-sm, and gensim packages.
Citation
TBA.
Report an issue
Should you have any questions or comments, please feel free to contact the author.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file KEP-1.0.0.tar.gz
.
File metadata
- Download URL: KEP-1.0.0.tar.gz
- Upload date:
- Size: 1.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4804bd680f095c85c55aeb9351917640d2cf76eb1a2559842cb8c3585d6a6bfc |
|
MD5 | 4b244b211787ce972a48e28c896b9b13 |
|
BLAKE2b-256 | 5dff3352920b0e1f8a59ee4cc526e6c9112a01cf9dc552cd372fad2627a8e06b |