Data Collection from pubmed made easy
Project description
PUBMED-FLOW
Open source data collection tool to fetch data from pubmed
Contribute and Support
🎮 Features
- fetch pubmed ids (pmids) based on keyword query (supports multiple keywords query)
- Fetch Abstract of research papers from pubmed based on pmids
- Download the full pdf of respective pmid -> if available on pubmedcentral (pmc)
- if pdf not available on pmc -> download from scihub internally
How to obtain ncbi key?
- Follow this tutorial
Installation
From source
python setup.py install
OR
pip install git+https://github.com/nfflow/pubmedflow
How to use api?
Arguments:
| Name | Input | Description |
|---|---|---|
| folder_name | Optional, str | path to store output data |
Quick Start:
Download pubmed articles as PDF and DataFrame -
import eutils
from pubmedflow import LazyPubmed
pb = LazyPubmed(title_query,
folder_name='pubmed_data',
api_key='',
max_documents=None,
download_pdf=True,
scihub=False)
Perform unsupervised learning to make a pre-trained model from the collected data:
pb.pubmed_train(model_name='sentence-transformers/all-mpnet-base-v2',
model_output_path='pubmedflow_model',
model_architecture='ct')
Do question answering on the downloaded text to get answer spans from each article:
qa_results = pb.pubmed_qa(qa_query = 'What are the chronic diseases',)
print(qa_results)
Summarise each of them
summ_results = pb.pubmed_summarise()
print(summ_results)
Perform entity extraction on each of them
ents = pb.pubmed_entity_extraction()
print(ents)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pubmedflow-0.0.2.tar.gz
(7.9 kB
view details)
File details
Details for the file pubmedflow-0.0.2.tar.gz.
File metadata
- Download URL: pubmedflow-0.0.2.tar.gz
- Upload date:
- Size: 7.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e0db5d6712feb95092cfb0f584517b37f8e750c553b12e9970f89cff1ea19590
|
|
| MD5 |
2cd873be3bcb389a4a947b0a90d2a195
|
|
| BLAKE2b-256 |
d2e648799e96d9a7055216ad9b1a04a7855088b6df69853ccbf62836e6e2a18b
|