Data Collection from pubmed made easy
Project description
PUBMED-FLOW
Open source data collection tool to fetch data from pubmed
Contribute and Support
🎮 Features
- fetch pubmed ids (pmids) based on keyword query (supports multiple keywords query)
- Fetch Abstract of research papers from pubmed based on pmids
- Download the full pdf of respective pmid -> if available on pubmedcentral (pmc)
- if pdf not available on pmc -> download from scihub internally
How to obtain ncbi key?
- Follow this tutorial
Installation
From source
python setup.py install
OR
pip install git+https://github.com/nfflow/pubmedflow
How to use api?
Arguments:
Name | Input | Description |
---|---|---|
folder_name | Optional, str | path to store output data |
Quick Start:
Download pubmed articles as PDF and DataFrame -
import eutils
from pubmedflow import LazyPubmed
pb = LazyPubmed(title_query,
folder_name='pubmed_data',
api_key='',
max_documents=None,
download_pdf=True,
scihub=False)
Perform unsupervised learning to make a pre-trained model from the collected data:
pb.pubmed_train(model_name='sentence-transformers/all-mpnet-base-v2',
model_output_path='pubmedflow_model',
model_architecture='ct')
Do question answering on the downloaded text to get answer spans from each article:
qa_results = pb.pubmed_qa(qa_query = 'What are the chronic diseases',)
print(qa_results)
Summarise each of them
summ_results = pb.pubmed_summarise()
print(summ_results)
Perform entity extraction on each of them
ents = pb.pubmed_entity_extraction()
print(ents)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pubmedflow-0.0.2.tar.gz
(7.9 kB
view details)
File details
Details for the file pubmedflow-0.0.2.tar.gz
.
File metadata
- Download URL: pubmedflow-0.0.2.tar.gz
- Upload date:
- Size: 7.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e0db5d6712feb95092cfb0f584517b37f8e750c553b12e9970f89cff1ea19590 |
|
MD5 | 2cd873be3bcb389a4a947b0a90d2a195 |
|
BLAKE2b-256 | d2e648799e96d9a7055216ad9b1a04a7855088b6df69853ccbf62836e6e2a18b |