Skip to main content

Data Collection from pubmed made easy

Project description

PUBMED-FLOW

Open source data collection tool to fetch data from pubmed

Contribute and Support

License:MIT GitHub commit PRs Welcome Open All Collab

🎮 Features

  • fetch pubmed ids (pmids) based on keyword query (supports multiple keywords query)
  • Fetch Abstract of research papers from pubmed based on pmids
  • Download the full pdf of respective pmid -> if available on pubmedcentral (pmc)
  • if pdf not available on pmc -> download from scihub internally

How to obtain ncbi key?

Installation

From source

python setup.py install

OR

pip install git+https://github.com/nfflow/pubmedflow

How to use api?

Arguments:

Name Input Description
folder_name Optional, str path to store output data

Quick Start:

Download pubmed articles as PDF and DataFrame -

import eutils
from pubmedflow import LazyPubmed


pb        = LazyPubmed(title_query,
                 folder_name='pubmed_data',
                 api_key='',
                 max_documents=None,
                 download_pdf=True,
                 scihub=False)
                    

Perform unsupervised learning to make a pre-trained model from the collected data:

pb.pubmed_train(model_name='sentence-transformers/all-mpnet-base-v2',
                                     model_output_path='pubmedflow_model',
                                     model_architecture='ct')

Do question answering on the downloaded text to get answer spans from each article:

qa_results = pb.pubmed_qa(qa_query = 'What are the chronic diseases',)
 print(qa_results)

Summarise each of them

summ_results = pb.pubmed_summarise()
print(summ_results)

Perform entity extraction on each of them

ents = pb.pubmed_entity_extraction()
print(ents)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pubmedflow-0.0.2.tar.gz (7.9 kB view details)

Uploaded Source

File details

Details for the file pubmedflow-0.0.2.tar.gz.

File metadata

  • Download URL: pubmedflow-0.0.2.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for pubmedflow-0.0.2.tar.gz
Algorithm Hash digest
SHA256 e0db5d6712feb95092cfb0f584517b37f8e750c553b12e9970f89cff1ea19590
MD5 2cd873be3bcb389a4a947b0a90d2a195
BLAKE2b-256 d2e648799e96d9a7055216ad9b1a04a7855088b6df69853ccbf62836e6e2a18b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page