PubMed Mapper: A Python library that map PubMed XML to Python object
Project description
pubmed-mapper: A Python Library that map PubMed XML to Python object
1. Philosophy
Programmatically access PubMed article is a common task for me. Luckily, with the help of eutils, we can access full article data in XML format. What I need is Python objects, not just XML strings, so pubmed-mapper was born.
2. Installation
pip install pubmed-mapper
3. Usage
3.1 use as library
3.1.1 parse a PubMed ID
from pubmed_mapper import Article
article = Article.parse_pmid('32329900')
# PubMed ID
print(article.pmid) # 32329900
# ids
print(article.ids) # [pubmed: 32329900, doi: 10.1111/jgs.16467]
print(article.ids[1].id_type) # doi
print(article.ids[1].id_value) # 10.1111/jgs.16467
# title
print(article.title) # Associations of Coffee...
# abstract
print(article.abstract) # <p><strong>Background: </strong>Coffee and tea...
# keywords
print(article.keywords) # ['aging', 'coffee; diet; longevity', 'tea']
# MeSH headings
print(article.mesh_headings) # ['Aged', 'Body Mass Index', '...']
# authors
print(article.authors) # [hadyab AH Aladdin H, Manson JE JoAnn E, ...]
print(article.authors[0].last_name) # Shadyab
print(article.authors[0].forename) # Aladdin H
print(article.authors[0].initials) # AH
print(article.authors[0].affiliation) # Department of Family...
# journal
print(article.journal) # Journal of the American Geriatrics Society
print(article.journal.issn) # 1532-5415
print(article.journal.issn_type) # Electronic
print(article.journal.title) # Journal of the American Geriatrics Society
print(article.journal.abbr) # J Am Geriatr Soc
# volume
print(article.volume) # 68
# issue
print(article.issue) # 9
# references
print(article.references) # [n. 2013;129:643-659....]
print(article.references[0].citation) # Lotfield E, Freedman ND...
print(article.references[0].ids) # []
# pubdate
print(article.pubdate) # 2020-09-01
3.1.2 parse a downloaded XML file
from lxml import etree
from pubmed_mapper import Article
infile = 'xxx.xml'
with open(infile) as fp:
root = etree.parse(fp)
articles = []
for pubmed_article_element in root.xpath('/PubmedArticleSet/PubmedArticle'):
article = Article.parse_element(pubmed_article_element)
articles.append(article)
3.2 use as command line software
3.2.1 parse PubMed ID
pubmed-mapper pmid -p 32329900
3.2.2 parse single PubMed XML file
pubmed-mapper file -i data/pubmed21n0001.xml -o output/pubmed21n0001.jl
3.2.3 parse a directory who contains multiple PubMed XML files
pubmed-mapper directory -i data/ -o output/pubmed-mapper.jl
4. FAQs
4.1 There many types of PubMed article publication date, how do you convert it to datetime.date object?
Parse publication date is a hard work, until now pubmed-mapper can't parse all types of them. The types pubmed-mapper can be parsed and the parsed value are:
type | value |
---|---|
2021-03-13 | 2021-03-13 |
2021-03 | 2021-03-01 |
2021 Spring | 2021-04-01 |
2021 | 2021-01-01 |
2021 Jan-Feb | 2021-01-01 |
2021 Mar 13-15 | 2021-03-13 |
2021 Mar-2022 Jan | 2021-03-01 |
2021-2022 | 2021-01-01 |
2021 Mar 13-Dec 15 | 2021-03-13 |
1976-1977 Winter | 1976-01-01 |
1977-1978 Fall-Winter | 1977-10-01 |
4.2 What is pubmed-mapper.log generated by pubmed-mapper?
pubmed-mapper.log is the default log file generate by pubmed-mapper, you can change the file by using --log-file options:
pubmed-mapper --log-file my-custom.log file -i data/pubmed21n0001.xml -o output/pubmed21n0001.jl
You can go to this log file to find out more parsing details.
4.3 I want log detail message in my log file?
Using --log-level can log more detail message:
pubmed-mapper --log-file my-custom.log --log-level DEBUG file -i data/pubmed21n0001.xml -o output/pubmed21n0001.jl
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file pubmed-mapper-0.1.1.tar.gz
.
File metadata
- Download URL: pubmed-mapper-0.1.1.tar.gz
- Upload date:
- Size: 226.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f339a5ab37ac30179dca5efc1f48bd885bc64273ea01f96f8213521b4a29b4ab |
|
MD5 | 4a6f4796575add7f4ac49eee92d13bd8 |
|
BLAKE2b-256 | 2a4d3c9a8322cc6c4206eaadf691d624bc7c281fb73a66da8c45971cf855b4d6 |
File details
Details for the file pubmed_mapper-0.1.1-py3.7.egg
.
File metadata
- Download URL: pubmed_mapper-0.1.1-py3.7.egg
- Upload date:
- Size: 15.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fce859bd3fda31359eb9ddb9fb7df2f9069abb828949dfd45219954e563ba34a |
|
MD5 | 15ec0e940ee4ae794edb52a2e8b0c83d |
|
BLAKE2b-256 | 14ab7437b853d85898cf5e1b32214dce26b18712325a7eb9ea93ebed61715d9a |
File details
Details for the file pubmed_mapper-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: pubmed_mapper-0.1.1-py3-none-any.whl
- Upload date:
- Size: 7.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7dfbc2acec663e123f2a3711a0ab71104a32fb42b6208294fa6f017195a89db3 |
|
MD5 | 4b575e12599c0bffe6a7a55f4c581811 |
|
BLAKE2b-256 | f7429c8b76d9180e278a819fab3b95f122e34c264bb647a7413390595c96a320 |