Convert PubMed articles (PMIDs or PMCIDs) to clean, structured markdown with full text, abstracts, and supplementary materials

These details have not been verified by PyPI

Project links

Project description

PubMed Downloader

Convert PubMed articles to clean, structured markdown. Handles the full pipeline: PMID resolution, full-text extraction via PubMed Central, HTML-to-markdown conversion, and supplementary material retrieval.

Articles without open-access full text automatically fall back to abstract-only download.

Installation

pip install pubmed-markdown

Requires Python 3.11+.

Setup

Set your email for NCBI API identification (required to avoid 403 errors):

export NCBI_EMAIL=your-email@institution.edu

Or pass it directly:

downloader = PubMedMarkdown(email="your-email@institution.edu")

Quick Start

from pubmed_markdown import PubMedMarkdown

downloader = PubMedMarkdown()

# Get markdown string from a PMID
markdown = downloader.pmid_to_markdown("12895196")

Usage

Python API

Get markdown strings (single or batch, no files created):

from pubmed_markdown import PubMedMarkdown

downloader = PubMedMarkdown()

# From PMID — accepts a single string or a list
markdown = downloader.pmid_to_markdown("12895196")
markdowns = downloader.pmid_to_markdown(["12895196", "17872605"])

# From PMCID directly — also accepts a single string or a list
markdown = downloader.pmcid_to_markdown("PMC1884285")
markdowns = downloader.pmcid_to_markdown(["PMC1884285", "PMC6435416"])

# Skip supplementary materials
markdown = downloader.pmid_to_markdown("12895196", include_supplements=False)

Save markdown files to disk (single or batch):

from pubmed_markdown import PubMedMarkdown

downloader = PubMedMarkdown()
downloader.pmids_to_markdown_files(["12895196", "17872605"], save_dir="data")

# Also works with a single PMID
downloader.pmids_to_markdown_files("25051018", save_dir="data")

# Overwrite existing files
downloader.pmids_to_markdown_files(["12895196"], save_dir="data", overwrite=True)

This creates:

data/
├── html/          # Raw HTML from PMC
└── markdown/      # Converted markdown files

Full-text articles are saved as {PMCID}.md. Articles without open-access full text are saved as PMID{PMID}.md with abstract only.

Individual utility functions:

from pubmed_markdown import (
    get_pmcid_from_pmid,
    get_html_from_pmcid,
    get_abstract_markdown_from_pmid,
    fetch_bioc_supplement,
    format_supplement_as_markdown,
)

# Resolve PMIDs to PMCIDs (returns dict mapping PMID -> PMCID or None)
mapping = get_pmcid_from_pmid(["12895196", "17872605"])

# Fetch raw HTML from PMC
html = get_html_from_pmcid("PMC1884285")

# Get abstract for non-open-access articles
abstract_md = get_abstract_markdown_from_pmid("12345678")

# Get raw supplementary material text
supplement = fetch_bioc_supplement("PMC6435416")

# Get supplementary materials formatted as a markdown section
supplement_md = format_supplement_as_markdown("PMC6435416")

Command Line

# Convert PMIDs from a file (one PMID per line)
pubmed-download --file_path=pmids.txt --save_dir=data

# Overwrite existing files
pubmed-download --file_path=pmids.txt --save_dir=data --overwrite

# Specify email directly
pubmed-download --file_path=pmids.txt --email=your-email@institution.edu

API Reference

Method	Creates Files	Returns	Use Case
`pmid_to_markdown()`	No	Markdown string(s)	Single or batch, programmatic use
`pmcid_to_markdown()`	No	Markdown string(s)	Direct PMCID conversion
`pmids_to_markdown_files()`	Yes	None	Batch processing, building datasets
`pmids_to_pmcids()`	No	List of PMCIDs	PMID to PMCID resolution
`pmcids_to_html()`	Yes	None	Fetch and save raw HTML
`local_html_to_markdown()`	Yes	None	Re-convert existing HTML files

All methods accepting IDs take either a single string or a list of strings.

How It Works

PMID to PMCID -- Uses NCBI's ID Converter API with batching and rate limiting
HTML extraction -- Fetches full article HTML from PubMed Central
Markdown conversion -- Converts HTML to structured markdown preserving tables, figures, citations, and references
Supplementary materials -- Fetches pre-processed supplement text via NCBI's BioC API
Abstract fallback -- Articles not in PMC Open Access get abstract + metadata via NCBI E-Fetch

Configuration

Environment Variable	Default	Description
`NCBI_EMAIL`	None	Email for NCBI API identification

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.5

Mar 17, 2026

This version

0.2.4

Mar 13, 2026

0.2.2

Mar 13, 2026

0.2.1

Mar 13, 2026

0.2.0

Mar 13, 2026

0.1.1

Mar 12, 2026

0.1.0

Mar 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pubmed_markdown-0.2.4.tar.gz (55.0 kB view details)

Uploaded Mar 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pubmed_markdown-0.2.4-py3-none-any.whl (21.9 kB view details)

Uploaded Mar 13, 2026 Python 3

File details

Details for the file pubmed_markdown-0.2.4.tar.gz.

File metadata

Download URL: pubmed_markdown-0.2.4.tar.gz
Upload date: Mar 13, 2026
Size: 55.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for pubmed_markdown-0.2.4.tar.gz
Algorithm	Hash digest
SHA256	`53254aae05ca8a3bae3d344ce3c8892049870be0b625406affe714ab10592373`
MD5	`8963c78430705eb27cf3399667080b78`
BLAKE2b-256	`64b8f858e86d194f209217d4deb2b793e95436e0469cdf1a3a3c8a1d459aa6d6`

See more details on using hashes here.

File details

Details for the file pubmed_markdown-0.2.4-py3-none-any.whl.

File metadata

Download URL: pubmed_markdown-0.2.4-py3-none-any.whl
Upload date: Mar 13, 2026
Size: 21.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for pubmed_markdown-0.2.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`08cc850f5c385a8b392606bda02e6c18cef39d7422520264da834f42b2c3ebe5`
MD5	`89adfd1445d4576991594d8e261d7917`
BLAKE2b-256	`5b24c3795d957cdfd32b3611072894cef427156bbbfc01720e2e720650968c55`

See more details on using hashes here.

pubmed-markdown 0.2.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PubMed Downloader

Installation

Setup

Quick Start

Usage

Python API

Command Line

API Reference

How It Works

Configuration

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes