Identifier -> the retrievable artifacts of a scholarly article: a pluggable source ladder + identifier resolvers.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

litfetch

Resolve a scholarly article identifier to its retrievable artifacts — the full-text body and any supplementary material — and fetch their bytes.

litfetch is two cooperating seams:

a fetch ladder — pluggable Fetcher backends (PMC Open Access S3, Europe PMC, Elsevier OA) tried in priority order; the first to serve the body wins, returning a Blob (a File plus its bytes);
an optional resolver layer — pluggable Resolvers that enrich what you know about a paper (pmid → pmcid/doi, etc.) so the ladder can act.

You hand it an ArticleIds bundle (any of pmid / pmcid / doi). Resolution is demand-driven: a resolver only runs when the next fetcher needs an identifier you don't yet have, and runs at most once.

An article is modelled as a file-set: a collection of File references (the body in its various media types, plus supplementary material, distinguished by FileKind), each hosted upstream. litfetch fetches the raw artifacts and reports their access terms; it does not render them. To turn a fetched JATS/Elsevier body into markdown, run litdown on the bytes yourself (see Render to markdown).

The examples below are a tour; docs/api.md is the full reference for the public surface.

Install

pip install litfetch

bioRxiv / medRxiv preprint full text needs a browser-fingerprint HTTP client, enabled by the biorxiv extra:

pip install 'litfetch[biorxiv]'

Usage

Fetch the body

Hand fetch_body an ArticleIds; the default ladder serves the first available body as a Blob:

from litfetch import ArticleIds, fetch_body

blob = await fetch_body(ArticleIds(pmcid='PMC5334499'))
if blob:
    print(blob.file.source, blob.file.media_type, len(blob.content))

Render to markdown

litfetch returns raw bytes, not markdown. Convert a JATS/Elsevier body with litdown — you pick and pin the converter:

import io
import litdown
from litfetch import ArticleIds, fetch_body

blob = await fetch_body(ArticleIds(pmcid='PMC5334499'))
if blob:
    markdown = litdown.convert(io.BytesIO(blob.content))

Inject your own resolver

A resolver is an async (ArticleIds, Http) -> ArticleIds — the session running it supplies the Http. Enrich from whatever you have — a corpus client, a local cache, an API — and merge it in (this one ignores Http, hence _http):

from litfetch import ArticleIds, Http, fetch_body

async def my_resolver(ids: ArticleIds, _http: Http) -> ArticleIds:
    if not ids.pmid:
        return ids
    pmcid, doi = await my_corpus.lookup(ids.pmid)
    return ids.merge(ArticleIds(pmcid=pmcid, doi=doi))

blob = await fetch_body(ArticleIds(pmid='29622564'), resolver=my_resolver)

Use a bundled resolver

Bundled resolvers are constructed with their config, then passed in the same slot. chain(...) composes several (yours first, fallbacks after); it stops once every identifier is known:

from litfetch import ArticleIds, fetch_body
from litfetch.resolvers import SemanticScholarResolver, NcbiIdConverterResolver, chain

resolver = chain(
    my_resolver,                              # your own
    SemanticScholarResolver(api_key=S2_KEY),  # bundled
    NcbiIdConverterResolver(tool='myapp'),    # bundled
)
blob = await fetch_body(ArticleIds(pmid='29622564'), resolver=resolver)

Polite-pool identification (NCBI/Crossref email, Unpaywall's required email) comes from a session contact, not a hardcoded default — set it on the session: async with litfetch.Session(contact='you@example.org') as s: await s.fetch_body(...).

default_resolver() is a batteries-included, keyless chain (Europe PMC search + NCBI ID Converter).

No resolver — you already hold the IDs

A non-PubMed paper you only have a DOI for, plus your own Elsevier key:

blob = await fetch_body(
    ArticleIds(doi='10.1016/j.cell.2020.01.001'),
    credentials={'elsevier_api_key': key},
)

Supplementary material

list_files enumerates the file-set (references, no bytes); fetch_file materialises one:

from litfetch import ArticleIds, FileKind, list_files, fetch_file

files = await list_files(ArticleIds(pmcid='PMC5334499'), kind=FileKind.SUPPLEMENTARY)
for file in files:
    blob = await fetch_file(file)

Access terms

Read the licence from the fetched bytes, falling back to an access authority (Unpaywall) when the bytes carry none:

from litfetch import extract_source_metadata, resolve_access

meta = extract_source_metadata(blob)          # from the JATS/Elsevier bytes
if meta.licence is None:
    meta = await resolve_access(ArticleIds(doi='10.1016/j.cell.2020.01.001'))

Resolvers stand alone

Each resolver is usable on its own as a cross-reference tool, independent of fetching. A resolver is given the Http to use, so run it inside a session:

from litfetch import ArticleIds, Session
from litfetch.resolvers import SemanticScholarResolver

async with Session() as s:
    ids = await SemanticScholarResolver()(ArticleIds(doi='10.1016/j.cell.2020.01.001'), s)
print(ids.pmid, ids.pmcid)

Batch: one session, a scope per paper

The one-shot functions above each open a throwaway session. For many papers, hold one Session (pooled connection, shared pacing) and open a scope per paper — the scope caches within itself, so a duplicate upstream call (e.g. Unpaywall for both licence and PDF) is fetched once:

from litfetch import ArticleIds, Session

async with Session() as session:
    for pmid in pmids:
        async with session.scope() as s:
            blob = await s.fetch_body(ArticleIds(pmid=pmid))
            access = await s.resolve_access(ArticleIds(pmid=pmid))

Extending

A new body fetcher: implement the Fetcher protocol — a name, a requires: frozenset[str] of the ArticleIds fields it needs, and an async fetch(ids, *, credentials, http) returning a body Blob or None. Add it to a fetchers= list (or your own default_fetchers).
A new file source: implement the FileSource protocol — a name, and async list_files(ids, ...) / fetch_file(file, ...) — to enumerate and materialise an article's file-set (body renditions and supplementary alike).
A new resolver: write an async ArticleIds -> ArticleIds that fills gaps via ArticleIds.merge and never overwrites a known id.

Development

uv sync
uv run ruff check . && uv run ruff format --check .
uv run pyright
uv run pytest

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

folded

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jul 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

litfetch-0.1.0.tar.gz (48.6 kB view details)

Uploaded Jul 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

litfetch-0.1.0-py3-none-any.whl (39.2 kB view details)

Uploaded Jul 3, 2026 Python 3

File details

Details for the file litfetch-0.1.0.tar.gz.

File metadata

Download URL: litfetch-0.1.0.tar.gz
Upload date: Jul 3, 2026
Size: 48.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for litfetch-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`cca2a4a220fbb617c38e4a752731407459eb878ce3ddc67a782935ca52c631e0`
MD5	`f1e7fbfe94700795ffab32b4513cf7b9`
BLAKE2b-256	`76797470d55c7c951cce63360b88b0000b9ed27bc6089d7d82b0b2d4c522f6df`

See more details on using hashes here.

Provenance

The following attestation bundles were made for litfetch-0.1.0.tar.gz:

Publisher: release.yml on populationgenomics/litfetch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: litfetch-0.1.0.tar.gz
- Subject digest: cca2a4a220fbb617c38e4a752731407459eb878ce3ddc67a782935ca52c631e0
- Sigstore transparency entry: 2055579337
- Sigstore integration time: Jul 3, 2026
Source repository:
- Permalink: populationgenomics/litfetch@9ee565b47bd06422dcf5cee83e16e4a0563209e6
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/populationgenomics
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@9ee565b47bd06422dcf5cee83e16e4a0563209e6
- Trigger Event: release

File details

Details for the file litfetch-0.1.0-py3-none-any.whl.

File metadata

Download URL: litfetch-0.1.0-py3-none-any.whl
Upload date: Jul 3, 2026
Size: 39.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for litfetch-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bf8dd8b4a3bde7494f202344a50fd4f819d3ea51415d2b15f7dca5d7f32a896f`
MD5	`bef809d77f39841474bdd45e43c9a58a`
BLAKE2b-256	`ceaf24dbf8efeeb2174a2d758d7c03c396fc2a76c14f0deedffab0a226cbd43a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for litfetch-0.1.0-py3-none-any.whl:

Publisher: release.yml on populationgenomics/litfetch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: litfetch-0.1.0-py3-none-any.whl
- Subject digest: bf8dd8b4a3bde7494f202344a50fd4f819d3ea51415d2b15f7dca5d7f32a896f
- Sigstore transparency entry: 2055579593
- Sigstore integration time: Jul 3, 2026
Source repository:
- Permalink: populationgenomics/litfetch@9ee565b47bd06422dcf5cee83e16e4a0563209e6
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/populationgenomics
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@9ee565b47bd06422dcf5cee83e16e4a0563209e6
- Trigger Event: release

litfetch 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

litfetch

Install

Usage

Fetch the body

Render to markdown

Inject your own resolver

Use a bundled resolver

No resolver — you already hold the IDs

Supplementary material

Access terms

Resolvers stand alone

Batch: one session, a scope per paper

Extending

Development

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance