Skip to main content

Academic literature search, citation management, and PDF retrieval CLI

Project description

OpenCite

Academic literature search, citation management, and PDF retrieval CLI.

Searches Semantic Scholar, OpenAlex, PubMed, arXiv, bioRxiv, medRxiv, OSF Preprints (PsyArXiv/SocArXiv/...), Zenodo, Figshare, CrossRef, and CORE in parallel, deduplicates results, and supports BibTeX output, citation graph traversal, PDF retrieval (with HTML full-text shortcuts for arXiv ar5iv and bioRxiv .full), batch downloads, and PDF-to-markdown conversion.

Quick Start

Install and set up your API keys:

uv pip install opencite                # or: pip install opencite
opencite config init                   # creates ~/.opencite/config.toml

Add your API keys to ~/.opencite/config.toml or export them as environment variables:

export SEMANTIC_SCHOLAR_API_KEY=your_key
export PUBMED_API_KEY=your_key
export OPENALEX_API_KEY=your_key

Start searching:

opencite search "transformer attention mechanism"
opencite lookup 10.1038/nature12345
opencite canonical "deep learning" --min-citations 500
opencite cite 10.1038/nature12345
opencite pdf 10.1038/nature12345 -o paper.pdf --convert

[!NOTE] AI-agent skill: The opencite skill ships in neuromechanist/research-skills alongside the other research-tooling skills (figures, grant, manuscript, neuroinformatics, presentation, project). It works with Claude Code, Codex / OpenAI, and VS Code GitHub Copilot. For Claude Code, run /plugin marketplace add neuromechanist/research-skills, then open /plugin and install opencite from the research-skills marketplace; see the research-skills README for setup with the other agents.

[!TIP] PDF conversion requires pip install 'opencite[pdf]'. If MISTRAL_API_KEY is set, markit-mistral is used (better for math/complex layouts); otherwise markitdown (free, local).

Commands

search - Find papers

opencite search "query" [--max N] [--source all|openalex|s2|pubmed]
    [--year-from YYYY] [--year-to YYYY] [--oa-only]
    [--sort relevance|citations|year] [-f text|json|bibtex|csv] [-o FILE] [-v]

lookup - Look up papers by identifier

opencite lookup IDENTIFIER [IDENTIFIER ...] [--enrich] [--append-bib FILE]
    [-f text|json|bibtex] [-o FILE] [-v]

Accepts DOI, pmid:X, pmc:X, arxiv:X, S2 ID, or OpenAlex ID. Supports multiple IDs.

cite - Citation graph

opencite cite IDENTIFIER [--direction citing|references|both] [--max N]
    [--sort citations|year] [--min-citations N] [-f text|json|bibtex] [-o FILE]

canonical - Most-cited papers in a field

opencite canonical "topic" [--max N] [--year-from YYYY] [--min-citations N]
    [-f text|json|bibtex] [-o FILE]

pdf - Download PDF

opencite pdf IDENTIFIER [-o PATH] [--filename NAME] [--convert]
    [--converter auto|markitdown|mistral]

-o accepts a file path (e.g., paper.pdf) or directory. With --convert, also generates a markdown file alongside the PDF.

convert - PDF to markdown

opencite convert FILE.pdf [-o FILE] [--converter auto|markitdown|mistral]
    [--extract-images] [--images-dir DIR]

Auto mode uses markit-mistral when MISTRAL_API_KEY is set (better for math and complex layouts), otherwise falls back to markitdown (free, local).

batch-fetch - Batch download PDFs

opencite batch-fetch FILE [-o DIR] [--convert] [--concurrency N] [--summary FILE]
opencite batch-fetch --from-json FILE [options]
opencite batch-fetch --from-stdin [options]

Downloads PDFs for multiple papers with controlled concurrency. Supports text files (one ID per line), JSON files (array of DOIs or opencite search results), and stdin. With --convert, output is organized into pdf/, markdown/, and markdown/img/ subdirectories.

Example workflow:

# Search and save as JSON, then batch download with conversion
opencite search "tDCS motor cortex" --max 30 -f json -o results.json
opencite batch-fetch --from-json results.json --convert --summary report.json -o ./papers

ids - Convert between identifiers

opencite ids IDENTIFIER [IDENTIFIER ...] [-f text|json]

Converts between DOI, PMID, and PMCID using the NCBI ID Converter API.

config - Manage configuration

opencite config init    # create ~/.opencite/config.toml template
opencite config show    # display resolved config (keys masked)
opencite config path    # show config file location

Output Formats

All search/lookup/cite/canonical commands support -f/--format:

  • text (default) - human-readable output
  • json - structured JSON
  • bibtex - BibTeX entries for citation managers
  • csv - comma-separated values (search only)

Use -o/--output FILE to write to a file instead of stdout.

Installation

# uv (recommended)
uv pip install opencite              # core (no PDF-to-markdown conversion)
uv pip install 'opencite[pdf]'       # with PDF download and markdown conversion

# pip
pip install opencite                 # core (no PDF-to-markdown conversion)
pip install 'opencite[pdf]'          # with PDF download and markdown conversion

# uvx (no install needed, runs from cache)
uvx opencite --version

PDF conversion support (markitdown and markit-mistral) is available via the [pdf] extra. Install opencite[pdf] when you need opencite pdf, opencite convert, opencite batch-fetch --convert, or preprint HTML full-text (arXiv ar5iv, bioRxiv/medRxiv .full HTML) which also depends on markitdown.

For development:

git clone https://github.com/neuromechanist/opencite.git
cd opencite
uv sync --extra dev

Configuration

OpenCite supports TOML config, .env files, and environment variables.

opencite config init    # creates ~/.opencite/config.toml with template
opencite config show    # display resolved config (keys masked)
opencite config path    # show config file location

Config loading priority

Later sources override earlier ones:

  1. ~/.opencite/config.toml
  2. ~/.opencite/.env
  3. .env in working directory
  4. Environment variables

API keys

Required for academic database access:

export SEMANTIC_SCHOLAR_API_KEY=your_key
export PUBMED_API_KEY=your_key
export OPENALEX_API_KEY=your_key

Optional:

export MISTRAL_API_KEY=your_key        # for PDF-to-markdown via Mistral OCR

Publisher tokens (optional)

For authenticated PDF downloads from paywalled publishers:

export ELSEVIER_API_KEY=your_key       # Elsevier/ScienceDirect
export WILEY_TDM_TOKEN=your_token      # Wiley TDM
export SPRINGER_API_KEY=your_key       # Springer Nature

These can also be set in ~/.opencite/config.toml:

[publishers]
elsevier = "your_key"
wiley_tdm = "your_token"
springer = "your_key"

Redistribution and licensing

OpenCite retrieves PDFs and markdown for you and reports what it found, but it does not enforce a redistribution policy. The publication-vs-reuse decision belongs to the caller.

What we report:

  • Paper.oa_status -- the OpenAlex Open Access status (gold, hybrid, green, bronze, closed, diamond, or empty when unknown). is_oa = True collapses all open categories together; oa_status distinguishes them. Notably, bronze is free-to-read but not openly licensed.
  • PDFLocation.license and version -- per-source license string (cc-by, cc-by-nc, etc.) and version (publishedVersion, acceptedVersion, submittedVersion) where the upstream API surfaces them. Available in opencite lookup --format json --verbose and opencite search.
  • <pdf>.license.json -- a sidecar written next to every downloaded PDF containing {url, source, license, version, oa_status, publisher_tdm, doi, retrieved_at}. A later "is this PDF safe to commit?" check can run by scanning sidecars without re-querying the original Paper.

If your pipeline publishes its artifacts (e.g. commits PDFs/markdown to a public repo), be deliberate about which sources you enable. Publisher TDM tokens (Elsevier, Wiley, Springer) almost universally prohibit redistribution of the bytes they return; the sidecar's publisher_tdm: true flag is a machine-readable signal for downstream scanners.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opencite-0.5.2.tar.gz (142.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

opencite-0.5.2-py3-none-any.whl (96.7 kB view details)

Uploaded Python 3

File details

Details for the file opencite-0.5.2.tar.gz.

File metadata

  • Download URL: opencite-0.5.2.tar.gz
  • Upload date:
  • Size: 142.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for opencite-0.5.2.tar.gz
Algorithm Hash digest
SHA256 d7a63482e4d1a0372fd2d01cc246e3d10b74d50393ae153afe53029b490e7d7e
MD5 335cbe1cb8f71791c0f131c2aadd59d0
BLAKE2b-256 7a18313a129d8cee89784c409ec19049f09adf6ba515792747b9dd65dbbd9ddb

See more details on using hashes here.

Provenance

The following attestation bundles were made for opencite-0.5.2.tar.gz:

Publisher: publish.yml on neuromechanist/opencite

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file opencite-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: opencite-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 96.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for opencite-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d4ffb6b553c94df454dbc1de4eed55fdc2ea7604099fea0e3fdbc5a5ea654ce1
MD5 5791b28b4d3feb7ceedc3a525b34fe45
BLAKE2b-256 0eb8d00880ff96e9d3808c6b65e6347b7c4f9fdc8b72213a5dd4cd147224207e

See more details on using hashes here.

Provenance

The following attestation bundles were made for opencite-0.5.2-py3-none-any.whl:

Publisher: publish.yml on neuromechanist/opencite

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page