Skip to main content

Scientific paper search, enrichment, download, and management for the SciTeX ecosystem

Project description

SciTeX Scholar (scitex-scholar)

Scientific paper search, enrichment, PDF download, and library management for reproducible research.

PyPI version Documentation Tests License: AGPL-3.0

Full Documentation · pip install scitex-scholar


Problem

Literature management spans many tools and APIs: searching databases, resolving DOIs, downloading PDFs through institutional access, enriching BibTeX metadata, and keeping a reproducible, deduplicated library. Each step speaks a different library, auth flow, and data format.

Solution

scitex-scholar provides a unified workflow:

  • Search across CrossRef, Semantic Scholar, PubMed, arXiv, and OpenAlex
  • Resolve DOIs from titles; enrich BibTeX with abstracts, citation counts, impact factors (JCR 2024), PMIDs, and arXiv IDs
  • Download PDFs through institutional access (OpenAthens / SSO) with Playwright browser automation
  • Organize papers in a MASTER-hash library with per-project symlinks at ~/.scitex/scholar/library/
  • Highlight each sentence of a PDF by rhetorical role — claim, method, limitation, supportive citation, contradicting citation — via Claude
  • Automate the same operations from the CLI, a Python API, or the SciTeX MCP server

Installation

pip install scitex-scholar                 # core
pip install "scitex-scholar[pdf]"          # PDF text extraction
pip install "scitex-scholar[mcp]"          # MCP server deps (fastmcp)
pip install "scitex-scholar[browser]"      # Playwright automation
pip install "scitex-scholar[all]"          # everything

Python usage

from scitex_scholar import Scholar, Paper, Papers, apply_filters, to_bibtex

scholar = Scholar()
papers = scholar.search("deep learning EEG", year_min=2020)   # auto-enriched
papers.save("results.bib")

# Filter + export
top = apply_filters(papers, min_citations=50, min_impact_factor=5.0)
print(to_bibtex(top))

CLI

Entry point: python -m scitex_scholar <subcommand>.

# Single paper by DOI or title
python -m scitex_scholar single --doi 10.1038/nature12373 --project demo

# Multiple DOIs/titles in parallel
python -m scitex_scholar parallel --dois 10.1038/xxx 10.1126/yyy --project demo --num-workers 4

# Process a whole BibTeX file
python -m scitex_scholar bibtex --bibtex refs.bib --project demo --output refs.enriched.bib

# Start the (legacy) MCP server
python -m scitex_scholar mcp

Common flags: --browser-mode {stealth,interactive}, --chrome-profile NAME, --force.

Core API

Symbol Purpose
Scholar Main search / enrich / download / save interface
Paper, Papers Single paper / collection with export methods
ScholarConfig Paths, API keys, auto-enrich toggle, browser settings
apply_filters Filter a Papers collection
to_bibtex, to_ris, to_endnote, to_text_citation Export formats
generate_cite_key, make_citation_key Deterministic BibTeX keys
CitationGraphBuilder, plot_citation_graph Optional citation graph
pdf_highlight.highlight_pdf Overlay semantic highlights on a PDF

Sources: core/, search_engines/, metadata_engines/, pdf_download/, pipelines/, browser/, auth/, storage/, pdf_highlight/, _mcp/.

Semantic PDF Highlighting

Overlay colour-coded highlights on a PDF that separate what the paper claims from its methods, self-admitted limitations, and stance toward related work. Highlights are standard PDF annotation objects placed on a copy of the source — the original bytes are unchanged and any viewer can show or strip them.

colour category meaning
green focal_claim what the paper clarifies, suggests, demonstrates
purple focal_method novel method, model, cohort, or analysis
red focal_limitation self-admitted caveat or threat to validity
blue related_supportive prior work whose finding supports the paper
orange related_contradictive prior work whose finding contradicts the paper

A compact colour legend + signature (model name, timestamp) is stamped in the lower-right corner of the last page. See docs for full details.

export ANTHROPIC_API_KEY=sk-ant-...
scitex-scholar highlight paper.pdf            # sentence-level, Haiku, writes paper.highlighted.pdf
scitex-scholar highlight paper.pdf --stub     # offline keyword heuristic (no API calls)
from scitex_scholar.pdf_highlight import highlight_pdf
result = highlight_pdf("paper.pdf", output_path="paper.highlighted.pdf")
print(result.counts(), result.annotations_added)

Also exposed as the scholar_highlight_pdf MCP tool (unified scitex serve server) and as the semantic-highlight agent skill under src/scitex_scholar/_skills/scitex-scholar/.

MCP integration

The package ships MCP tool handlers consumed by the unified scitex serve server (tools prefixed scholar_*). A standalone server at scitex_scholar.mcp_server is still shipped but deprecated. See src/scitex_scholar/_skills/scitex-scholar/SKILL.md for the full tool list.

Storage layout

~/.scitex/scholar/library/
├── MASTER/<HASH>/               # Canonical per-paper storage (metadata.json + PDF)
└── <project>/<human-label> -> ../MASTER/<HASH>

Cache and auth state live under ~/.scitex/scholar/cache/ (URL resolver, Chrome profiles, OpenAthens cookies). Override with SCITEX_DIR.

License

AGPL-3.0-only.

Part of SciTeX

scitex-scholar is part of SciTeX.

Four Freedoms for Research

  1. The freedom to run your research anywhere — your machine, your terms.
  2. The freedom to study how every step works — from raw data to final manuscript.
  3. The freedom to redistribute your workflows, not just your papers.
  4. The freedom to modify any module and share improvements with the community.

AGPL-3.0 — because we believe research infrastructure deserves the same freedoms as the software it runs on.


SciTeX

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scitex_scholar-1.1.2.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scitex_scholar-1.1.2-py3-none-any.whl (1.8 MB view details)

Uploaded Python 3

File details

Details for the file scitex_scholar-1.1.2.tar.gz.

File metadata

  • Download URL: scitex_scholar-1.1.2.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scitex_scholar-1.1.2.tar.gz
Algorithm Hash digest
SHA256 8a1c899ababcde9b8ee2cdae574e6170397d42926141cba83869083f12990b5a
MD5 c5df371ab0c65520d0fd4636f2d695a4
BLAKE2b-256 0b48b7815a4329584537cc7036e62c22e3a4181f14ef716958b5939a6f7d3780

See more details on using hashes here.

Provenance

The following attestation bundles were made for scitex_scholar-1.1.2.tar.gz:

Publisher: publish-pypi.yml on ywatanabe1989/scitex-scholar

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scitex_scholar-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: scitex_scholar-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scitex_scholar-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4cfa6db55426bca5ad3a11b2d8d3d9f74db9d8da5a68e261a89e99541ac372fb
MD5 93dc47e94d7bdc5f2be440c8998ce484
BLAKE2b-256 4255cd3f8ba1a219dd5e739ffdc88650d5762588c64eb9e24377a4a3e4fdc54c

See more details on using hashes here.

Provenance

The following attestation bundles were made for scitex_scholar-1.1.2-py3-none-any.whl:

Publisher: publish-pypi.yml on ywatanabe1989/scitex-scholar

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page