Scientific paper search, enrichment, download, and management for the SciTeX ecosystem
Project description
SciTeX Scholar (scitex-scholar)
Scientific paper search, enrichment, PDF download, and library management for reproducible research.
Full Documentation · pip install scitex-scholar
Problem and Solution
| # | Problem | Solution |
|---|---|---|
| 1 | Literature search is balkanized -- CrossRef / OpenAlex / Semantic Scholar / arXiv / PubMed each have different APIs, rate limits, auth | Unified search -- scitex scholar search "topic" federates across all, deduplicates by DOI, returns ranked results |
| 2 | BibTeX from the wild is missing abstracts / DOIs / impact factors -- manuscript prep wastes hours | scitex scholar bibtex enrichment -- one call resolves DOIs, fetches abstracts, adds impact factors, normalizes formatting |
| 3 | Paywalled PDFs require institutional login per journal -- manual login-download-rename is the bottleneck | Browser-automation + OAuth -- persistent Chrome profile with stealth; scitex scholar fetch 10.1038/... grabs the PDF end-to-end |
Problem
Literature management spans many tools and APIs: searching databases, resolving DOIs, downloading PDFs through institutional access, enriching BibTeX metadata, and keeping a reproducible, deduplicated library. Each step speaks a different library, auth flow, and data format.
Solution
scitex-scholar provides a unified workflow:
- Search across CrossRef, Semantic Scholar, PubMed, arXiv, and OpenAlex
- Resolve DOIs from titles; enrich BibTeX with abstracts, citation counts, impact factors (JCR 2024), PMIDs, and arXiv IDs
- Download PDFs through institutional access (OpenAthens / SSO) with Playwright browser automation
- Organize papers in a MASTER-hash library with per-project symlinks at
~/.scitex/scholar/library/ - Highlight each sentence of a PDF by rhetorical role — claim, method, limitation, supportive citation, contradicting citation — via Claude
- Automate the same operations from the CLI, a Python API, or the SciTeX MCP server
Installation
pip install scitex-scholar # core
pip install "scitex-scholar[pdf]" # PDF text extraction
pip install "scitex-scholar[mcp]" # MCP server deps (fastmcp)
pip install "scitex-scholar[browser]" # Playwright automation
pip install "scitex-scholar[all]" # everything
4 Interfaces
Python API
from scitex_scholar import Scholar, Paper, Papers, apply_filters, to_bibtex
scholar = Scholar()
papers = scholar.search("deep learning EEG", year_min=2020) # auto-enriched
papers.save("results.bib")
# Filter + export
top = apply_filters(papers, min_citations=50, min_impact_factor=5.0)
print(to_bibtex(top))
CLI
Entry point: scitex-scholar <subcommand> (Click-based).
# Discover everything
scitex-scholar --help
scitex-scholar --help-recursive # full overview, every leaf
scitex-scholar --version # or -V
# Paper(s)
scitex-scholar paper fetch --doi 10.1038/nature12373 --project demo
scitex-scholar paper fetch-batch --dois 10.1038/xxx --dois 10.1126/yyy --project demo --num-workers 4
# BibTeX file
scitex-scholar bibtex import --bibtex refs.bib --project demo --output refs.enriched.bib
# PDF post-processing
scitex-scholar pdf highlight paper.pdf
# Library
scitex-scholar library link-project-tree .
scitex-scholar library db build --dry-run
scitex-scholar library db audit --json
# Auth (institutional SSO — OpenAthens / EZProxy / Shibboleth)
scitex-scholar auth status # exit 0 if any session valid, 1 otherwise
scitex-scholar auth login # trigger SSO flow now (debug-friendly)
scitex-scholar auth logout -y # clear cached cookies (--yes required)
scitex-scholar auth refresh # logout + login
# MCP server
scitex-scholar mcp start
scitex-scholar mcp list-tools --json
# Shell completion
scitex-scholar install-shell-completion --shell bash
scitex-scholar print-shell-completion --shell bash
# Skills + Python API introspection
scitex-scholar skills list
scitex-scholar list-python-apis -v
Debugging the SSO automator
Every browser-automation step writes a screenshot + HTML pair to
~/.scitex/scholar/cache/engine/screenshots/ and
~/.scitex/browser/cache/debug/. When a selector breaks (e.g. an
Okta UI refresh), ls -lt the artifact dirs to get a frame-by-frame
storyboard — the screenshot shows what was rendered, the HTML
shows what the locator was reasoning over. See
_skills/scitex-browser/11_debugging-visuals.md for the full pattern.
Mutating verbs accept --dry-run and -y/--yes. Read verbs support --json.
Common paper/bibtex flags: --browser-mode {stealth,interactive}, --chrome-profile NAME, --force.
Migration (1.3.0): the CLI moved to noun-verb groups. Old top-level commands (
single,parallel, top-levelbibtex --bibtex,highlight,link-project-tree,materialize,dematerialize,db) still work but emit aDeprecationWarningand will be removed in 1.4.0. See CHANGELOG.md for the full migration table.
MCP Server
The package ships MCP tool handlers consumed by the unified scitex serve
server (tools prefixed scholar_*). A standalone server at
scitex_scholar.mcp_server is still shipped but deprecated. See the
Skills documentation
for the full tool list.
Skills
Agent skill pages are published at
scitex-scholar.readthedocs.io/en/latest/skills.html.
The semantic-highlight skill documents the PDF-highlighting workflow.
Core API
| Symbol | Purpose |
|---|---|
Scholar |
Main search / enrich / download / save interface |
Paper, Papers |
Single paper / collection with export methods |
ScholarConfig |
Paths, API keys, auto-enrich toggle, browser settings |
apply_filters |
Filter a Papers collection |
to_bibtex, to_ris, to_endnote, to_text_citation |
Export formats |
generate_cite_key, make_citation_key |
Deterministic BibTeX keys |
CitationGraphBuilder, plot_citation_graph |
Optional citation graph |
pdf_highlight.highlight_pdf |
Overlay semantic highlights on a PDF |
Sources: core/, search_engines/, metadata_engines/, pdf_download/, pipelines/, browser/, auth/, storage/, pdf_highlight/, _mcp/.
Semantic PDF Highlighting
Overlay colour-coded highlights on a PDF that separate what the paper claims from its methods, self-admitted limitations, and stance toward related work. Highlights are standard PDF annotation objects placed on a copy of the source — the original bytes are unchanged and any viewer can show or strip them.
| colour | category | meaning |
|---|---|---|
| green | focal_claim |
what the paper clarifies, suggests, demonstrates |
| purple | focal_method |
novel method, model, cohort, or analysis |
| red | focal_limitation |
self-admitted caveat or threat to validity |
| blue | related_supportive |
prior work whose finding supports the paper |
| orange | related_contradictive |
prior work whose finding contradicts the paper |
A compact colour legend + signature (model name, timestamp) is stamped in the lower-right corner of the last page. See docs for full details.
export ANTHROPIC_API_KEY=sk-ant-...
scitex-scholar pdf highlight paper.pdf # sentence-level, Haiku, writes paper.highlighted.pdf
scitex-scholar pdf highlight paper.pdf --stub # offline keyword heuristic (no API calls)
from scitex_scholar.pdf_highlight import highlight_pdf
result = highlight_pdf("paper.pdf", output_path="paper.highlighted.pdf")
print(result.counts(), result.annotations_added)
Also exposed as the scholar_highlight_pdf MCP tool (unified scitex serve server) and as the
semantic-highlight agent skill (see
skills documentation).
Storage layout
~/.scitex/scholar/library/
├── MASTER/<HASH>/ # Canonical per-paper storage (metadata.json + PDF)
└── <project>/<human-label> -> ../MASTER/<HASH>
Cache and auth state live under ~/.scitex/scholar/cache/ (URL resolver, Chrome profiles, OpenAthens cookies). Override with SCITEX_DIR.
License
AGPL-3.0-only.
Part of SciTeX
scitex-scholar is part of SciTeX. Install via
the umbrella with pip install scitex[scholar] to use as
scitex.scholar (Python) or scitex scholar ... (CLI).
Four Freedoms for Research
- The freedom to run your research anywhere — your machine, your terms.
- The freedom to study how every step works — from raw data to final manuscript.
- The freedom to redistribute your workflows, not just your papers.
- The freedom to modify any module and share improvements with the community.
AGPL-3.0 — because we believe research infrastructure deserves the same freedoms as the software it runs on.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scitex_scholar-1.3.1.tar.gz.
File metadata
- Download URL: scitex_scholar-1.3.1.tar.gz
- Upload date:
- Size: 6.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a5bd4f9867be49245a91f3b0630c23e64d4c20735c2fe31d7e6445027076713c
|
|
| MD5 |
60695b67fd09c4d2d886759bba0badd3
|
|
| BLAKE2b-256 |
7eb65fea8452ec8ca55e748e2c91db383a4863c168cc79303b840d90bfb7aa21
|
Provenance
The following attestation bundles were made for scitex_scholar-1.3.1.tar.gz:
Publisher:
publish-pypi.yml on ywatanabe1989/scitex-scholar
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scitex_scholar-1.3.1.tar.gz -
Subject digest:
a5bd4f9867be49245a91f3b0630c23e64d4c20735c2fe31d7e6445027076713c - Sigstore transparency entry: 1467323219
- Sigstore integration time:
-
Permalink:
ywatanabe1989/scitex-scholar@7afcf239d9355d47f01eeb6e9212cfbef7e30d59 -
Branch / Tag:
refs/heads/develop - Owner: https://github.com/ywatanabe1989
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@7afcf239d9355d47f01eeb6e9212cfbef7e30d59 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file scitex_scholar-1.3.1-py3-none-any.whl.
File metadata
- Download URL: scitex_scholar-1.3.1-py3-none-any.whl
- Upload date:
- Size: 7.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
14b3c6c7adfe52b92bc3760cba591ff404deeeb543fb16566360c5c0c4fc6a1f
|
|
| MD5 |
a6ac8b6369a32ac171e611f23803e437
|
|
| BLAKE2b-256 |
b590573bc33889f2cf76ad6d0df8336a28613654defe57c5a1cd1d8167fba6d6
|
Provenance
The following attestation bundles were made for scitex_scholar-1.3.1-py3-none-any.whl:
Publisher:
publish-pypi.yml on ywatanabe1989/scitex-scholar
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scitex_scholar-1.3.1-py3-none-any.whl -
Subject digest:
14b3c6c7adfe52b92bc3760cba591ff404deeeb543fb16566360c5c0c4fc6a1f - Sigstore transparency entry: 1467323450
- Sigstore integration time:
-
Permalink:
ywatanabe1989/scitex-scholar@7afcf239d9355d47f01eeb6e9212cfbef7e30d59 -
Branch / Tag:
refs/heads/develop - Owner: https://github.com/ywatanabe1989
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@7afcf239d9355d47f01eeb6e9212cfbef7e30d59 -
Trigger Event:
workflow_dispatch
-
Statement type: