Academic research MCP server — search, extract, and manage papers

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

STSNaive

These details have not been verified by PyPI

Project description

GRaDOS

English | 简体中文

  .oooooo.    ooooooooo.             oooooooooo.     .oooooo.    .oooooo..o
 d8P'  `Y8b   `888   `Y88.           `888'   `Y8b   d8P'  `Y8b  d8P'    `Y8
888            888   .d88'  .oooo.    888      888 888      888 Y88bo.     
888            888ooo88P'  `P  )88b   888      888 888      888  `"Y8888o. 
888     ooooo  888`88b.     .oP"888   888      888 888      888      `"Y88b
`88.    .88'   888  `88b.  d8(  888   888     d88' `88b    d88' oo     .d8P
 `Y8bood8P'   o888o  o888o `Y888""8o o888bood8P'    `Y8bood8P'  8""88888P'

Graduate Research and Document Operating System

The enrichment-grade MCP server for academic paper workflows. For science.

GRaDOS gives AI agents (Claude, Codex, Cursor, and similar clients) a single stdio MCP server that can search academic databases, fetch papers through paywalls, parse PDFs into canonical Markdown, and revisit saved papers for citation-grounded writing.

Architecture 🧭

GRaDOS keeps one boundary clear: the host agent plans and writes; GRaDOS retrieves, materializes, verifies, and stores evidence.

flowchart TD
  Agent["Host Agent<br/>Claude / Codex / Cursor"]

  subgraph GRADOS["GRaDOS MCP"]
    direction TB
    Server["stdio MCP Server"]
    Tools["Research Tools"]
    Recovery["Recovery Layer<br/>Run Manifest / Operation Registry"]
    Server --> Tools
    Tools --> Recovery
  end

  subgraph RESEARCH["Research Flow"]
    direction TB
    Local["Local-first Lookup<br/>papers/*.md / structure cards"]
    Remote["Remote Search<br/>Crossref / PubMed / ..."]
    Fetch["Full-text Fetch<br/>api / browser / codex / scihub"]
    Parse["PDF Parse<br/>Docling / MinerU / PyMuPDF"]
    Local --> Remote --> Fetch --> Parse
  end

  subgraph STORAGE["Canonical Storage"]
    direction TB
    Pdfs["Raw PDFs<br/>downloads/*.pdf"]
    Markdown["Canonical Markdown<br/>papers/*.md"]
    Indexes["Search Indexes<br/>Chroma / FTS / Metadata"]
    Pdfs --> Markdown --> Indexes
  end

  subgraph EVIDENCE["Evidence Layer"]
    direction TB
    Candidate["Candidate Evidence"]
    Packs["Current-valid Evidence Packs"]
    Audits["Grids / Comparisons / Audits"]
    Candidate --> Packs --> Audits
  end

  subgraph OUTPUT["Outputs"]
    direction TB
    Reread["Canonical Reread"]
    Consult["ChatGPT Pro Packet<br/>Optional Advisory"]
    Writing["Citation-grounded Writing"]
    Reread --> Writing
    Consult --> Writing
  end

  Agent --> Server
  Tools --> Local
  Parse --> Pdfs
  Indexes --> Candidate
  Audits --> Reread
  Audits --> Consult

  classDef agent fill:#F8FAFC,stroke:#475569,stroke-width:1px,color:#0F172A
  classDef process fill:#EEF6FF,stroke:#2563EB,stroke-width:1px,color:#172554
  classDef storage fill:#F0FDF4,stroke:#16A34A,stroke-width:1px,color:#14532D
  classDef evidence fill:#FAF5FF,stroke:#9333EA,stroke-width:1px,color:#3B0764
  classDef output fill:#FFF7ED,stroke:#EA580C,stroke-width:1px,color:#7C2D12

  class Agent agent
  class Server,Tools,Recovery,Local,Remote,Fetch,Parse process
  class Pdfs,Markdown,Indexes storage
  class Candidate,Packs,Audits evidence
  class Reread,Consult,Writing output

Core contracts:

Local papers are checked first, then remote sources are searched and fetched through configured full-text routes.
PDFs are parsed into canonical Markdown; raw PDFs, parser provenance, assets, semantic indexes, lexical indexes, and remote metadata remain visible on disk.
Search snippets, scores, evidence grids, comparisons, and audits are navigation material. Final citation support requires canonical rereading through read_saved_paper or a current-valid pack from verify_evidence_pack.
ChatGPT Pro packets, research_run_manifest, and Operation Registry records are advisory or recovery metadata. They do not replace canonical paper text as evidence.
Evidence-grounded writing lives in the bundled skill profiles under references/paper_writing.md; the MCP runtime remains the evidence and recovery layer.

MCP Toolsets 🧰

By default, MCP tools/list exposes the research_default profile rather than every public GRaDOS tool. This keeps ordinary research agents focused on the end-to-end loop: local/remote search, extraction, structure cards, canonical rereading, operation polling, evidence packs, audits, evidence grids, comparison, default external consult, and run-linked artifact saving.

Toolsets only control MCP tool visibility. They do not remove Python functions, CLI commands, resources, internal workflows, or existing storage paths. The paper resources grados://papers/index and grados://papers/{safe_doi} remain registered by default and are not counted as tools.

Setting	Exposed tools
unset / `GRADOS_MCP_TOOLSETS=research_default`	Default research workflow tools.
`GRADOS_MCP_TOOLSETS=all` or `GRADOS_MCP_TOOLSETS=full`	The full public MCP surface, currently all 32 tools.
`GRADOS_MCP_TOOLSETS=research_default,local_pdf`	Default research tools plus local PDF import/parse/handoff/asset tools.
`GRADOS_MCP_TOOLS=read_saved_paper,prepare_evidence_pack`	Exact tool allow-list when no toolset is also set.
`GRADOS_MCP_TOOLSETS=research_default` plus `GRADOS_MCP_TOOLS=read_paper_asset`	Default profile plus explicitly named tools.

Named toolsets: research_default, local_pdf, analysis_extra, evidence_extra, evidence_recovery, external_recovery, maintenance, zotero, all, and full. Unknown toolset or tool names fail at server startup so misconfigured MCP clients do not silently run with the wrong surface.

MCP Tools 🔧

Server	Tool	Description
GRaDOS	`search_academic_papers`	Search remote academic databases for paper metadata, DOI deduplication, resumable continuation tokens, and local saved/full-text/summary state. Optional `indepth=true` starts a background research run for the returned candidates with the same `limit`; poll `get_operation_status` with the returned operation id.
GRaDOS	`search_saved_papers`	Search the local saved-paper library with semantic retrieval, SQLite FTS/BM25 fallback, exact lookup, metadata filters, and hybrid RRF. Returned snippets and Evidence Anchor JSON blocks are screening/reranking material, not citation evidence.
GRaDOS	`extract_paper_full_text`	Fetch, parse, QA-check, and save one paper's canonical full text by DOI. Already-saved and native-full-text paths stay synchronous; PDF-obtained parser work can return a pending operation receipt.
GRaDOS	`read_saved_paper`	Read paragraph windows from one saved paper for canonical deep reading and citation verification. Accepts a DOI, safe DOI, or `grados://papers/...` URI.
GRaDOS	`get_saved_paper_structure`	Return a low-token structure card for one saved paper with preview text, headings, asset summary, and parser provenance summary when available. Use it for screening before deep reading, not as the final citation source.
GRaDOS	`read_paper_asset`	List or read parser-generated figures, tables, formulas, page images, and debug/source assets for a saved paper. Images are returned inline only on request and within configured size limits.
GRaDOS	`import_local_pdf_library`	Scan a local PDF file or directory, then process parsing/import as a background import run. Returns `operation_id`, progress, and a `get_operation_status` next action.
GRaDOS	`parse_pdf_file`	Parse a local PDF into markdown. Without a DOI it returns a truncated preview; with a DOI it saves the paper into the canonical library, materializes the managed PDF when `copy_to_library=true`, and may return `parse_in_progress` while GRaDOS continues a durable background parse attempt.
GRaDOS	`get_operation_status`	Inspect pending external consult, DOI-bound extraction/PDF parse, indepth search, local PDF import, or Codex download handoff operations. Extraction status can also be looked up with `doi:<DOI>` or a bare DOI. With `detail=true`, recover a ChatGPT browser response without resending the prompt and return registry events/debug pointers.
GRaDOS	`ingest_codex_downloaded_pdf`	Complete a `codex` Chrome-extension handoff by validating either `downloaded_file_path` or one scanned watch-dir candidate, then reuse the same canonical parse/save path. Ambiguous, missing, or invalid candidates are recorded as recoverable `codex_download_handoff` operations; long parser runs return in-progress rather than parse failure.
GRaDOS	`plan_library_pdf_cleanup`	Dry-run duplicate PDF cleanup under `downloads/`, reporting noncanonical publisher-name PDFs that have the same hash as a DOI's managed `downloads/{safe_doi}.pdf`. It never deletes files.
GRaDOS	`save_paper_to_zotero`	Save one paper to the configured Zotero library through the Web API, typically for papers that actually support the final answer.
GRaDOS	`save_research_artifact`	Persist reusable intermediate outputs such as search snapshots, extraction receipts, evidence grids, compression-safe evidence checkpoints, and run-linked artifacts in the local SQLite state store. Include `metadata.research_run_id` to attach an artifact to a run manifest.
GRaDOS	`query_research_artifacts`	Query previously saved research artifacts by id, kind, or keyword. `detail=true` returns the full stored content.
GRaDOS	`prepare_evidence_pack`	Retrieve candidate anchors, reread canonical blocks from `papers/*.md`, filter non-evidence fragments, and persist a minimal `evidence_pack` artifact with pack hash, block hashes, answerability, and scoped DOI coverage.
GRaDOS	`read_evidence_pack`	Restore a persisted evidence pack by pack id or artifact id.
GRaDOS	`verify_evidence_pack`	Rebuild canonical block manifests from current `papers/*.md` and report snapshot/current validity, missing papers, document changes, relocation, and hash mismatches.
GRaDOS	`preview_external_consult_packet`	Dry-run a compact external-consult packet from one current-valid evidence pack without saving artifacts or contacting external services.
GRaDOS	`prepare_external_consult_packet`	Persist an `external_consult_packet` artifact with verified anchor ids, canonical paragraph coordinates, excerpts, candidate claims, limitations, and prompt hash, returning the host prompt as a regenerable view.
GRaDOS	`prepare_external_consult_from_topic`	Prepare a fresh evidence pack from a topic and persist a verified external-consult packet in one route, returning both pack and packet ids plus the host prompt.
GRaDOS	`consult_chatgpt_pro`	Consult ChatGPT Pro through the private GRaDOS browser profile. Requires a prompt; packs, packets, artifacts, and files are optional context. It records model/thinking strategies, sends once, uses bounded auto-reattach by default, saves advisory output, supports `manual_response` pasted fallback saves, and does not auto-audit.
GRaDOS	`run_external_consult`	Prepare a topic or current-valid pack into external consult packet context, then send that packet through `consult_chatgpt_pro`; this route no longer auto-audits by default.
GRaDOS	`save_external_consult_result`	Save a host-provided ChatGPT Pro response as advisory `external_consult_result` state linked to its source pack, optional packet, prompt hash, and session metadata. Defaults to `audit=true`.
GRaDOS	`audit_external_consult_result`	Audit a saved external consult result against its linked packet when available, otherwise its source pack, using structured `claims[].anchor_ids` as the primary handoff contract while still reporting prose risks.
GRaDOS	`audit_answer_against_pack`	Audit draft claims using only evidence items inside one verified pack. It returns `verified`, `minor_distortion`, `major_distortion`, `unverifiable`, or `unverifiable_access` verdicts and does not search the full library to fill gaps. Optional `include_suggestions=true` attaches follow-up planning.
GRaDOS	`suggest_missing_evidence`	Suggest follow-up evidence or revision work for non-verified pack-audit claims without changing strict audit results.
GRaDOS	`manage_failure_cases`	Record, inspect, and summarize failed fetch, parse, search, or citation attempts. Can also suggest conservative retry steps from local failure memory.
GRaDOS	`get_citation_graph`	Return lightweight local citation relationships, including citation neighbors, common references, and reverse citing-paper lookups.
GRaDOS	`get_papers_full_context`	Return structured full-context material for context-budgeted saved-paper batches, with token estimates or actual section content for CAG-style deep reading.
GRaDOS	`build_evidence_grid`	Build topic- or subquestion-centered evidence grids from the local paper library before drafting. Rows carry reread anchors for agent-side reranking before citation verification, scoped DOI calls report requested/covered/missing coverage, and weak rows expose `eligibility`, `rejection_reason`, and `evidence_warning` instead of looking citation-ready.
GRaDOS	`compare_papers`	Extract aligned comparison material across multiple saved papers, focused on methods, results, or full text. Returned excerpts carry per-axis reread anchors, avoid backmatter sections by default, and leave an axis empty when no eligible excerpt exists.
GRaDOS	`audit_draft_support`	Audit draft claims against the local paper library and return first-pass `verified`, `minor_distortion`, `major_distortion`, `unverifiable`, or `unverifiable_access` verdicts with eligible candidate evidence snippets, issue types, revision actions, and anchors. `author_year` citations include bracketed, parenthetical, and narrative markers such as `Dou et al. (2026)`, `Dou et al., 2026`, and `张三等，2025`; author/year text is stripped from retrieval queries while attribution checks remain strict. `candidate_limit` controls candidates per claim.

MCP Resources 📚

Resource	Description
`grados://papers/index`	Low-token index of all saved papers.
`grados://papers/{safe_doi}`	Canonical overview card for one saved paper.

safe_doi is an opaque GRaDOS paper ID returned by save receipts, search results, or resource URIs. New saves include a short normalized-DOI hash suffix to avoid filename collisions; older IDs such as 10_1234_demo still resolve. Prefer passing the DOI itself or the returned URI instead of deriving a paper ID by replacing DOI punctuation.

Local Paper Library 🗂️

After extraction or import, GRaDOS keeps papers in a visible on-disk layout:

Directory	Content	Purpose
`config.json`	Runtime configuration	One config file for the whole install
`papers/`	Canonical Markdown papers with YAML front-matter	Deep reading, structure cards, and retrieval
`papers/_parsed/`	Parser provenance sidecars keyed by safe DOI	PDF/parser provenance, source/canonical hashes, block mapping, and asset manifest pointers; not citation content
`papers/_assets/`	Parser-generated assets and manifests	Figures, tables, formulas, page images, and source/debug assets fetched with `read_paper_asset`; not indexed as text
`downloads/`	Raw `.pdf` files	Archival copies of fetched or imported papers
`database/chroma/`	ChromaDB collections	Built-in semantic retrieval store
`database/fts.sqlite3`	Rebuildable SQLite FTS5/BM25 index	Deterministic lexical fallback and hybrid retrieval candidate generation
`database/remote_metadata/`	ChromaDB collection	Remote paper metadata, fetch status, and browser-resume cache
`database/research.sqlite3`	Research artifacts and failure memory	Evidence packs, run manifests, checkpoints, extraction receipts, and recoverable failure records
`research_checkpoints/`	`checkpoint.json` and rendered `checkpoint.md` files	Recoverable indepth research workflow state
`paper_summaries/`	Query-independent derived paper summaries	Navigation and context recovery, never citation evidence
`browser/`	Managed Chromium, publisher/ChatGPT profiles, session records	Browser strategy assets for publisher PDF access and gated ChatGPT external consult
`models/`	Embedding and OCR model caches	Runtime assets warmed by setup

Repository Map 🗺️

README.md / README.zh-CN.md: primary installation and usage guides
.mcp.json: repo-local MCP wiring example
.claude-plugin/: native Claude Code plugin manifests
.agents/plugins/marketplace.json: repo-hosted Codex marketplace manifest
plugin.mcp.json: root plugin-scoped MCP config used by the Claude Code plugin
plugins/grados/.codex-plugin/: self-contained Codex plugin bundle used by the marketplace
plugins/grados/plugin.mcp.json: plugin-scoped MCP config copied into the Codex bundle
skills/grados/SKILL.md: structured research workflow built on top of the MCP tools
skills/grados/references/paper_writing.md: evidence-grounded writing workflow router
skills/grados/references/writing_profiles/: task profiles for protocols, reviews, reports, and manuscripts
skills/grados/references/domain_profiles/: domain-specific writing guardrails, currently including mechanics and elastic metamaterials

Installation 🚀

Option A: `uv tool install` (recommended)

uv tool install grados
grados setup
grados client install all

This creates ~/GRaDOS/config.json, prepares the visible directory layout, installs managed browser assets, and warms the default Harrier embedding runtime. docling is now included in the default install because the canonical parsing pipeline is Docling-first. MinerU is an optional authenticated cloud parser in the same waterfall; it runs only when MINERU_API_KEY is configured. Use grados auth set <provider> to store API keys in the OS keychain. Plaintext keys placed in config.json are treated as a one-time import path and are cleared after a successful migration.

Option B: extras, zero-install, or pip

# Default install (includes Docling)
uv tool install grados

# Zero-install run
uvx grados version

# Traditional Python install
pip install grados

Extras in the current package:

grados: core MCP server, CLI, ChromaDB storage, Docling-first parser, optional MinerU cloud fallback, PyMuPDF fallback, browser automation, and built-in Zotero save support
grados[docling]: compatibility alias for the built-in Docling runtime
grados[marker]: compatibility alias only; Marker is no longer bundled because the current marker-pdf release pins vulnerable parser dependencies
grados[full]: compatibility alias only

Option C: from source

git clone https://github.com/STSNaive/GRaDOS.git
cd GRaDOS
uv sync --all-extras
uv run grados setup
uv run grados client install all
uv run grados status

Quick Start ⚡

Install GRaDOS with uv tool install grados (this now includes Docling by default)
Run grados setup
Run grados client install all to register Claude Code and Codex in one step
Run grados auth set elsevier (and any other providers you need)
Run grados status to confirm dependencies, browser assets, keychain health, and API-key sources
If you already have a PDF library, run grados import-pdfs --from /path/to/papers --recursive
If you are upgrading from an older MiniLM-backed index, run grados reindex once before semantic search

Configure your clients 🔌

Recommended:

grados client install all

This currently installs GRaDOS into both Claude Code and Codex:

registers the grados MCP server through each client's own CLI
copies the bundled grados skill into the user's skills directory

You can also target a single client:

grados client install claude
grados client install codex
grados client list
grados client doctor

After installing or updating the Codex MCP registration, restart Codex or start a new thread so any already-running MCP process picks up the new GRaDOS runtime.

Manual MCP wiring (fallback)

Claude Code / Claude Desktop:

{
  "mcpServers": {
    "grados": {
      "command": "uvx",
      "args": ["grados"]
    }
  }
}

Codex:

[mcp_servers.grados]
command = "uvx"
args = ["grados"]

Use uvx when you want zero-install MCP launching. For long-lived local use, uv tool install grados plus the grados executable remains the primary path, and now brings Docling with it by default. If you want a custom data root, set GRADOS_HOME in your MCP client's environment.

Native Plugin Install 🧩

GRaDOS now ships native plugins for Codex and Claude Code.

Claude Code:

/plugin marketplace add STSNaive/GRaDOS
/plugin install grados@grados-plugins
/reload-plugins

Codex:

codex plugin marketplace add STSNaive/GRaDOS
codex
/plugins

Then choose the GRaDOS Plugins marketplace, install the GRaDOS plugin, and start a new thread. You can call @grados explicitly or just describe the research task directly.

Companion Skill 🤖

GRaDOS still ships a repo-local skill in skills/grados/. The grados client install ... flow above is now the preferred path for local use. Plugin install remains the alternative when you specifically want the native plugin packaging.

skills/grados/SKILL.md contains the current search -> structure -> deep read -> cite -> verify workflow
skills/grados/references/tools.md documents the current MCP tools and 2 resources
skills/grados/references/paper_writing.md routes evidence-grounded writing tasks to focused profiles for protocols, reviews, reports, and manuscripts
skills/grados/agents/openai.yaml describes the OpenAI / Codex-facing dependency on the grados MCP server

Codex and Claude Code use the same skill directory shape, <skills-root>/grados/SKILL.md, with the same supporting files under that directory. Only the skills root differs:

Codex personal skills: ~/.agents/skills
Claude Code personal skills: ~/.claude/skills
Claude Code project skills: .claude/skills

Install it by copying the entire skills/grados/ directory into the appropriate skills root:

mkdir -p "<skills-root>"
cp -R skills/grados "<skills-root>/"

For Codex, set <skills-root> to ~/.agents/skills
For Claude Code personal skills, set <skills-root> to ~/.claude/skills
For Claude Code project skills, set <skills-root> to .claude/skills

This fallback assumes the grados MCP server is already registered in your client. This repository's .mcp.json is the minimal repo-local example; after copying the skill, reload your client so it can discover the new skill files.

Configuration ⚙️

Keep grados-config.example.json as the commented reference; edits take effect on the next CLI run or MCP server restart.

Research Workflow Knobs

research.indepth: disabled by default; controls whether remote search immediately materializes returned candidates for checkpointed full-text review.
research.external_consult: disabled by default; a GRaDOS-native ChatGPT Pro consult transport with enabled and response_wait_total_seconds. Gate automation with grados external-consult is-enabled --quiet; inspect details with grados external-consult status --json; initialize the private profile with grados external-consult setup-browser. When enabled, GRaDOS can send prompt-only or context-bounded consults through its private ChatGPT browser profile, save advisory responses, and optionally audit pack-linked results through explicit follow-up tools. When this is off, GRaDOS does not call ChatGPT, open Chrome, or change evidence reading.

Timeout / Retry Knobs

search: connect_timeout, read_timeout
extract: fetch_connect_timeout, fetch_read_timeout, pdf_read_timeout
extract.headless_browser: legacy-named config section for the browser strategy (disable_pdf_viewer, download_inbox, deadline_seconds, networkidle_timeout, pdf_backfill_timeout, poll_min_seconds, poll_max_seconds)
research.external_consult: response_wait_total_seconds
extract.codex_handoff: watch-dir ingest controls used only after a codex Chrome-extension handoff (download_watch_dir, download_max_age_seconds, download_settle_seconds, download_settle_max_wait_seconds, download_scan_recursive)
retry_policy: max_attempts, max_wait, respect_retry_after

Size Guards

extract.security: byte ceilings for remote PDFs, remote text/XML/HTML responses, local PDFs, browser PDF captures, MinerU result zips, and MinerU full.md. Defaults are intentionally generous for normal paper PDFs; raise them only for trusted oversized inputs.
extract.assets: controls parser asset bundles under papers/_assets/{safe_doi}/ (mode=all|referenced|none), Docling image scale, per-file/total asset size ceilings, inline image ceiling, and max asset count. Asset bytes are stored beside canonical Markdown and are fetched with read_paper_asset, not indexed into Chroma.

Commands 🧰

Command	Purpose
`grados`	Start the MCP stdio server
`grados setup`	Create directories, write `config.json`, install browser assets, and warm models
`grados client install claude`	Register GRaDOS in Claude Code and install bundled skills into `~/.claude/skills`
`grados client install codex`	Register GRaDOS in Codex and install bundled skills into `~/.agents/skills`
`grados client install all`	Install GRaDOS into both Claude Code and Codex
`grados client list`	Show which supported clients currently have GRaDOS installed
`grados client doctor`	Run a lightweight health check for supported clients
`grados client remove claude	codex
`grados auth set/status/migrate/clear`	Manage provider API keys in the OS keychain
`grados browser status --json`	Inspect the publisher PDF browser runtime, managed executable, profile status, lock, and session directory
`grados browser doctor [--live --doi DOI]`	Check publisher browser prerequisites; `--live` runs a PDF-acquisition probe without saving `papers/*.md`
`grados external-consult is-enabled --quiet`	Predicate gate for the optional ChatGPT Pro external consult transport; exit 0 means enabled, exit 1 means disabled
`grados external-consult status --json`	Show the same external consult gate plus config path details as structured diagnostics; profile initialization means Chrome profile markers only, not ChatGPT login readiness
`grados external-consult setup-browser [--keep-open]`	Open the private GRaDOS ChatGPT profile for first-time ChatGPT login; closes after stable login detection by default, while `--keep-open` keeps the command and profile lock alive until the setup browser closes
`grados external-consult doctor [--live]`	Check external consult browser prerequisites; `--live` probes ChatGPT login plus no-submit model, thinking, and composer readiness
`grados import-pdfs --from /path/to/papers --recursive`	Import an existing local PDF library into the canonical paper store
`grados eval-retrieval --fixture cases.jsonl`	Evaluate saved-paper retrieval against local golden cases using dense, FTS/BM25, exact lookup, and RRF unless `--dense-only` is set
`grados status`	Show config, dependency, runtime-asset, and API-key health
`grados paths`	Show the resolved GRaDOS filesystem layout
`grados update-db`	Incrementally refresh the ChromaDB index from `papers/` when the active indexing config is unchanged
`grados reindex`	Rebuild the semantic index from scratch after embedding-model or chunking changes
`grados version`	Show package versions

If you change indexing.model_id, indexing.max_length, or the section-aware chunking settings in config.json, use grados reindex instead of grados update-db.

Changing only indexing.batch_size is a runtime-only tuning knob and does not require a rebuild.

Indexing Defaults 🧠

Default model: microsoft/harrier-oss-v1-270m
Heavier opt-in model: microsoft/harrier-oss-v1-0.6b
Default indexing.max_length: 4096
Default indexing.batch_size: 0 (auto, conservative on CPU/MPS and wider on CUDA)
Overlong single paragraphs are re-split by sentence or clause before embedding so grados reindex does not send giant chunks into SentenceTransformer.encode()

GRaDOS does not assume FlashAttention is available on local macOS / CPU setups. If your runtime says it can use SDPA, that still does not guarantee a fused CUDA FlashAttention path; the safer default is smaller chunks, a shorter indexing length, and conservative batching.

Filesystem Layout 🗄️

By default, GRaDOS keeps everything in a visible directory:

~/GRaDOS/
├── config.json
├── papers/
├── downloads/
├── browser/
│   ├── chromium/
│   ├── profile/
│   ├── pdf-sessions/
│   ├── chatgpt-profile/
│   ├── chatgpt-sessions/
│   └── extensions/
├── models/
├── database/
│   ├── chroma/
│   └── remote_metadata/
├── logs/
└── cache/

Root selection priority:

GRADOS_HOME
~/GRaDOS

Local PDF tools such as parse_pdf_file, ingest_codex_downloaded_pdf(downloaded_file_path=...), and import_local_pdf_library read host file paths from a trusted local MCP/CLI session and enforce extract.security.max_local_pdf_bytes before and while loading the file. Long extraction, parser, indepth, import, external-consult, and Codex handoff work returns durable pending or needs-input receipts with operation_id; poll get_operation_status instead of repeating the original request. If a DOI extraction outlives the MCP host timeout before a receipt is visible, poll get_operation_status(operation_id="doi:<DOI>", detail=true).

API Keys 🔑

Key	Source	Required
`ELSEVIER_API_KEY`	Elsevier Developer Portal	No
`PUBMED_API_KEY`	NCBI E-utilities API key	No
`WOS_API_KEY`	Clarivate Developer Portal	No
`SPRINGER_meta_API_KEY`	Springer Nature Metadata API	No
`SPRINGER_OA_API_KEY`	Springer Nature Open Access API	No
`MINERU_API_KEY`	MinerU API token	No
`ZOTERO_API_KEY`	Zotero Settings -> Keys	No

Crossref works without an API key. PubMed also works without one, but PUBMED_API_KEY is available as an optional pacing upgrade for E-utilities. GRaDOS will use whichever services are configured and skip the rest; the default remote search flow still works with the free sources, and the local paper workflow works without any third-party key.

The preferred path is grados auth set <provider>, which stores the secret in the OS keychain. If you temporarily place a plaintext key in ~/GRaDOS/config.json, GRaDOS will import it into the keychain on the next run and then clear the plaintext value from the file.

Runtime Order 🌊

Search priority:

{
  "search": {
    "order": ["Elsevier", "Springer", "WebOfScience", "Crossref", "PubMed"]
  }
}

Full-text fetch priority:

{
  "extract": {
    "fetch_strategy": {
      "order": ["api", "browser", "codex", "scihub"],
      "enabled": {
        "api": true,
        "browser": true,
        "codex": false,
        "scihub": true
      }
    },
    "unpaywall": {
      "enabled": true
    }
  }
}

Unpaywall is an optional DOI-to-OA-location resolver, not a download strategy. When extract.unpaywall.enabled=true, GRaDOS resolves best_oa_location / oa_locations before codex or browser runs and uses the best url_for_pdf or url_for_landing_page as that route's start URL. It does not affect the api or scihub routes. Legacy oa entries left in old fetch_strategy.order or enabled maps are ignored.

Legacy fetch-strategy aliases such as TDM, SciHub, and Headless are still accepted while existing configs migrate. The current scihub runtime uses extract.sci_hub.endpoints as an ordered access list: the first endpoint is tried first, and later entries are fallbacks. The legacy extract.sci_hub.fallback_mirror value is still accepted when endpoints is omitted or empty.

The browser strategy is a first-class path for institutional publisher access. It uses the GRaDOS-managed publisher profile (browser/profile), profile locking, operational PDF browser session records under browser/pdf-sessions, and response/download/CDP/backfill PDF capture. By default extract.headless_browser.disable_pdf_viewer=true writes Chrome profile prefs so PDF URLs download into GRADOS_HOME/browser_inbox/ (download_inbox) instead of opening in Chrome's PDF viewer; downloads/ remains only the canonical DOI-bound archive after materialization, hash/QA, and conflict checks. Browser acquisition never writes papers/*.md directly: it returns PDF bytes or a challenge plus browser capture metadata, then extract_paper_full_text materializes the PDF and routes DOI-bound parsing through a durable parse attempt. Retained browser windows keep manual/challenge pages open while each DOI gets a job-owned page. If a publisher verification page blocks PDF capture, GRaDOS marks the page title with GRaDOS ACTION REQUIRED best-effort, records controlled-wait telemetry, keeps listening during the bounded browser deadline, and can capture a user-opened PDF tab through pdf_url_backfill_after_manual. If the bounded wait still expires, GRaDOS records a challenge with manual-resume metadata in remote_metadata; complete the verification in the managed browser profile, then call extract_paper_full_text again with resume_browser=true, or pass a known downloaded PDF path through the existing local ingestion/parse route.

codex is disabled by default. When enabled and placed in extract.fetch_strategy.order, it acts as a Codex Chrome extension host-agent handoff at that exact point in the order: extract_paper_full_text returns a Chrome download receipt with a codex_download_handoff operation, then the host agent uses the Codex @chrome plugin / Codex Chrome extension as the acquisition route. If the host knows the absolute PDF path, call ingest_codex_downloaded_pdf(doi=..., downloaded_file_path=...) or parse_pdf_file(file_path=..., doi=..., copy_to_library=true, acquisition_via="codex"); otherwise ingest_codex_downloaded_pdf scans extract.codex_handoff.download_watch_dir. That watch dir is scan-only: it does not configure Chrome, and an empty scan means pass the real path rather than click the publisher download button again. If multiple plausible PDFs are found, GRaDOS returns a needs-input disambiguation result instead of guessing. If a DOI-bound local parse exceeds extract.parsing.foreground_wait_seconds, GRaDOS returns a parse_in_progress receipt, keeps the background parse attempt running, and later reconciles repeated calls by DOI plus PDF hash; do not redownload the PDF just because the foreground call returned before MinerU or another parser finished. If Unpaywall finds an OA URL, the receipt starts from that URL instead of https://doi.org/{doi}.

All PDF acquisition routes that copy into the library now share one materialization boundary. The managed raw PDF for a DOI is downloads/{safe_doi}.pdf; publisher filenames and external local PDFs are acquisition inputs. Same-DOI same-hash candidates reuse, rename, or copy to the managed path. Same-DOI different-hash candidates return a conflict receipt that keeps both the existing canonical PDF and the candidate input; if a conflicting candidate exists only as captured bytes, GRaDOS preserves it under downloads/_conflicts/{safe_doi}.{hash12}.pdf for manual review. PDF fetches are parsed and QA-checked before ordinary fulltext success; parser QA failures continue through configured parser/fetch fallbacks, and unresolved QA failures are saved only as partial_success. New papers/*.md frontmatter keeps only reading metadata and pointers such as parsed_manifest_path / assets_manifest_path; PDF paths, hashes, acquisition route, and parser/materialization provenance live in the receipt, remote_metadata.fetch_via, and papers/_parsed/{safe_doi}.json.

If research.external_consult.enabled=true, the default tool is consult_chatgpt_pro: it requires a prompt, may include a pack, packet, saved artifact, or local file as optional bounded context, then opens the dedicated GRaDOS ChatGPT profile. model_strategy can select the visible Pro target, record the current label, or skip with a warning; thinking_strategy can select the highest visible effort, record the current label, or skip with a warning. GRaDOS sends the prompt once, saves session, transcript, snapshot, capture, strategy, and advisory-result metadata, and does not auto-audit by default. research.external_consult.response_wait_total_seconds is the total response wait budget counted from initial prompt submission; default wait_policy=auto splits it across the initial wait plus bounded reattach/capture attempts before returning status=pending. Call get_operation_status(operation_id=..., detail=true) to run a short no-resend recovery probe on the same conversation; if the answer is already capturable, GRaDOS saves it, and if it is still generating the operation remains pending with the same recovery handle. If automatic capture still fails but the answer is visible, copy it manually and call consult_chatgpt_pro with recover_session_id plus manual_response; GRaDOS saves it with manual_copy capture metadata without reopening the browser. Recovery stores only recoverable ChatGPT /c/<id> conversation URLs; home/project shell URLs remain diagnostic last_observed_url values and return conversation_url_missing_or_not_recoverable instead of reopening the ChatGPT home page. run_external_consult remains available for topic-or-pack packet preparation before consult. preview_external_consult_packet, prepare_external_consult_from_topic, prepare_external_consult_packet, save_external_consult_result, and audit_external_consult_result remain available for dry runs, recovery, explicit result save, and explicit audit. ChatGPT Pro output is advisory only; final citations still require verified evidence packs or canonical paragraph rereads. This does not remove the separate extract.fetch_strategy.codex PDF acquisition route.

PDF parsing priority:

{
  "extract": {
    "parsing": {
      "order": ["Docling", "MinerU", "PyMuPDF"],
      "enabled": {
        "Docling": true,
        "MinerU": true,
        "PyMuPDF": true
      },
      "foreground_wait_seconds": 90.0,
      "attempt_stale_seconds": 1800.0
    }
  }
}

MinerU is an authenticated cloud parser. When enabled and MINERU_API_KEY is present, GRaDOS uploads the local PDF through MinerU's signed upload API, polls for the extraction zip, reads full.md as the parser output, and saves allowed images, tables, formulas, page/debug files, and source JSON into the paper's asset bundle. GRaDOS enforces extract.security.max_mineru_zip_bytes, extract.security.max_mineru_full_md_bytes, and extract.assets.* size/count limits before exposing assets. Use grados auth set mineru to store the token in the OS keychain.

extract.parsing.foreground_wait_seconds controls how long parse_pdf_file(file_path=..., doi=...) waits for canonical save before returning parse_in_progress; the parser keeps running in a GRaDOS-owned background attempt. extract.parsing.attempt_stale_seconds controls when an inactive running attempt or retryable failed attempt can be restarted from the same local PDF path. These settings do not change individual parser timeouts such as mineru_timeout. MCP progress or cancellation can improve UX, but timeout safety comes from durable pending receipts plus get_operation_status.

Importing Existing PDF Libraries ♻️

If you already have a local PDF library, use grados import-pdfs to parse and copy those files into the canonical papers/ + downloads/ layout:

grados import-pdfs --from /path/to/papers --recursive
grados status

Development 🛠️

uv sync --all-extras
uv run grados version
uv run pytest
uv build

Project Docs 📚

ADR.md
- Records accepted architectural decisions and why the project chose them.
CHANGELOG.md
- Records completed, user-visible changes across releases and unreleased work.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

STSNaive

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.6.34

Jun 23, 2026

This version

0.6.33

Jun 8, 2026

0.6.32

Jun 5, 2026

0.6.31

Jun 4, 2026

0.6.30

Jun 4, 2026

0.6.29

Jun 3, 2026

0.6.28

Jun 2, 2026

0.6.27

May 31, 2026

0.6.26

May 31, 2026

0.6.25

May 28, 2026

0.6.24

May 25, 2026

0.6.23

May 25, 2026

0.6.22

May 24, 2026

0.6.21

May 22, 2026

0.6.20

May 21, 2026

0.6.19

May 21, 2026

0.6.18

May 13, 2026

0.6.17

May 12, 2026

0.6.16

May 11, 2026

0.6.14

May 8, 2026

0.6.13

May 7, 2026

0.6.12

May 7, 2026

0.6.11

May 6, 2026

0.6.10

Apr 21, 2026

0.6.9

Apr 16, 2026

0.6.8

Apr 16, 2026

0.6.7

Apr 15, 2026

0.6.6

Apr 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grados-0.6.33.tar.gz (544.3 kB view details)

Uploaded Jun 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

grados-0.6.33-py3-none-any.whl (381.7 kB view details)

Uploaded Jun 8, 2026 Python 3

File details

Details for the file grados-0.6.33.tar.gz.

File metadata

Download URL: grados-0.6.33.tar.gz
Upload date: Jun 8, 2026
Size: 544.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for grados-0.6.33.tar.gz
Algorithm	Hash digest
SHA256	`6a595afdf715f14ea4d743bc977f9f9c054bcec92ea6a08da650ce0731f5fa63`
MD5	`2edd8f4ca95acc9282f4b959bbe8f0c0`
BLAKE2b-256	`5c93807f1a82e39290741ae9aaa652f2f0024af884b511b9f787468251ae78f4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for grados-0.6.33.tar.gz:

Publisher: publish.yml on STSNaive/GRaDOS

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: grados-0.6.33.tar.gz
- Subject digest: 6a595afdf715f14ea4d743bc977f9f9c054bcec92ea6a08da650ce0731f5fa63
- Sigstore transparency entry: 1756411510
- Sigstore integration time: Jun 8, 2026
Source repository:
- Permalink: STSNaive/GRaDOS@f9763145d59a9beb24f5f98fa1480df2a9591a1c
- Branch / Tag: refs/tags/v0.6.33
- Owner: https://github.com/STSNaive
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@f9763145d59a9beb24f5f98fa1480df2a9591a1c
- Trigger Event: push

File details

Details for the file grados-0.6.33-py3-none-any.whl.

File metadata

Download URL: grados-0.6.33-py3-none-any.whl
Upload date: Jun 8, 2026
Size: 381.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for grados-0.6.33-py3-none-any.whl
Algorithm	Hash digest
SHA256	`194437f99e828d710d420fb0bcb60f2d9a1d20c70e50ff150eb6ac8bfe7ffcb5`
MD5	`0ac47db6c5b7ce30459bca067ea0996e`
BLAKE2b-256	`0551780affd80942948ba30fe1ab4bce774a7fadc175113d4ef8d5d9c0195d91`

See more details on using hashes here.

Provenance

The following attestation bundles were made for grados-0.6.33-py3-none-any.whl:

Publisher: publish.yml on STSNaive/GRaDOS

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: grados-0.6.33-py3-none-any.whl
- Subject digest: 194437f99e828d710d420fb0bcb60f2d9a1d20c70e50ff150eb6ac8bfe7ffcb5
- Sigstore transparency entry: 1756411518
- Sigstore integration time: Jun 8, 2026
Source repository:
- Permalink: STSNaive/GRaDOS@f9763145d59a9beb24f5f98fa1480df2a9591a1c
- Branch / Tag: refs/tags/v0.6.33
- Owner: https://github.com/STSNaive
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@f9763145d59a9beb24f5f98fa1480df2a9591a1c
- Trigger Event: push

grados 0.6.33

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

GRaDOS

Architecture 🧭

MCP Toolsets 🧰

MCP Tools 🔧

MCP Resources 📚

Local Paper Library 🗂️

Repository Map 🗺️

Installation 🚀

Option A: uv tool install (recommended)

Option B: extras, zero-install, or pip

Option C: from source

Quick Start ⚡

Configure your clients 🔌

Manual MCP wiring (fallback)

Native Plugin Install 🧩

Companion Skill 🤖

Configuration ⚙️

Research Workflow Knobs

Timeout / Retry Knobs

Size Guards

Commands 🧰

Indexing Defaults 🧠

Filesystem Layout 🗄️

API Keys 🔑

Runtime Order 🌊

Importing Existing PDF Libraries ♻️

Development 🛠️

Project Docs 📚

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Option A: `uv tool install` (recommended)