Skip to main content

All-in-one MCP server for academic paper processing: OCR, metadata, structure extraction, translation, summary, arXiv, and Zotero.

Project description

fro-wang-academic-tools-mcp

fro-wang-academic-tools-mcp is an MCP server for academic paper workflows, built on FastMCP. It combines OCR, metadata extraction/enrichment, section parsing, translation, summary generation, arXiv tools, and Zotero integration.

Detailed system design is documented in docs/README_SYSTEM.md.

Motivation

This project is dedicated to managing academic papers locally to empower AI Agents. While AI Agents excel at local tasks, they struggle with the dominant format of academic literature: PDFs. When you are conducting literature reviews, reading, or searching for papers, most of your local library consists of PDF files.

Local coding agents prefer plain text. They rely on tools like grep or semantic search to navigate local files. These tools cannot natively search inside PDFs, making it difficult for agents to assist with literature management effectively.

Having frequently encountered this bottleneck, I built fro-wang-academic-tools-mcp to bridge the gap. These tools extract the core workflow from my website, frowang.com, converting PDFs into agent-friendly formats (like Markdown) and enriching them with metadata.

Features

  • End-to-end paper pipeline via process_paper
  • OCR with MinerU (ocr_paper)
  • Metadata extraction + enrichment (extract_metadata)
  • Section structure extraction (extract_sections)
  • Markdown translation (translate_paper)
  • Summary report generation (generate_summary)
  • Folder rename by metadata convention (rename_paper_folder)
  • arXiv search/download tools
  • Zotero library tools (search, notes, collections, attachments, annotations)

Tool Groups

This server currently registers 22 MCP tools:

  • Paper processing: ocr_paper, extract_metadata, extract_sections, translate_paper, generate_summary, rename_paper_folder
  • Pipeline: process_paper, start_process_paper_job, get_process_paper_job, cancel_process_paper_job
  • arXiv: search_papers, download_paper
  • Zotero: zotero_search_items, zotero_get_item_metadata, zotero_get_item_fulltext, zotero_get_collections, zotero_get_collection_items, zotero_get_tags, zotero_get_recent, zotero_get_annotations, zotero_get_notes, zotero_create_note

Requirements

  • Python >=3.11 (project pin: 3.13.7, see .python-version)
  • MinerU token (for OCR)
  • OpenAI-compatible LLM API key (default endpoint is DeepSeek)
  • Zotero credentials (for remote mode) or local Zotero desktop mode

Configuration Guides:

Installation Modes

There are two ways to install and use this MCP server. Developer mode is recommended, as the installed mode has not been fully tested yet.

Mode 1: Developer Mode (Recommended)

Clone the repository from GitHub, fill in a local .env file, then point your MCP client to the local project directory.

Step 1: Clone the repo and set up the environment:

git clone https://github.com/your-org/fro-wang-academic-tools-mcp.git
cd fro-wang-academic-tools-mcp
uv python pin 3.13.7
uv venv --python 3.13.7
uv sync --extra dev
Copy-Item .env.example .env
# Edit .env and fill in your keys

Edit .env at least for:

  • LLM_API_KEY
  • MINERU_API_KEY_1
  • ZOTERO_LIBRARY_ID and ZOTERO_API_KEY (if using Zotero remote mode)

Step 2: Configure your MCP client to run the server from the local project path.

The key advantage of this mode is that all credentials are stored in a local .env file — no need to embed them in the MCP client config.

Codex (~/.codex/config.toml)

[mcp_servers.academic_tools]
command = "uv"
args = ["--directory", "/absolute/path/to/mcps/fro-wang-academic-tools-mcp", "run", "fro-wang-academic-tools-mcp"]
startup_timeout_sec = 30.0

Environment variables are read from the .env file in the project directory, so you do not need to set them again in the MCP config.

JSON-based MCP clients (Cursor, Claude Desktop, Cline, etc.)

{
  "mcpServers": {
    "academic_tools": {
      "command": "uv",
      "args": ["--directory", "/absolute/path/to/mcps/fro-wang-academic-tools-mcp", "run", "fro-wang-academic-tools-mcp"]
    }
  }
}

Claude Code

claude mcp add academic-tools -- uv --directory /absolute/path/to/mcps/fro-wang-academic-tools-mcp run fro-wang-academic-tools-mcp

Mode 2: Installed Mode via uv tool install (Not Yet Fully Tested)

Warning: This mode has not been fully tested. Use developer mode if you encounter issues.

Install the package directly as a uv tool:

uv tool install fro-wang-academic-tools-mcp

In this mode, the package is installed to a managed location and you cannot easily edit its internal .env. Instead, all credentials must be passed via the MCP client's env block.

Codex (~/.codex/config.toml)

[mcp_servers.academic_tools]
command = "fro-wang-academic-tools-mcp"
args = []
startup_timeout_sec = 30.0

[mcp_servers.academic_tools.env]
LLM_API_KEY = "your_llm_key"
LLM_BASE_URL = "https://api.deepseek.com"
LLM_MODEL = "deepseek-chat"
MINERU_API_KEY_1 = "your_mineru_key"
ZOTERO_LIBRARY_ID = "your_zotero_library_id"
ZOTERO_API_KEY = "your_zotero_api_key"

JSON-based MCP clients (Cursor, Claude Desktop, Cline, etc.)

{
  "mcpServers": {
    "academic_tools": {
      "command": "fro-wang-academic-tools-mcp",
      "args": [],
      "env": {
        "LLM_API_KEY": "your_llm_key",
        "LLM_BASE_URL": "https://api.deepseek.com",
        "LLM_MODEL": "deepseek-chat",
        "MINERU_API_KEY_1": "your_mineru_key",
        "ZOTERO_LIBRARY_ID": "your_zotero_library_id",
        "ZOTERO_API_KEY": "your_zotero_key"
      }
    }
  }
}

Optionally, you can also point to an external env file instead of embedding keys inline:

ACADEMIC_TOOLS_ENV_FILE=/absolute/path/to/.env

Long-Running Jobs (Recommended for MCP Clients)

Many MCP clients apply a ~60s timeout per tool call. For full pipeline runs, prefer async jobs:

  1. Start job with start_process_paper_job (returns immediately with job_id)
  2. Poll with get_process_paper_job(job_id)
  3. Optional cancel via cancel_process_paper_job(job_id)

This avoids client timeout while OCR/LLM stages continue in the background.

Development

Run the server locally for testing:

uv run fro-wang-academic-tools-mcp

or

uv run python -m academic_tools

Run tests and lint:

uv run pytest
uv run ruff check src

Check Python version used by the project env:

uv run python -V
uv run python -c "import sys; print(sys.executable)"

Common Issues

  • uv init says project already initialized:
    • Expected behavior. This repo already has pyproject.toml.
  • uv sync --extra dev warns about VIRTUAL_ENV mismatch:
    • You likely activated another project's venv. Deactivate it and run again in this folder.
  • Readme file does not exist: README.md during build:
    • This file is required by [project].readme in pyproject.toml. Keep README.md in project root.

Project Entry Points

  • CLI script: fro-wang-academic-tools-mcp (defined in pyproject.toml)
  • Module entry: python -m academic_tools
  • Server wiring: src/academic_tools/server.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fro_wang_academic_tools_mcp-0.1.1.tar.gz (163.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fro_wang_academic_tools_mcp-0.1.1-py3-none-any.whl (56.2 kB view details)

Uploaded Python 3

File details

Details for the file fro_wang_academic_tools_mcp-0.1.1.tar.gz.

File metadata

File hashes

Hashes for fro_wang_academic_tools_mcp-0.1.1.tar.gz
Algorithm Hash digest
SHA256 8e461021a38fcb4b03400c24f7b62be60d944d2c3e2ede10717978ed81013f79
MD5 5ff72e4e79be32a47ec313dc88b948d8
BLAKE2b-256 75c11c9af09b4156059807b3169b4ad5c23b24409fadaee291b460c237d8e402

See more details on using hashes here.

File details

Details for the file fro_wang_academic_tools_mcp-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for fro_wang_academic_tools_mcp-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5a1729f624ee0d9904198e5759ddd8dc7880cf6c7ddbdd9e2a947044b19be0c9
MD5 7652d84174719366b935ad88f0756072
BLAKE2b-256 ab78a986c45ad10cd14d397a08783ae8fa34419cfbfa0e2249ae43875406e593

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page