MCP server for reading and searching EPUB/PDF documents

Project description

mcp-ebook-read

A local MCP server for Codex to read and retrieve content from EPUB/PDF documents.

One-Command Docker Setup

Qdrant (required)

docker rm -f qdrant 2>/dev/null || true && docker run -d --name qdrant -p 6333:6333 -p 6334:6334 qdrant/qdrant:v1.16.3

GROBID (required by startup preflight and `document_ingest_pdf_paper`)

docker rm -f grobid 2>/dev/null || true && docker run -d --name grobid -p 8070:8070 lfoppiano/grobid:0.8.0

Verify Services

curl -sS http://localhost:6333/collections
curl -sS http://localhost:8070/api/isalive

Expected:

Qdrant returns JSON with "status":"ok"
GROBID returns true

Run MCP Server (PyPI via `uvx`)

QDRANT_URL=http://localhost:6333 GROBID_URL=http://localhost:8070 GROBID_TIMEOUT_SECONDS=120 uvx mcp-ebook-read

If startup preflight fails, the server exits with a structured error payload on stderr that includes missing env vars and setup hints.

First Run Recommendation

Before configuring this MCP inside an MCP client, run it once manually from a terminal:

QDRANT_URL=http://localhost:6333 GROBID_URL=http://localhost:8070 GROBID_TIMEOUT_SECONDS=120 uvx mcp-ebook-read

This pre-resolves and aligns runtime dependencies, which helps avoid long first-time activation latency after MCP client configuration.

When you want to refresh uvx to the latest published version, run:

QDRANT_URL=http://localhost:6333 GROBID_URL=http://localhost:8070 GROBID_TIMEOUT_SECONDS=120 uvx mcp-ebook-read@latest

If you installed the tool persistently via uv tool install, use uv tool upgrade mcp-ebook-read instead.

Environment Variables

Required:

QDRANT_URL (for example http://127.0.0.1:6333)
GROBID_URL (for example http://127.0.0.1:8070)

Optional:

GROBID_TIMEOUT_SECONDS (default 20; recommended 120 for large papers)
QDRANT_COLLECTION (default mcp_ebook_read_chunks)
QDRANT_TIMEOUT_SECONDS (default 10)
FASTEMBED_MODEL (FastEmbed model override)
FASTEMBED_CACHE_PATH (FastEmbed cache root override; defaults to ~/Library/Caches/mcp-ebook-read/fastembed on macOS and $XDG_CACHE_HOME/mcp-ebook-read/fastembed or ~/.cache/mcp-ebook-read/fastembed elsewhere)
DOCLING_FORMULA_ENRICHMENT (true by default)
PDF_FORMULA_REQUIRE_ENGINE (true by default)
PDF_FORMULA_BATCH_SIZE (auto by default; or an explicit integer)
PDF_DOCLING_NUM_THREADS (override Docling CPU threads)
PDF_DOCLING_BATCH_SIZE (override Docling OCR/layout/table batch sizes together)
PDF_DOCLING_DEVICE (override Docling accelerator device, for example auto or cpu)
PDF_DOCLING_TUNING_PROFILE_PATH (override the local autotune profile JSON path)

Persistence Model

Persistence is sidecar-based and auto-routed by document location.
For each document, MCP writes state to <document_dir>/.mcp-ebook-read/.
Sidecar contains:
- catalog.db
- docs/<doc_id>/reading/reading.md
- docs/<doc_id>/assets/...
- docs/<doc_id>/evidence/...

Notes

Use library_scan to discover .pdf/.epub files under a root and register updates/removals.
After a fresh server restart, call library_scan(root=...) or storage_list_sidecars(root=...) before using tools that only take doc_id.
Use search for global semantic retrieval and read for locator-based chunk windows.
Startup preflight is fail-fast and requires both Qdrant and GROBID to be configured and reachable.
FastEmbed model cache defaults to a stable per-user cache directory under mcp-ebook-read/fastembed instead of the system temp directory.
FastEmbed startup now performs bounded retries and clears broken per-model cache state before retrying when the local cache is corrupted or a transient download failure leaves incomplete model files behind.
Use document_ingest_pdf_book to queue a background ingest job for a PDF book.
Use document_ingest_epub_book to queue a background ingest job for an EPUB book.
Use document_ingest_pdf_paper to queue a background ingest job for a PDF paper. Docling remains the canonical page-aware outline; GROBID enriches paper metadata and title.
Use document_ingest_status to poll the current status of one ingest job (or the latest job for a document).
Use document_ingest_list_jobs to inspect recent ingest job history for one document.
Use document_autotune_pdf_parser before a long PDF ingest when you want to benchmark a few Docling thread/batch profiles on sampled pages and persist the best local profile for later runs.
Use search_in_outline_node when you need chapter-scoped retrieval (recommended for reading workflows).
Use get_outline to fetch document outline nodes before chapter/formula/image scoped reading.
Use read_outline_node to read a chapter/outline node directly without locator stitching.
Use render_pdf_page for PDF evidence rendering.
PDF image extraction is on-demand: ingest does not pre-extract PDF images.
Use pdf_list_images to trigger/list extracted PDF figure/table images (optionally scoped to one outline node).
Use pdf_read_image to get one extracted PDF image path plus nearby text context.
Use pdf_book_list_formulas / pdf_book_read_formula for formula-centric reading on PDF books.
Use pdf_paper_list_formulas / pdf_paper_read_formula for formula-centric reading on PDF papers.
Use epub_list_images to list extracted EPUB images (optionally scoped to one outline node).
Use epub_read_image to get one EPUB image path plus nearby text context.
Use storage_list_sidecars to inspect sidecar persistence under a root.
Use storage_delete_document to remove one document's persisted state.
Use storage_cleanup_sidecars to prune missing docs/orphan artifacts and compact catalogs.
For large papers, increase GROBID_TIMEOUT_SECONDS (for example 120) to reduce timeout failures.
PDF ingest now uses a mixed formula pipeline:
- Docling structure extraction with do_formula_enrichment.
- Pix2Text as the primary formula recovery engine.
- Fail-fast when formula markers exist but Pix2Text is unavailable.
Optional formula env controls:
- DOCLING_FORMULA_ENRICHMENT (true by default)
- PDF_FORMULA_REQUIRE_ENGINE (true by default)
- PDF_FORMULA_BATCH_SIZE (auto by default; auto-detected from CPU and memory, or set an explicit integer)
Optional Docling performance controls:
- document_autotune_pdf_parser benchmarks a sampled subset of one PDF and writes the selected profile to a local JSON cache.
- By default the tuning profile lives at ~/Library/Caches/mcp-ebook-read/docling_pdf_tuning.json on macOS and $XDG_CACHE_HOME/mcp-ebook-read/docling_pdf_tuning.json (or ~/.cache/...) elsewhere.
- PDF_DOCLING_NUM_THREADS and PDF_DOCLING_BATCH_SIZE override the cached profile when you need a fixed setting.
Sidecar cleanup is explicit:
- library_scan no longer triggers threshold-based auto compaction.
- Use storage_cleanup_sidecars(..., compact_catalog=true) when you want compaction.
Ingest is now asynchronous by design:
- the document_ingest_* tools submit work and return immediately with job_id/doc_id;
- poll document_ingest_status(doc_id=..., job_id=...) until status becomes succeeded or failed;
- use document_ingest_list_jobs(doc_id=...) when you need recent history or lost the latest job_id.

No-Label Formula Benchmark

Use your own non-scanned PDF corpus as a no-label regression baseline (without manual annotations).

uvx mcp-ebook-formula-benchmark \
  --samples-dir /ABSOLUTE/PATH/TO/pdf-formula-benchmark-corpus \
  --passes 2 \
  --max-unresolved-rate 0.15 \
  --min-latex-valid-rate 0.85 \
  --min-stability-rate 1.0

Output is JSON with per-document metrics and a threshold pass/fail flag. Exit code is 0 when thresholds pass, otherwise 2.

Claude Code MCP Configuration (JSON via `uvx`)

You can register this server in a Claude Code compatible mcpServers JSON config.

Published package

{
  "mcpServers": {
    "mcp-ebook-read": {
      "command": "uvx",
      "args": [
        "mcp-ebook-read"
      ],
      "env": {
        "QDRANT_URL": "http://127.0.0.1:6333",
        "QDRANT_COLLECTION": "mcp_ebook_read_chunks",
        "GROBID_URL": "http://127.0.0.1:8070",
        "GROBID_TIMEOUT_SECONDS": "120"
      }
    }
  }
}

Security note

Do not put real passwords, API keys, or tokens directly in committed JSON files.
Use environment variables or secret managers, and keep example values as placeholders only.

Codex MCP Configuration (TOML)

You can also configure MCP servers in Codex using TOML style (for example in a Codex MCP config file).

Example

[mcp_servers.mcp-ebook-read]
command = "uvx"
args = [ "mcp-ebook-read" ]
startup_timeout_sec = 60

[mcp_servers.mcp-ebook-read.env]
QDRANT_URL = "http://127.0.0.1:6333"
QDRANT_COLLECTION = "mcp_ebook_read_chunks"
GROBID_URL = "http://127.0.0.1:8070"
GROBID_TIMEOUT_SECONDS = "120"

Project details

Release history Release notifications | RSS feed

This version

0.1.10

Apr 18, 2026

0.1.9

Apr 9, 2026

0.1.7

Mar 18, 2026

0.1.6

Mar 15, 2026

0.1.5

Mar 15, 2026

0.1.4

Mar 15, 2026

0.1.3

Mar 9, 2026

0.1.2

Feb 26, 2026

0.1.1

Feb 25, 2026

0.1.0

Feb 25, 2026

0.1.0rc3 pre-release

Feb 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_ebook_read-0.1.10.tar.gz (105.9 kB view details)

Uploaded Apr 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mcp_ebook_read-0.1.10-py3-none-any.whl (75.0 kB view details)

Uploaded Apr 18, 2026 Python 3

File details

Details for the file mcp_ebook_read-0.1.10.tar.gz.

File metadata

Download URL: mcp_ebook_read-0.1.10.tar.gz
Upload date: Apr 18, 2026
Size: 105.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for mcp_ebook_read-0.1.10.tar.gz
Algorithm	Hash digest
SHA256	`db4d6c5ff34007e387668e57c70d6dc56c69831b69bd1237a9e2b591840fbb85`
MD5	`a26d41854f6460c2695def04bfc83deb`
BLAKE2b-256	`8f987b8971188a31b059dc17ef412b0e0de11e112ab034e1b7cc9999785e5c1e`

See more details on using hashes here.

File details

Details for the file mcp_ebook_read-0.1.10-py3-none-any.whl.

File metadata

Download URL: mcp_ebook_read-0.1.10-py3-none-any.whl
Upload date: Apr 18, 2026
Size: 75.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for mcp_ebook_read-0.1.10-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6b53c911367a3468a1be1eb8a0b147f51299ac6027e523c2e99bb7dc08ba9236`
MD5	`a0cfbfc71c3aaa14a9c503efc7226829`
BLAKE2b-256	`0bf0f99e5bdcda161afd649dde3757148bbce4aeb96e4c198848b44a466c118d`

See more details on using hashes here.

mcp-ebook-read 0.1.10

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

mcp-ebook-read

One-Command Docker Setup

Qdrant (required)

GROBID (required by startup preflight and `document_ingest_pdf_paper`)

Verify Services

Run MCP Server (PyPI via `uvx`)

First Run Recommendation

Environment Variables

Persistence Model

Notes

No-Label Formula Benchmark

Claude Code MCP Configuration (JSON via `uvx`)

Published package

Security note

Codex MCP Configuration (TOML)

Example

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

mcp-ebook-read 0.1.10

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

mcp-ebook-read

One-Command Docker Setup

Qdrant (required)

GROBID (required by startup preflight and document_ingest_pdf_paper)

Verify Services

Run MCP Server (PyPI via uvx)

First Run Recommendation

Environment Variables

Persistence Model

Notes

No-Label Formula Benchmark

Claude Code MCP Configuration (JSON via uvx)

Published package

Security note

Codex MCP Configuration (TOML)

Example

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

GROBID (required by startup preflight and `document_ingest_pdf_paper`)

Run MCP Server (PyPI via `uvx`)

Claude Code MCP Configuration (JSON via `uvx`)