Skip to main content

MCP server for high-accuracy scientific PDF OCR using Meta's Nougat

Project description

Nougat-MCP

PyPI version Python versions License: GPL v3 MCP Protocol

nougat-mcp is a Model Context Protocol (MCP) server for high-fidelity OCR of scientific PDFs using Meta's Nougat.

It is designed for agent workflows where you need equations, tables, and structure preserved better than traditional OCR.

Why This Server

  • Scientific OCR quality tailored for papers, formulas, and dense layouts.
  • MCP-native interface for Codex, Claude, Cursor, Antigravity, and other clients.
  • Output-format control:
    • mmd: raw Nougat/Mathpix-style output.
    • md: renderer-friendly conversion (math delimiter and KaTeX compatibility fixes).
  • Settings file support so agents can read a shared default format policy.

Installation

Install from PyPI:

uv pip install nougat-mcp

This package installs nougat-ocr and pins known-sensitive dependencies for stability.

Tools

parse_research_paper

Arguments:

  • file_path (string): Absolute path to a local PDF.
  • output_format (string, optional):
    • default (default): uses server settings.
    • mmd: raw Nougat output.
    • md: converted markdown-friendly output.

Returns:

  • OCR result as a single text string in the requested format.

get_output_settings

Returns resolved server output settings, including where settings were loaded from.

Output Conversion (mmd -> md)

When output_format="md", the server applies compatibility conversions:

  • \[ ... \] -> $$ ... $$
  • \( ... \) -> $ ... $
  • \tag{...} -> visible equation label \qquad\text{(...)}
  • KaTeX delimiter normalization, for example:
    • \bigl{\|} ... \bigr{\|} -> \bigl\| ... \bigr\|

This avoids common renderer parse errors in markdown environments that are not fully MathJax-compatible.

Server Settings

Settings are read in this order:

  1. NOUGAT_MCP_SETTINGS (if set)
  2. ./settings.json (current working directory)

Example settings.json:

{
  "nougat_mcp": {
    "default_output_format": "md",
    "md_rewrite_tags": true,
    "md_fix_sized_delimiters": true
  }
}

Agent Configuration

Codex CLI

Add to ~/.codex/config.toml:

[mcp_servers.nougat]
command = "uvx"
args = ["nougat-mcp"]
enabled = true

[mcp_servers.nougat.env]
NOUGAT_MCP_SETTINGS = "/absolute/path/to/settings.json"

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "nougat": {
      "command": "uvx",
      "args": ["nougat-mcp"],
      "env": {
        "NOUGAT_MCP_SETTINGS": "/absolute/path/to/settings.json"
      }
    }
  }
}

Antigravity / Gemini Desktop

Add to ~/.gemini/settings.json:

{
  "mcpServers": {
    "nougat": {
      "type": "stdio",
      "command": "uvx",
      "args": ["nougat-mcp"],
      "env": {
        "NOUGAT_MCP_SETTINGS": "/absolute/path/to/settings.json"
      }
    }
  }
}

Cursor

In Cursor MCP settings, add:

{
  "mcpServers": {
    "nougat": {
      "command": "uvx",
      "args": ["nougat-mcp"],
      "env": {
        "NOUGAT_MCP_SETTINGS": "/absolute/path/to/settings.json"
      }
    }
  }
}

Note: Cursor MCP config location can vary by version/platform; use the MCP settings UI or your current JSON settings file.

Showcase (Real Page Example)

A real extraction from page 5 of src/2405.08770v1.pdf is included:

Quick comparison:

# mmd
\[DV=V_{x}. \tag{3.2}\]

# md
$$
DV=V_{x}. \qquad\text{(3.2)}
$$

Performance Notes

  • First run may download model weights (~1.4 GB).
  • CPU inference is significantly slower than GPU inference.
  • Use page subsets whenever possible to reduce runtime.

Release to PyPI

This repository includes automated publishing via GitHub Actions: .github/workflows/publish-pypi.yml.

One-time setup (recommended)

  1. Create the nougat-mcp project on PyPI.
  2. In PyPI project settings, configure a Trusted Publisher:
    • Owner: svretina
    • Repository: nougat-mcp
    • Workflow: publish-pypi.yml
    • Environment: pypi
  3. In GitHub, ensure Actions are enabled for the repo.

Release flow

  1. Bump version in pyproject.toml.
  2. Commit and push to master.
  3. Create and push a version tag:
git tag v0.1.0
git push origin v0.1.0
  1. The workflow builds, validates (twine check), and publishes to PyPI.

Compatibility Pins

To keep Nougat stable across environments, the package pins sensitive dependency ranges:

  • transformers>=4.35,<4.38
  • albumentations>=1.3,<1.4
  • pypdfium2<5.0
  • huggingface-hub<1.0
  • fsspec<=2025.10.0

Credits

License

GNU General Public License v3.0 (LICENSE).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nougat_mcp-0.1.0.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nougat_mcp-0.1.0-py3-none-any.whl (33.2 kB view details)

Uploaded Python 3

File details

Details for the file nougat_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: nougat_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nougat_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 46ead874878aa3d95e3fc391363434324ff399d80bfb0e5e3999f64902ee80d5
MD5 20e55f37347322d9fd499e91bb4062cb
BLAKE2b-256 31b3c228cfd19fdc68c549b861d076735d31d8a5fbd9f1e776a6005f87794c6d

See more details on using hashes here.

Provenance

The following attestation bundles were made for nougat_mcp-0.1.0.tar.gz:

Publisher: publish-pypi.yml on svretina/nougat-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nougat_mcp-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: nougat_mcp-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 33.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nougat_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 55719ed64be1c4323116efb9736dd18376a14cd6fa94c8f8611409c32172b0e2
MD5 4462ae0af11e6e8eb00f9c8d1743a92e
BLAKE2b-256 79be254f9c6ccf3c712f5986904952563438ea8bb40f4b675d62d38b9470091e

See more details on using hashes here.

Provenance

The following attestation bundles were made for nougat_mcp-0.1.0-py3-none-any.whl:

Publisher: publish-pypi.yml on svretina/nougat-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page