High-quality PDF ↔ Markdown converter with MCP integration and Unicode support

These details have not been verified by PyPI

Project links

Project description

活水 PDF 转换器 (Huoshui PDF Converter)

A high-quality, cross-platform PDF ↔ Markdown converter implemented as an MCP (Model Context Protocol) server. Supports bidirectional conversion with full Unicode/CJK character support.

Features

Core Capabilities

PDF → Markdown: Extract text and images with layout preservation
Markdown → PDF: Generate beautiful PDFs with multiple rendering engines
Unicode Support: Full support for Chinese, Japanese, Korean, and other Unicode characters
Cross-Platform: Works on Windows, macOS, and Linux
MCP Integration: Use with Claude Desktop or any MCP-compatible client

Technical Features

Pure Python: No external system dependencies required
Automatic Font Detection: Finds and uses system Unicode fonts
Smart Engine Selection: Automatically switches engines based on content
Comprehensive Error Handling: Graceful degradation and detailed logging
Async Architecture: Non-blocking operations for better performance

Installation

From MCP Registry (Recommended)

This server is available in the Model Context Protocol Registry. Install it using your MCP client.

mcp-name: io.github.huoshuiai42/huoshui-pdf-converter

As a Python Package

pip install huoshui-pdf-converter

Or using uv (recommended):

uv pip install huoshui-pdf-converter

As an MCP Server

Add to your Claude Desktop configuration:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json Windows: %APPDATA%\Claude\claude_desktop_config.json Linux: ~/.config/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "huoshui-pdf-converter": {
      "command": "uvx",
      "args": ["huoshui-pdf-converter"],
      "env": {}
    }
  }
}

Or if you prefer to use a specific Python environment:

{
  "mcpServers": {
    "huoshui-pdf-converter": {
      "command": "python",
      "args": ["-m", "huoshui_pdf_converter.server"],
      "env": {}
    }
  }
}

Usage

Command Line Interface

# Convert PDF to Markdown
huoshui-pdf pdf-to-md input.pdf output.md

# Convert Markdown to PDF
huoshui-pdf md-to-pdf input.md output.pdf

# With options
huoshui-pdf md-to-pdf input.md output.pdf --page-size A4 --margin 2cm --font-size 12

As a Python Library

import asyncio
from huoshui_pdf_converter import PDFToMarkdownConverter, MarkdownToPDFConverter

async def main():
    # PDF to Markdown
    pdf_converter = PDFToMarkdownConverter()
    result = await pdf_converter.convert(
        pdf_path="input.pdf",
        output_path="output.md",
        extract_images=True,
        preserve_formatting=True
    )

    # Markdown to PDF
    md_converter = MarkdownToPDFConverter()
    result = await md_converter.convert(
        markdown_path="input.md",
        output_path="output.pdf",
        page_size="A4",
        margin="2cm",
        font_size=12
    )

asyncio.run(main())

MCP Tools

When used as an MCP server, the following tools are available:

pdf_to_markdown: Convert PDF files to Markdown

{
  "pdf_path": "path/to/input.pdf",
  "output_path": "path/to/output.md",
  "extract_images": true,
  "preserve_formatting": true
}

markdown_to_pdf: Convert Markdown files to PDF

{
  "markdown_path": "path/to/input.md",
  "output_path": "path/to/output.pdf",
  "page_size": "A4",
  "margin": "2cm",
  "font_size": 12
}

list_supported_formats: Get supported formats and engines
validate_file: Validate input files before conversion

Supported Formats

Input Formats

PDF: All standard PDF files (PDF 1.0 - 1.7)
Markdown: CommonMark and GitHub Flavored Markdown

Output Options

Page Sizes: A4, A3, Letter, Legal
Margins: Customizable (e.g., "1cm", "0.5in")
Font Sizes: Any size in points
Images: PNG, JPEG extraction from PDFs

Unicode and Font Support

The converter automatically detects and uses appropriate fonts for different languages:

macOS: Arial Unicode, PingFang SC, STHeiti
Windows: Microsoft YaHei, SimSun, Arial Unicode MS
Linux: Noto Sans CJK, Source Han Sans, WenQuanYi

Architecture

Conversion Engines

PDF → Markdown

PyMuPDF (MuPDF): High-quality text and image extraction

Markdown → PDF

ReportLab: Best Unicode support, cross-platform compatibility
xhtml2pdf: Good HTML/CSS rendering (fallback)
fpdf2: Basic PDF generation (last resort)

Engine Selection Logic

Detects CJK characters → Uses ReportLab
Complex formatting → Uses xhtml2pdf
Basic documents → Uses any available engine

Development

Setup Development Environment

# Clone the repository
git clone https://github.com/yourusername/huoshui-pdf-converter.git
cd huoshui-pdf-converter

# Install dependencies
uv pip install -e ".[dev]"

# Run tests
python test_converter.py

Project Structure

huoshui-pdf-converter/
├── huoshui_pdf_converter/
│   ├── __init__.py
│   ├── server.py           # MCP server implementation
│   ├── pdf_converter.py    # PDF to Markdown converter
│   └── markdown_converter.py # Markdown to PDF converter
├── pyproject.toml
├── README.md
├── LICENSE
└── test_converter.py

Troubleshooting

Common Issues

Chinese characters not displaying:
- Ensure Arial Unicode or similar fonts are installed
- The converter will automatically detect and use appropriate fonts
Import errors:
- Install all dependencies: pip install huoshui-pdf-converter[all]
MCP connection issues:
- Check Claude Desktop logs
- Ensure Python is in your PATH

Logging

Enable debug logging:

import logging
logging.basicConfig(level=logging.DEBUG)

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built with FastMCP for Model Context Protocol support
Uses PyMuPDF for PDF parsing
Uses ReportLab for PDF generation
Inspired by the need for better PDF ↔ Markdown conversion tools

Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: your.email@example.com

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.4

Sep 9, 2025

1.0.3

Jul 5, 2025

1.0.2

Jul 5, 2025

1.0.1

Jul 5, 2025

1.0.0

Jul 5, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

huoshui_pdf_converter-1.0.4.tar.gz (21.9 kB view details)

Uploaded Sep 9, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

huoshui_pdf_converter-1.0.4-py3-none-any.whl (23.2 kB view details)

Uploaded Sep 9, 2025 Python 3

File details

Details for the file huoshui_pdf_converter-1.0.4.tar.gz.

File metadata

Download URL: huoshui_pdf_converter-1.0.4.tar.gz
Upload date: Sep 9, 2025
Size: 21.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for huoshui_pdf_converter-1.0.4.tar.gz
Algorithm	Hash digest
SHA256	`96e9c157984c608377cf98a124b57f3f61a346913c2532b4e7dabdbe47a0946a`
MD5	`3e11f5ea1f1ccd8c90e1d6d83b7fd793`
BLAKE2b-256	`14f74c62813a3f5aec20b0648e63eadc8376bb54d91b3784d6541e3d9ddf31db`

See more details on using hashes here.

File details

Details for the file huoshui_pdf_converter-1.0.4-py3-none-any.whl.

File metadata

Download URL: huoshui_pdf_converter-1.0.4-py3-none-any.whl
Upload date: Sep 9, 2025
Size: 23.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for huoshui_pdf_converter-1.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b0aae1f058d625dc37f71d46ff6ab53322097cdb48d14699c200759984d9b1bf`
MD5	`974f01723e70d48a4a0dd3c4e5c16e09`
BLAKE2b-256	`81ae58681aab2375d934bdc7f0b21f8adc117dd4c3be1bf8c32c3e2135e4550a`

See more details on using hashes here.

huoshui-pdf-converter 1.0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

活水 PDF 转换器 (Huoshui PDF Converter)

Features

Core Capabilities

Technical Features

Installation

From MCP Registry (Recommended)

As a Python Package

As an MCP Server

Usage

Command Line Interface

As a Python Library

MCP Tools

Supported Formats

Input Formats

Output Options

Unicode and Font Support

Architecture

Conversion Engines

Engine Selection Logic

Development

Setup Development Environment

Project Structure

Troubleshooting

Common Issues

Logging

Contributing

License

Acknowledgments

Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes