Skip to main content

Python wrapper for mhtml-to-html Go tool with automatic encoding detection

Project description

MHTML to HTML (Python)

A Python wrapper for the excellent gonejack/mhtml-to-html Go tool, adding automatic encoding detection for Chinese, Japanese, and Korean content.

NOTE: This is 100% vibe coded, including all the effusive LLM slop below. It works and has tests, that's all I can say.

Features

  • 🌍 Smart Encoding Detection: Automatically detects and converts Chinese (GBK, GB18030), Japanese (Shift_JIS), Korean (EUC-KR) and other encodings
  • Fast Performance: Uses optimized Go binary under the hood
  • 🖥️ Cross-Platform: Works on Linux, macOS, and Windows (x64 & ARM64)
  • 🐍 Simple Python API: Clean interface with optional CLI
  • 📦 Zero Dependencies: Self-contained with embedded binaries
  • 🙏 Built on Excellence: Wraps the proven mhtml-to-html Go tool

Installation

pip install mhtml-to-html-py

Quick Start

Python API

from mhtml_converter import convert_mhtml

# Convert MHTML to HTML string
html_content = convert_mhtml("document.mht")

# Save to file with verbose encoding detection
convert_mhtml("chinese_doc.mht", output_file="output.html", verbose=True)

# Convert with explicit encoding (if detection fails)
html_content = convert_mhtml("document.mht", encoding="gbk")

Command Line

# Convert single file
mhtml-to-html-py input.mht -o output.html

# Verbose mode to see encoding detection
mhtml-to-html-py input.mht -o output.html --verbose

# Convert multiple files
mhtml-to-html-py *.mht --output-dir converted/

Why This Package?

Many MHTML files, especially those saved from Chinese, Japanese, or Korean websites, use non-UTF-8 encodings that cause garbled text when converted naively. This package:

  1. Detects encoding from HTML meta tags and content analysis
  2. Converts properly to UTF-8 for universal compatibility
  3. Preserves formatting and embedded resources
  4. Works reliably across different platforms and languages

Use Cases

  • Converting saved web pages from Asian websites
  • Processing email archives in MHTML format
  • Batch conversion of documentation
  • Web scraping pipeline preprocessing
  • Digital preservation workflows

Technical Details

This package wraps the high-performance mhtml-to-html Go binary that handles the actual conversion. The Python layer provides a clean API, handles platform detection automatically, and adds enhanced encoding detection capabilities.

Supported Platforms

OS Architecture Status
Linux x86_64
Linux ARM64
macOS Intel
macOS Apple Silicon
Windows x86_64

Credits

This project is a Python wrapper around the excellent gonejack/mhtml-to-html Go tool. All the heavy lifting for MHTML parsing and conversion is done by that project. We've added:

  • Python packaging and distribution
  • Cross-platform binary embedding
  • Enhanced encoding detection
  • Simplified Python API

License

MIT License - see LICENSE file for details.

Contributing

Issues and pull requests welcome! This project wraps the excellent gonejack/mhtml-to-html Go tool with Python convenience layers.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mhtml_to_html_py-0.1.0.tar.gz (14.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mhtml_to_html_py-0.1.0-py3-none-any.whl (28.3 MB view details)

Uploaded Python 3

File details

Details for the file mhtml_to_html_py-0.1.0.tar.gz.

File metadata

  • Download URL: mhtml_to_html_py-0.1.0.tar.gz
  • Upload date:
  • Size: 14.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for mhtml_to_html_py-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5cc16d72353e51fa4e25377afb9d2cae0933c06dc30e0d7f4b3ca01fe6650b16
MD5 5c4922cb410e1364697d420623348405
BLAKE2b-256 653343ea9ea085c9c378cbc0238f5c1f242dbd1b3c1efd1c428e8a9e79a2a7cd

See more details on using hashes here.

Provenance

The following attestation bundles were made for mhtml_to_html_py-0.1.0.tar.gz:

Publisher: publish.yml on mpr1255/mhtml-to-html-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mhtml_to_html_py-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mhtml_to_html_py-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 171b3c2f26c441e93ef076d6945e8658c7624d45dcd7bebcbb983799434aceac
MD5 2aa2c04354d520b0aad987fce8ebf6d1
BLAKE2b-256 6588d6aa02be0ca33f3e58bb6ad8862762ce02b067b430492ea6212d7e08b399

See more details on using hashes here.

Provenance

The following attestation bundles were made for mhtml_to_html_py-0.1.0-py3-none-any.whl:

Publisher: publish.yml on mpr1255/mhtml-to-html-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page