Skip to main content

Python wrapper for mhtml-to-html Go tool with automatic encoding detection

Reason this release was yanked:

wrong version

Project description

MHTML to HTML (Python)

A Python wrapper for the excellent gonejack/mhtml-to-html Go tool, adding automatic encoding detection for Chinese, Japanese, and Korean content.

NOTE: This is 100% vibe coded, including all the effusive LLM slop below. It works and has tests, that's all I can say.

Features

  • 🌍 Smart Encoding Detection: Automatically detects and converts Chinese (GBK, GB18030), Japanese (Shift_JIS), Korean (EUC-KR) and other encodings
  • Fast Performance: Uses optimized Go binary under the hood
  • 🖥️ Cross-Platform: Works on Linux, macOS, and Windows (x64 & ARM64)
  • 🐍 Simple Python API: Clean interface with optional CLI
  • 📦 Zero Dependencies: Self-contained with embedded binaries
  • 🙏 Built on Excellence: Wraps the proven mhtml-to-html Go tool

Installation

pip install mhtml-to-html-py

Quick Start

Python API

from mhtml_converter import convert_mhtml

# Convert MHTML to HTML string
html_content = convert_mhtml("document.mht")

# Save to file with verbose encoding detection
convert_mhtml("chinese_doc.mht", output_file="output.html", verbose=True)

# Convert with explicit encoding (if detection fails)
html_content = convert_mhtml("document.mht", encoding="gbk")

Command Line

# Convert single file
mhtml-to-html-py input.mht -o output.html

# Verbose mode to see encoding detection
mhtml-to-html-py input.mht -o output.html --verbose

# Convert multiple files
mhtml-to-html-py *.mht --output-dir converted/

Why This Package?

Many MHTML files, especially those saved from Chinese, Japanese, or Korean websites, use non-UTF-8 encodings that cause garbled text when converted naively. This package:

  1. Detects encoding from HTML meta tags and content analysis
  2. Converts properly to UTF-8 for universal compatibility
  3. Preserves formatting and embedded resources
  4. Works reliably across different platforms and languages

Use Cases

  • Converting saved web pages from Asian websites
  • Processing email archives in MHTML format
  • Batch conversion of documentation
  • Web scraping pipeline preprocessing
  • Digital preservation workflows

Technical Details

This package wraps the high-performance mhtml-to-html Go binary that handles the actual conversion. The Python layer provides a clean API, handles platform detection automatically, and adds enhanced encoding detection capabilities.

Supported Platforms

OS Architecture Status
Linux x86_64
Linux ARM64
macOS Intel
macOS Apple Silicon
Windows x86_64

Credits

This project is a Python wrapper around the excellent gonejack/mhtml-to-html Go tool. All the heavy lifting for MHTML parsing and conversion is done by that project. We've added:

  • Python packaging and distribution
  • Cross-platform binary embedding
  • Enhanced encoding detection
  • Simplified Python API

License

MIT License - see LICENSE file for details.

Contributing

Issues and pull requests welcome! This project wraps the excellent gonejack/mhtml-to-html Go tool with Python convenience layers.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mhtml_to_html_py-1.1.0.tar.gz (14.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mhtml_to_html_py-1.1.0-py3-none-any.whl (28.3 MB view details)

Uploaded Python 3

File details

Details for the file mhtml_to_html_py-1.1.0.tar.gz.

File metadata

  • Download URL: mhtml_to_html_py-1.1.0.tar.gz
  • Upload date:
  • Size: 14.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for mhtml_to_html_py-1.1.0.tar.gz
Algorithm Hash digest
SHA256 0e46523216170c5eb7db23786b84d5709f7e46227dee3f0ecdeba2fd9d0f8418
MD5 20f095bbaa7ab4cc769ab9d510df30f6
BLAKE2b-256 8ac5b662430b80fbcfac2586851df3fa88e5b1c7d17091d4590407ae7fe98e2e

See more details on using hashes here.

Provenance

The following attestation bundles were made for mhtml_to_html_py-1.1.0.tar.gz:

Publisher: publish.yml on mpr1255/mhtml-to-html-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mhtml_to_html_py-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mhtml_to_html_py-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7025d1be0fa091c0502ebd0d7b2571e3c78ac4cd4d7258d9a902169c0d23f189
MD5 ad2064922c14f0ffcd3006a9efdf37c1
BLAKE2b-256 65497ed5b6428bdd0b2203afa55e97ed2122bb320db0417de9a705c69ab4a3ff

See more details on using hashes here.

Provenance

The following attestation bundles were made for mhtml_to_html_py-1.1.0-py3-none-any.whl:

Publisher: publish.yml on mpr1255/mhtml-to-html-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page