Python wrapper for mhtml-to-html Go tool with automatic encoding detection
Project description
MHTML to HTML (Python)
A Python wrapper for the excellent gonejack/mhtml-to-html Go tool, adding automatic encoding detection for Chinese, Japanese, and Korean content.
NOTE: This is 100% vibe coded, including all the effusive LLM slop below. It works and has tests, that's all I can say.
Features
- 🌍 Smart Encoding Detection: Automatically detects and converts Chinese (GBK, GB18030), Japanese (Shift_JIS), Korean (EUC-KR) and other encodings
- ⚡ Fast Performance: Uses optimized Go binary under the hood
- 🖥️ Cross-Platform: Works on Linux, macOS, and Windows (x64 & ARM64)
- 🐍 Simple Python API: Clean interface with optional CLI
- 📦 Zero Dependencies: Self-contained with embedded binaries
- 🙏 Built on Excellence: Wraps the proven mhtml-to-html Go tool
Installation
pip install mhtml-to-html-py
Quick Start
Python API
from mhtml_converter import convert_mhtml
# Convert MHTML to HTML string
html_content = convert_mhtml("document.mht")
# Save to file with verbose encoding detection
convert_mhtml("chinese_doc.mht", output_file="output.html", verbose=True)
# Convert with explicit encoding (if detection fails)
html_content = convert_mhtml("document.mht", encoding="gbk")
Command Line
# Convert single file
mhtml-to-html-py input.mht -o output.html
# Verbose mode to see encoding detection
mhtml-to-html-py input.mht -o output.html --verbose
# Convert multiple files
mhtml-to-html-py *.mht --output-dir converted/
Why This Package?
Many MHTML files, especially those saved from Chinese, Japanese, or Korean websites, use non-UTF-8 encodings that cause garbled text when converted naively. This package:
- Detects encoding from HTML meta tags and content analysis
- Converts properly to UTF-8 for universal compatibility
- Preserves formatting and embedded resources
- Works reliably across different platforms and languages
Use Cases
- Converting saved web pages from Asian websites
- Processing email archives in MHTML format
- Batch conversion of documentation
- Web scraping pipeline preprocessing
- Digital preservation workflows
Technical Details
This package wraps the high-performance mhtml-to-html Go binary that handles the actual conversion. The Python layer provides a clean API, handles platform detection automatically, and adds enhanced encoding detection capabilities.
Supported Platforms
| OS | Architecture | Status |
|---|---|---|
| Linux | x86_64 | ✅ |
| Linux | ARM64 | ✅ |
| macOS | Intel | ✅ |
| macOS | Apple Silicon | ✅ |
| Windows | x86_64 | ✅ |
Credits
This project is a Python wrapper around the excellent gonejack/mhtml-to-html Go tool. All the heavy lifting for MHTML parsing and conversion is done by that project. We've added:
- Python packaging and distribution
- Cross-platform binary embedding
- Enhanced encoding detection
- Simplified Python API
License
MIT License - see LICENSE file for details.
Contributing
Issues and pull requests welcome! This project wraps the excellent gonejack/mhtml-to-html Go tool with Python convenience layers.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mhtml_to_html_py-0.1.0.tar.gz.
File metadata
- Download URL: mhtml_to_html_py-0.1.0.tar.gz
- Upload date:
- Size: 14.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5cc16d72353e51fa4e25377afb9d2cae0933c06dc30e0d7f4b3ca01fe6650b16
|
|
| MD5 |
5c4922cb410e1364697d420623348405
|
|
| BLAKE2b-256 |
653343ea9ea085c9c378cbc0238f5c1f242dbd1b3c1efd1c428e8a9e79a2a7cd
|
Provenance
The following attestation bundles were made for mhtml_to_html_py-0.1.0.tar.gz:
Publisher:
publish.yml on mpr1255/mhtml-to-html-py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mhtml_to_html_py-0.1.0.tar.gz -
Subject digest:
5cc16d72353e51fa4e25377afb9d2cae0933c06dc30e0d7f4b3ca01fe6650b16 - Sigstore transparency entry: 224177507
- Sigstore integration time:
-
Permalink:
mpr1255/mhtml-to-html-py@5e9c3b7e9d66fe4111f9db217077e8b07cbfcf47 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/mpr1255
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5e9c3b7e9d66fe4111f9db217077e8b07cbfcf47 -
Trigger Event:
push
-
Statement type:
File details
Details for the file mhtml_to_html_py-0.1.0-py3-none-any.whl.
File metadata
- Download URL: mhtml_to_html_py-0.1.0-py3-none-any.whl
- Upload date:
- Size: 28.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
171b3c2f26c441e93ef076d6945e8658c7624d45dcd7bebcbb983799434aceac
|
|
| MD5 |
2aa2c04354d520b0aad987fce8ebf6d1
|
|
| BLAKE2b-256 |
6588d6aa02be0ca33f3e58bb6ad8862762ce02b067b430492ea6212d7e08b399
|
Provenance
The following attestation bundles were made for mhtml_to_html_py-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on mpr1255/mhtml-to-html-py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mhtml_to_html_py-0.1.0-py3-none-any.whl -
Subject digest:
171b3c2f26c441e93ef076d6945e8658c7624d45dcd7bebcbb983799434aceac - Sigstore transparency entry: 224177513
- Sigstore integration time:
-
Permalink:
mpr1255/mhtml-to-html-py@5e9c3b7e9d66fe4111f9db217077e8b07cbfcf47 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/mpr1255
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5e9c3b7e9d66fe4111f9db217077e8b07cbfcf47 -
Trigger Event:
push
-
Statement type: