Convert HWP and HWPX files to Markdown
Project description
pyhwp2md
Convert HWP (Hangul Word Processor) and HWPX files to Markdown format.
Features
- 🔄 Convert both HWP (binary) and HWPX (XML) files
- 📝 Extracts text, paragraphs, and tables
- 📊 Converts tables to Markdown pipe format
- 🎯 Simple CLI interface
- 🐍 Python 3.10+ support
Quick Start
Run without installation (uvx)
# Convert directly without installing
uvx pyhwp2md document.hwp
# Save to file
uvx pyhwp2md document.hwp -s
# Specify output path
uvx pyhwp2md document.hwpx -o output.md
Installation
Using pip
pip install pyhwp2md
Using uv
uv pip install pyhwp2md
From source
git clone https://github.com/pitzcarraldo/pyhwp2md.git
cd pyhwp2md
pip install -e .
Usage
Command Line
# Output to stdout (default)
pyhwp2md document.hwp
# Save to .md file in same directory
pyhwp2md document.hwp -s
pyhwp2md document.hwpx --save
# Specify output path
pyhwp2md document.hwp -o output.md
Python API
from pyhwp2md import convert
# Convert and get markdown string
markdown = convert("document.hwp")
print(markdown)
# Convert and save to file
markdown = convert("document.hwpx", output_path="output.md")
Supported Formats
| Format | Extension | Description |
|---|---|---|
| HWP | .hwp |
Binary format (HWP 5.0+) |
| HWPX | .hwpx |
XML-based format |
Supported Elements
- ✅ Paragraphs
- ✅ Headings (H1-H6)
- ✅ Tables
- ✅ Lists (bulleted/numbered)
- ✅ Line breaks
- ⚠️ Images (coming soon)
- ⚠️ Links (partial support)
Development
Setup
# Clone repository
git clone https://github.com/pitzcarraldo/pyhwp2md.git
cd pyhwp2md
# Install with dev dependencies
pip install -e .[dev]
Running Tests
# Run tests
pytest
# Run tests with coverage
pytest --cov=pyhwp2md
# Run linter
ruff check src/ tests/
# Run type checker
mypy src/
Dependencies
- pyhwp - HWP binary file parser
- python-hwpx - HWPX XML file parser
License
MIT License - see LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Acknowledgments
- pyhwp by mete0r
- python-hwpx by airmang
Links
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyhwp2md-0.1.2.tar.gz
(10.4 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
pyhwp2md-0.1.2-py3-none-any.whl
(12.9 kB
view details)
File details
Details for the file pyhwp2md-0.1.2.tar.gz.
File metadata
- Download URL: pyhwp2md-0.1.2.tar.gz
- Upload date:
- Size: 10.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5c38e13596ab4e9e9da7226b6e846b50bb522594c2010d96bdb6f1bf246360f3
|
|
| MD5 |
ca507b45404c3cbecb9ed010994c9721
|
|
| BLAKE2b-256 |
8d3072216d7a4d6c1e8dfdb4921adb2e1d656b3b5b69758f8857b8a6aba69428
|
File details
Details for the file pyhwp2md-0.1.2-py3-none-any.whl.
File metadata
- Download URL: pyhwp2md-0.1.2-py3-none-any.whl
- Upload date:
- Size: 12.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b9921be507bacecfad52ddd2e1c2071eedcf4e1bb5cb0d98ef38165540dbed38
|
|
| MD5 |
d821f192d1889cab58001836be769b8f
|
|
| BLAKE2b-256 |
3577942543eda20c62ecdf465f4ecebc0cbf7c1eada383a8ef230c0e99217dff
|