Convert HWP and HWPX files to Markdown
Project description
pyhwp2md
Convert HWP (Hangul Word Processor) and HWPX files to Markdown format.
Features
- 🔄 Convert both HWP (binary) and HWPX (XML) files
- 📝 Extracts text, paragraphs, and tables
- 📊 Converts tables to Markdown pipe format
- 🎯 Simple CLI interface
- 🐍 Python 3.10+ support
Quick Start
Run without installation (uvx)
# Convert directly without installing
uvx pyhwp2md document.hwp
# Save to file
uvx pyhwp2md document.hwp -s
# Specify output path
uvx pyhwp2md document.hwpx -o output.md
Installation
Using pip
pip install pyhwp2md
Using uv
uv pip install pyhwp2md
From source
git clone https://github.com/pitzcarraldo/pyhwp2md.git
cd pyhwp2md
pip install -e .
Usage
Command Line
# Output to stdout (default)
pyhwp2md document.hwp
# Save to .md file in same directory
pyhwp2md document.hwp -s
pyhwp2md document.hwpx --save
# Specify output path
pyhwp2md document.hwp -o output.md
Python API
from pyhwp2md import convert
# Convert and get markdown string
markdown = convert("document.hwp")
print(markdown)
# Convert and save to file
markdown = convert("document.hwpx", output_path="output.md")
Supported Formats
| Format | Extension | Description |
|---|---|---|
| HWP | .hwp |
Binary format (HWP 5.0+) |
| HWPX | .hwpx |
XML-based format |
Supported Elements
- ✅ Paragraphs
- ✅ Headings (H1-H6)
- ✅ Tables
- ✅ Lists (bulleted/numbered)
- ✅ Line breaks
- ⚠️ Images (coming soon)
- ⚠️ Links (partial support)
Development
Setup
# Clone repository
git clone https://github.com/pitzcarraldo/pyhwp2md.git
cd pyhwp2md
# Install with dev dependencies
pip install -e .[dev]
Running Tests
# Run tests
pytest
# Run tests with coverage
pytest --cov=pyhwp2md
# Run linter
ruff check src/ tests/
# Run type checker
mypy src/
Dependencies
- pyhwp - HWP binary file parser
- python-hwpx - HWPX XML file parser
License
MIT License - see LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Acknowledgments
- pyhwp by mete0r
- python-hwpx by airmang
Links
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyhwp2md-0.1.0.tar.gz
(9.8 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
pyhwp2md-0.1.0-py3-none-any.whl
(12.5 kB
view details)
File details
Details for the file pyhwp2md-0.1.0.tar.gz.
File metadata
- Download URL: pyhwp2md-0.1.0.tar.gz
- Upload date:
- Size: 9.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
788b344cc8236cbbebdb01099162ffe159bc86db7e92b2568a9886720536de9e
|
|
| MD5 |
159939b01c5f2db3e5568fd30ce07da4
|
|
| BLAKE2b-256 |
9168fdbdbf904faf7a049b4c7ddf932542f4c0897d34df6aa8927d902515ba3a
|
File details
Details for the file pyhwp2md-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pyhwp2md-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d2aaa2931bacf6aaff0604f85d12a7f6bcca5ca5c6d596deeaa74b6f24e18980
|
|
| MD5 |
3279d85d8168b72a407728413b5f4196
|
|
| BLAKE2b-256 |
1eb8c8bb3d5b1465161302fc8fcfc614fe139234a831d91b0b005aa3a0ebe932
|