Convert HWP and HWPX files to Markdown
Project description
pyhwp2md
Convert HWP (Hangul Word Processor) and HWPX files to Markdown format.
Features
- 🔄 Convert both HWP (binary) and HWPX (XML) files
- 📝 Extracts text, paragraphs, and tables
- 📊 Converts tables to Markdown pipe format
- 🎯 Simple CLI interface
- 🐍 Python 3.10+ support
Quick Start
Run without installation (uvx)
# Convert directly without installing
uvx pyhwp2md document.hwp
# Save to file
uvx pyhwp2md document.hwp -s
# Specify output path
uvx pyhwp2md document.hwpx -o output.md
Installation
Using pip
pip install pyhwp2md
Using uv
uv pip install pyhwp2md
From source
git clone https://github.com/pitzcarraldo/pyhwp2md.git
cd pyhwp2md
pip install -e .
Usage
Command Line
# Output to stdout (default)
pyhwp2md document.hwp
# Save to .md file in same directory
pyhwp2md document.hwp -s
pyhwp2md document.hwpx --save
# Specify output path
pyhwp2md document.hwp -o output.md
Python API
from pyhwp2md import convert
# Convert and get markdown string
markdown = convert("document.hwp")
print(markdown)
# Convert and save to file
markdown = convert("document.hwpx", output_path="output.md")
Supported Formats
| Format | Extension | Description |
|---|---|---|
| HWP | .hwp |
Binary format (HWP 5.0+) |
| HWPX | .hwpx |
XML-based format |
Supported Elements
- ✅ Paragraphs
- ✅ Headings (H1-H6)
- ✅ Tables
- ✅ Lists (bulleted/numbered)
- ✅ Line breaks
- ⚠️ Images (coming soon)
- ⚠️ Links (partial support)
Development
Setup
# Clone repository
git clone https://github.com/pitzcarraldo/pyhwp2md.git
cd pyhwp2md
# Install with dev dependencies
pip install -e .[dev]
Running Tests
# Run tests
pytest
# Run tests with coverage
pytest --cov=pyhwp2md
# Run linter
ruff check src/ tests/
# Run type checker
mypy src/
Dependencies
- pyhwp - HWP binary file parser
- python-hwpx - HWPX XML file parser
License
MIT License - see LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Acknowledgments
- pyhwp by mete0r
- python-hwpx by airmang
Links
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyhwp2md-0.1.3.tar.gz
(10.8 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
pyhwp2md-0.1.3-py3-none-any.whl
(13.5 kB
view details)
File details
Details for the file pyhwp2md-0.1.3.tar.gz.
File metadata
- Download URL: pyhwp2md-0.1.3.tar.gz
- Upload date:
- Size: 10.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
26ebe5948635f26dfd08932dcd79501cc739ea2a755eb7f903925a0ec08ad7bf
|
|
| MD5 |
9cd6850415c805392c03bbb4d9792ed1
|
|
| BLAKE2b-256 |
eef163508720a4eb3efcc7e5816ea1c26e9273d53f60b8cf5ff9180e2c7dfb1c
|
File details
Details for the file pyhwp2md-0.1.3-py3-none-any.whl.
File metadata
- Download URL: pyhwp2md-0.1.3-py3-none-any.whl
- Upload date:
- Size: 13.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
05dcc9c575430ec79a232dd3d9470e821a828af5d27a3a0aab875ae661c218d7
|
|
| MD5 |
d5d34a3e82f2bcc96cd13b3996faa854
|
|
| BLAKE2b-256 |
df02a527567e8a982d8cb24e47b7e365fc16a1ab3a325651d75411f48b5c5ec8
|