Read Korean HWP/HWPX documents in Python; edit paragraphs and table cells in HWPX; natural-language edits via Claude or OpenAI. AI-friendly API.
Project description
master-of-hwp
Read Korean HWP/HWPX documents in Python, edit paragraphs in HWPX, and expose structure to AI workflows.
master-of-hwp is a Python-first library for opening real .hwp and .hwpx files, inspecting sections / paragraphs / tables, querying content, and performing immutable paragraph edits. The API is designed to be LLM-friendly: results are plain Python data structures, every mutation returns a new document, and a round-trip fidelity harness validates that edits preserve document structure.
Why this exists
Korean government, education, and enterprise workflows rely on HWP documents. Most AI tooling can't touch them directly — they get round-tripped through DOCX, shredding tables and formatting. master-of-hwp reads the real format, exposes the structure AI needs, and keeps edits byte-level honest.
30-Second Quickstart
pip install master-of-hwp
from master_of_hwp import HwpDocument
doc = HwpDocument.open("report.hwpx")
# Inspect
print(f"{doc.sections_count} sections, {len(list(doc.iter_paragraphs()))} paragraphs")
print(doc.summary())
# Query
for section, paragraph, text in doc.find_paragraphs("보도자료"):
print(f"§{section}.{paragraph}: {text}")
# Edit (HWPX) — immutable: returns a new document
edited = doc.replace_paragraph(0, 0, "New intro text")
edited.path.with_suffix(".edited.hwpx").write_bytes(edited.raw_bytes)
API at a Glance
| API | Purpose |
|---|---|
HwpDocument.open(path) |
Open .hwp / .hwpx as an immutable document |
.sections_count |
Number of sections |
.byte_size |
Size of raw bytes |
.section_texts |
Plain text per section |
.section_paragraphs |
Paragraphs per section (nested list) |
.section_tables |
Tables: [section][table][row][cell][paragraph] |
.plain_text |
All sections concatenated, format-agnostic normalization |
.iter_paragraphs() |
Yield (section, paragraph, text) tuples |
.find_paragraphs(query, regex=, case_sensitive=) |
Substring or regex search |
.summary() |
Compact JSON-serializable overview for LLM context |
.replace_paragraph(s, p, text) |
Return a new document with one paragraph replaced |
.replace_table_cell_paragraph(s, t, r, c, p, text) |
Edit a paragraph inside a table cell (HWPX) |
.ai_edit(request, provider=, dry_run=) |
Natural-language edit pipeline (intent → locate → apply → verify) |
Supported Formats
| Capability | HWP 5.0 (.hwp) |
HWPX (.hwpx) |
|---|---|---|
| Open document | ✅ | ✅ |
| Count sections | ✅ | ✅ |
| Extract section text | ✅ | ✅ |
| Enumerate paragraphs | ✅ | ✅ |
| Enumerate tables | Best effort* | ✅ |
| Replace paragraph | Same-length only** | ✅ |
| Replace table cell paragraph | ❌ (v0.3) | ✅ |
| Insert / delete | ❌ (v0.3) | ❌ (v0.3) |
* Minimal heuristic anchored on the TABLE(0x5B) record; exact row/cell recovery is pending a richer record-level parser.
** Different-length HWP 5.0 edits require a CFBF stream resize writer, scheduled for v0.3.
Natural-Language Editing
pip install master-of-hwp[ai] # adds anthropic SDK
export ANTHROPIC_API_KEY=sk-ant-...
from master_of_hwp import HwpDocument
from master_of_hwp.ai.providers import AnthropicProvider
doc = HwpDocument.open("가정통신문.hwpx")
result = doc.ai_edit(
"첫 번째 문단의 '급식비'를 '수업료'로 바꿔줘",
provider=AnthropicProvider(),
)
if result.status == "applied":
result.new_doc.path.with_suffix(".edited.hwpx").write_bytes(result.new_doc.raw_bytes)
else:
print(result.message) # refused / failed explanation
Without an API key, a rule-based fallback parser handles simple patterns
(바꿔, 변경, keyword matches). See master_of_hwp.ai.providers for
the LLMProvider Protocol — plug in OpenAI, local Ollama, etc.
Studio (Non-developer GUI)
For teachers / office workers who want a one-click experience — rhwp WYSIWYG editor is now bundled (v0.2+):
pip install master-of-hwp-studio
mohwp studio # launches web GUI + MCP server + bundled rhwp editor
mohwp mcp-config # prints Claude Desktop config snippet
No Node.js setup required. The rhwp editor runs automatically on localhost:7700.
See studio/README.md.
Fidelity Harness
from master_of_hwp.fidelity.harness import verify_replace_roundtrip
from master_of_hwp.core.document import SourceFormat
report = verify_replace_roundtrip(
raw_bytes, SourceFormat.HWPX, section_index=0, paragraph_index=5, new_text="New content"
)
assert report.structural_equal
assert report.edited_paragraph_applied
Examples
python examples/01_read_sections.py samples/public-official/table-vpos-01.hwpx
python examples/02_extract_tables.py samples/public-official/table-vpos-01.hwpx
python examples/03_edit_paragraph.py samples/public-official/table-vpos-01.hwpx outputs/edited.hwpx
Roadmap
- v0.1 ✅ — Read path, HWPX paragraph replacement, fidelity harness, AI scaffold
- v0.2 — HWP 5.0 resize writer, paragraph insert/delete, table cell edit
- v0.3 — Full agentic edit loop (intent → locate → operate → verify → rollback)
- v1.0 — API compatibility contract starts
Details: docs/ROADMAP.md, docs/ARCHITECTURE.md.
Philosophy
- Platform-first — infrastructure, not a template app.
- Round-trip fidelity is the contract — opening and saving must not corrupt structure; proved by a benchmark, not a hope.
- Agentic document intelligence — documents should understand themselves.
- Solo OSS · no commercial pressure · quality first — take the time it needs.
Contributing
Contributions are very welcome — this is an open, community-driven project.
- 🐛 Bug reports / feature requests: open an issue
- 💻 Code contributions: fork → branch → PR. See CONTRIBUTING.md for dev setup, test expectations, and scope.
- 💬 Questions / discussion: GitHub Discussions
Areas we'd love help on:
- HWP 5.0 CFBF resize writer (v0.3)
- Paragraph insert / delete operations for both formats
- Additional LLM providers (OpenAI, Gemini, local Ollama) on top of the
LLMProviderProtocol - Windows / Linux installer for
master-of-hwp-studio - Accessibility improvements to the web GUI
No contribution is too small. Documentation fixes, typo corrections, and sample HWP files are equally valuable.
Acknowledgments
The WYSIWYG editor bundled in master-of-hwp-studio is built on rhwp by @edwardkim — a Rust + WebAssembly HWP parsing / rendering engine. This project would not be possible without their work. If you find master-of-hwp-studio useful, please star rhwp too.
License
MIT — see LICENSE.
한국어 개요
프로젝트의 한국어 소개는 README.ko.md 를 참고하세요.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file master_of_hwp-0.2.1.tar.gz.
File metadata
- Download URL: master_of_hwp-0.2.1.tar.gz
- Upload date:
- Size: 10.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7a1406b620568d340b2a204dab6b83421d92b053ae54ea3b84b354c41fd7eef2
|
|
| MD5 |
5bb42ff48c7bcd8f7932c65233426577
|
|
| BLAKE2b-256 |
90e3ab81e617622e5cc8454fc969f19be58f2bf8a1933f7758e95c3351ee2ad5
|
Provenance
The following attestation bundles were made for master_of_hwp-0.2.1.tar.gz:
Publisher:
release.yml on reallygood83/master-of-hwp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
master_of_hwp-0.2.1.tar.gz -
Subject digest:
7a1406b620568d340b2a204dab6b83421d92b053ae54ea3b84b354c41fd7eef2 - Sigstore transparency entry: 1347416186
- Sigstore integration time:
-
Permalink:
reallygood83/master-of-hwp@fd1453b01fc3a76077997dc38411285df2701b39 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/reallygood83
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@fd1453b01fc3a76077997dc38411285df2701b39 -
Trigger Event:
push
-
Statement type:
File details
Details for the file master_of_hwp-0.2.1-py3-none-any.whl.
File metadata
- Download URL: master_of_hwp-0.2.1-py3-none-any.whl
- Upload date:
- Size: 30.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
db1a58fd326250cd9e3ac62b1e19966acef8d5a3128e3d26c90cd0e1f72d1ec3
|
|
| MD5 |
b40ae338242244376aa7578e1e8dc865
|
|
| BLAKE2b-256 |
1205729d6f3b1ee780b1b40c2c022c2e83a66fe9826448b2a1546428169fc250
|
Provenance
The following attestation bundles were made for master_of_hwp-0.2.1-py3-none-any.whl:
Publisher:
release.yml on reallygood83/master-of-hwp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
master_of_hwp-0.2.1-py3-none-any.whl -
Subject digest:
db1a58fd326250cd9e3ac62b1e19966acef8d5a3128e3d26c90cd0e1f72d1ec3 - Sigstore transparency entry: 1347416248
- Sigstore integration time:
-
Permalink:
reallygood83/master-of-hwp@fd1453b01fc3a76077997dc38411285df2701b39 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/reallygood83
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@fd1453b01fc3a76077997dc38411285df2701b39 -
Trigger Event:
push
-
Statement type: