Skip to main content

Read Korean HWP/HWPX documents in Python; edit paragraphs and table cells in HWPX; natural-language edits via Claude Code CLI, Codex CLI, or Anthropic/OpenAI API. AI-friendly API.

Project description

master-of-hwp

PyPI version Python License: MIT Tests

Read Korean HWP/HWPX documents in Python, edit paragraphs in HWPX, and expose structure to AI workflows.

master-of-hwp is a Python-first library for opening real .hwp and .hwpx files, inspecting sections / paragraphs / tables, querying content, and performing immutable paragraph edits. The API is designed to be LLM-friendly: results are plain Python data structures, every mutation returns a new document, and a round-trip fidelity harness validates that edits preserve document structure.

Why this exists

Korean government, education, and enterprise workflows rely on HWP documents. Most AI tooling can't touch them directly — they get round-tripped through DOCX, shredding tables and formatting. master-of-hwp reads the real format, exposes the structure AI needs, and keeps edits byte-level honest.

30-Second Quickstart

pip install master-of-hwp
from master_of_hwp import HwpDocument

doc = HwpDocument.open("report.hwpx")

# Inspect
print(f"{doc.sections_count} sections, {len(list(doc.iter_paragraphs()))} paragraphs")
print(doc.summary())

# Query
for section, paragraph, text in doc.find_paragraphs("보도자료"):
    print(f{section}.{paragraph}: {text}")

# Edit (HWPX) — immutable: returns a new document
edited = doc.replace_paragraph(0, 0, "New intro text")
edited.path.with_suffix(".edited.hwpx").write_bytes(edited.raw_bytes)

API at a Glance

API Purpose
HwpDocument.open(path) Open .hwp / .hwpx as an immutable document
.sections_count Number of sections
.byte_size Size of raw bytes
.section_texts Plain text per section
.section_paragraphs Paragraphs per section (nested list)
.section_tables Tables: [section][table][row][cell][paragraph]
.plain_text All sections concatenated, format-agnostic normalization
.iter_paragraphs() Yield (section, paragraph, text) tuples
.find_paragraphs(query, regex=, case_sensitive=) Substring or regex search
.summary() Compact JSON-serializable overview for LLM context
.replace_paragraph(s, p, text) Return a new document with one paragraph replaced
.replace_table_cell_paragraph(s, t, r, c, p, text) Edit a paragraph inside a table cell (HWPX)
.ai_edit(request, provider=, dry_run=) Natural-language edit pipeline (intent → locate → apply → verify)

Supported Formats

Capability HWP 5.0 (.hwp) HWPX (.hwpx)
Open document
Count sections
Extract section text
Enumerate paragraphs
Enumerate tables Best effort*
Replace paragraph Same-length only**
Replace table cell paragraph ❌ (v0.3)
Insert / delete ❌ (v0.3) ❌ (v0.3)

* Minimal heuristic anchored on the TABLE(0x5B) record; exact row/cell recovery is pending a richer record-level parser. ** Different-length HWP 5.0 edits require a CFBF stream resize writer, scheduled for v0.3.

Natural-Language Editing

pip install master-of-hwp[ai]  # adds anthropic SDK
export ANTHROPIC_API_KEY=sk-ant-...
from master_of_hwp import HwpDocument
from master_of_hwp.ai.providers import AnthropicProvider

doc = HwpDocument.open("가정통신문.hwpx")
result = doc.ai_edit(
    "첫 번째 문단의 '급식비'를 '수업료'로 바꿔줘",
    provider=AnthropicProvider(),
)
if result.status == "applied":
    result.new_doc.path.with_suffix(".edited.hwpx").write_bytes(result.new_doc.raw_bytes)
else:
    print(result.message)  # refused / failed explanation

Without an API key, a rule-based fallback parser handles simple patterns (바꿔, 변경, keyword matches). See master_of_hwp.ai.providers for the LLMProvider Protocol — plug in OpenAI, local Ollama, etc.

Studio (Non-developer GUI)

For teachers / office workers who want a one-click experience — rhwp WYSIWYG editor is now bundled (v0.2+):

pip install master-of-hwp-studio
mohwp studio                    # launches web GUI + MCP server + bundled rhwp editor
mohwp mcp-config                # prints Claude Desktop config snippet

No Node.js setup required. The rhwp editor runs automatically on localhost:7700.

See studio/README.md.

Fidelity Harness

from master_of_hwp.fidelity.harness import verify_replace_roundtrip
from master_of_hwp.core.document import SourceFormat

report = verify_replace_roundtrip(
    raw_bytes, SourceFormat.HWPX, section_index=0, paragraph_index=5, new_text="New content"
)
assert report.structural_equal
assert report.edited_paragraph_applied

Examples

python examples/01_read_sections.py  samples/public-official/table-vpos-01.hwpx
python examples/02_extract_tables.py samples/public-official/table-vpos-01.hwpx
python examples/03_edit_paragraph.py samples/public-official/table-vpos-01.hwpx outputs/edited.hwpx

Roadmap

  • v0.1 ✅ — Read path, HWPX paragraph replacement, fidelity harness, AI scaffold
  • v0.2 — HWP 5.0 resize writer, paragraph insert/delete, table cell edit
  • v0.3 — Full agentic edit loop (intent → locate → operate → verify → rollback)
  • v1.0 — API compatibility contract starts

Details: docs/ROADMAP.md, docs/ARCHITECTURE.md.

Philosophy

  • Platform-first — infrastructure, not a template app.
  • Round-trip fidelity is the contract — opening and saving must not corrupt structure; proved by a benchmark, not a hope.
  • Agentic document intelligence — documents should understand themselves.
  • Solo OSS · no commercial pressure · quality first — take the time it needs.

Contributing

Contributions are very welcome — this is an open, community-driven project.

Areas we'd love help on:

  • HWP 5.0 CFBF resize writer (v0.3)
  • Paragraph insert / delete operations for both formats
  • Additional LLM providers (OpenAI, Gemini, local Ollama) on top of the LLMProvider Protocol
  • Windows / Linux installer for master-of-hwp-studio
  • Accessibility improvements to the web GUI

No contribution is too small. Documentation fixes, typo corrections, and sample HWP files are equally valuable.

Acknowledgments

The WYSIWYG editor bundled in master-of-hwp-studio is built on rhwp by @edwardkim — a Rust + WebAssembly HWP parsing / rendering engine. This project would not be possible without their work. If you find master-of-hwp-studio useful, please star rhwp too.

License

MIT — see LICENSE.

한국어 개요

프로젝트의 한국어 소개는 README.ko.md 를 참고하세요.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

master_of_hwp-0.2.2.tar.gz (10.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

master_of_hwp-0.2.2-py3-none-any.whl (32.1 kB view details)

Uploaded Python 3

File details

Details for the file master_of_hwp-0.2.2.tar.gz.

File metadata

  • Download URL: master_of_hwp-0.2.2.tar.gz
  • Upload date:
  • Size: 10.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for master_of_hwp-0.2.2.tar.gz
Algorithm Hash digest
SHA256 25d284f68d6074400fae21766a05a670d412dceb04d738f547b77562f3032136
MD5 04611429d2aa6bc8dec1bf749123d8fd
BLAKE2b-256 e1b7483511d3edd60e7c1ab2e61ee42799d6a1474e049c074b41ff42a097aeb0

See more details on using hashes here.

Provenance

The following attestation bundles were made for master_of_hwp-0.2.2.tar.gz:

Publisher: release.yml on reallygood83/master-of-hwp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file master_of_hwp-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: master_of_hwp-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 32.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for master_of_hwp-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8812feabe696a197c099b8caa64a3a3bd9f74485af826f5fbf679e6fa6b2afaa
MD5 5fd7e93baa84a4e71c27d9ab1e574ad4
BLAKE2b-256 2bdcd7e2390a0b815b5285f0471f6e29de25f1414a1f41bb8fcd23d1bdac1ab8

See more details on using hashes here.

Provenance

The following attestation bundles were made for master_of_hwp-0.2.2-py3-none-any.whl:

Publisher: release.yml on reallygood83/master-of-hwp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page