Skip to main content

Read Korean HWP/HWPX documents in Python; edit paragraphs in HWPX. AI-friendly API.

Project description

master-of-hwp

Read Korean HWP/HWPX documents in Python, with paragraph editing for HWPX and an API designed for AI workflows.

master-of-hwp is a Python-first library for opening real .hwp and .hwpx files, inspecting sections, paragraphs, and tables, and performing immutable paragraph replacement where the underlying adapter supports it. Version 0.1.0 focuses on a dependable read path plus an initial write primitive for HWPX.

30-Second Quickstart

pip install master-of-hwp
from pathlib import Path

from master_of_hwp import HwpDocument

doc = HwpDocument.open("samples/public-official/table-vpos-01.hwpx")
print(doc.sections_count)
first_paragraph = next(
    text
    for paragraphs in doc.section_paragraphs
    for text in paragraphs
    if text
)
print(first_paragraph)

edited = doc.replace_paragraph(0, 0, "PyPI quickstart paragraph")
Path("outputs/quickstart-edited.hwpx").write_bytes(edited.raw_bytes)

API at a Glance

API What it does
HwpDocument.open(path) Open a .hwp or .hwpx file into an immutable document object
HwpDocument.sections_count Count sections
HwpDocument.section_texts Read plain text per section
HwpDocument.section_paragraphs Read paragraphs per section
HwpDocument.section_tables Read nested table data
HwpDocument.replace_paragraph(...) Return a new document with one paragraph replaced

Supported Formats

Capability HWP 5.0 (.hwp) HWPX (.hwpx)
Open document Yes Yes
Count sections Yes Yes
Extract section text Yes Yes
Enumerate paragraphs Yes Yes
Enumerate tables Best effort Yes
Replace paragraph Same-text no-op only Yes

Quickstart Notes

  • replace_paragraph is a pure function: the original HwpDocument stays unchanged.
  • HWPX paragraph replacement rewrites the ZIP package in memory and returns new bytes.
  • HWP 5.0 write support is intentionally partial in 0.1.0 and will expand in 0.2.

Examples

python examples/01_read_sections.py samples/public-official/table-vpos-01.hwpx
python examples/02_extract_tables.py samples/public-official/table-vpos-01.hwpx
python examples/03_edit_paragraph.py samples/public-official/table-vpos-01.hwpx outputs/edited.hwpx

Roadmap

  • v0.1 — Read path for HWP/HWPX, HWPX paragraph replacement, fidelity helpers
  • v0.2 — Broader write path: insert/delete operations and stronger HWP 5.0 editing support
  • v0.3 — AI-oriented editing loop and provider abstractions

Longer project direction lives in docs/ROADMAP.md and docs/ARCHITECTURE.md.

Maintainer Release Notes

  • The repository includes .github/workflows/release.yml for PyPI Trusted Publishing on v*.*.* tags.
  • PyPI project creation, Trusted Publisher registration, and release tagging are manual maintainer steps.
  • Validate a release locally with python -m build and python -m twine check dist/* before tagging.

Contributing

Contributions are welcome. Start with CONTRIBUTING.md for development setup, test expectations, and project scope.

License

MIT. See LICENSE.

Korean README

For the original Korean project overview, see README.ko.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

master_of_hwp-0.1.0.tar.gz (8.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

master_of_hwp-0.1.0-py3-none-any.whl (24.5 kB view details)

Uploaded Python 3

File details

Details for the file master_of_hwp-0.1.0.tar.gz.

File metadata

  • Download URL: master_of_hwp-0.1.0.tar.gz
  • Upload date:
  • Size: 8.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for master_of_hwp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 240de37a259149a229703516818de66752d911f28d80d30021c387a39340982d
MD5 78d081a8324a1a6a5cfc2852d9417acd
BLAKE2b-256 a506d62059332837e56c8951b660ca7e9f12ab1cd6efead4675644c580a23520

See more details on using hashes here.

Provenance

The following attestation bundles were made for master_of_hwp-0.1.0.tar.gz:

Publisher: release.yml on reallygood83/master-of-hwp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file master_of_hwp-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: master_of_hwp-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 24.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for master_of_hwp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4e958e7dac9a870f3189724b9052fe1bdb9d4342cc5fade4d3228f7071d1cdd9
MD5 63a1b2b595eea71e6a8544585c17e915
BLAKE2b-256 7c343aa21c308d2de7e39d3a8ae362ff1ac5b2074bcc54fe0c054b1cbe67a2e5

See more details on using hashes here.

Provenance

The following attestation bundles were made for master_of_hwp-0.1.0-py3-none-any.whl:

Publisher: release.yml on reallygood83/master-of-hwp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page