Read Korean HWP/HWPX documents in Python; edit paragraphs in HWPX. AI-friendly API.
Project description
master-of-hwp
Read Korean HWP/HWPX documents in Python, with paragraph editing for HWPX and an API designed for AI workflows.
master-of-hwp is a Python-first library for opening real .hwp and .hwpx files, inspecting sections, paragraphs, and tables, and performing immutable paragraph replacement where the underlying adapter supports it. Version 0.1.0 focuses on a dependable read path plus an initial write primitive for HWPX.
30-Second Quickstart
pip install master-of-hwp
from pathlib import Path
from master_of_hwp import HwpDocument
doc = HwpDocument.open("samples/public-official/table-vpos-01.hwpx")
print(doc.sections_count)
first_paragraph = next(
text
for paragraphs in doc.section_paragraphs
for text in paragraphs
if text
)
print(first_paragraph)
edited = doc.replace_paragraph(0, 0, "PyPI quickstart paragraph")
Path("outputs/quickstart-edited.hwpx").write_bytes(edited.raw_bytes)
API at a Glance
| API | What it does |
|---|---|
HwpDocument.open(path) |
Open a .hwp or .hwpx file into an immutable document object |
HwpDocument.sections_count |
Count sections |
HwpDocument.section_texts |
Read plain text per section |
HwpDocument.section_paragraphs |
Read paragraphs per section |
HwpDocument.section_tables |
Read nested table data |
HwpDocument.replace_paragraph(...) |
Return a new document with one paragraph replaced |
Supported Formats
| Capability | HWP 5.0 (.hwp) |
HWPX (.hwpx) |
|---|---|---|
| Open document | Yes | Yes |
| Count sections | Yes | Yes |
| Extract section text | Yes | Yes |
| Enumerate paragraphs | Yes | Yes |
| Enumerate tables | Best effort | Yes |
| Replace paragraph | Same-text no-op only | Yes |
Quickstart Notes
replace_paragraphis a pure function: the originalHwpDocumentstays unchanged.- HWPX paragraph replacement rewrites the ZIP package in memory and returns new bytes.
- HWP 5.0 write support is intentionally partial in
0.1.0and will expand in0.2.
Examples
python examples/01_read_sections.py samples/public-official/table-vpos-01.hwpx
python examples/02_extract_tables.py samples/public-official/table-vpos-01.hwpx
python examples/03_edit_paragraph.py samples/public-official/table-vpos-01.hwpx outputs/edited.hwpx
Roadmap
v0.1— Read path for HWP/HWPX, HWPX paragraph replacement, fidelity helpersv0.2— Broader write path: insert/delete operations and stronger HWP 5.0 editing supportv0.3— AI-oriented editing loop and provider abstractions
Longer project direction lives in docs/ROADMAP.md and docs/ARCHITECTURE.md.
Maintainer Release Notes
- The repository includes
.github/workflows/release.ymlfor PyPI Trusted Publishing onv*.*.*tags. - PyPI project creation, Trusted Publisher registration, and release tagging are manual maintainer steps.
- Validate a release locally with
python -m buildandpython -m twine check dist/*before tagging.
Contributing
Contributions are welcome. Start with CONTRIBUTING.md for development setup, test expectations, and project scope.
License
MIT. See LICENSE.
Korean README
For the original Korean project overview, see README.ko.md.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file master_of_hwp-0.1.0.tar.gz.
File metadata
- Download URL: master_of_hwp-0.1.0.tar.gz
- Upload date:
- Size: 8.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
240de37a259149a229703516818de66752d911f28d80d30021c387a39340982d
|
|
| MD5 |
78d081a8324a1a6a5cfc2852d9417acd
|
|
| BLAKE2b-256 |
a506d62059332837e56c8951b660ca7e9f12ab1cd6efead4675644c580a23520
|
Provenance
The following attestation bundles were made for master_of_hwp-0.1.0.tar.gz:
Publisher:
release.yml on reallygood83/master-of-hwp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
master_of_hwp-0.1.0.tar.gz -
Subject digest:
240de37a259149a229703516818de66752d911f28d80d30021c387a39340982d - Sigstore transparency entry: 1345798409
- Sigstore integration time:
-
Permalink:
reallygood83/master-of-hwp@4ca9398f0399f4cdada37a9032d9c007a7444dfc -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/reallygood83
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@4ca9398f0399f4cdada37a9032d9c007a7444dfc -
Trigger Event:
push
-
Statement type:
File details
Details for the file master_of_hwp-0.1.0-py3-none-any.whl.
File metadata
- Download URL: master_of_hwp-0.1.0-py3-none-any.whl
- Upload date:
- Size: 24.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4e958e7dac9a870f3189724b9052fe1bdb9d4342cc5fade4d3228f7071d1cdd9
|
|
| MD5 |
63a1b2b595eea71e6a8544585c17e915
|
|
| BLAKE2b-256 |
7c343aa21c308d2de7e39d3a8ae362ff1ac5b2074bcc54fe0c054b1cbe67a2e5
|
Provenance
The following attestation bundles were made for master_of_hwp-0.1.0-py3-none-any.whl:
Publisher:
release.yml on reallygood83/master-of-hwp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
master_of_hwp-0.1.0-py3-none-any.whl -
Subject digest:
4e958e7dac9a870f3189724b9052fe1bdb9d4342cc5fade4d3228f7071d1cdd9 - Sigstore transparency entry: 1345798488
- Sigstore integration time:
-
Permalink:
reallygood83/master-of-hwp@4ca9398f0399f4cdada37a9032d9c007a7444dfc -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/reallygood83
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@4ca9398f0399f4cdada37a9032d9c007a7444dfc -
Trigger Event:
push
-
Statement type: