Skip to main content

Python library for creating, editing, and previewing HWPX (Hancom Office) documents without Hancom Office

Project description

pyhwpxlib

Python library for creating, converting, and editing HWPX (Hancom Office) documents programmatically. No Hancom Office installation required.

한국어

Why pyhwpxlib?

  • Generate HWPX reports on a server with zero desktop dependencies
  • Convert Markdown, HTML, or legacy HWP 5.x files to HWPX
  • Auto-fill government forms and contracts from data
  • Let AI agents (Claude Code, Cursor, etc.) produce native Korean documents
  • Extract text, Markdown, or HTML from existing HWPX files
  • Merge multiple HWPX files into one

Install

pip install pyhwpxlib

This installs both the Python library and the pyhwpxlib CLI command.

Python 3.10+ required. No external dependencies for core features.

# Optional: image support
pip install pyhwpxlib[images]    # Pillow

# Optional: faster XML parsing
pip install pyhwpxlib[lxml]      # lxml

# Optional: HWP 5.x → HWPX conversion
pip install pyhwpxlib[hwp]       # olefile

# Install everything
pip install pyhwpxlib[all]

Quick Start

Create a document in 5 lines

from pyhwpxlib import HwpxBuilder

doc = HwpxBuilder()
doc.add_heading("Project Report", level=1)
doc.add_paragraph("April 2026")
doc.add_table([
    ["Item", "Qty", "Price"],
    ["Server", "3", "9,000,000"],
    ["License", "10", "5,000,000"],
])
doc.add_heading("1. Overview", level=2)
doc.add_paragraph("This report covers...")
doc.save("report.hwpx")

Convert Markdown from the terminal

pyhwpxlib md2hwpx report.md -o report.hwpx

Fill a government form template

from pyhwpxlib.api import fill_template_checkbox

fill_template_checkbox(
    "contract_template.hwpx",
    data={">Name<": ">Name  John Doe<"},
    checks=["Agree"],
    output_path="contract_filled.hwpx",
)

CLI Reference

pip install pyhwpxlib installs the pyhwpxlib command with 9 subcommands:

md2hwpx -- Markdown to HWPX

pyhwpxlib md2hwpx report.md -o report.hwpx
pyhwpxlib md2hwpx report.md -o report.hwpx -s github   # style preset

Auto-detects: headings (#), bold, italic, bullet/numbered lists, code blocks, tables, horizontal rules.

hwpx2html -- HWPX to HTML

pyhwpxlib hwpx2html document.hwpx -o document.html

Produces a self-contained HTML with embedded base64 images.

text -- Extract text from HWPX

pyhwpxlib text document.hwpx                      # plain text (default)
pyhwpxlib text document.hwpx -f markdown           # as Markdown
pyhwpxlib text document.hwpx -f html               # as HTML

fill -- Fill template with data

# Key-value pairs
pyhwpxlib fill template.hwpx -o filled.hwpx -d name=Hong age=30

# From JSON file
pyhwpxlib fill template.hwpx -o filled.hwpx -d data.json

info -- Inspect HWPX file

pyhwpxlib info document.hwpx

Shows file size, section count, image list, text character/line counts, and a text preview.

merge -- Merge multiple HWPX files

pyhwpxlib merge part1.hwpx part2.hwpx part3.hwpx -o combined.hwpx

Inserts page breaks between documents automatically.

unpack -- Extract HWPX to folder

pyhwpxlib unpack document.hwpx -o unpacked/

Extracts all XML and binary files from the HWPX ZIP for direct editing.

pack -- Re-package folder as HWPX

pyhwpxlib pack unpacked/ -o output.hwpx

Re-creates a valid HWPX file from an unpacked folder. The mimetype entry is stored uncompressed per the OWPML spec.

validate -- Validate HWPX structure

pyhwpxlib validate output.hwpx

Checks for required files (mimetype, header.xml, section0.xml, content.hpf) and validates XML parsing. Returns exit code 0 on success, 1 on failure.


Python API

Document Creation (HwpxBuilder)

High-level builder for creating HWPX documents. Includes table style presets (corporate, government, academic, default).

doc = HwpxBuilder(table_preset='corporate')
Method Description
add_heading(text, level) Headings (level 1--4)
add_paragraph(text, bold, italic, font_size, text_color, alignment) Styled paragraphs
add_table(data, header_bg, col_widths, merge_info, cell_colors, ...) Tables with auto-preset styling
add_bullet_list(items, bullet_char) Bullet lists (-, , )
add_numbered_list(items, format_string) Numbered lists (^1., ^1), (^1))
add_nested_bullet_list(items) Multi-level bullet lists (level 0--6)
add_nested_numbered_list(items) Multi-level numbered lists
add_image(path, width, height) Local image
add_image_from_url(url, width, height) Image from URL (auto-download)
add_page_break() Page break
add_line() Horizontal divider
add_header(text) / add_footer(text) Header / Footer
add_page_number(pos) Page numbers (4 positions)
add_footnote(text) Footnotes
add_equation(script) Math equations
add_highlight(text, color) Highlighted text
add_rectangle(...) / add_draw_line(...) Shapes
save(path) Save as .hwpx

Low-Level API (pyhwpxlib.api)

For fine-grained control over the HWPX object model:

from pyhwpxlib.api import create_document, add_paragraph, add_table, save

doc = create_document()
add_paragraph(doc, "Hello, World!", bold=True, font_size=14)
add_table(doc, rows=3, cols=2, data=[["A","B"],["1","2"],["3","4"]])
save(doc, "output.hwpx")

Additional low-level functions:

Category Functions
Text add_paragraph, add_styled_paragraph, add_heading, add_hyperlink, add_code_block
Lists add_bullet_list, add_numbered_list, add_nested_bullet_list, add_nested_numbered_list
Tables add_table (with merge, gradient, per-cell styles)
Images & Shapes add_image, add_rectangle, add_ellipse, add_line, add_arc, add_polygon, add_curve, add_connect_line, add_textart, add_rectangle_with_image_fill
Layout add_header, add_footer, add_page_number, add_page_break, set_page_setup, set_columns
Annotations add_footnote, add_bookmark, add_indexmark, add_hidden_comment, add_highlight, add_dutmal
Special add_equation, add_tab, add_special_char, add_container
Form Controls add_checkbox, add_radio_button, add_button, add_combobox, add_listbox, add_edit_field, add_scrollbar
Conversion convert_md_to_hwpx, convert_md_file_to_hwpx, convert_html_to_hwpx, convert_html_file_to_hwpx, convert_hwpx_to_html
Reading open_document, extract_text, extract_markdown, extract_html
Templates fill_template, fill_template_checkbox, fill_template_batch, extract_schema, analyze_schema_with_llm
Documents merge_documents
Page Setup set_page_setup(paper="A4"/"A3"/"B5"/"LETTER"/"LEGAL", landscape=True, margin_*)

Conversions

Direction CLI Python
Markdown → HWPX pyhwpxlib md2hwpx in.md -o out.hwpx convert_md_file_to_hwpx("in.md", "out.hwpx")
HTML → HWPX -- convert_html_file_to_hwpx("in.html", "out.hwpx")
HWPX → HTML pyhwpxlib hwpx2html in.hwpx -o out.html convert_hwpx_to_html("in.hwpx", "out.html")
HWP 5.x → HWPX -- from pyhwpxlib.hwp2hwpx import convert; convert("old.hwp", "new.hwpx")
HWPX → Text pyhwpxlib text in.hwpx extract_text("document.hwpx")
HWPX → Markdown pyhwpxlib text in.hwpx -f markdown extract_markdown("document.hwpx")

Template Automation

from pyhwpxlib.api import extract_schema, fill_template_checkbox, fill_template_batch

# 1. Discover what fields a template has
schema = extract_schema("form_template.hwpx")
print(schema)  # {'title': '...', 'fields': [...], 'checkboxes': [...]}

# 2. Fill a single document
fill_template_checkbox(
    "form_template.hwpx",
    data={">Name<": ">Name  Jane Doe<"},
    checks=["Agree"],
    output_path="filled.hwpx",
)

# 3. Batch-generate from a list of records
fill_template_batch(
    "form_template.hwpx",
    records=[
        {"data": {">Name<": ">Name  Alice<"}, "checks": ["Agree"]},
        {"data": {">Name<": ">Name  Bob<"},   "checks": ["Agree"]},
    ],
    output_dir="output/",
)

Edit Existing Documents (Unpack/Pack)

pyhwpxlib unpack document.hwpx -o unpacked/    # Extract ZIP to folder
# Edit XML files in unpacked/Contents/ directly
pyhwpxlib pack unpacked/ -o output.hwpx         # Re-package as HWPX
pyhwpxlib validate output.hwpx                  # Validate structure

Preview (HWP/HWPX → SVG)

Render HWP or HWPX documents to SVG for visual inspection or LLM review:

pip install pyhwpxlib[preview]
from pyhwpxlib.rhwp_bridge import RhwpEngine

engine = RhwpEngine()  # one-time WASM load
with engine.load("sample.hwp") as doc:   # HWP or HWPX
    print(doc.page_count)
    svg = doc.render_page_svg(0, embed_fonts=True)   # Korean fonts embedded
    all_svgs = doc.render_all_svgs(embed_fonts=True)

Important: Always use embed_fonts=True when Korean text is involved. Without it, Korean characters may appear as empty boxes (tofu) on machines without Korean fonts installed.

For accurate text measurement and font embedding, install with fonts extras:

pip install pyhwpxlib[preview-fonts]

The preview feature is ideal for LLM-in-the-loop workflows where a language model generates an HWPX document and needs to visually verify the output.

Third-Party Notice

The preview feature bundles a pre-built WebAssembly binary from the rhwp project (MIT License, © 2025-2026 Edward Kim). The binary is redistributed unmodified. See NOTICE.md and pyhwpxlib/vendor/LICENSE.rhwp.txt for details.


What is HWPX?

HWPX is the modern document format for Hancom Office, the standard office suite in South Korea. It's a ZIP archive containing XML files (OWPML spec) -- similar to .docx for Microsoft Word. Used by Korean government agencies, public institutions, and enterprises.

Credits

Project Author License Usage
hwp2hwpx neolord0 Apache 2.0 HWP→HWPX conversion (ported to Python)
hwplib neolord0 Apache 2.0 HWP binary parser (ported to Python)
python-hwpx Kyuhyun Ko MIT HWPX dataclass model
rhwp Edward Kim MIT HWP/HWPX → SVG renderer (bundled WASM, [preview] extras)

Known Limitations

  • Complex cell-merge layouts may require manual review
  • No built-in HWPX preview (verify in Hancom Office or Whale)
  • CSS→HWPX mapping covers 46 major properties only
  • Image OCR for form text requires a separate API key

License

Dual license -- see LICENSE.md for details.

Files License
hwp2hwpx.py, hwp_reader.py, value_convertor.py Apache 2.0 (derivative works)
All other files BSL 1.1

BSL 1.1 summary: Personal/non-commercial/educational/open-source use is free. Commercial use requires a license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyhwpxlib-0.3.2-py3-none-any.whl (1.6 MB view details)

Uploaded Python 3

File details

Details for the file pyhwpxlib-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: pyhwpxlib-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for pyhwpxlib-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e0419743dbde5142321314dd52cd1a12d3956e95aedcac8989bfe5774aff1064
MD5 b8bc4bd33d06fae567609866d89db2e5
BLAKE2b-256 0483026ceaad088fcf29511a56291f2662df3ab9bdf2501be6d34275b082ea7b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page