Skip to main content

Convert HWP (Hangul Word Processor) files to HWPX format

Project description

hwp2hwpx

Convert HWP files to HWPX format — the only pip install-able HWP→HWPX converter.

HWP is the legacy binary format used by Hangul (한글), the dominant word processor in South Korea. HWPX is the modern XML-based format (OWPML/ODF-like ZIP archive). This package converts between them programmatically — no Hangul installation or GUI required.

🇰🇷 한국어

Why?

Tool What it does Limitation
Hangul GUI Open HWP → Save As HWPX Manual, not scriptable
HwpxConverter.exe Bundled with Hangul, GUI only No CLI, Windows only
kordoc Parses HWP → Markdown/JSON Extracts content, doesn't convert format
hwp2hwpx ← this Converts HWP → HWPX (valid ZIP/XML) Needs Java runtime

If you need to read HWP content → use kordoc. If you need a real HWPX file you can open/edit in Hangul → use this.

Install

pip install hwp2hwpx

Requires Java Runtime (JRE) 8+:

# Windows
winget install EclipseAdoptium.Temurin.21.JDK

# macOS
brew install temurin

# Linux (Debian/Ubuntu)
apt install default-jre

Usage

CLI

# Single file
hwp2hwpx document.hwp

# Multiple files
hwp2hwpx *.hwp

# Output directory
hwp2hwpx document.hwp -o output/

# Recursive folder conversion
hwp2hwpx ./documents/ -r

Python API

from hwp2hwpx import convert, convert_batch

# Single file
output_path = convert("document.hwp")
output_path = convert("document.hwp", "output.hwpx")

# Batch
results = convert_batch(["a.hwp", "b.hwp"], output_dir="output/")
for input_path, output_path, error in results:
    if error:
        print(f"FAIL: {input_path}: {error}")
    else:
        print(f"OK: {output_path}")

How it works

Bundles neolord0/hwp2hwpx Java library as a fat JAR:

  • hwplib — reads HWP binary (OLE2/CFB compound document)
  • hwpxlib — writes HWPX XML (ZIP archive with OWPML structure)

Pure file-format conversion. No Hangul installation, no COM API, no DRM issues.

Korean file paths on Windows are automatically handled via temp-file workaround (JVM encoding issue bypass).

Output format

The output HWPX is a standard ZIP archive containing:

META-INF/container.xml
Contents/header.xml
Contents/section0.xml
Contents/section1.xml
...

Fully compatible with Hangul 2020+ and any OWPML-aware tool.

License

Apache License 2.0

Based on Java libraries by neolord0:


한국어

HWP(한글 워드프로세서) 파일을 HWPX(OWPML) 형식으로 변환하는 Python 패키지.

pip install hwp2hwpx 한 줄로 설치, 바로 사용. 한글 프로그램 설치 불필요.

설치

pip install hwp2hwpx

Java 필요: winget install EclipseAdoptium.Temurin.21.JDK

사용법

hwp2hwpx 문서.hwp
hwp2hwpx *.hwp -o 출력폴더/
from hwp2hwpx import convert
convert("문서.hwp")

kordoc과의 차이

  • kordoc: HWP를 읽어서 마크다운/JSON으로 추출 (텍스트 파싱)
  • hwp2hwpx: HWP를 HWPX 파일로 변환 (한글에서 열 수 있는 완전한 문서)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hwp2hwpx-1.0.1.tar.gz (2.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hwp2hwpx-1.0.1-py3-none-any.whl (2.2 MB view details)

Uploaded Python 3

File details

Details for the file hwp2hwpx-1.0.1.tar.gz.

File metadata

  • Download URL: hwp2hwpx-1.0.1.tar.gz
  • Upload date:
  • Size: 2.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for hwp2hwpx-1.0.1.tar.gz
Algorithm Hash digest
SHA256 e419a1d97b9547bf3c0d365adbc41696990b7fa96b5d5d8d08580875cb719a23
MD5 db0feedf13f761e28290e96588036338
BLAKE2b-256 9980fa9aba1c796bfb01db64cf887b1fa03cc180466b813afd4ed4d18cdfc0be

See more details on using hashes here.

File details

Details for the file hwp2hwpx-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: hwp2hwpx-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 2.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for hwp2hwpx-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e44606b65d840a33ceae42272194f1ae63665676fa01ea88bd763fede72798ea
MD5 668e7ad0a00b9c324d81294a430a2e1e
BLAKE2b-256 43b50e6ff236cc92db5a15afd9b38b34c7bf219ee7648365daab80f31e080dbd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page