Skip to main content

Parse legacy WPS Writer (.wps) OLE2 binary files into structured text and Markdown.

Project description

wps2md

A tiny Python library and CLI for converting legacy WPS Writer .wps files (OLE2 Word-binary format, FIB magic 0xA5EC/0xA5DC) into structured text and Markdown.

Unlike .docx (which is OOXML/zip and can be read by python-docx), .wps files saved by WPS Office are binary OLE2 compound documents. This library reads the WordDocument stream, validates the FIB, recovers paragraph style indices (istd) via PlcfBtePapx → FKPs, and renders Heading 1-9 styles as #..######### in Markdown.

Install

pip install wps2md

CLI

wps2md example.wps                 # print Markdown to stdout
wps2md example.wps > example.md
python -m wps2md example.wps       # equivalent

Library

from wps2md import parse, to_markdown

doc = parse("example.wps")
print(doc.main_text)                # plain text of the main body
print(doc.num_pages)                # from OLE SummaryInformation
print(to_markdown(doc.paragraphs))  # Markdown with H1-H9 from Word styles

for p in doc.paragraphs:
    print(p.heading_level, p.text)  # 0 for normal text, 1-9 for headings

API

  • parse(path) -> WpsDocument — parse a .wps file.
  • WpsDocument — dataclass with main_text, paragraphs, footnotes, headers_footers, annotations, encoding, num_pages.
  • Paragraph(istd: int, text: str) — one paragraph; heading_level returns 1-9 for built-in Heading styles, else 0.
  • to_markdown(paragraphs) -> str — render paragraphs as Markdown.
  • WpsParseError — raised for non-.wps inputs, encrypted files, or unreadable streams.

Limitations

  • Tables, images, footnotes/headers paragraph styles, complex fields, and CHPX (character formatting like bold/italic) are not currently surfaced — only paragraph-level Heading styles drive Markdown output.
  • Encrypted/password-protected files are rejected.
  • Only the OLE2 Word-binary variant of .wps is supported (modern WPS Office still writes this for .wps; the OOXML .docx variant should be read with python-docx instead).

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wps2md-0.1.0.tar.gz (26.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wps2md-0.1.0-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file wps2md-0.1.0.tar.gz.

File metadata

  • Download URL: wps2md-0.1.0.tar.gz
  • Upload date:
  • Size: 26.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.16

File hashes

Hashes for wps2md-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5299c1780a766ccb160a2fb438369754edb7712639ddd6ed7d32242a1b4cfe17
MD5 0fa13b10d40190287cd5ce335a4bfaf9
BLAKE2b-256 c916b95e43322cf9eaa106bc45273911b123a23f2c12a9d16e2e14498fbee66c

See more details on using hashes here.

File details

Details for the file wps2md-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: wps2md-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.16

File hashes

Hashes for wps2md-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d88045752b2502e1d80a709147e454ed1726dd72a6d93f1db80a3fa7643cacb5
MD5 9b8991311282e8e368300bd78c2d3a79
BLAKE2b-256 6a3c978bdc19bffcdb5efc255f031206b8467547554057090c1c76d359e45498

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page