Parse legacy WPS Writer (.wps) OLE2 binary files into structured text and Markdown.
Project description
wps2md
A tiny Python library and CLI for converting legacy WPS Writer .wps
files (OLE2 Word-binary format, FIB magic 0xA5EC/0xA5DC) into
structured text and Markdown.
Unlike .docx (which is OOXML/zip and can be read by python-docx),
.wps files saved by WPS Office are binary OLE2 compound documents.
This library reads the WordDocument stream, validates the FIB,
recovers paragraph style indices (istd) via PlcfBtePapx → FKPs,
and renders Heading 1-9 styles as #..######### in Markdown.
Install
pip install wps2md
CLI
wps2md example.wps # print Markdown to stdout
wps2md example.wps > example.md
python -m wps2md example.wps # equivalent
Library
from wps2md import parse, to_markdown
doc = parse("example.wps")
print(doc.main_text) # plain text of the main body
print(doc.num_pages) # from OLE SummaryInformation
print(to_markdown(doc.paragraphs)) # Markdown with H1-H9 from Word styles
for p in doc.paragraphs:
print(p.heading_level, p.text) # 0 for normal text, 1-9 for headings
API
parse(path) -> WpsDocument— parse a.wpsfile.WpsDocument— dataclass withmain_text,paragraphs,footnotes,headers_footers,annotations,encoding,num_pages.Paragraph(istd: int, text: str)— one paragraph;heading_levelreturns 1-9 for built-in Heading styles, else 0.to_markdown(paragraphs) -> str— render paragraphs as Markdown.WpsParseError— raised for non-.wpsinputs, encrypted files, or unreadable streams.
Limitations
- Tables, images, footnotes/headers paragraph styles, complex fields, and CHPX (character formatting like bold/italic) are not currently surfaced — only paragraph-level Heading styles drive Markdown output.
- Encrypted/password-protected files are rejected.
- Only the OLE2 Word-binary variant of
.wpsis supported (modern WPS Office still writes this for.wps; the OOXML.docxvariant should be read withpython-docxinstead).
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wps2md-0.1.0.tar.gz.
File metadata
- Download URL: wps2md-0.1.0.tar.gz
- Upload date:
- Size: 26.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5299c1780a766ccb160a2fb438369754edb7712639ddd6ed7d32242a1b4cfe17
|
|
| MD5 |
0fa13b10d40190287cd5ce335a4bfaf9
|
|
| BLAKE2b-256 |
c916b95e43322cf9eaa106bc45273911b123a23f2c12a9d16e2e14498fbee66c
|
File details
Details for the file wps2md-0.1.0-py3-none-any.whl.
File metadata
- Download URL: wps2md-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d88045752b2502e1d80a709147e454ed1726dd72a6d93f1db80a3fa7643cacb5
|
|
| MD5 |
9b8991311282e8e368300bd78c2d3a79
|
|
| BLAKE2b-256 |
6a3c978bdc19bffcdb5efc255f031206b8467547554057090c1c76d359e45498
|