Tool to parse Microsoft Rich Text Format (RTF)
Project description
rtfparse
RTF Parser. So far it can only de-encapsulate HTML content from an RTF, but it properly parses the RTF structure and allows you to write your own custom RTF renderers. The HTML de-encapsulator provided with rtfparse is just one such custom renderer which liberates the HTML content from its RTF encapsulation and saves it in a given html file.
rtfparse can also decompressed RTF from MS Outlook .msg files and parse that.
Installation
Install rtfparse from your local repository with pip:
pip install rtfparse
Installation creates an executable file rtfparse in your python scripts folder which should be in your $PATH.
Usage From Command Line
Use the rtfparse executable from the command line. Read rtfparse --help.
rtfparse writes logs into ~/rtfparse/ into these files:
rtfparse.debug.log
rtfparse.info.log
rtfparse.errors.log
Example: De-encapsulate HTML from an uncompressed RTF file
rtfparse --rtf-file "path/to/rtf_file.rtf" --de-encapsulate-html --output-file "path/to/extracted.html"
Example: De-encapsulate HTML from MS Outlook email file
Thanks to extract_msg and compressed_rtf, rtfparse internally uses them:
rtfparse --msg-file "path/to/email.msg" --de-encapsulate-html --output-file "path/to/extracted.html"
Example: Only decompress the RTF from MS Outlook email file
rtfparse --msg-file "path/to/email.msg" --output-file "path/to/extracted.rtf"
Example: De-encapsulate HTML from MS Outlook email file and save (and later embed) the attachments
When extracting the RTF from the .msg file, you can save the attachments (which includes images embedded in the email text) in a directory:
rtfparse --msg-file "path/to/email.msg" --output-file "path/to/extracted.rtf" --attachments-dir "path/to/dir"
In rtfparse version 1.x you will be able to embed these images in the de-encapsulated HTML. This functionality will be provided by the package embedimg.
rtfparse --msg-file "path/to/email.msg" --output-file "path/to/extracted.rtf" --attachments-dir "path/to/dir" --embed-img
In the current version the option --embed-img does nothing.
Programatic usage in python module
from pathlib import Path
from rtfparse.parser import Rtf_Parser
from rtfparse.renderers.de_encapsulate_html import De_encapsulate_HTML
source_path = Path(r"path/to/your/rtf/document.rtf")
target_path = Path(r"path/to/your/html/de_encapsulated.html")
parser = Rtf_Parser(rtf_path=source_path)
parsed = parser.parse_file()
renderer = De_encapsulate_HTML()
with open(target_path, mode="w", encoding="utf-8") as html_file:
renderer.render(parsed, html_file)
RTF Specification Links
If you find a working official Microsoft link to the RTF specification and add it here, you'll be remembered fondly.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rtfparse-0.8.0.tar.gz.
File metadata
- Download URL: rtfparse-0.8.0.tar.gz
- Upload date:
- Size: 13.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.23.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e6b4e658909af34b191530f14f6f14a6aad62dc2c402eb1c5e7313d531c3bbf1
|
|
| MD5 |
f6d4063e7627a53a47a27255107a8813
|
|
| BLAKE2b-256 |
862ea49f303fb095fced49552845d38478b0ed03d89186d87eadc2809e535e1f
|
File details
Details for the file rtfparse-0.8.0-py3-none-any.whl.
File metadata
- Download URL: rtfparse-0.8.0-py3-none-any.whl
- Upload date:
- Size: 15.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.23.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1edb8eed645e1bbc0572fd6436e60ad7aa5960bab41c4a064347414804ac8362
|
|
| MD5 |
6e0f80f761d9854a72e8bdc062ed6ab5
|
|
| BLAKE2b-256 |
c9396581bda75ae525a09f370f98bf96d354d1589bf7095c0eeb96e63b3e94ee
|