RTF parser
Project description
rtfparse
RTF Parser. So far it can only de-encapsulate HTML content from an RTF, but it properly parses the RTF structure and allows you to write your own custom RTF renderers. The HTML de-encapsulator provided with rtfparse
is just one such custom renderer which liberates the HTML content from its RTF encapsulation and saves it in a given html file.
Dependencies
argcomplete
extract-msg
compressed_rtf
Installation
Install rtfparse from your local repository with pip:
pip install rtfparse
Installation creates an executable file rtfparse
in your python scripts folder which should be in your $PATH.
First Run
When you run rtfparse
for the first time it will start a configuration wizard which will guide you through the process of creating a default configuration file and specifying the location of its folders. (These folders serve as locations for saving extracted rtf or html files.)
In the configuration wizard you can press A
for care-free automatic configuration, which would look something like this:
$ rtfparse
Config file missing, creating new default config file
____ ____ __ _ ____ _ ____ _ _ ____ ____ ___ _ ____ __ _
|___ [__] | \| |--- | |__, |__| |--< |--| | | [__] | \|
_ _ _ ___ ____ ____ ___
|/\| | /__ |--| |--< |__>
◊ email_rtf (C:\Users\nagidal\rtfparse\email_rtf) does not exist!
(A) Automatically configure this and all remaining rtfparse settings
(C) Create this path automatically
(M) Manually input correct path to use or to create
(Q) Quit and edit `email_rtf` in rtfparse_configuration.ini
Created directory C:\Users\nagidal\rtfparse
Created directory C:\Users\nagidal\rtfparse\email_rtf
Created directory C:\Users\nagidal\rtfparse\html
rtfparse
also creates the folder .rtfparse
(beginning with a dot) in your home directory where it saves its default configuration and its log files.
Usage From Command Line
Use the rtfparse
executable from the command line. For example if you want to de-encapsulate the HTML from an RTF file, do it like this:
rtfparse -f "path/to/rtf_file.rtf" -d
Or you can de-encapsulate the HTML from an MS Outlook message, thanks to extract_msg and compressed_rtf:
rtfparse -m "path/to/email.msg" -d
The resulting html file will be saved to the html
folder you set in the rtfparse_configuration.ini
. Command reference is in rtfparse --help
.
Usage in python module
import pathlib
from rtfparse.parser import Rtf_Parser
from rtfparse.renderers import de_encapsulate_html
source_path = pathlib.Path(r"D:\trace\email\test_mail_sw_release.rtf")
target_path = pathlib.Path(r"D:\trace\email\extracted_with_rtfparse.html")
parser = Rtf_Parser(rtf_path=source_path)
parsed = parser.parse_file()
renderer = de_encapsulate_html.De_encapsulate_HTML()
with open(target_path, mode="w", encoding="utf-8") as html_file:
renderer.render(parsed, html_file)
RTF Specification Links
If you find a working official Microsoft link to the RTF specification and add it here, you'll be remembered fondly.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file rtfparse-0.7.5.tar.gz
.
File metadata
- Download URL: rtfparse-0.7.5.tar.gz
- Upload date:
- Size: 17.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3bf8f6f76f4f7bac9475b0a7d07bcbcde6c857e8860effc198c1c1cd28a20b55 |
|
MD5 | 0c98e2c3cc486e9b9d2f80ffa0c1d1e4 |
|
BLAKE2b-256 | dc679a0a4298b67ee8f2eb2307b1ee412959824038fe20ac5d3f2c3332b339a2 |
File details
Details for the file rtfparse-0.7.5-py3-none-any.whl
.
File metadata
- Download URL: rtfparse-0.7.5-py3-none-any.whl
- Upload date:
- Size: 20.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 414ff1c371435a4152642c79dc7bde85a7aea61c83f2a4973c73eb22f4fdc206 |
|
MD5 | 3d30188798fe60b2b9ca02839b8d1458 |
|
BLAKE2b-256 | ae792fc77330e8d401258fc565ff619570812fa609a720381fec38073451f743 |