Skip to main content

RTF parser

Project description

rtfparse

RTF Parser. So far it can only de-encapsulate HTML content from an RTF, but it properly parses the RTF structure and allows you to write your own custom RTF renderers. The HTML de-encapsulator provided with rtfparse is just one such custom renderer which liberates the HTML content from its RTF encapsulation and saves it in a given html file.

Dependencies

argcomplete
extract-msg
compressed_rtf

Installation

Install rtfparse from your local repository with pip:

pip install rtfparse

Installation creates an executable file rtfparse in your python scripts folder which should be in your $PATH.

First Run

When you run rtfparse for the first time it will start a configuration wizard which will guide you through the process of creating a default configuration file and specifying the location of its folders. (These folders serve as locations for saving extracted rtf or html files.)

In the configuration wizard you can press A for care-free automatic configuration, which would look something like this:

$ rtfparse
Config file missing, creating new default config file

 ____ ____ __ _ ____ _ ____ _  _ ____ ____ ___ _ ____ __ _
 |___ [__] | \| |--- | |__, |__| |--< |--|  |  | [__] | \|
 _  _ _ ___  ____ ____ ___
 |/\| |  /__ |--| |--< |__>


◊ email_rtf (C:\Users\nagidal\rtfparse\email_rtf) does not exist!

(A) Automatically configure this and all remaining rtfparse settings
(C) Create this path automatically
(M) Manually input correct path to use or to create
(Q) Quit and edit `email_rtf` in rtfparse_configuration.ini

Created directory C:\Users\nagidal\rtfparse
Created directory C:\Users\nagidal\rtfparse\email_rtf
Created directory C:\Users\nagidal\rtfparse\html

rtfparse also creates the folder .rtfparse (beginning with a dot) in your home directory where it saves its default configuration and its log files.

Usage From Command Line

Use the rtfparse executable from the command line. For example if you want to de-encapsulate the HTML from an RTF file, do it like this:

rtfparse -f "path/to/rtf_file.rtf" -d

Or you can de-encapsulate the HTML from an MS Outlook message, thanks to extract_msg and compressed_rtf:

rtfparse -m "path/to/email.msg" -d

The resulting html file will be saved to the html folder you set in the rtfparse_configuration.ini. Command reference is in rtfparse --help.

Usage in python module

import pathlib
from rtfparse.parser import Rtf_Parser
from rtfparse.renderers import de_encapsulate_html


source_path = pathlib.Path(r"D:\trace\email\test_mail_sw_release.rtf")
target_path = pathlib.Path(r"D:\trace\email\extracted_with_rtfparse.html")


parser = Rtf_Parser(rtf_path=source_path)
parsed = parser.parse_file()


renderer = de_encapsulate_html.De_encapsulate_HTML()
with open(target_path, mode="w", encoding="utf-8") as html_file:
    renderer.render(parsed, html_file)

RTF Specification Links

If you find a working official Microsoft link to the RTF specification and add it here, you'll be remembered fondly.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rtfparse-0.7.5.tar.gz (17.5 kB view details)

Uploaded Source

Built Distribution

rtfparse-0.7.5-py3-none-any.whl (20.4 kB view details)

Uploaded Python 3

File details

Details for the file rtfparse-0.7.5.tar.gz.

File metadata

  • Download URL: rtfparse-0.7.5.tar.gz
  • Upload date:
  • Size: 17.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for rtfparse-0.7.5.tar.gz
Algorithm Hash digest
SHA256 3bf8f6f76f4f7bac9475b0a7d07bcbcde6c857e8860effc198c1c1cd28a20b55
MD5 0c98e2c3cc486e9b9d2f80ffa0c1d1e4
BLAKE2b-256 dc679a0a4298b67ee8f2eb2307b1ee412959824038fe20ac5d3f2c3332b339a2

See more details on using hashes here.

File details

Details for the file rtfparse-0.7.5-py3-none-any.whl.

File metadata

  • Download URL: rtfparse-0.7.5-py3-none-any.whl
  • Upload date:
  • Size: 20.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for rtfparse-0.7.5-py3-none-any.whl
Algorithm Hash digest
SHA256 414ff1c371435a4152642c79dc7bde85a7aea61c83f2a4973c73eb22f4fdc206
MD5 3d30188798fe60b2b9ca02839b8d1458
BLAKE2b-256 ae792fc77330e8d401258fc565ff619570812fa609a720381fec38073451f743

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page