Skip to main content

A library for extracting HTML content from RTF encapsulated HTML as commonly found in the exchange MSG email format.

Project description

RTFDE: RTF De-Encapsulator

A python3 library for extracting encapsulated HTML & plain text content from the RTF bodies of .msg files.

De-encapsulation enables previously encapsulated HTML and plain text content to be extracted and rendered as HTML and plain text instead of the encapsulating RTF content. After de-encapsulation, the HTML and plain text should differ only minimally from the original HTML or plain text content.

Features

  • De-encapsulate HTML from RTF encapsulated HTML.
  • De-encapsulate plain text from RTF encapsulated text.

Known Issues

  • This library fully unquotes text it de-encapsulates because it does not know which text was quoted in the RTF conversion process and which text was quoted in the original html/text. So, for instance escaped Quoted-Printable text will be returned un-escaped.
  • This library currently can't combine attachments from a .MSG Message object with the de-encapsulated HTML. This is mostly because I could not get a good set of examples of encapsulated HTML which had attachment objects that needed to be integrated back into the body of the HTML.

Anti-Features (I don't intend to have this library do this.)

  • Extract plain text from RTF encapsulated HTML. If you want this, then you will have to parse the HTML using another library.

Installation

To install from the pip package.

pip3 install RTFDE

Usage

De-encapsulating HTML or TEXT

from RTFDE.deencapsulate import DeEncapsulator

with open('rtf_file', 'rb') as fp:
    raw_rtf  = fp.read()
    rtf_obj = DeEncapsulator(raw_rtf)
    rtf_obj.deencapsulate()
    if rtf_obj.content_type == 'html':
        print(rtf_obj.html)
    else:
        print(rtf_obj.text)

Enabling Logging

Any logging (including how verbose the logging is) can be handled by configuring logging. You can enable RTFDE's logging at the highest level by getting and setting the "RTFDE" logger.

log = logging.getLogger("RTFDE")
log.setLevel(logging.INFO)

To see how to enable more in-depth logging for debugging check out the CONTRIBUTING.md file.

# Now, get the log that you want
# The main logger is simply called RTFDE. That will get you all the *normal* logs.
requests_log = logging.getLogger("RTFDE")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True

Contribute

Please check the contributing guidelines

License

Please see the license file for license information on RTFDE. If you have further questions related to licensing PLEASE create an issue about it on github.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rtfde-0.1.2.2.tar.gz (33.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rtfde-0.1.2.2-py3-none-any.whl (36.7 kB view details)

Uploaded Python 3

File details

Details for the file rtfde-0.1.2.2.tar.gz.

File metadata

  • Download URL: rtfde-0.1.2.2.tar.gz
  • Upload date:
  • Size: 33.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for rtfde-0.1.2.2.tar.gz
Algorithm Hash digest
SHA256 2f0cd6ecd644071e39452e6fc4f4a1435453af0ec7c90ea86fb4fc96010c7f1b
MD5 a4b1a370d629715d09f6440deffb330e
BLAKE2b-256 9e5c116a016b38af589e8141160bc9b034b73dde2e50c22a921751f4d982a7ca

See more details on using hashes here.

File details

Details for the file rtfde-0.1.2.2-py3-none-any.whl.

File metadata

  • Download URL: rtfde-0.1.2.2-py3-none-any.whl
  • Upload date:
  • Size: 36.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for rtfde-0.1.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d43868c74f21ae9ea5acbfd4176d5de1f2cfae0ff7f267698471c606287c04ec
MD5 f7ed97db84964b2bc89bdada367c9872
BLAKE2b-256 14245a653278259be44c1845ddd56dd30cfa7265281ba149b9342b79f9d4f788

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page