Skip to main content

A library for extracting HTML content from RTF encapsulated HTML as commonly found in the exchange MSG email format.

Project description

RTFDE: RTF De-Encapsulator

A python3 library for extracting encapsulated HTML & plain text content from the RTF bodies of .msg files.

De-encapsulation enables previously encapsulated HTML and plain text content to be extracted and rendered as HTML and plain text instead of the encapsulating RTF content. After de-encapsulation, the HTML and plain text should differ only minimally from the original HTML or plain text content.

Features

  • De-encapsulate HTML from RTF encapsulated HTML.
  • De-encapsulate plain text from RTF encapsulated text.

Known Issues

  • This library fully unquotes text it de-encapsulates because it does not know which text was quoted in the RTF conversion process and which text was quoted in the original html/text. So, for instance escaped Quoted-Printable text will be returned un-escaped.
  • This library currently can't combine attachments from a .MSG Message object with the de-encapsulated HTML. This is mostly because I could not get a good set of examples of encapsulated HTML which had attachment objects that needed to be integrated back into the body of the HTML.

Anti-Features (I don't intend to have this library do this.)

  • Extract plain text from RTF encapsulated HTML. If you want this, then you will have to parse the HTML using another library.

Installation

To install from the pip package.

pip3 install RTFDE

Usage

De-encapsulating HTML or TEXT

from RTFDE.deencapsulate import DeEncapsulator

with open('rtf_file', 'r') as fp:
    raw_rtf  = fp.read()
    rtf_obj = DeEncapsulator(raw_rtf)
    rtf_obj.deencapsulate()
    if rtf_obj.content_type == 'html':
        print(rtf_obj.html)
    else:
        print(rtf_obj.text)

Contribute

Please check the contributing guidelines

License

Please see the license file for license information on RTFDE. If you have further questions related to licensing PLEASE create an issue about it on github.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

RTFDE-0.0.2.tar.gz (18.9 kB view details)

Uploaded Source

Built Distribution

RTFDE-0.0.2-py3-none-any.whl (34.6 kB view details)

Uploaded Python 3

File details

Details for the file RTFDE-0.0.2.tar.gz.

File metadata

  • Download URL: RTFDE-0.0.2.tar.gz
  • Upload date:
  • Size: 18.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.55.0 CPython/3.9.1

File hashes

Hashes for RTFDE-0.0.2.tar.gz
Algorithm Hash digest
SHA256 b86b5d734950fe8745a5b89133f50554252dbd67c6d1b9265e23ee140e7ea8a2
MD5 8ea48d10c9dd11b3d2eba95082126b9c
BLAKE2b-256 81ea28f5ab6b46a072887c8c8fd8c8a1f7b54025fc4bb2e09024668ea6686044

See more details on using hashes here.

File details

Details for the file RTFDE-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: RTFDE-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 34.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.55.0 CPython/3.9.1

File hashes

Hashes for RTFDE-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 18386e4f060cee12a2a8035b0acf0cc99689f5dff1bf347bab7e92351860a21d
MD5 ef6cd41c546f11ce978670d1a5868943
BLAKE2b-256 5d3f39ba5a72620c43656bc80cb1f7afe0d498df4a48947d75ea0ca0752ffbf4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page