Skip to main content

Tool to parse Microsoft Rich Text Format (RTF)

Project description

rtfparse

RTF Parser. So far it can only de-encapsulate HTML content from an RTF, but it properly parses the RTF structure and allows you to write your own custom RTF renderers. The HTML de-encapsulator provided with rtfparse is just one such custom renderer which liberates the HTML content from its RTF encapsulation and saves it in a given html file.

rtfparse can also decompressed RTF from MS Outlook .msg files and parse that.

Installation

Install rtfparse from your local repository with pip:

pip install rtfparse

Installation creates an executable file rtfparse in your python scripts folder which should be in your $PATH.

Usage From Command Line

Use the rtfparse executable from the command line. Read rtfparse --help.

rtfparse writes logs into ~/rtfparse/ into these files:

rtfparse.debug.log
rtfparse.info.log
rtfparse.errors.log

Example: De-encapsulate HTML from an uncompressed RTF file

rtfparse --rtf-file "path/to/rtf_file.rtf" --de-encapsulate-html --output-file "path/to/extracted.html"

Example: De-encapsulate HTML from MS Outlook email file

Thanks to extract_msg and compressed_rtf, rtfparse internally uses them:

rtfparse --msg-file "path/to/email.msg" --de-encapsulate-html --output-file "path/to/extracted.html"

Example: Only decompress the RTF from MS Outlook email file

rtfparse --msg-file "path/to/email.msg" --output-file "path/to/extracted.rtf"

Example: De-encapsulate HTML from MS Outlook email file and save (and later embed) the attachments

When extracting the RTF from the .msg file, you can save the attachments (which includes images embedded in the email text) in a directory:

rtfparse --msg-file "path/to/email.msg" --output-file "path/to/extracted.rtf" --attachments-dir "path/to/dir"

In rtfparse version 1.x you will be able to embed these images in the de-encapsulated HTML. This functionality will be provided by the package embedimg.

rtfparse --msg-file "path/to/email.msg" --output-file "path/to/extracted.rtf" --attachments-dir "path/to/dir" --embed-img

In the current version the option --embed-img does nothing.

Programatic usage in python module

from pathlib import Path
from rtfparse.parser import Rtf_Parser
from rtfparse.renderers.de_encapsulate_html import De_encapsulate_HTML

source_path = Path(r"path/to/your/rtf/document.rtf")
target_path = Path(r"path/to/your/html/de_encapsulated.html")


parser = Rtf_Parser(rtf_path=source_path)
parsed = parser.parse_file()

renderer = De_encapsulate_HTML()

with open(target_path, mode="w", encoding="utf-8") as html_file:
    renderer.render(parsed, html_file)

RTF Specification Links

If you find a working official Microsoft link to the RTF specification and add it here, you'll be remembered fondly.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rtfparse-0.8.1.tar.gz (13.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rtfparse-0.8.1-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file rtfparse-0.8.1.tar.gz.

File metadata

  • Download URL: rtfparse-0.8.1.tar.gz
  • Upload date:
  • Size: 13.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.4

File hashes

Hashes for rtfparse-0.8.1.tar.gz
Algorithm Hash digest
SHA256 19c58427fef9f42ab7650b927b78d5d9b2ca00bf472de8ff1977f5a2c0f2b4cb
MD5 71b915b895a23fc74e06fcdebef68c4e
BLAKE2b-256 aa36acef88d95578b6537022caf5354481c3e127151344b73ab8f8b1298cb703

See more details on using hashes here.

File details

Details for the file rtfparse-0.8.1-py3-none-any.whl.

File metadata

  • Download URL: rtfparse-0.8.1-py3-none-any.whl
  • Upload date:
  • Size: 15.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.4

File hashes

Hashes for rtfparse-0.8.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9fbcd95861ca5343268d8c106a202412b05a11ec3a893712fc5a9f242b3c67e8
MD5 0f341535888cf50295b6606e66a9151f
BLAKE2b-256 6a5d78eaf2e52593f8eaa5730987c76291ebe56f9198b3da0a3e10522042b04d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page