Skip to main content

Convert .eml (email) files to PDF using Python.

Project description

eml2pdf

Convert .eml (email) files to PDF using Python, making them easier to archive, share, and view without requiring an email client.

Depends on GNOME's Pango and various Python libraries but NOT on a full rendering engine like WebKit or Gecko. python-pdfkit and wkhtmltopdf are deprecated libraries

eml2pdf should run on Linux distributions with Pango and Python and macOS. The Pango dependency is a challenge on Windows at the moment.

Features

  • Converts email body from HTML or plain text message body.
  • Tries to filter potential security or privacy issues.
  • Preserves formatting, character encodings, embedded images.
  • Generates a header section with email metadata From, To, Subject, Date.
  • Generates a list of attachments with size and md5sum. (Attachments are not embedded in the PDF.)

Dependencies

Installation

On a desktop system, chances are high that you have Pango installed. In this case you can install eml2pdf from PyPi using pip:

pip install eml2pdf

If weasyprint can't find Pango you can consult weasyprints install help to install weasyprint using your system's package manager.

Users of Arch linux or derived distro's like Manjora can use AUR package eml2pdf.

Check INSTALL.md on Github or {ref}install for more detailed installation instructions if you need more help.

Usage

eml2pdf has two modes of operation controlled via subcommands convert_dir and convert_file. Both modes share options to control page size, perform HTML sanitization, produce HTML debug output, be more quiet or verbose. Refer to usage help per subcommand below.

usage: eml2pdf [-h] {convert_dir,convert_file} ...

Convert EML files to PDF

options:
  -h, --help            show this help message and exit

supported subcommands::
  {convert_dir,convert_file}
                        Use {subcommand} --help for options.
    convert_dir         Convert all EML files in an input dir to PDF files in an output dir.
    convert_file        Convert a single EML file to a single PDF

convert_dir

convert_dir will convert all .eml files in an input directory and save converted PDF files in a specified output directory.

The output filenames are formatted as: YYYY-MM-DD_subject[-counter].pdf, where:

  • The date prefix is taken from the email's sent date.
  • The email subject is taken from the email headers.
  • Should there be any duplicate filenames, then a counter will be added.
  • The extension is changed to .pdf

For example, some_file.eml with subject "My Email" sent on March 15, 2024 will become 2024-03-15_My_Email.pdf.

convert_dir has -n/--number-of-procs, as a specific option to set the number of parallel processes. eml2pdf guesses its default value from the apparent number of CPU's available to the eml2pdf process. If verbose output is requested then -n is set to 1 or the debugging output from different subprocesses gets mixed.

$ eml2pdf convert_dir -h
usage: eml2pdf convert_dir [-h] [-p size] [--unsafe] [-d] [-v] [-q] [-n number] input_dir output_dir

positional arguments:
  input_dir             Directory containing EML files
  output_dir            Directory for PDF output

options:
  -h, --help            show this help message and exit
  -p, --page size       One of a3, a4, a5, b4, b5, letter, legal, or ledger, with or without "landscape", for example: "a4 landscape" or a3. Surround with quotes if there is a space in the argument value.
                        Defaults to "a4", implying portrait.
  --unsafe              Don't sanitize HTML from potentially unsafe elements such as remote images, scripts, etc. This may expose sensitive user information.
  -d, --debug_html      Write intermediate html file next to PDF's
  -v, --verbose         Show a lot of verbose debugging info. Forces number of procs to 1.
  -q, --quiet           Show only errors.
  -n, --number-of-procs number
                        Number of parallel processes. Defaults to the number of
                        available logical CPU's to eml_to_pdf.

Example below renders all .eml files in ./emails to a4 landscape oriented pdf's in ./pdf:

eml2pdf -p "a4 landscape" ./emails ./pdfs

convert_file

convert_file works per file, taking the input filename of the EML to convert to PDF and output filename to convert to.

$ eml2pdf convert_file -h
usage: eml2pdf convert_file [-h] [-p size] [--unsafe] [-d] [-v] [-q] input_file output_file

positional arguments:
  input_file        Input EML file to convert
  output_file       Output PDF file to convert to

options:
  -h, --help        show this help message and exit
  -p, --page size   One of a3, a4, a5, b4, b5, letter, legal, or ledger, with or without "landscape", for example: "a4 landscape" or a3. Surround with quotes if there is a space in the argument value. Defaults
                    to "a4", implying portrait.
  --unsafe          Don't sanitize HTML from potentially unsafe elements such as remote images, scripts, etc. This may expose sensitive user information.
  -d, --debug_html  Write intermediate html file next to PDF's
  -v, --verbose     Show a lot of verbose debugging info. Forces number of procs to 1.
  -q, --quiet       Show only errors.

Shared options between convert_dir and convert_file

Debug HTML

eml2pdf will first parse email header info such as date, subject, etc. Next the mail body will be parsed. If there is an HTML body, eml2pdf will clean this HTML body (ref. below under Security) and prepend this resulting HTML with a summary table.

In a next step this HTML is rendered by weasyprint to a PDF.

The --debug_html flag will save this intermediate HTML. You can use this to check if there is an email parsing issue in eml2pdf or a PDF conversion issue in weasyprint.

Page size

Not all emails are properly formatted. Part of your mail might not be visible in the pdf in case an email doesn't limit width of some elements such as images, tables or others. You can play with page sizes and orientations to try and accomodate wide emails.

Security

HTML Sanitization

Emails can contain HTML which can contain stuff you don't expect or want.

In the best case your emails contain clean HTML.

In common cases they will contain intentional tracking of end users using forged remote sources for images and other resources. This is a common practice in marketing or mass mailing solutions.

eml2pdf tries to keep the formatting in your mails ánd clean up potentially malicious content using custom filtering of tags, remote images, remote stylesheets, etc.

We try to cleanup. We can't give you a 100% guarantee. If you're very worried, please cleanup your mails yourself.

You can use the --unsafe flag if you don't want eml2pdf to try and sanitize your mails. Check your mails' content before you use this flag!

MD5 sums of attachments

eml2pdf lists attachments with their md5sums. You can use these md5sums for your convenience. They give a very strong indication that files are not altered. They will not be usable as proof in courts of law. They are not intended to be.

Reporting issues

We've tested eml2pdf with a couple of cases with embedded images, tables, unicode or specific encodings. Refer to tests for example emails.

Please open an issue ticket if you have a mail where conversion results are not usable. Describe what you think your message contains and the output you expect. Attach verbose eml2pdf output of only this eml file and attach the eml file itself. We're not promising a solution, but we can have a look.

Please cleanup any attachments you add. Remove things you don't want to share with the world.

Credits

eml2pdf was originally forked from klokie/eml-to-pdf by Daniel Grossfeld.

Contributors

  • Inline non-image attachments - omusale
  • convert_file mode and 8bit CTE with UTF-8 encoding - bastidest.

If you want to work on eml2pdf, read DEVELOPMENT.md on GitHub or {ref}development. PR's welcome ;-).

License

eml2pdf code is published under the MIT license.

Licenses for dependencies:

  • weasyprint: BSD-3
  • python-markdown: BSD-3
  • hurry.filesize: ZPL 2.1
  • beautifulsoup4: MIT
  • Pango: GPLv2

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eml2pdf-2.0.tar.gz (217.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eml2pdf-2.0-py3-none-any.whl (19.2 kB view details)

Uploaded Python 3

File details

Details for the file eml2pdf-2.0.tar.gz.

File metadata

  • Download URL: eml2pdf-2.0.tar.gz
  • Upload date:
  • Size: 217.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for eml2pdf-2.0.tar.gz
Algorithm Hash digest
SHA256 eff4091f5af555523fc47448d8b0fbde3ad43edd25201bac9361363d63920531
MD5 e482883bf55da35a9c64546f43adb102
BLAKE2b-256 24b93cbf95b6ab7482fcf54a24db037cdca4960beb8e3c58b783a4a985f818c4

See more details on using hashes here.

File details

Details for the file eml2pdf-2.0-py3-none-any.whl.

File metadata

  • Download URL: eml2pdf-2.0-py3-none-any.whl
  • Upload date:
  • Size: 19.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for eml2pdf-2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 05330eda8ed6cc6e0c3389114a863faeeddbe04c5a6ecc4473df87b66eeb6b7e
MD5 89e36a2e96d56ce8f7fd4062294a9587
BLAKE2b-256 048faf597a05fd8931537e238e8fa8b2de2dda71bbab17e5deb1707ee0a93e01

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page