Skip to main content

Extracts email metadata and text from a PDF file

Project description

xmpdf

Extracts email metadata and text body from a PDF containing emails.

Installation

pip install xmpdf

Usage

from xmpdf import Xmpdf

ems = Xmpdf(pdf_file)
# print summary info about emails in PDF file
print(ems.info())
# process emails
for m in ems.emails:
    process(m)

OS Dependencies

If you encounter errors installing xmpdf, please check the OS-level dependencies of the pdftotext package to ensure you have the required libraries installed, as xmpdf utilizes this package.

Notes

  • Assumes an email ends when a new email begins
  • Works best with a standard email header (i.e., From:, To:, Sent:, Subject:)
  • The initial development of this package was funded in part by The Mellon Foundation’s “Email Archives: Building Capacity and Community” program.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xmpdf-0.5.3.tar.gz (5.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xmpdf-0.5.3-py3-none-any.whl (6.1 kB view details)

Uploaded Python 3

File details

Details for the file xmpdf-0.5.3.tar.gz.

File metadata

  • Download URL: xmpdf-0.5.3.tar.gz
  • Upload date:
  • Size: 5.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for xmpdf-0.5.3.tar.gz
Algorithm Hash digest
SHA256 c80e216120a537fa52028c04076ae10de67b455dec1aaee9c7eabf9574c133a7
MD5 e7d84e32c69814812054b02e53dbcf7f
BLAKE2b-256 4ccdf04c19afa22a91162be0fc879ee1523be1ae6d550e5674a661bec781c506

See more details on using hashes here.

File details

Details for the file xmpdf-0.5.3-py3-none-any.whl.

File metadata

  • Download URL: xmpdf-0.5.3-py3-none-any.whl
  • Upload date:
  • Size: 6.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for xmpdf-0.5.3-py3-none-any.whl
Algorithm Hash digest
SHA256 1b6062584dfb3af5ee035757df40001164dbd4a6253fa321e44f3b2488cf656a
MD5 f3205dda23ae0193eb0c1132801e4899
BLAKE2b-256 7e12404caba15e87c36f7d95eacbfeca8ad8bf49c2cac2e6cf30457fa3110b3f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page