Skip to main content

Extracts email metadata and text from a PDF file

Project description

xmpdf

Extracts email metadata and text body from a PDF containing emails.

Installation

pip install xmpdf

Usage

from xmpdf import Xmpdf

ems = Xmpdf(pdf_file)
# print summary info about emails in PDF file
print(ems.info())
# process emails
for m in ems.emails:
    process(m)

OS Dependencies

If you encounter errors installing xmpdf, please check the OS-level dependencies of the pdftotext package to ensure you have the required libraries installed, as xmpdf utilizes this package.

Notes

  • Assumes an email ends when a new email begins
  • Works best with a standard email header (i.e., From:, To:, Sent:, Subject:)
  • The initial development of this package was funded in part by The Mellon Foundation’s “Email Archives: Building Capacity and Community” program.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xmpdf-0.7.0.tar.gz (6.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xmpdf-0.7.0-py3-none-any.whl (6.4 kB view details)

Uploaded Python 3

File details

Details for the file xmpdf-0.7.0.tar.gz.

File metadata

  • Download URL: xmpdf-0.7.0.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for xmpdf-0.7.0.tar.gz
Algorithm Hash digest
SHA256 f74e9549f1627e8ffa340c7723270f038254b90f1a8bc2b47aecb42526e9d677
MD5 259798b6bacc0d3ae4ef6eeed78b6799
BLAKE2b-256 06c83dc9b5f3b60711770222d5c9fc4ccef68678868b3da6e5d7924b9a9477ed

See more details on using hashes here.

File details

Details for the file xmpdf-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: xmpdf-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 6.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for xmpdf-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dd6c183ccc17cd5e67f39f92527e1fbfa7ff05315509d3cfb4465083b5bdffd4
MD5 7d734474391da9da57d4c57c62b7f94b
BLAKE2b-256 85f7e801f53fcddf6b02da2ba4e96f070a9127b4434a614f6c84060c3bd50413

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page