Skip to main content

Extracts email metadata and text from a PDF file

Project description

pdf2mbox

a command-line utility and Python package for converting PDF emails to MBOX format

Installation

pip install pdf2mbox

Usage

# from the command line
% python -m pdf2mbox --help
usage: pdf2mbox.py [-h] [--version] [--overwrite] [--csv [CSV]]
                   pdf_file [mbox_file]

Generates an mbox from a PDF containing emails

positional arguments:
  pdf_file         PDF file provided as input
  mbox_file        Mbox file generated as output

optional arguments:
  -h, --help       show this help message and exit
  --version, -v    show program's version number and exit
  --overwrite, -o  overwrite MBOX file if it exists
  --csv [CSV]      generate CSV file output

# from within python
from pdf2mbox import pdf2mbox
pe = pdf2mbox(pdf_file, mbox_file) # pe contains dict of emails

Notes

  • The initial development of this package was funded in part by The Mellon Foundation’s “Email Archives: Building Capacity and Community” program.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf2mbox-0.3.2.tar.gz (3.7 kB view details)

Uploaded Source

File details

Details for the file pdf2mbox-0.3.2.tar.gz.

File metadata

  • Download URL: pdf2mbox-0.3.2.tar.gz
  • Upload date:
  • Size: 3.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for pdf2mbox-0.3.2.tar.gz
Algorithm Hash digest
SHA256 f8d754891f41a7e891fe6cf12e74fbef049fb6d9b752865363b5f17b088ddfae
MD5 279edaad96e75042f77f97f1e9633b9c
BLAKE2b-256 7dc22ae4c49a98516452141d8500d651766a711ba21de84e67a059b4e4eb4878

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page