Skip to main content

Extracts email metadata and text from a PDF file

Project description

pdf2mbox

a command-line utility and Python package for converting PDF emails to MBOX format

Installation

pip install pdf2mbox

Usage

# from the command line
% python -m pdf2mbox --help
usage: pdf2mbox.py [-h] [--version] [--overwrite] [--csv [CSV]]
                   pdf_file [mbox_file]

Generates an mbox from a PDF containing emails

positional arguments:
  pdf_file         PDF file provided as input
  mbox_file        Mbox file generated as output

optional arguments:
  -h, --help       show this help message and exit
  --version, -v    show program's version number and exit
  --overwrite, -o  overwrite MBOX file if it exists
  --csv [CSV]      generate CSV file output

# from within python
from pdf2mbox import pdf2mbox
pe = pdf2mbox(pdf_file, mbox_file) # pe contains dict of emails

Notes

  • The initial development of this package was funded in part by The Mellon Foundation’s “Email Archives: Building Capacity and Community” program.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf2mbox-0.3.3.tar.gz (3.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdf2mbox-0.3.3-py3-none-any.whl (4.0 kB view details)

Uploaded Python 3

File details

Details for the file pdf2mbox-0.3.3.tar.gz.

File metadata

  • Download URL: pdf2mbox-0.3.3.tar.gz
  • Upload date:
  • Size: 3.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for pdf2mbox-0.3.3.tar.gz
Algorithm Hash digest
SHA256 0f916980c8133949270f49d89b527df687d9ca64359cc4eb774c6dbb03a8c405
MD5 4dbd6c99a59284beb7828a332bdbc07e
BLAKE2b-256 2f7a84bd8d1f1685bbad8bbb13f1251759c363223a2304535dbc1f757c07834f

See more details on using hashes here.

File details

Details for the file pdf2mbox-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: pdf2mbox-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 4.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for pdf2mbox-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 12ecd0aad6fba17453dd40e5a405925eba0d39b002c8a2c8639280cecb693390
MD5 1838505d05d57b99ac7b6e0ee70f1603
BLAKE2b-256 4b9ea2184e4098970f34977df3a22136ac731746df6e5b71acd4fa961b00c5f6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page